Searching \ for 'UTF-8 to Windows CP1252 conversion code?' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=utf+windows+cp1252
Search entire site for: 'UTF-8 to Windows CP1252 conversion code?'.

Truncated match.
PICList Thread
'UTF-8 to Windows CP1252 conversion code?'
2009\09\04@170816 by Harold Hallikainen

face
flavicon
face
Before I start to write my own, does anyone have some C code to convert a
string in UTF-8 to CP1252?

Thanks!

Harold



--
FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

2009\09\04@172047 by Mark Rages

face picon face
Is this for embedded?  there's code in here:

http://elm-chan.org/fsw/ff/00index_e.html

On Fri, Sep 4, 2009 at 4:21 PM, Harold
Hallikainen<spam_OUTharoldTakeThisOuTspamhallikainen.org> wrote:
> Before I start to write my own, does anyone have some C code to convert a
> string in UTF-8 to CP1252?
>
> Thanks!
>
> Harold
>

--
Mark Rages, Engineer
Midwest Telecine LLC
.....markragesKILLspamspam@spam@midwesttelecine.com

2009\09\05@000115 by Harold Hallikainen

face
flavicon
face
Thanks! That's a nice site, but I don't see anything about changing UTF-8
strings to CP1252. Is it there somewhere? And yes, this is to run on a
PIC24H.

Thanks!

Harold

{Quote hidden}

> -

2009\09\05@085748 by Gerhard Fiedler

picon face
Harold Hallikainen wrote:

> Before I start to write my own, does anyone have some C code to
> convert a string in UTF-8 to CP1252?

I don't, but maybe you can lift something out of there
<http://www.gnu.org/software/libiconv/>

Gerhard

2009\09\05@090758 by Mark Rages
face picon face
On Fri, Sep 4, 2009 at 11:14 PM, Harold
Hallikainen<EraseMEharoldspam_OUTspamTakeThisOuThallikainen.org> wrote:
> Thanks! That's a nice site, but I don't see anything about changing UTF-8
> strings to CP1252. Is it there somewhere? And yes, this is to run on a
> PIC24H.
>

The code tables are in ff.h.

Regards,
Mark
markrages@gmail
--
Mark Rages, Engineer
Midwest Telecine LLC
markragesspamspam_OUTmidwesttelecine.com

2009\09\05@131049 by Harold Hallikainen

face
flavicon
face

> Harold Hallikainen wrote:
>
>> Before I start to write my own, does anyone have some C code to
>> convert a string in UTF-8 to CP1252?
>
> I don't, but maybe you can lift something out of there
> <http://www.gnu.org/software/libiconv/>
>
> Gerhard

Thanks! That seems a lot more complex than I need, probably because it
does so many different conversions. Also... people sure comment code less
that I do! I've already written a UTF-8 to 16 bit Unicode converter for
another project, so I guess I'll take that and drive an if to pass through
ASCII and a switch case to handle CP1252 above 0x7f. A lot of the codes
line up, but several don't.

Thanks!

Harold

--
FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

2009\09\05@134839 by Harold Hallikainen

face
flavicon
face

> On Fri, Sep 4, 2009 at 11:14 PM, Harold
> Hallikainen<@spam@haroldKILLspamspamhallikainen.org> wrote:
>> Thanks! That's a nice site, but I don't see anything about changing
>> UTF-8
>> strings to CP1252. Is it there somewhere? And yes, this is to run on a
>> PIC24H.
>>
>
> The code tables are in ff.h.
>
> Regards,
> Mark
> markrages@gmail

Thanks! I see the code tables there. I also see various Unicode tables in
the options directory. It looks like they're doing a brute force lookup.
Since 16 bit Unicode would require 65536 entries and CP1252 only has 256
values, and many of the Unicode values between 0x80 and 0xff match the
CP1252 value, I'm thinking of a switch case to handle the exceptions. I've
already written a UTF-8 to 16 bit Unicode converter for another project,
so I think I'll use that to drive my switch case.

Again, thanks for the comments!

Harold


--
FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

2009\09\06@182257 by peter green

flavicon
face

> I've already written a UTF-8 to 16 bit Unicode converter for
> another project, so I guess I'll take that and drive an if to pass through
> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the codes
> line up, but several don't.
>  
Exactly 32 don't and they are all in one continuous block (0x80 to 0x9F)
so a small table for that block might be another option to consider.

2009\09\06@214433 by Harold Hallikainen

face
flavicon
face

>
>> I've already written a UTF-8 to 16 bit Unicode converter for
>> another project, so I guess I'll take that and drive an if to pass
>> through
>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the codes
>> line up, but several don't.
>>
> Exactly 32 don't and they are all in one continuous block (0x80 to 0x9F)
> so a small table for that block might be another option to consider.
> -

2009\09\06@215354 by peter green

flavicon
face
Harold Hallikainen wrote:
>>> I've already written a UTF-8 to 16 bit Unicode converter for
>>> another project, so I guess I'll take that and drive an if to pass
>>> through
>>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the codes
>>> line up, but several don't.
>>>
>>>      
>> Exactly 32 don't and they are all in one continuous block (0x80 to 0x9F)
>> so a small table for that block might be another option to consider.
>>    
> Excellent!
>  
Unfortunately i've just realised I was wrong, while the exceptions are
in one contiguous block in windows-1252 they are spread all over the
place in unicode so my advice would have been good for a 1252 to unicode
converter but not for a unicode to 1252 converter.


2009\09\06@220316 by Harold Hallikainen

face
flavicon
face

{Quote hidden}

OK, I'm at home and all that is at work... I guess I'll go back to my
switch case.

Thanks!

Harold



--
FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

2009\09\06@233636 by Jeremy Lee

flavicon
face

{Quote hidden}

Erk. Rather sub-optimal, since unless the compiler is very clever, it will
be running 32 tests per character.

You can still use the table approach: Build your table of 32 entries,
where each entry lists the unicode AND ISO code, and the list is sorted by
unicode.

Then do a 'phone book' Log(n) search (use the middle value to determine if
you recurse for the top or bottom block) to find if the unicode is in the
list. If so, then replace with the associated ISO code.

Only takes about 5 tests per character, and is super-quick if you unroll
the loop.

Of course, you could hand-optimize the case statement as a tree of nested
IFs that act like the 20-questions game (although in this case, you should
only need five questions.) which is really a pre-optimized version of the
phone book search.

And if all this seems overcomplicated, speeding up string processing by 6
times is usually worth it.

--
Jeremy Lee BCompSci (Hons)
The Unorthodox Engineers
 http://www.unorthodox.com.au

2009\09\07@001948 by Harold Hallikainen

face
flavicon
face

{Quote hidden}

Thanks! I did something like that (binary search) to look up the Unifont
bit map for a Unicode character when I was storing part of the Unifont
table in internal flash. I've since moved it to an SPI flash, so I just
index directly to the bitmap. Anyway, I'll look at this all this week.
Thanks for the comments!

Harold

FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

More... (looser matching)
- Last day of these posts
- In 2009 , 2010 only
- Today
- New search...