>
>>
>>> Harold Hallikainen wrote:
>>>>>> I've already written a UTF-8 to 16 bit Unicode converter for
>>>>>> another project, so I guess I'll take that and drive an if to pass
>>>>>> through
>>>>>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the
>>>>>> codes
>>>>>> line up, but several don't.
>>>>>>
>>>>>>
>>>>> Exactly 32 don't and they are all in one continuous block (0x80 to
>>>>> 0x9F)
>>>>> so a small table for that block might be another option to consider.
>>>>>
>>>> Excellent!
>>>>
>>> Unfortunately i've just realised I was wrong, while the exceptions are
>>> in one contiguous block in windows-1252 they are spread all over the
>>> place in unicode so my advice would have been good for a 1252 to
>>> unicode
>>> converter but not for a unicode to 1252 converter.
>>>
>>
>> OK, I'm at home and all that is at work... I guess I'll go back to my
>> switch case.
>
> Erk. Rather sub-optimal, since unless the compiler is very clever, it will
> be running 32 tests per character.
>
> You can still use the table approach: Build your table of 32 entries,
> where each entry lists the unicode AND ISO code, and the list is sorted by
> unicode.
>
> Then do a 'phone book' Log(n) search (use the middle value to determine if
> you recurse for the top or bottom block) to find if the unicode is in the
> list. If so, then replace with the associated ISO code.
>
> Only takes about 5 tests per character, and is super-quick if you unroll
> the loop.
>
> Of course, you could hand-optimize the case statement as a tree of nested
> IFs that act like the 20-questions game (although in this case, you should
> only need five questions.) which is really a pre-optimized version of the
> phone book search.
>
> And if all this seems overcomplicated, speeding up string processing by 6
> times is usually worth it.
>
> --
> Jeremy Lee BCompSci (Hons)
> The Unorthodox Engineers
>
http://www.unorthodox.com.au