piclist 2001\04\16\185455a >
Thread: How to make a dictionary/notepad using PIC
www.piclist.com/techref/microchip/devices.htm?key=pic
flavicon
face BY : David Cary email (remove spam text)



Dear Roman Black,

Roman Black <.....fastvidspamEraseMEEZY.NET.AU> on 2001-04-12 08:17:41 AM replied,

...
>>   100: space
>>   101: e
>>   110: t
>>   111: n
>>   0100: r
>>   0101: o
>>   0110: a
>>   0111: i
>>   00_xxxx_xxxx: all other (8 bit) letters.
...
{Quote hidden}

Say I restrict myself to 7 bit characters (changing the last line to
 00xxx_xxxx
).

If the text I'm compressing is mostly-uppercase, then those most-frequent 8
letters will I'm compressing will be uppercase. Capital A will be represented by
the code
 B'01_10' ; A
and then lowercase a will be represented by
 B'00_110_0001' ; a
.

If the text I'm compressing is mostly-lowercase, then
 B'00_100_0001' ; A
 B'01_10' ; a

The letter frequencies at
 www.piclist.com/techref/method/compress/embedded.htm
and
 www.piclist.com/techref/method/compress/etxtfreq.htm
seem to imply that the top 4 characters (including space) account for about 1/2
of typical English text.

Then 100 characters (on average) of text compress to
 3*1/2 + 9*1/2 bits/char = 150 + 450 bits/100 chars = 6.00 bits/char.

Um... not really that good is it. Ah well -- the best possible
one-letter-at-a-time compression ( Huffman compression ) can't get any better
than about 4 bits/char on English text.

Other popular one-letter-at-a-time compression schemes are ``base 40'' and ``
Zork Standard Code for Information Interchange (ZSCII)'' (see
 rdrop.com/~cary/html/data_compression.html#short
for details). They pack 3 letters plus a flag bit into 2 bytes (5 bits/byte,
plus overhead to handle capital/lowercase shifts).

The really good text compressors (1.5 bits/char or less) have codes that
decompress into entire words or substrings.

Have you looked at LZRW1a ?
 LZRW1A
 http://www.ross.net/compression/lzrw1a.html
The program itself fits into a PIC easily. The standard size output buffer (16
KB) is a bit large for the PIC. I wonder how well it would compress if we
reduced its output buffer to, say, 256 bytes (using 64 bytes from each of the 4
banks of RAM in the PIC16F877) or even a mere 95 bytes (so the output buffer
fits into Bank 3 RAM). The *compression* code (before unrolling) takes less than
100 instructions in 68000 assembler (maybe twice that in PIC code ?). The
decompression code (before unrolling) takes less than 40 instructions in 68000
assembler (maybe twice that in PIC code ?).

--
David Cary

--
http://www.piclist.com hint: To leave the PICList
spampiclist-unsubscribe-requestspammitvma.mit.edu


<86256A30.007D9024.00@Brunswickoutdoor.com>

See also: www.piclist.com/techref/microchip/devices.htm?key=pic
Reply You must be a member of the piclist mailing list (not only a www.piclist.com member) to post to the piclist. This form requires JavaScript and a browser/email client that can handle form mailto: posts.
Subject (change) How to make a dictionary/notepad using PIC

month overview.

new search...