piclist 2001\04\11\154506a >
Thread: How to make a dictionary/notepad using PIC
www.piclist.com/techref/microchip/devices.htm?key=pic
flavicon
face BY : David Cary email (remove spam text)



Roman Black <EraseMEfastvidspamBeGonespamEZY.NET.AU> on 2001-04-10 02:56:18 AM
had some really good points and presents some good ideas.

Most data compression and decompression algorithms don't use any multiplies or
divides, so it's no big deal that PICs have slow multiplies.

The *fastest* data compression routine I've seen is RLE (which is useless on
English text).

The next fastest data compression routine I've seen is:
 LZRW1A
 http://www.ross.net/compression/lzrw1a.html
(compression source code; also decompression source code is here).
but this needs a block of "history" RAM when decompressing. The bigger this
block, the better the compression. Does the PIC have enough RAM for "adequate"
compression ?

Some other ``small'' data compression ideas were discussed on Deja (now Google
Groups)
under subject headings ``compression on a PDA'' and ``Compressing the Bible for
a PDA''.

http://groups.google.com/groups?q=%22Compressing+the+Bible+for+a+PDA%22&hl=en&lr=&safe=off&btnG=Google+Search&meta=site%3Dgroups%26group%3Dcomp.compression.*
http://groups.google.com/groups?q=%22compression+on+a+PDA%22&hl=en&lr=&safe=off&btnG=Google+Search&meta=site%3Dgroups%26group%3Dcomp.compression.*

(be sure to view the whole thread).

I hear that ``spell checkers'' compress their lists of spelling words by storing
a few bits indicating how many letters this word had in common with the start of
the previous word (zero, for the first word (starting with A), the first word
starting with B, etc), a few bits indicating how much to increment the next
letter (unless the previous word didn't have a next letter), followed by the
rest of the word.
For example,
 ex
 fear
 feared
 fearing
 fed
 feed
(26 bytes, not counting end-of-word indicator)
would be encoded
 <0> e x
 <0> <1> e a r
 <4> e d
 <4> <4> ng
 <2> <3>
 <2> <1> d
(If you just put the letters into bytes, and the numbers into nybbles, rather
than something more efficient, this gives 15 bytes, not counting end-of-word
indicator).

I was playing with a very simple compression algorithm based on the letter
frequencies in
 www.piclist.com/techref/method/compress/embedded.htm
and
 www.piclist.com/techref/method/compress/etxtfreq.htm
(which really needs a "up" link to
 www.piclist.com/techref/method/compress.htm
)
that decoded like this:

 100: space
 101: e
 110: t
 111: n
 0100: r
 0101: o
 0110: a
 0111: i
 00xxxx_xxxx: all other (8 bit) letters.

I think I could decode this with a pretty compact program on a PIC.
I think I could even *encode* this with a small subroutine on the PIC.
It has the feature that I can encode *any* sequence of raw bytes, so I can send
funky control codes.
If I didn't need all possible 8 bit values
I could combine this with Roman Black's idea
and further decode some of those ``unused'' 10 bit symbols
(obviously I don't need the ones for ``_etnroai'')
into letter pairs, triplets, or entire words.

True Morse code compresses almost as well as the above scheme -- if I use "1"
for dot, "01" for dash, "00" for end-of-letter, I get
00 : space
100 : e
0100 : t
01100 : n
101100 : r
01010100 : o
10100 : a
1100 : i
11100 : s
...
for the most common letters.

--
David Cary

--
http://www.piclist.com hint: To leave the PICList
@spam@piclist-unsubscribe-requestspamspammitvma.mit.edu


<86256A2B.006C4BBB.00@Brunswickoutdoor.com>

See also: www.piclist.com/techref/microchip/devices.htm?key=pic
Reply You must be a member of the piclist mailing list (not only a www.piclist.com member) to post to the piclist. This form requires JavaScript and a browser/email client that can handle form mailto: posts.
Subject (change) How to make a dictionary/notepad using PIC

month overview.

new search...