Searching \ for '[PIC] Need help with CRC cost/benefit for new boot' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/method/errors.htm?key=crc
Search entire site for: 'Need help with CRC cost/benefit for new boot'.

Exact match. Not showing close matches.
PICList Thread
'[PIC] Need help with CRC cost/benefit for new boot'
2012\03\14@122020 by Byron Jeff

flavicon
face
Forgot the tag... Sorry.

I'm putting the finishing touches on a bootloader targeted for the
12F18XX/16F19XX enhanced 16F chipset family. It has an extended goal set to Frank
Sergeant's Pikme bootloader which I updated a couple of years ago:

http://pygmy.utoh.org/pikme/

The additions are more reliable serial communications and autobauding. I've
accomplished this by encoding the data using Chip Gracey's 3PB protocol
used for the Parallax Propeller bootloader:

http://propeller.wikispaces.com/Download+Protocol

The protocol encodes each data bit with a start and stop bit extending the
error range from 5% to nearly 50% of the cell width. A preable is used to set the timing of
the download. I've been testing with using the internal oscillator at 32
Mhz and running the serial port at 115.2K 7N1 which facilitates encoding 3
start bits, 3 data bits, and 3 stop bits per character. Seems to work fine
so far.

Since this is designed as a blind half pin protocol (most likely using
PortE3/MCLR on the larger chips), some sort of error checking is going to
be attached to each data packet. No need in burning a bad row of data if it
can be detected. Simple protocol with a 3 bit command, 6 bit length, up to
64 15 bit words of address/data and checksum. 15 bits is used because it
maps into 5 characters with 3PB, program data is 14 bits, and addresses
are a maximum of 15 bits. So it didn't make sense to encode a 16th bit and
an extra character per word for this particular target.

So I've been working on checksums. It's clear that CRC is the best. However
it gets murky in terms of running CRC on 15 bit values. While there are
decent 15 bit CRC polynominals out there, I've had a tough time finding any
table driven algorithms that addresses anything other than 4, 8, 16, or 32
bits. It looks like the natural breakdown would be to split the 15 bits
into three 5 bit words and use three 5 bit tables to encode. So the tables
would cost 96 words of program memory. But then it looks like some work to
either split the 15 bit word into 5 bit chunks, or the opposite of
combining three 5 bit chunks into a whole 15 bit word. With some careful
coding it would be possible to generate both at the same time I guess.

So it has me wondering if the cost is worth it. I know that CRC is the
best. Theresa Maxino wrote an excellent Master's thesis examining different
checksumming types and their computational/complexity cost vs. their
ability to catch errors. CRC is clearly superior. The paper is here:

http://www.ece.cmu.edu/~koopman/thesis/maxino_ms.pdf

But she points out that 1's complement sums and Fletcher's checksum are
computationally simpler but do not have nearly the error catching rate of
CRCs.

So I'm open to any thoughts as to whether or not investing in a 5/15 bit
table driven CRC is worth the effort for a 6 foot serial cable that already
was widened the error gap via encoding. My current thought is to implement
Fletcher's for now and revisit after getting the bootloader done.

Any help appreciated.

Thanks.

BAJ

-- Byron A. Jeff
Department Chair: IT/CS/CNET
College of Information and Mathematical Sciences
Clayton State University
http://cims.clayton.edu/bjeff

-- Byron A. Jeff
Department Chair: IT/CS/CNET
College of Information and Mathematical Sciences
Clayton State University
http://cims.clayton.edu/bjef

2012\03\14@125746 by alan.b.pearce

face picon face
PIC topic added ...

> So I've been working on checksums. It's clear that CRC is the best. However it gets
> murky in terms of running CRC on 15 bit values. While there are decent 15 bit CRC
> polynominals out there, I've had a tough time finding any table driven algorithms
> that addresses anything other than 4, 8, 16, or 32 bits. It looks like the natural
> breakdown would be to split the 15 bits into three 5 bit words and use three 5 bit
> tables to encode. So the tables would cost 96 words of program memory. But then it
> looks like some work to either split the 15 bit word into 5 bit chunks, or the
> opposite of combining three 5 bit chunks into a whole 15 bit word. With some careful
> coding it would be possible to generate both at the same time I guess.

My initial thought was that the easiest way is to calculate a 16 bit CRC with the top bit of data always 0, but I guess you also need to transmit the CRC in a 15 bit word, but maybe the way to do that is have it as a special word.

But if the comms are one way, how are you going to signify an error?


-- Scanned by iCritical.

2012\03\14@132853 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 04:56:27PM +0000, spam_OUTalan.b.pearceTakeThisOuTspamstfc.ac.uk wrote:
> PIC topic added ...
>
> > So I've been working on checksums. It's clear that CRC is the best. However it gets
> > murky in terms of running CRC on 15 bit values. While there are decent 15 bit CRC
> > polynominals out there, I've had a tough time finding any table driven algorithms
> > that addresses anything other than 4, 8, 16, or 32 bits. It looks like the natural
> > breakdown would be to split the 15 bits into three 5 bit words and use three 5 bit
> > tables to encode. So the tables would cost 96 words of program memory. But then it
> > looks like some work to either split the 15 bit word into 5 bit chunks, or the
> > opposite of combining three 5 bit chunks into a whole 15 bit word. With some careful
> > coding it would be possible to generate both at the same time I guess.
>
> My initial thought was that the easiest way is to calculate a 16 bit CRC
> with the top bit of data always 0, but I guess you also need to transmit
> the CRC in a 15 bit word, but maybe the way to do that is have it as a
> special word.

Even if it's a 16 bit CRC, there will be some of the same types of
problems. Another thought is to split into a 7 bit table and a 8 bit one.
That will git rid of the split/combine issue at a cost of 384 words of
memory. Again just wondering if it's worth it.

>
> But if the comms are one way, how are you going to signify an error?

By not loading the bad row and marking the download as invalid. The
bootloader simply will not run application code anymore until a new load is
done. I'm thinking of saving the very last row of memory (or possibly the
2nd row as the first is reserved for the bootloader under Frank's scheme)
for such flags. Erase the row when the current application is valid, write
a "bad code flag" when it's invalid and check the row being blank before
running the application with either the execute command or after a timeout.

If a developer wanted to, they could certainly use a I/O pin (as opposed to
input only) or even two pins if they wanted to communicate back to the
host. But in my experience bootloaders are reliable virtually all the time.
so there's no real deep need for two way communication. Neither Frank's
Pikme nor Wouter's ZPL had a defined feedback channel. And even if there's
a burp, there's little cost to simply downloading again.

BAJ


>
>
> --
> Scanned by iCritical.
>
> -

2012\03\14@145641 by Dave Tweed

face
flavicon
face
Byron Jeff wrote:
> On Wed, Mar 14, 2012 at 04:56:27PM +0000, .....alan.b.pearceKILLspamspam@spam@stfc.ac.uk wrote:
> > My initial thought was that the easiest way is to calculate a 16 bit CRC
> > with the top bit of data always 0, but I guess you also need to transmit
> > the CRC in a 15 bit word, but maybe the way to do that is have it as a
> > special word.
>
> Even if it's a 16 bit CRC, there will be some of the same types of
> problems. Another thought is to split into a 7 bit table and a 8 bit one.
> That will git rid of the split/combine issue at a cost of 384 words of
> memory. Again just wondering if it's worth it.

You don't *have* to do a table-driven implementation. Those exist for speed,
but your application doesn't need that. Just calculate the CRC one bit at a
time as they come in.

-- Dave Twee

2012\03\14@150502 by Barry Gershenfeld
picon face
You've examined the costs, but what about the benefits?

There's only a narrow area where a simple checksum misses an error that a
CRC would catch.

What error rates do you expect?   Most digital connections don't make
errors, and when they do, the term "noise" often means "I haven't taken the
trouble to find out what the problem is".    Now, when there's an analog
link (telephone, radio, modem), then a CRC is a very good idea.  I can see
that a scheme which times the width of pulses might be counted as "analog"
and therefore subject to jitter, aka, noise.

What is the risk of a bad download?  Medical devices, certainly you demand
liability.  Lab experiments and consumer toys, not so important. Maybe once
in its lifetime there will be a bad load, and trying the download again
will fix it.  Which sounds an awful lot like your failure scenario with CRC
anyway :)

Barr

2012\03\14@151403 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 02:56:41PM -0400, Dave Tweed wrote:
> Byron Jeff wrote:
> > On Wed, Mar 14, 2012 at 04:56:27PM +0000, alan.b.pearcespamKILLspamstfc.ac.uk wrote:
> > > My initial thought was that the easiest way is to calculate a 16 bit CRC
> > > with the top bit of data always 0, but I guess you also need to transmit
> > > the CRC in a 15 bit word, but maybe the way to do that is have it as a
> > > special word.
> >
> > Even if it's a 16 bit CRC, there will be some of the same types of
> > problems. Another thought is to split into a 7 bit table and a 8 bit one.
> > That will git rid of the split/combine issue at a cost of 384 words of
> > memory. Again just wondering if it's worth it.
>
> You don't *have* to do a table-driven implementation. Those exist for speed,
> but your application doesn't need that. Just calculate the CRC one bit at a
> time as they come in.

Actually speed is an issue because those bits are coming from a spigot I
cannot control. At 115.2kbits/sec the cell width is 8.6 uS. So I get 12 uS
from the data sample time until I have to be back to get the next bit (as
every bit has a 8.6 uS stop bit and I sample in the middle of the cell). At
8 MIPS each instruction is 125 ns, so I can get 8 in a uS and about 100 in
the 12.9 uS I have after the sample. So I guess there is in fact enough
time to check to see if this bit is a part of the CRC, shift it in, and do
the XOR along with shifting the bit into the data normally.
I can do some testing of that. Should be less space than using the tables.
And I really don't have anything else to do in those 12.9 uS.

BAJ

>
> -- Dave Tweed
> -

2012\03\14@152531 by Paul Hutchinson

picon face
> -----Original Message-----
> From: .....piclist-bouncesKILLspamspam.....mit.edu On Behalf Of Dave Tweed
> Sent: Wednesday, March 14, 2012 2:57 PM
>
> You don't *have* to do a table-driven implementation. Those exist
> for speed, but your application doesn't need that. Just calculate the CRC
> one bit at a time as they come in.
>
> -- Dave Tweed

I like Dr. Imre Bartfai's no tables assembly code for CCITT-CRC16.
http://www.piclist.com/techref/microchip/crc.htm

Paul Hutch

2012\03\14@155049 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 12:05:01PM -0700, Barry Gershenfeld wrote:
> You've examined the costs, but what about the benefits?
>
> There's only a narrow area where a simple checksum misses an error that a
> CRC would catch.

According to the paper, it's not actually so narrow. It's the reason that
CRC "dominates" Fletcher's in all instances and all Hammond Distances.

> What error rates do you expect?   Most digital connections don't make
> errors, and when they do, the term "noise" often means "I haven't taken the
> trouble to find out what the problem is".    Now, when there's an analog
> link (telephone, radio, modem), then a CRC is a very good idea.  I can see
> that a scheme which times the width of pulses might be counted as "analog"
> and therefore subject to jitter, aka, noise.

Like I said, it's a 6 foot serial cable. I fully expect zero errors.
>
> What is the risk of a bad download?  Medical devices, certainly you demand
> liability.  Lab experiments and consumer toys, not so important. Maybe once
> in its lifetime there will be a bad load, and trying the download again
> will fix it.  Which sounds an awful lot like your failure scenario with CRC
> anyway :)

It's a development setup for educational testing. The only harm will be
user frustration.

This is one of the reason's I threw the subject out to the list. Sometimes
it's too easy to overthink the problem, or to get a fresh perspective. I've
already gotten two good options (don't worry too much, use bit by bit CRC
because there's enough time and the cost is minimal).

Thanks.

BAJ

>
> Barry
> -

2012\03\14@161913 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 03:25:30PM -0400, Paul Hutchinson wrote:
> > -----Original Message-----
> > From: EraseMEpiclist-bouncesspam_OUTspamTakeThisOuTmit.edu On Behalf Of Dave Tweed
> > Sent: Wednesday, March 14, 2012 2:57 PM
> >
> > You don't *have* to do a table-driven implementation. Those exist
> > for speed, but your application doesn't need that. Just calculate the CRC
> > one bit at a time as they come in.
> >
> > -- Dave Tweed
>
> I like Dr. Imre Bartfai's no tables assembly code for CCITT-CRC16.
> http://www.piclist.com/techref/microchip/crc.htm

It's basically an byte based standard version of the bit by bit
implementation. I can do exactly the same for 15 bit words. Realizing that
I have the time to do it. Fundamentally I already have a routine that
receives the data 1 bit at a time. So all I need to do is incorporate the
high bit test and the xor as soon as I receive the next bit.

Considering that the cost is minimal and that the benefits are useful, I
think I now have a winner.

BAJ

>
> Paul Hutch
>
> -

2012\03\14@193933 by Robert Rolf

picon face
Is there some reason you cannot just do a SINGLE CRC16 or 32  on the whole
file,
at the end of the bootload, to verify that the ENTIRE file is correct?

>From what you posted initially, all you care about is good/bad for the
ENTIRE upload.
Why bother with line by line checked if all you really need is that the
-whole- load  be good?
You said you were just going to redo the entire bootload if a single
line didn't verify as correct,
so why bother with all the bit twiddling to reduce individual bit errors?
K.I.S.S.

Do you also validate each flash line write to ensure that it 'took'?

I run 115300 TTL over 6' cables ALL the time for bootload, and never have
an issue.

I just run a CRC on the entire image at boot time so I KNOW my load was
good or not.
Doing it at boot time saves one having to do 'on the fly' CRC computations
(if you don't have
the CPU power to do that).

Given you concern for a good upload, it would be prudent to also validate
the code
(and stored values) image at run time.
Are you doing a CRC of the Flash image on startup to validate that the
memory image is uncorrupted by glitches, runaway code, etc.?
Even a simple checksum is better than doing nothing at all to verify code
integrity.

Rober

2012\03\14@200423 by Scott Dattalo

face
flavicon
face

>
> It's basically an byte based standard version of the bit by bit
> implementation. I can do exactly the same for 15 bit words. Realizing that
> I have the time to do it. Fundamentally I already have a routine that
> receives the data 1 bit at a time. So all I need to do is incorporate the
> high bit test and the xor as soon as I receive the next bit.
>
> Considering that the cost is minimal and that the benefits are useful, I
> think I now have a winner.
>
> BAJ

Byron,

You may wish to look at some of the unrolled CRC implementations I wrote a
long time ago:

http://www.dattalo.com/technical/software/pic/crc.php

The CCITT 16-bit can be implemented in just 17 instructions cycles:
http://www.dattalo.com/technical/software/pic/crc_1021.asm

Scott

2012\03\14@213356 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 05:39:32PM -0600, Robert Rolf wrote:
> Is there some reason you cannot just do a SINGLE CRC16 or 32  on the whole
> file,
> at the end of the bootload, to verify that the ENTIRE file is correct?

It doesn't really buy much. It would be usable if it were possible to load
the entire file into RAM before writing it into the program memory. But
with nearly 16K of program space in a 16F1938/39 and nowhere near that
amount of RAM, it doesn't give too much of a gain over checksumming
packets.

>
> >From what you posted initially, all you care about is good/bad for the
> ENTIRE upload.
> Why bother with line by line checked if all you really need is that the
> -whole- load  be good?
> You said you were just going to redo the entire bootload if a single
> line didn't verify as correct,
> so why bother with all the bit twiddling to reduce individual bit errors?
> K.I.S.S.

Because whether you do it at the block level or the file level, you still
have to process every bit into the CRC computation. The only realy
difference is when you clear the remainder. While in theory it will save 15
bits/row, that difference isn't really enough to justify one model over the
other.

>
>  Do you also validate each flash line write to ensure that it 'took'?

Now that is doable since a single row can be saved long enough to verify it
was properly written. I plan to put a pause after every row of data to give
the chip time to write and verify it.

>
> I run 115300 TTL over 6' cables ALL the time for bootload, and never have
> an issue.

I don't expect to have issues either. It's one of the reasons that I'm
fretting over how much complexity to encode. On the other hand, by spending
a bit of time preplanning, I can ensure that there is some safety if the
loader needs to be used in less friendly environments.

>
> I just run a CRC on the entire image at boot time so I KNOW my load was
> good or not.
> Doing it at boot time saves one having to do 'on the fly' CRC computations
> (if you don't have
> the CPU power to do that).

Now that's actually an excellent idea! That would be a good reason to have
a CRC over the entire image and simply store it with the image. It could be
generated as a separate command specifying the beginning and end addresses
of the image and the checksum. The loader could store it and verify that
the image is intact before running it.
>
> Given you concern for a good upload, it would be prudent to also validate
> the code
> (and stored values) image at run time.

Exactly.
>  Are you doing a CRC of the Flash image on startup to validate that the
> memory image is uncorrupted by glitches, runaway code, etc.?

I haven't gotten that far. I've spent all my time on the communications
side of the loader so far. Writing the image and executing the application
are much more well known parts of the bootloader of which I can pretty much
just lift the code from Frank's Pikme or any other PIC bootloader.

It's a great idea. I will definitely incorporate it.

> Even a simple checksum is better than doing nothing at all to verify code
> integrity.

Thanks for the input. I always knew that there was a use for
crowd-sourcing! ;-)

BAJ

>
> Robert
> -

2012\03\19@174523 by Byron Jeff

flavicon
face
On Wed, Mar 14, 2012 at 05:04:19PM -0700, Scott Dattalo wrote:
{Quote hidden}

I have an update. I went ahead an implemented a bit by bit implementation
of CRC32 using 15 data bits that are read from program memory. It's on my
laptop, which I don't have on me at the moment, but when I get a chance
I'll post it here:

http://kahuna.clayton.edu/byron/crc32/crc32.asm

I tested it using the first row of the program as input, processing 32
words of data. Works like a champ. It displays on an attached LED display I
have for the project, but that's tangential to the process.

I did take a look at some of Scott's implementations. Scott has a parallel
CRC-32 implementation here:

http://www.dattalo.com/technical/software/pic/crc_32.asm

that only uses a 32 byte table to do its job. I found a Circuit Celler
article that uses the same concepts here:

http://outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf

The bottom line is that the table can be reduced from a linear table to a
logrithmic table by instead of having a single entry for each potential
data value, that you code an entry for each bit position value, then XOR
the table elements of the bit positions to gain the result. So for example
if there were a 15 bit element 0x0095, instead of having a specific entry
for it, XOR the table entries for 0x0080+0x0010+0x0004+0x0001. Since
there's only a table entry for each bit position, instead of having 32,768
table entires, the table is reduced to 15 entries.

Now the debate in my head is how much faster an unrolled implementation of:

check the bit - XOR the table entry if the bit is set

is over the bit by bit implementation. It looks like the real advantage is
that the 6 shifts are eliminated for each bit. But the replacement is the 4
table lookups required if the table entry is required.

Since I'm no longer having to do this at wire speed, and my implementation
works, I think I'll just leave it with the one small modification of using
a common buffer for both the program data and the read CRC to save space.

My PIC implementation worked so well that it actually identified an error
in my host implementation that I was using for comparison. The host
implementation used an Intel Hex file reader I located here:

http://www.pjrc.com/tech/8051

I probably should have picked a better C implementation, but the code is
throwaway because I plan on recoding this in Python anyway and it was just
for testing. I had to add a routine that formatted the data into
Microchip's HEX16 format. I then wrote a bit by bit implementation that
computed the CRC for the first 32 words using 15 bits of data from each
word. I checked the CRC by rerunning the CRC computation using the CRC
tacked on. All zeros meant I was good and a triple check with a bad bit or
two verified that the computation was working.

So when I started on the PIC implementation, it was generated a bad
checksum. I finally deduced that the value in the host implementation for
address 7 wasn't what was stored in the PIC memory at address 7. I realized
that the config word was being mapped into the program memory space because
the change upper address line was ignored by the code. After fixing it, it
worked fine.

So I'm hoping to finish up this week. I'll keep everyone updated.

BAJ

>
>
> -

More... (looser matching)
- Last day of these posts
- In 2012 , 2013 only
- Today
- New search...