Searching \ for 'EEPROM endurance/error correction (long)' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/method/errors.htm?key=error+cor
Search entire site for: 'EEPROM endurance/error correction (long)'.

Truncated match.
PICList Thread
'EEPROM endurance/error correction (long)'
2000\01\26@020439 by Roland Andrag

flavicon
face
Hello Everyone!

Thanks for the overwhelming response to my question! I received posts from
Ken Webster, Ian Rozowsky, Nikolai Golovchenko, Ling SM, Wagner Lipnharski
and Mike Harrison (no particular order), many of which were quite long.
Thanks for the trouble!  I have tried to address all the posts below.  This
may or may not have been a good idea, since the result is an incredible
amount of babbling. Any comments welcome, as always.

I had suggested the following:

>1. Have a pointer (in EEPROM) pointing to where the variable of interest is
>repeatedly stored;
>2. Store the variable a couple of times in successive locations (say three
>times);
>3. When reading, if all the stored values (all three in line 2) do not
>agree, use the value given by the majority;
>4. Once all three positions do not agree, move on to three new locations
and
>update the pointer.

Mike mentioned that this could be risky since more than one cell could fail
at the same time. I find this extremely unlikely, unless, as Mike pointed
out, a change in temperature caused a shift in the read threshold. Is it
possible that a cell may read fine at 25 deg. C and not at 75?

The application notes clearly state that the endurance limit goes down as
the temperature increases since more damage is caused to a cell when it is
programmed at a higher temperature.  The notes don't however discuss read
threshold vs. temperature.  Mike, could you also explain why it would be
better to store the byte and its complement, for example, as opposed to just
storing the same byte a couple of times?

Your suggestion about spreading the variable has been taken to heart. It has
two drawbacks I can thing of though: 1. Increased memory usage; 2. What if
one of the locations that the data is spread across fails? Would I ever
know? I good idea though.

Wagner, the Microchip AN537 clearly states that: "..if a part is rated to
100K E/W cycles, then each individual byte can be erased and written 100K
times. The part is NOT limited to a total of 100K E/W opcodes or control
bytes".  This is the way that you had it in paragraph 2 of your post, if I
understand you correctly. You cast some doubt on this in paragraph 4 though.
Your statement that an EEPROM lasts longer when programmed in block mode is
also confirmed by AN537.  The pointer I described and you referred to should
not wear out first, since it would only be changed in the case of an EEPROM
bit failure in one of the locations where the data is stored.  An
interesting question though: what if the pointer does fail?

Ian, your idea of reading back directly after writing is of course spot on.
Two possible problems though: 1. Power failure after write; 2. Is it
possible that a location reads fine immediately after being written, but not
a month later? I realise that the probability of case 1 happening on the one
write where there is a problem with the eeprom is staggeringly small.  Using
three locations does solve the problem of power failure then - for example,
say you read back the three locations as:

169 169 169: All is fine, data agrees.
169 169 201: Read data as 169 (majority). Possible that power failure
occurred after writing 1st two, or that last location failed.
169 133 221: Possible that power failure occurred while byte 2 was being
written, or that two locations failed at the same time (unlikely?). read
169.
222 223 223: First location failed, read 223. Also possible that power
failed after writing location 1. In my case I'm basically storing a 16 bit
counter, so if I end up with the previous value it is not a train smash.

Ling, your idea of storing the data on different chip is also a good one.
In my case I will have more than one chip, so it will be used.  Multiple
copies in the same chip, however, will NOT hasten the failure of the chip,
if AN537 is to be believed, since the EEPROM deteriorates on a cell by cell
basis.

Nikolai, thanks for your post.  The pointer you (and I) were referring to
would only be updated in the case of one of the locations storing data
failed.  It should thus have a long and prosperous life.  If it does fail,
however, I would be in deep trouble...  I am sorry that I didn't make clear
that I was talking about serial EEPROMS, since the rest of your post seems
to pertain to the 16F84. Thanks again!

Ken, (last but not least!), thanks for your post.  I think I addressed the
idea of spreading the data across a circular buffer somewhere way back in
this way too long message.  I still can't decide which is better for my
purposes.  How would NASA do this so that *nothing* could go wrong?

Thanks again, sorry for the long message..
Roland

2000\01\26@033327 by V sml

picon face
What I meant for "multiple (N) copies on the same chip" is the number
of write increases by N times to the chip/cell.  Asssuming an even
distribution, you are reducing the lifetime of the chip/cell by N
times.  If your total write times to the cell is X (without
duplication), then it is the computation of:  X, 100K and 100K/N

If your X is < 100K and more than 100K/N, then implementing N copies
would need more consideration.  Are you pushing the chip to failure
unnecessarily?

If X is > 100K then EEPROM failure would be N times too often, compare
to no duplication.

If X << 100K/N then .. no issue.

BTW, mind sending me your code when done?

Dwaine Reed, I'm still interested in your code.

Cheers,  Ling SM

>Ling, your idea of storing the data on different chip is also a good
one.  In my case I will have more than one chip, so it will be used.
Multiple copies in the same chip, however, will NOT hasten the failure
of the chip, if AN537 is to be believed, since the EEPROM deteriorates
on a cell by cell basis.

2000\01\26@033337 by Jason Harper

picon face
> In my case I'm basically storing a 16 bit counter

In that specific case, here's another idea: do a web search on Gray code,
an alternative to binary coding in which exactly one bit changes between
successive values.  Store each bit as a byte with a value of 00 or FF, and
use a count of the bits set to determine the value when reading.  Possibly
use mutiple bytes to store a bit, especially the lower bits which change
more often, if you have the EEPROM space to spare.  A power failure during
write can cause an error of at most 1 in your count, since only one bit is
actually changing.

This is based on the assumption that the bits in an EEPROM location being
written fail individually, and that at least some of them will continue
working far beyond the time at which the first bit fails.  Does anyone have
any statistics on the types of EEPROM failures that occur (written 1's
reading as 0's versus written 0's reading as 1's)?  That would affect the
optimum threshhold for the number of bits set in a byte to decide if it
represents a 0 or a 1 in the Gray coded number.
       Jason Harper

2000\01\26@071304 by Roland Andrag

flavicon
face
Jason, I'm very impressed - very good idea!!

Use one byte/bit (since that is the smallest unit of memory that can be
programmed), and using Gray code only one bit ever changes from count to
count! If a bit fails there are eight copies! Not bad at all since I have
enough memory going round.

Thanks
Roland



{Original Message removed}

2000\01\26@071510 by Roland Andrag
flavicon
face
Ling, I'm with you - and I do have enough memory going round to be able to
afford writing more copied (I'll never be able to wear out all the cells in
the chip, even if I wrote 100 copies of the variable of interest).

Cheers, thanks for your reply
 Roland


{Original Message removed}

2000\01\26@104313 by wwl

picon face
>
>> > In my case I'm basically storing a 16 bit counter
>>
A really easy to code method for a 16 bit counter, with 16x wear
reduction :
Store the MSbyte at a fixed location (written only once in 256 counts,
so no wear problem), then use the bottom 4 bits of the MSbyte to
select one of 16 locations to store the LSByte. Obviously you could
use 3 or 5 bits etc. as required.

2000\01\26@134751 by Ken Webster

flavicon
face
>purposes.  How would NASA do this so that *nothing* could go wrong?


Dunno how they would do it but I would consider something like:

1). More than one pointer to the active storage locaton
2). True and compliment versions of the data in the active location ...
checksums and CRC's of the data if it is longer than just a coupple of bytes
(otherwise just more copies for extra redundancy)
3). A power supply circuit that gives a warning several milliseconds before
the power drops below normal operating voltage so that any activity that
might trigger an EEPROM write is avoided yet there is enough time to
complete an EEPROM write if one is already occuring

As far as point 3 goes, a variation of this is actually how I solved my
EEPROM updating problem.  I simply kept the data in SRAM where frequent
updates were not a problem.  Then, when the "power dropping" interrupt
occurs, the data is copied from SRAM to EEPROM.  Even if someone made a test
fixture that repeatedly cycled the power once per minute, it would still
take aobut 2 years before the EEPROM had been written 1E6 times.

To give several hundred milliseconds warning when the power is failing I
used a large electrolytic capacitor (10000uF) charged to about 25 volts
along with an LT1111 switching regulator.  The LT1111 has an extra
comparitor with open-drain output and an internal voltage reference as one
of its inputs.  A pair of resistors was connected to divide the capacitor
voltage so that the comparitor's threshold is set at about 16 volts.  The
comparitor output was connected to the PIC's external INT input (with a
pullup resistor, of course, since the comparitor output is open-drain).
Since the LT1111 can still produce a 5V output with the input as low as
about 6V, the warning interrupt was delivered with about 1 second worth of
good power left.  This was enough time to write a large block of data to the
EEPROM and even read it back and write the data to an alternate location if
there was any problem.  I think NASA would have approved :o)

Cheers,

Ken

2000\01\26@160904 by Alan King

picon face
 Not so sure about this, the whole byte is erased first, then rewritten in most
eeprom/flash.  So could still end up bits that should be programmed are clear
instead..



> Jason, I'm very impressed - very good idea!!
> > more often, if you have the EEPROM space to spare.  A power failure during
> > write can cause an error of at most 1 in your count, since only one bit is
> > actually changing.

2000\01\26@173410 by Jason Harper

picon face
>   Not so sure about this, the whole byte is erased first, then rewritten
in most
> eeprom/flash.  So could still end up bits that should be programmed are
clear
> instead..
>
> > Jason, I'm very impressed - very good idea!!
> > > more often, if you have the EEPROM space to spare.  A power failure
during
> > > write can cause an error of at most 1 in your count, since only one
bit is
> > > actually changing.

The whole point of using Gray code is that _only one bit changes at a time_
(which means one byte changes at a time, in the form in which it would be
written to EEPROM).  It's completely irrelevant what the actual update
behavior of the EEPROM is - the bit is either going to read back as the
same value as before (in which case the stored value doesn't change, and
the next increment will rewrite the same bit with the right value), or it
will have changed to its proper new setting (in which case everything
continues to work properly, even if the actual byte value wasn't quite what
was intended due to a write failure of some sort).  Again, this suggestion
was for the specific case of storing an incrementing counter that can live
with a very occasional failure to increment.  It's useless for storing an
arbitrarily changing value.
       Jason Harper

2000\01\27@132303 by wagner

flavicon
face
.. by paying $500 a single e2prom... or do you think their contractors
use $2 eeproms?

WHy do you think that a missile or a satelite cost millions of dollars?
or are you trying to say that the MarsLander failled because it overused
a cheap e2prom at the tracking system?

:)

>  How would NASA do this so that *nothing* could go wrong?


'EEPROM endurance/error correction (long)'
2000\02\08@160817 by Mark Willis
flavicon
face
Some other thoughts here:

What about this, this method doesn't need any pointer - you can
determine the pointer to the block to use, at run time.

Split the EEPRom up into N sections, whatever size is good for you.

When you first program the EEProm, you overwrite all sections but the
last, with some flag value (0xFF or 0xA5 or whatever's convenient.)  You
then start using the last page (which didn't have 0xFF's in there) as
the
current "valid page".

On PIC code startup, do a "for" loop, searching for the first page in
0..N-1 that doesn't have the Flag value in it.  Use that page to store
your data.

IF and/or When failure occurs on read-back, decrement the Use_Block
counter by one, if it hits Zero, tell the user "Replace me soon!" - and
you still have use of page 0 for a while at least.

Advantages:  No Index value needed, you just seek the currently used
block and see it when you hit it;  You can forewarn the user with an LED
or Piezo on a "spare" pic pin (not that this mythical beast exists!)
that they're down to the last block of usable Flash.

Reasons for using complemented copy storage, or a CRC:  The complement's
fast to calculate (inverted copy works, you can use a 1's or 2's
complement, the CRC takes more resources - and is a lot smaller in
required storage;  Use what works for you.)

There's not really any advantage to Grey codes here;  The whole reason
for grey codes is for situations where one bit is likely to be "Fuzzy",
due to sensor "edges" like in an optical encoder - if you get a one bit
error on, say, channel 5, because the black/white transition is in the
middle of passing through the sensor's field of view, your error's a 1/2
bit "Uncertainty" - which is QUITE acceptable, really.  Now, contrast
that with a regular binary count, say moving from position 3 to 4 here:

State  Grey    Binary
0     0000    0000
1     0001    0001
2     0011    0010
3     0010    0011
4     0110    0100

(Hopint I remembered Grey code properly here <G>)  In Grey code, bit 2
can be fuzzy - we are told that we're in either Position 3 or 4, and as
we're in fact somewhere in between those two, we won't complain, we're
pretty happy.  If you want more resolution, just add one more bit of
Grey coding etc.

In Binary, we're told that we're somewhere between positions 0 and 7 -
depending on whether bits 0, 1, and 2 read as a 0 or a 1 - and they're
all three in flux here!  That is just plain NOT acceptable, you end up
with a mess on each edge transition; according to Murphy, you're almost
always AT an edge transition, of course.

In a Flash ROM or EEProm, there's no mechanical object making the
"fuzziness" of any one bit any more likely than any other bit - so all
Grey codes would do for you is give the PIC something to do in it's
spare time, and give you more code to maintain.  Good to know about
these tools, don't use a Hammer to do the jobs that need a jewelers'
screwdriver, though, Please. <G>

 Mark

--
I re-ship for small US & overseas businesses, world-wide.
(For private individuals at cost; ask.)

More... (looser matching)
- Last day of these posts
- In 2000 , 2001 only
- Today
- New search...