Searching \ for '[PIC]:failed data location?' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/microchip/memory.htm?key=data
Search entire site for: 'failed data location?'.

Exact match. Not showing close matches.
PICList Thread
'[PIC]:failed data location?'
2002\11\12@092847 by Micro Eng

picon face
Found a strange problem on the '877 I've been using for a few months.  One
of the registers all of a sudden decided not to work anymore, this after
debugging a small code change, I finally went back and found that the flags
were not being set right, and in fact, I couldn't write to it at all.  I
moved it one location further down, and now it works.  To verify, I put in a
movlw, movwf commands in and the register never accepts the data.

So, question is...anyone else all of a sudden found a register that decided
to fail? Makes me a little nervous to release something that might have a
failure in the future.

In addition, using the ICD to program the chip in circuit, it seems that I
don't always get a good burn. In other words, I can program it once, and
something doesn't work quite right, and then I can reburn, and it does work.
 Anyone else seeing this as well?

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail

--
http://www.piclist.com hint: To leave the PICList
spam_OUTpiclist-unsubscribe-requestTakeThisOuTspammitvma.mit.edu


2002\11\12@100709 by 4HAZ

flavicon
face
----- From: "Micro Eng" <micro_eng@
- snip -
> In addition, using the ICD to program the chip in circuit, it seems that I
> don't always get a good burn. In other words, I can program it once, and
> something doesn't work quite right, and then I can reburn, and it does work.
>   Anyone else seeing this as well?

Back in the days of EPROM programming the specs said 50ms max burn per byte.
Many EPROM programmers used the full 50ms/byte figure which resulted in the EPROM getting a bit hot. And the programming time could run something close to 15 minutes for a 16k(byte) device, which seemed like forever. Then along comes a new "fast" burner that could do the job in a matter of seconds, seemed like a big improvement until you started having bits fail a day or more after you had got a good verify. Looking back over the EPROM spec sheets I noticed it said burning 5 times the minimum required to program (up to a max of 50ms in one session) would insure the device stayed programmed. New procedure, program at max speed and if verify failed program again, then multiply the number of programming cycles it took to put the data in times 4 and burn that many times more. Shortly after this new software was released that programmed each byte for 1ms bursts, then read it back, kept looping for a max of 10 shots then burned it for 4 times that.

My suggestion is to program 5 times in a row to insure data integrity (while monitoring the temp with a finger to protect against cooking the device), until they come up with a patch for the programming software.

$.02 Lonnie - KF4HAZ -

--
http://www.piclist.com hint: To leave the PICList
.....piclist-unsubscribe-requestKILLspamspam@spam@mitvma.mit.edu


2002\11\12@104123 by Olin Lathrop

face picon face
>>
My suggestion is to program 5 times in a row to insure data integrity
(while monitoring the temp with a finger to protect against cooking the
device), until they come up with a patch for the programming software.
<<

This probably won't work at all on PICs.  First, the flash parts have a
self-timed write.  All the programmer has to do is wait the maximum time
the write could take.  Second it is unlikely that a programmer will
surface to the user interface the ability to re-write the data without
first erasing the part.


*****************************************************************
Embed Inc, embedded system specialists in Littleton Massachusetts
(978) 742-9014, http://www.embedinc.com

--
http://www.piclist.com hint: To leave the PICList
piclist-unsubscribe-requestspamKILLspammitvma.mit.edu


2002\11\13@043807 by nigel

flavicon
face
> So, question is...anyone else all of a sudden found a
> register that decided to fail?

I've had a couple of 16F628's display that fault, straight out of the tube.
Two locations in RAM were 'stuck' at 0xFF.  It was literally only one or
two devices.

I asked here at the time (a few months ago) and no-one else had had the
same experience.  I wrote some code to test for it (write 0xAA to each
location, check it, write 0x55, check it, repeat for all RAM), but it
seemed to be an rare fault.

Incidentally, all devices were handled with normal static precautions, and
the target board has been in manufacture for several years (a few thousand
per year) without problems.

Nigel

--
Nigel Orr, Design Engineer                     .....nigelKILLspamspam.....axoninstruments.co.uk
Axon Instruments Ltd., Wardes Road, Inverurie, Aberdeenshire, UK, AB51 3TT
             Phone: +44 1467 622 332   Fax: +44 1467 625 235
                     http://www.axoninstruments.co.uk

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\13@081056 by Olin Lathrop

face picon face
> I've had a couple of 16F628's display that fault, straight out of the
tube.
> Two locations in RAM were 'stuck' at 0xFF.  It was literally only one or
> two devices.

I got a batch of 16F628 that didn't verify correctly at 3V when programmed
correctly at 5V.  They did verify correctly at 4V and 5.5V.


*****************************************************************
Embed Inc, embedded system specialists in Littleton Massachusetts
(978) 742-9014, http://www.embedinc.com

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\13@083447 by nigel

flavicon
face
Olin wrote:
> I got a batch of 16F628 that didn't verify correctly at 3V
> when programmed
> correctly at 5V.  They did verify correctly at 4V and 5.5V.

I thought the OP was had a problem with bad RAM (Subject: "[PIC]:failed
data location"), you seem to be talking about ROM/FLASH?  To clarify, I've
had a couple of PICs each with a couple of RAM locations which were 'stuck'
at 0xFF.

Nigel

--
Nigel Orr, Design Engineer                     EraseMEnigelspam_OUTspamTakeThisOuTaxoninstruments.co.uk
Axon Instruments Ltd., Wardes Road, Inverurie, Aberdeenshire, UK, AB51 3TT
             Phone: +44 1467 622 332   Fax: +44 1467 625 235
                     http://www.axoninstruments.co.uk

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\13@102706 by 4HAZ

flavicon
face
Shows that as the gray-matter moves toward gray-hair the understanding of the internal workings of the technology passes us by.
Lonnie - KF4HAZ -

----- From: "Olin Lathrop" <olin_piclist@

{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\13@165932 by Peter L. Peres

picon face
On Tue, 12 Nov 2002, Micro Eng wrote:

*>In addition, using the ICD to program the chip in circuit, it seems that I
*>don't always get a good burn. In other words, I can program it once, and
*>something doesn't work quite right, and then I can reburn, and it does work.
*>  Anyone else seeing this as well?

I haven't seen this but with some chip+board+icsp programmer combinations
and chips re-used dozens (?) of times in development cycles I did have to
issue up to four consecutive erase commands (without programming anything)
to restore reliable operation in chips used for development. Especially
windowed parts (where the erase command was 1/2 day spent under 1kW of Hg
vapor lamps - normal erase time is 10 minutes) had this sort of problem
(like 16C711JW ans 12C509JW).

Peter

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\13@173654 by Micro Eng

picon face
To clarify, it was in the data RAM area, not the program space or EEPROM
space.  The location has been working fine, just all of a sudden failed.
CHanged to a new location, and it works fine again.




{Quote hidden}

_________________________________________________________________
Tired of spam? Get advanced junk mail protection with MSN 8.
http://join.msn.com/?page=features/junkmail

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads


2002\11\14@114317 by Dennis J. Murray

flavicon
face
This "Failed Location" problem has me wondering -- has this problem ever
been experienced for other chips (i.e. '509A series)??  I KNOW it's an old
style chip & I should have moved to it's replacement, but I have a lot of
the '509's around.  Ya use what ya got!!

I've had an occurrence recently of a system failure that "can't happen".
The circuit uses a photodetector to measure sunlight, then activates
external circuitry ONCE for dawn, then once again at dusk.  This cycle
repeats every day.  I use a RAM flag that toggles when the chip activates
the external circuit.  The flag must be 1 in order to trigger the circuit at
dusk and 0 in order to trigger the circuit at dawn.  The circuit is now
consistently only triggering at dusk, which technically cannot occur!

I performed a search on the assembly code for the flag and it IS only set in
the external circuit activation routine - which means the flag will toggle
(from 0 to 0ffx) on each call (one at dawn and one at dusk).  The program
will not allow 2 dusk activations back-to-back -- you MUST have a dawn
activation in between in order to activate the logic to check for dusk!  I
read the program back from the chip and compared it to the original and they
match.

The circuit is running off a 12V motorcycle battery with a drop-down
regulator supplying 5 volts.  I checked - the voltage is still 12.8 volts on
the battery & the regulator's still doing it's thing.  This particular
circuit has been running flawlessly since May and just failed 2 weeks ago.

The only way I can figure that the circuit is behaving like this is if the
RAM location holding my flag is stuck.  This would explain everything, but
seems like I'm copping out.  Has anyone else experienced anything similar??

BTW, I'm not a real newbie at programming.  I've been doing assembler
language professionally since the mid-60's (but I'm not senile yet, either -
I don't think!).

Thanks!
Dennis

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@130559 by Mike Harrison

flavicon
face
I'd be highly surprised if it was a hardware problem - it's probably
doing something you're not looking for - bank-select bit problems,
watchdog timing out, accidental vairable overlaps, or unintentionally
overwriting the byte the flag is held in - just because your search
only shows it being set in one place doesn't mean there aren't other
ways it could get changed. Of course in this instance you could try
changing the chip - if this fixes it it may still be an uninitialised
variable issue    

On Thu, 14 Nov 2002 11:42:29 -0500, you wrote:

{Quote hidden}

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@131800 by Doug Hewett

flavicon
face
Does the software logic make assumptions re the number of daylight hours?  Do you have 'daylight saving time' in the UK?  What factors can fool the hardware?

{Original Message removed}

2002\11\14@132424 by Dennis J. Murray

flavicon
face
Uninitialized variables is a real newbie problem - trust me, that ain't it!!
Everything's running in the same bank, so that's not it.  No interrupt
routines (interrupts disabled)- everything's polled, so that's not it.

Reread what I wrote - this same chip's been working properly for about 5
months - day in and day out.  When it failed, it failed consistently - not
random failures!  Hard reset didn't fix it either.

New chip's working fine, but my question is - how long??

{Original Message removed}

2002\11\14@133046 by Dennis J. Murray

flavicon
face
The only factors I can think of is if we have a solar eclipse - might fool
it into thinking dusk came early.  The routine waits for 8 hours after
triggering for looking for the opposite event.  I.e. if we just triggered
DUSK, willl not look for DAWN for 8 hours & vice versa.  Even if we had a
solar eclipse, it would resyncronize itself the next day.  Tested that.

There are no code overlaps either- looked for that early on.  I use ONLY
variable names - never direct RAM locations in code.  All variables are 1
byte long, so none of them would overflow into the adjacent byte.

Routine could care less about daylight savings time - only that 8 hours or
more have elapsed since the last valid event (dusk or dawn).

BTW: For what it's worth, I'm not in the UK, I'm in the USA - Virginia to be
exact.

----- Original Message -----
From: "Doug Hewett" <spamBeGoneDHewettspamBeGonespamSTARTRAC.COM>
To: <TakeThisOuTPICLISTEraseMEspamspam_OUTMITVMA.MIT.EDU>
Sent: Thursday, November 14, 2002 1:16 PM
Subject: Re: [PIC]:failed data location?


> Does the software logic make assumptions re the number of daylight hours?
Do you have 'daylight saving time' in the UK?  What factors can fool the
hardware?
>
> {Original Message removed}

2002\11\14@134104 by Josh Koffman

flavicon
face
Shot in the dark, but maybe the device isn't seeing 8 hours of darkness.
Could something have changed about the housing? Its orientation for
instance? Or what about spill from nearby lighting? I know it's working
with the new chip, which probably means my suggestions are pointless.

Josh
--
A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete
fools.
       -Douglas Adams

"Dennis J. Murray" wrote:
{Quote hidden}

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@135328 by Paul Hutchinson

flavicon
face
Here's a wild guess :-)

What could have changed in the last 5 months, low temperatures?

In the NE US we've recently hit the time of year for regular below 0 degC
nights. So, we have to move commercial temperature range chips indoors.

I think the mountains of VA are starting to get cold too.

Paul

> {Original Message removed}

2002\11\14@135331 by Dennis J. Murray

flavicon
face
Less than 8 hours of darkness (or daylight) means that the device triggers
that much AFTER dusk/dawn, rather than right at dusk/dawn.  It then waits 8
hours before checking for the next event.


{Original Message removed}

2002\11\14@140718 by Dennis J. Murray

flavicon
face
Yes, that is a consideration.  Temps at night have gotten to freezing in the
mountains (where this is located).  Are you thinking that I have a chip
that's sensitive to cold?  The new one seems to be working great in the same
environment.

----- Original Message -----
From: "Paul Hutchinson" <RemoveMEphutchinsonspamTakeThisOuTIMTRA.COM>
To: <PICLISTEraseMEspam.....MITVMA.MIT.EDU>
Sent: Thursday, November 14, 2002 1:52 PM
Subject: Re: [PIC]:failed data location?


{Quote hidden}

> > {Original Message removed}

2002\11\14@143505 by Paul Hutchinson

flavicon
face
All chips (all electronic components actually) are sensitive to temperature.
The chip designers have to take the operating range into account when
designing them.

If the chip is a commercial temperature range PIC, it is only rated for
reliable operation from 0 to 70 degC. Industrial temperature range gets
you -40 to +85 degC operating range.

Operating the chip outside the specified temperature range may cause faulty
operation and/or permanent damage to the chip. That's why Microchip and, all
other chip makers, have operating temperature specs.

That said, I'm sure many people on the list have used PICs outside of their
rated temperature range. It may be that the RAM in the first PIC failed due
to temperature outside of the specified range and, the new chip will work at
temperatures much lower with no failure. Using a part outside its rated
temperature range is like over clocking a micro, it might work but, don't
count on it.

Paul

> {Original Message removed}

2002\11\14@150153 by Olin Lathrop

face picon face
> I've had an occurrence recently of a system failure that "can't happen".

World, please spare me another "the compiler has a bug since my perfect
program won't run" lament!

> The only way I can figure that the circuit is behaving like this is if
the
> RAM location holding my flag is stuck.

The ***ONLY*** way!!?  You don't suppose for a second that you could have
done something wrong?

> This would explain everything, but
> seems like I'm copping out.

Yes, you are.

You need a serious attitude adjustment if you ever want to be successful
at bug fixing.  The only valid question is "what did **I** screw up?"
Only then will your mind open to possibilities that you aren't seeing now.
Sometimes in the process of tracking down your screwup and verifying the
operation of what you designed, you run accross irrefutable evidence that
something else beyond your control is actually busted.  This is rather
rare, and you're nowhere near that yet.


*****************************************************************
Embed Inc, embedded system specialists in Littleton Massachusetts
(978) 742-9014, http://www.embedinc.com

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@174057 by William Chops Westfield

face picon face
Could your circuit be "triggering" correctly at dawn, but have a hardware
failure that prevents a visible occurance of the "dawn activity" ?

BillW

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@184251 by Dennis J. Murray

flavicon
face
As I've seen recently, this is typical of Olin!  Of ALL people to talk
attitude adjustments!!  I don't know how old you are (by your response, I'd
guess just out of diapers).  How do you know WHAT stage I'm at??  I've never
met you, nor care to.  You haven't the FOGGIEST idea what I've done to check
out the circuit, other than the salient facts in my first post.  People like
you discourage newbies, who may in fact, make a really dumb error (which you
will so willingly point out).

Yes, my first & only thought was program bugs.  What do you think I've been
doing for the past two weeks on less than 4 pages of code????  I had enough
room in memory to fill the used memory with NOPS & re-insert a fresh copy of
code higher up.  Same problem.  I then moved the code even higher & changed
RAM assignments - problem goes away.

I've got 6 of these units in the field, and this is the most recent -
installed in May (the oldest was put in the field in Dec 01).  They're ALL
running the same code and only this one failed.  It failed hard - even after
numerous hard resets.  It's replacement chip has been working perfectly for
2 weeks.

OK, genius, where would YOU look??

{Original Message removed}

2002\11\14@184902 by Dennis J. Murray

flavicon
face
The same hardware is used for both dawn & dusk events, so I don't believe
that's it.  The replacement chip's working fine for now (the '509 was
socketed, so it was easy to replace).  If it IS hardware, Paul Hutchinson is
probably closest to the solution.  He indicated the chip may have failed
because of cold.  I'm reluctant to believe that yet - I don't know if chips
would permanently fail on cold, as they would do when they get too hot (i.e.
thermal runaway).

Whatever's the cause, the failure's hard.  I brought  the circuit inside &
it still fails.

As I've said, I replaced the chip & thinks are OK - for now.  But what's to
prevent a failure down the road?

{Original Message removed}

2002\11\14@200209 by Chris Hunter

flavicon
face
----- Original Message -----
From: "Dennis J. Murray" <EraseMEdjmurrayspamBELLATLANTIC.NET>
To: <RemoveMEPICLISTEraseMEspamEraseMEMITVMA.MIT.EDU>
Sent: Thursday, November 14, 2002 11:48 PM
Subject: Re: [PIC]:failed data location?

//snip//

> If it IS hardware, Paul Hutchinson is
> probably closest to the solution.  He indicated the chip may have failed
> because of cold.  I'm reluctant to believe that yet - I don't know if
chips
> would permanently fail on cold, as they would do when they get too hot
(i.e.
> thermal runaway).

I have had semiconductors fail in particularly cold environments, but never
down to individual memory cells - I had some high power FETs that had the
internal wires fracture through thermal shock.

Chris

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@203605 by Scott Touchton

picon face
I will throw my 2 cents into the ring on the issue:

I have seen RAM in the 16C54 that toggles on its own accord.  Even
duplicated it with a combination of temp and supply voltage.  The part was
"in specified operating range" all the time.  No code bugs, just good ol'
marginal wafers.

I was using in excess of 500K per year, and Microchip acknowledged the issue
with sincere apologies, but no remedy.


I find it highly likely this is what you are experiencing.  Though human
error is usually the main culprit.

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\14@204435 by Dale Botkin

flavicon
face
On Thu, 14 Nov 2002, Dennis J. Murray wrote:

> I've got 6 of these units in the field, and this is the most recent -
> installed in May (the oldest was put in the field in Dec 01).  They're ALL
> running the same code and only this one failed.  It failed hard - even after
> numerous hard resets.  It's replacement chip has been working perfectly for
> 2 weeks.

OK, so it's a hardware failure.  It happens, so you toss the chip and move
on with your life.  So why are we now turning this into a pissing match in
front of 1800 people on the PICList??  Both of you need to take it off
list or just drop it.

Dale

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics




2002\11\15@032554 by Michael Rigby-Jones

picon face
{Quote hidden}

While I agree about slagging matches on the list, "toss the chip and move on
with your life" isn't what any responsible engineer would do.  If this
happened to me I'd want to know exactly what went wrong, and what the
chances are of this happening to other products I have in the field and what
MChip are going to do about improved testing.  Software bugs are one thing,
but if you can't rely on the hardware to work properly for the expected life
of your product them you have a big problem.

Mike

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@050212 by Alan B. Pearce

face picon face
>As I've said, I replaced the chip & thinks are OK - for
>now.  But what's to prevent a failure down the road?

Nothing. You have just restarted the MTBF clock at zero time :)

You are now looking at the reason why safety critical items are required to
do a power up functionality test of the processor, as has been discussed on
this list a number of times.

There is one way to get around the problem. Multiple processors, with
majority voting to determine who is correct/failed. Use multiple flags
within your software, again majority voting between them to determine the
same thing. whatever, you end up with more code that could go wrong :)

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@051249 by Alan B. Pearce

face picon face
>I have seen RAM in the 16C54 that toggles on its own accord.

I had an RCA Video Game (that console built around an 1802 micro in the
1970's) that had a spectacular display of memory problems. It had an "etch a
sketch" type game where you could draw on the screen. By drawing a certain
combination of lines when the cursor was on a particular cell you had about
6 blinking cursors on the screen. It required certain adjacent cells to be
in a 0 state, and others to be in a 1 state for this to occur.

I am a little surprised at all the discussion this particular failure has
produced. Chips do fail. They have an MTBF figure. It may have even got an
ESD zap without anyone realising during assembly/programming handling, which
will often not produce a failure at the time, but some time (often months)
later.

It is just that chips these days are so reliable that we don't expect
failures.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@072822 by Scott Touchton

picon face
Well aware of MTBF.... but not 15,000 parts in a row in a very well
controlled facility.   Apparently something in the masking process yielded a
section of memory that got real flaky at 35F (as confirmed by Microchip).
Just sharing, not attacking or trying to bash Microchip.  I love their
products.  Life just sucks sometimes.

And yes, there has been a lot of discussion on this simple issue.  I find it
surprising that the "sounds like a bad chip to me" came out in the thread a
little late although it was the obvious answer  (might have missed it in the
thread, computer is real flaky at the moment)!!


> >I have seen RAM in the 16C54 that toggles on its own accord.
>
>
> I am a little surprised at all the discussion this particular failure has
> produced. Chips do fail. They have an MTBF figure. It may have even got an
> ESD zap without anyone realising during assembly/programming handling,
which
{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@090734 by Dennis J. Murray

flavicon
face
You all may be on to something I had not considered - cold failure, although
the coldest it got was about in the mid 20's, I wouldn't have thought that
would be too cold and result in a consistent failure.  The replacement's
doing fine, and has survived one nightly temperature in the very low 20's
with no problem.

I know microchip makes an industrial version that goes down to -40 degrees.
I might just buy a handful & replace all 7 of these devices with the
extended temperature range version.  If, for no other reason than customer
satisfaction!

Thanks for your input.

{Original Message removed}

2002\11\15@091732 by Dennis J. Murray

flavicon
face
I've been programming a LOOOONG time in many different languages, mostly
assembler.  Believe me when I say I've made more programming mistakes and
snafus than most of you've EVER thought possible!!!  As a point of
reference, my first FORTRAN program back in the 60's had well over 200
errrors, yet the program had less than 100 lines of code!!  Beat that!  Over
the years, I've become very adept at tracking down programming errors - my
carrer depended on it!

In this case, my first & only thought was "I screwed up in the program
somewhere", even though this particular unit had been running flawlessly
since May (no, I never considered the assembler - I've NEVER had an
assembler screw up my code!).

I simulated the chip under MPLAB, isolating different sections & athrashing
them heartily - no failure.  I figured it must be a timing-related program
failure, so I programmed a new chip and thrashed it every way I could think
of - no failure.

Thanks for your input, Scott.  I just don't feel comfortable that I'm out of
the woods by just replacing the chip.

Dernnis

{Original Message removed}

2002\11\15@091940 by Dale Botkin

flavicon
face
On Fri, 15 Nov 2002, Michael Rigby-Jones wrote:

> > OK, so it's a hardware failure.  It happens, so you toss the chip and move
> > on with your life.  So why are we now turning this into a pissing match in
> > front of 1800 people on the PICList??  Both of you need to take it off
> > list or just drop it.
> >
> While I agree about slagging matches on the list, "toss the chip and move on
> with your life" isn't what any responsible engineer would do.  If this
> happened to me I'd want to know exactly what went wrong, and what the
> chances are of this happening to other products I have in the field and what
> MChip are going to do about improved testing.  Software bugs are one thing,
> but if you can't rely on the hardware to work properly for the expected life
> of your product them you have a big problem.

Oh ferheavenssake.  ONE $3 chip fails.  You KNOW it failed (worked fine
for weeks, failed, replacement works fine).  The replacement works
perfectly, as do other identical units in the field.  Call it infant
mortality, since it's obviously on the leading edge of the bathtub curve.
How many thousands are you willing to spend on failure analysis to find
out that the chip did indeed fail because either A.) it was flippin'
defective, or B.) you broke it by operating it outside of its safe
operating area?

If you're using the chip within the power, temperature, humidity,
vibration and other limits specified in the data sheet, you can reasonably
expect the performance specified by Microchip or whatever manufacturer.
If you're outside any or all of these limits, all bets are off.  QA and
testing data are available on their web site.  Using THIS information to
determine what the chances are of the same thing happening to other units
in the field is what "any responsible engineer" should be doing.

Component failures are a fact of life that we have to deal with, no matter
what we build.  Even a good manufacturing lot will have some number of
premature failures.  If you get a whole bunch of failures and you know
it's NOT your fault, then it's time to call the manufacturer and
investigate.  But excepting very special applications (life support,
spacecraft, weapons systems) a single failure is pretty much a non-event.

Dale

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@092145 by Dennis J. Murray

flavicon
face
I agree with you, Mike.  I'm sorry I lost it with Olin.  Olin, if you're
reading this, I apologize.  We do need to keep this list impersonal and
address the issues at hand and not degrade someone's programming ability (or
lack of it).

----- Original Message -----
From: "Michael Rigby-Jones" <EraseMEmrjonesspamspamspamBeGoneBOOKHAM.COM>
To: <RemoveMEPICLISTKILLspamspamMITVMA.MIT.EDU>
Sent: Friday, November 15, 2002 3:25 AM
Subject: Re: [PIC]:failed data location?


> > {Original Message removed}

2002\11\15@093159 by Scott Touchton

picon face
>
> Thanks for your input, Scott.  I just don't feel comfortable that I'm out
of
> the woods by just replacing the chip.
>


I would highly recommend using the industrial temp part and making
absolutely sure your supply cannot dip.  And as Dale said, we will always
have failures.  I always hated building a few of something and getting that
one part.  Always casts a shroud of doubt on the design.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@093606 by Dennis J. Murray

flavicon
face
Thanks for your input, Scott.  I only had one failure, but it was enough to
get my attention.  I don't LIKE failures I can't explain.  I guess I'm too
complacent about the inherent reliability of the PIC chips - I just never
expected one to fail without help!

{Original Message removed}

2002\11\15@093610 by Michael Rigby-Jones

picon face
> -----Original Message-----
> From: Dale Botkin [SMTP:daleSTOPspamspamspam_OUTBOTKIN.ORG]
> Sent: Friday, November 15, 2002 2:18 PM
> To:   spamBeGonePICLISTSTOPspamspamEraseMEmitvma.mit.edu
> Subject:      Re: [PIC]:failed data location?
>
> How many thousands are you willing to spend on failure analysis to find
> out that the chip did indeed fail because either A.) it was flippin'
> defective, or B.) you broke it by operating it outside of its safe
> operating area?
>
I guess that would depend on how much you could potentialy lose.  In the
market I currently work for, where a single product could cost upwards of
$4000, even a single failure within a couple of months is bad news,
especialy as we thoroughly test everything over voltage and temperature
extremes.

> If you're using the chip within the power, temperature, humidity,
> vibration and other limits specified in the data sheet, you can reasonably
> expect the performance specified by Microchip or whatever manufacturer.
>
Which doesn't explain another posters experience with the duff batch of
16C5x devices.

{Quote hidden}

That depends entirely on the customer.  We have had IC's x-rayed and then
decapsulated to try to determine failure mode after a SINGLE failure, at our
customers insistance.  It would have cost us far more to have not
investigated the failure, even though it was apparently a one-off.

Mike

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@094404 by Roman Black

flavicon
face
Dennis J. Murray wrote:

> I've got 6 of these units in the field, and this is the most recent -
> installed in May (the oldest was put in the field in Dec 01).  They're ALL
> running the same code and only this one failed.  It failed hard


Send that chip back to Microchip and demand a
replacement. And always assume that SOME chips
will be faulty although PICs are generally pretty
good.
-Roman

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@100508 by Sean Alcorn - PIC Stuff

flavicon
face
> I would highly recommend using the industrial temp part and making
> absolutely sure your supply cannot dip.  And as Dale said, we will
> always
> have failures.  I always hated building a few of something and getting
> that
> one part.  Always casts a shroud of doubt on the design.

There is not that much difference in price between "Industrial" Temp
and "Extended" Temp versions.

We only use extended temp PICs in our applications for this reason.

Cheers,

Sean

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@102723 by Paul Hutchinson

flavicon
face
> -----Original Message-----
> [KILLspamPICLISTspamBeGonespamMITVMA.MIT.EDU]On Behalf Of Dale Botkin
> Sent: Friday, November 15, 2002 9:18 AM
>
> Oh ferheavenssake.  ONE $3 chip fails.  You KNOW it failed (worked fine
<snip>
>
> If you're using the chip within the power, temperature, humidity,

I agree 100% with you Dale but, he is _not_ following the specification for
temperature range. He has a PIC rated for 0 degC and is using it at lower
temperatures probably as low as -5 degC. Whether or not this is the cause of
the failure it is still a _major problem_ for anything other than a hobbyist
one off. Oh and with PIC's the price difference between 0degC parts
and -40degC parts is tiny.

<rant on>
There is a very common misconception among the general public and to a
smaller extent among engineers. They seem to believe that only high
temperatures are a problem for electronic components. I am always hearing
statements like "colder is always better for electronics".

Earlier this year I even had an FAE from Linx Technologies (RF module maker)
insist that even though the specs for one of their modules clearly stated 0
degC as the low temperature spec it would be fine to -40 deg. The FAE had
the contract consultant and our management convinced to simply ignore the
temperature specification. I asked for more details and heard that there is
one part in the module rated only to 0C by Phillips and the FAE said he was
100% confident it would be OK to -40. I replied, fine change the spec sheet
or give us a letter saying that this module is OK to -40. The FAE said the
engineering manager would not allow either option and we would have to take
his word for it.  We have released the product but using modules from RF
monolithics that are rated for -40deg operation instead of the Linx module.

I've been designing meteorological instruments for a living for over 20
years now and a few times every year I run up against this misconception.
<rant off>

Paul

=========================================
Paul Hutchinson
Chief Engineer
Maximum Inc., 30 Samuel Barnet Blvd.
New Bedford, MA 02745
EraseMEphutchinsonspamEraseMEimtra.com
http://www.maximum-inc.com
=========================================

> vibration and other limits specified in the data sheet, you can reasonably
> expect the performance specified by Microchip or whatever manufacturer.
> If you're outside any or all of these limits, all bets are off.  QA and
> testing data are available on their web site.  Using THIS information to
> determine what the chances are of the same thing happening to other units
> in the field is what "any responsible engineer" should be doing.
>
> Component failures are a fact of life that we have to deal with, no matter
<snip>
> spacecraft, weapons systems) a single failure is pretty much a non-event.
>
> Dale
>

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@114344 by Robert Rolf

picon face
I'd be more inclined to demand that the whole date code batch
be replaced. If the production masks were off by a bit, then the WHOLE LOT
is likely to eventually fail.

R

Roman Black wrote:
{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@133329 by Dennis J. Murray

flavicon
face
You're right, Roman, you will occasionally find bad chips (albeit rarely,
hopefully).  I've had bad 74LS logic chips before, so I don't know why I
assumed a much more complex chip (i.e. the '509) wouldn't fail.  Tunnel
vision, I guess.

It's not the cost of the chip, it only cost me $1-$1.25 each in moderate
quantities.  I spent more money than that in labor scratching my head when
the customer called to say it failed!

Hindsight being more accurate than foresight, I should have used the
industrial grade units.  Shame on me.  I'll remedy that in the interest of
customer relations.

Thanks for your input!
Dennis

{Original Message removed}

2002\11\15@133745 by Dennis J. Murray
flavicon
face
Thanks, Sean!  You're right, of course.  And, as I mentioned to Roman, I'll
remedy that issue immediately!

These chips are just too cheap not to go Industrial!

Dennis

{Original Message removed}

2002\11\15@155451 by Micro Eng

picon face
ahhh...but Olin....

location 0x70 contains a variable called SysFlags.  I use the 8 bits of this
for system wide flags.  Its been working just fine for the past several
months.  Then, all of a sudden, I put into a routine to check bit 8 of the
register, where I set it in the ISR, and it has been setting for a while
(read...working fine according to the watch window in ISD).  So now my
simple btfss isnt working.  Strange...so I go back and watch it.  Wait...the
bit...isnt being set anymore.  Strange, in that (ok...no flames....I didnt
change the code yet its broke thing) I had simply added a subroutine that
looked at that bit.  So, being the simple thing to do, I change the location
from 0x70 to 0x74 (next available location) and whoa...it WORKS! So now I
set a new variable to point at 0x70 and simply to a movlw 0x55, and a movwf
to 0x70, and what do I see...it never sets the bits.   I suppose the silicon
could have been punched thru, or the addressing isnt correct (yet all other
registers work fine still).  In other words, a functional register died.

So yes, at times you accidently overwrite a bit, but those are easy to spot
usually.  If I take everything out of the code, simple lol looping routine,
and it still fails....then I would tend to think its not code but hardware.
Just bothers me to find a failure such as this, because I don't want to see
the same thing occur in the field.  I suppose I could put in reduncy by
using a shadow register and doing a compare between them to determine if a
register failed but who's to say the shadow might be a fail point?

Ive actually seen a similar thing in CPLDs before.  Simulation shows it
should work, running a JTAG programmer to it and putting out the data to
unused pins shows that IT DID BREAK.  Being under tight schedule, and have
used thousands of the part that never seemed to show the problem made me
simple replace it.

I don't really have a problem that this might be a one time in a  million
failure.  My point of asking the question to begin with was...has anyone
else seen the same thing?  So far, it seems one other has.  That might have
been an enviromental issue.  Mine is not.






{Quote hidden}

_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=features/featuredemail

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\15@183949 by Sean Alcorn - PIC Stuff

flavicon
face
Dennis,

> Thanks, Sean!  You're right, of course.  And, as I mentioned to Roman,
> I'll
> remedy that issue immediately!
>
> These chips are just too cheap not to go Industrial!

You want "extended temperature" - these are the widest range.
Previously called "Automotive" by Microchip, I think.

Cheers,

Sean

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.




2002\11\16@093727 by Tim Tapio

flavicon
face
One question though, if the other systems are working fine, why would you
first suspect software?

Component failure is a fact of life...no matter what the MTBF might be
(usually a very educated estimate), there will always be those that are DOA
or fail within a 24 - 48 hour time span...and some that wait 5 months.
Granted, failures are not common, but if something is working for months and
suddenly stops, my first inclination would not be software issues.

But, that's just me.

Tim


{Original Message removed}

2002\11\18@090948 by Dennis J. Murray

flavicon
face
I guess I've been a programmer too long.  I ALWAY suspect software first, no
matter how long it runs.  I originally suspected I had a "window of
vulnerability", where I would get an interrupt at the most inopportune time
& didn't handle everything properly (because I didn't expect ANY
interrupts).  I verified I had disabled ALL interrupts, but suspected that
MAYBE, somewhere in the code, I MAY have re-enabled them.

Once you start digging into code, it's easy for me to put blinders on and
become convinced it's a code problem.

'Nuff said.  The new chip's working fine.

Thanks for your input.
Dennis
----- Original Message -----
From: "Tim Tapio" <TakeThisOuTtim.....spamTakeThisOuTTIMTAPIO.COM>
To: <TakeThisOuTPICLISTKILLspamspamspamMITVMA.MIT.EDU>
Sent: Saturday, November 16, 2002 9:36 AM
Subject: Re: [PIC]:failed data location?


> One question though, if the other systems are working fine, why would you
> first suspect software?
>
> Component failure is a fact of life...no matter what the MTBF might be
> (usually a very educated estimate), there will always be those that are
DOA
> or fail within a 24 - 48 hour time span...and some that wait 5 months.
> Granted, failures are not common, but if something is working for months
and
> suddenly stops, my first inclination would not be software issues.
>
> But, that's just me.
>
> Tim
>
>
> {Original Message removed}

More... (looser matching)
- Last day of these posts
- In 2002 , 2003 only
- Today
- New search...