Searching \ for 'Failure modes (was: wierd jump)' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=wierd+jump
Search entire site for: 'Failure modes (was: wierd jump)'.

Truncated match.
PICList Thread
'Failure modes (was: wierd jump)'
1999\02\08@203845 by Clyde Smith-Stubbs

flavicon
face
On Mon, Feb 08, 1999 at 11:09:16AM -0600, John Payson wrote:

> user may be somewhat confused when his product shows bogus
> numbers on the display, but that's better than having it
> go totally bonkers.

I disagree strongly with that - it's far better IMHO for the user
to know the device has failed (or for a watchdog reset to occur)
than for it to malfunction in a subtle manner. Or to put it another
way, a device should either work or not, "sort-of" working is far
more dangerous. I just don't understand how you could believe that
giving a wrong answer is preferable to giving no answer at all.

--
Clyde Smith-Stubbs               |            HI-TECH Software
Email: spam_OUTclydeTakeThisOuTspamhtsoft.com          |          Phone            Fax
WWW:   http://www.htsoft.com/    | USA: (408) 490 2885  (408) 490 2885
PGP:   finger .....clydeKILLspamspam@spam@htsoft.com   | AUS: +61 7 3355 8333 +61 7 3355 8334
---------------------------------------------------------------------------
HI-TECH C: compiling the real world.

1999\02\08@210806 by Sean Breheny

face picon face
Hi Clyde,

I understand your point, and it certainly is true in many situations, but
if a device can function partially (and usefully) while being partially
disabled, it may be better to allow that than to shut it down completely
for a small failure. For example, would it be good if your car suddenly
refused to start its engine just because the fuel level sensor had drifed
outside of calibration and was giving false fuel gauge readings? Of course
not. In fact,it wouldn't even be good for the fuel gauge to shut down
completely, although a warning light might be good.

For a real world example, my father uses a piece of medical equipment
called a "Peristaltic Compression Unit", it is somewhat akin to a figher
jet's "G" suit but is used for people who have fluid buildup in the legs.
It is run by a microcontroller and recently, it began refusing to operate
at all because its stepper motors were "out of sync" (This is what the
error code translated to in the manual,I'm not really sure what it
means,since it didn't say exactly which stepper motors it was talking
about, and what they were out of sync with). Now, I agree, normally you
want medical equipment to let you know it failed, but you don't want it to
stop operating completely! In this case, this peice of equipment is so rare
that it could take >1 month for him to get a replacement,meanwhile his legs
would swell up to the point of having gaping,open sores. Thank God, the
unit suddenly decided to work again(and,as far as my Dad's legs go,it is
working fine), but it will still be replaced in a month or two.

To make a long story short, wouldn't it be best to give a warning to the
user,but still allow operation, at least in cases where the life of the
person isn't placed in jeopardy by the malfunction.

Sean

At 11:36 AM 2/9/99 +1000, you wrote:
{Quote hidden}

| Sean Breheny
| Amateur Radio Callsign: KA3YXM
| Electrical Engineering Student
\--------------=----------------
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
EraseMEshb7spam_OUTspamTakeThisOuTcornell.edu  Phone(USA): (607) 253-0315 ICQ #: 3329174

1999\02\08@214653 by Alan King

picon face
I'll rewrite what
Clyde Smith-Stubbs wrote:

> xxxxxxxxxxxxxxx I just don't understand how you could believe that
> giving _______ answer is preferable to giving _____ answer at all.
           no                                  a wrong

As you said, IYHO..  And just like everything else that's opinion,
perceiving yours to be completely correct can be just as incorrect as
you see the other side of things.  For sure, when my pic controlled
parachute opener gives a "NO OPEN" answer instead of the "30 SECOND
OPEN" answer you asked for, you'll be wishing it came up with a wrong "5
SECOND OPEN" answer.  At least for the little while before you go splat,
you'll see some merit to other methods.  Your way works fine for LCDs
and such, but if the pic's running anything that could threaten
someones's safety, limited erroneous control is better than no control.
Even with independent mechanical safeguards, I'd still much rather not
have the controller sending "chop my head off" or "blow me up"
commands.  Whether it makes the code harder to troubleshoot or not.
Besides, falling through into unexpected territory can be just as
subtle.
 Just tell me your pics aren't running nuclear reactors and I'll sleep
a lot better tonight!  ;)



Clyde Smith-Stubbs wrote:

> I disagree strongly with that - it's far better IMHO for the user
> to know the device has failed (or for a watchdog reset to occur)
> than for it to malfunction in a subtle manner. Or to put it another
> way, a device should either work or not, "sort-of" working is far
> more dangerous. I just don't understand how you could believe that
> giving a wrong answer is preferable to giving no answer at all.

1999\02\08@215509 by Gerhard Fiedler

picon face
At 21:06 02/08/99 -0500, Sean Breheny wrote:
>To make a long story short, wouldn't it be best to give a warning to the
>user,but still allow operation, at least in cases where the life of the
>person isn't placed in jeopardy by the malfunction.

i think it depends a lot on the actual situation. for a simple remote
sensor for example you might just want it to shut down if anything
unpredicted occurs, so that the system it's connected to can blow some
alarm and a person can go there and look what's up,  exchange the sensor or
fix whatever. if you have the possibility of blowing a whistle and continue
doing something (preferrably not dangerous stuff :), this might actually be
preferrable, but not always the situation is for it.

ge

1999\02\08@220438 by Gerhard Fiedler

picon face
At 21:47 02/08/99 -0500, Alan King wrote:
>you see the other side of things.  For sure, when my pic controlled
>parachute opener gives a "NO OPEN" answer instead of the "30 SECOND
>OPEN" answer you asked for, you'll be wishing it came up with a wrong "5
>SECOND OPEN" answer.  At least for the little while before you go splat,
>you'll see some merit to other methods.

actually, from a parachute opener i'd very much like to have a strongly
audible beep -- and NOTHING else -- if =anything= goes wrong with it. and
if it's a situation where the beep is not enough, maybe a complete shut
down with of course the following fail-safe parachute opening. but not any
kind of "subtle malfunctioning"...

ge

1999\02\08@221018 by dave vanhorn

flavicon
face
At 11:36 AM 2/9/99 +1000, Clyde Smith-Stubbs wrote:
>On Mon, Feb 08, 1999 at 11:09:16AM -0600, John Payson wrote:
>
>> user may be somewhat confused when his product shows bogus
>> numbers on the display, but that's better than having it
>> go totally bonkers.
>
>I disagree strongly with that - it's far better IMHO for the user
>to know the device has failed (or for a watchdog reset to occur)
>than for it to malfunction in a subtle manner. Or to put it another
>way, a device should either work or not, "sort-of" working is far
>more dangerous. I just don't understand how you could believe that
>giving a wrong answer is preferable to giving no answer at all.


I agree completely. Much better to crash hard.
I've done the "slight error" approach as copy protection though.
If the right conditions aren't satisfied, then the software gradually gets
sicker and sicker, but it will run ok for a while first.  Let he who rips
it off beware.

1999\02\09@002156 by Clyde Smith-Stubbs

flavicon
face
On Mon, Feb 08, 1999 at 09:06:45PM -0500, Sean Breheny wrote:

> To make a long story short, wouldn't it be best to give a warning to the
> user,but still allow operation, at least in cases where the life of the

Sure, if you diagnose a failure, then you can drop into a "limp-home"
mode to allow things to continue, with reduced functionality. I was
talking about what some people see as "defensive coding" where you
effectively ignore failures of the software (as opposed to hardware
failures) rather than 1) making sure design errors are detected
at design time instead of in the field and 2) when things get totally
pear-shaped taking positive action (either shutdown, or enter limp-home
mode) rather than just blundering along hoping for the best.

There's a big difference between what you do when e.g. a sensor
malfunctions (in this case, limiting the sensor reading to a sensible
value may be very appropriate) and what you do when it's clear your
very core is damaged - in this case wishful thinking will just make
things worse.

To use the parachute example, I'd rather a malfunctioning chute opener
shut itself down and let me pull the cord myself rather than open the chute
before I'm ready and without warning. And I'd rather a GPS navigator
turned itself off rather than sending me in the wrong direction. Any
life-critical apparatus needs a backup anyway, even if it's a manual
one, and I'll choose positive action any day rather than crossing fingers.

Clyde
--
Clyde Smith-Stubbs               |            HI-TECH Software
Email: clydespamspam_OUThtsoft.com          |          Phone            Fax
WWW:   http://www.htsoft.com/    | USA: (408) 490 2885  (408) 490 2885
PGP:   finger @spam@clydeKILLspamspamhtsoft.com   | AUS: +61 7 3355 8333 +61 7 3355 8334
---------------------------------------------------------------------------
HI-TECH C: compiling the real world.

1999\02\09@115831 by Alan King

picon face
Clyde Smith-Stubbs wrote:
>
> On Mon, Feb 08, 1999 at 09:06:45PM -0500, Sean Breheny wrote:
>
> > To make a long story short, wouldn't it be best to give a warning to the
> > user,but still allow operation, at least in cases where the life of the
>
> Sure, if you diagnose a failure, then you can drop into a "limp-home"
> mode to allow things to continue, with reduced functionality. I was
> talking about what some people see as "defensive coding" where you
> effectively ignore failures of the software (as opposed to hardware
> failures) rather than 1) making sure design errors are detected
> at design time instead of in the field and 2) when things get totally
> pear-shaped taking positive action (either shutdown, or enter limp-home
> mode) rather than just blundering along hoping for the best.

 When you find anyone who can do #1 with 100.00 percent reliability,
let's get them a job at Microsoft so Windows can get a little better.
 For #2, at least my code masked to stay on the table comes back, and
has the ability to do something.  Your code CAN'T take positive action,
because you have no idea where it went after a faulty value to the
table.

1999\02\10@153155 by John Payson

flavicon
face
On Mon, Feb 08, 1999 at 11:09:16AM -0600, John Payson wrote:

> user may be somewhat confused when his product shows bogus
> numbers on the display, but that's better than having it
> go totally bonkers.

|I disagree strongly with that - it's far better IMHO for the user
|to know the device has failed (or for a watchdog reset to occur)
|than for it to malfunction in a subtle manner. Or to put it another
|way, a device should either work or not, "sort-of" working is far
|more dangerous. I just don't understand how you could believe that
|giving a wrong answer is preferable to giving no answer at all.

If you wanted to add code to validate table lookups and provide
the ability to call a user-defined function on a range violation,
that would be useful.  And if there were a guarantee that an in-
valid table reference would result in a specific form of misbeh-
avior such as tripping the watchdog, that might be okay.  But
merely jumping to an arbitrary location in code space is not IMHO
likely to produce anything resembling a useful failure mode.

In the programs I've written where I mask the table lookup indices,
conditions which would would produce an incorrect index are likely
to resolve themselves in short order.  For example, if I'm writing
a digititl countdown timer a glitch which causes the count value
(kept as BCD) to assume an invalid value may result in a goofy dis-
play and may cause the count to terminate more quickly or slowly
than it should, but the system WILL recover (if a BCD 'seconds coun-
ter gets glitched to $FF, e.g., it will take 165 seconds for it to
count down to zero, but it WILL hit zero).  By contrast, if the
display-segment data table weren't protected against out-of-range
values, that same glitch could cause the software to do just about
anything.

More... (looser matching)
- Last day of these posts
- In 1999 , 2000 only
- Today
- New search...