Searching \ for 'Robust Software:' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=
Search entire site for: 'Robust Software:'.

Truncated match.
PICList Thread
'Robust Software:'
1999\02\13@103528 by Graeme Smith

flavicon
face
hi guys... and gals...

(I try not to be TOOOOOOOO SEXIST)

but enough about that....

What I am wondering about today, sort of stems from that discussion about
the best way to make a jump table crashproof.

The discussion as near as I could follow it stemmed from using a table
sentry to assure that any jump coming from a jump table, would be within
the jump table....

What I am interested in learning, is if anyone, has ever come up with a
general case, of, what I call "ROMAN ENGINEERING"...

(All roads lead to ROME) ;)

for the Pic products.....

Ideally, if this protocol has ever been implimented, it would mean that a
random jump, anywhere in memory, would end up, back at the central control
loop fairly quickly.

Comments anyone?

                               GREY

GRAEME SMITH                         email: spam_OUTgrysmithTakeThisOuTspamfreenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.

1999\02\13@112031 by Sean Breheny

face picon face
Hi Grey,

At 08:35 AM 2/13/99 -0700, you wrote:
>What I am interested in learning, is if anyone, has ever come up with a
>general case, of, what I call "ROMAN ENGINEERING"...
>
>(All roads lead to ROME) ;)
>
>for the Pic products.....
>
>Ideally, if this protocol has ever been implimented, it would mean that a
>random jump, anywhere in memory, would end up, back at the central control
>loop fairly quickly.

It seems to me that this is the basic idea behind the WDT. Even if the PIC
gets into a metastable state, the WDT will bring it back to the beginning
of your program at the end of the timeout period.

I guess one could intersperce the code with jumps to your main loop. I have
often written routines which end in a "goto main" and work in such a way
that if input is received which is not understood, the code ends up on the
"goto main". This method doesn't really solve the problem of what happens
if there is a loop which doesn't terminate, or if the PIC locks up due to
power supply glitches, a stray cosmic ray, or whatever! However, the
combination of the WDT and a good brownout circuit would,I think, make a
PIC fairly robust against totally locking up. However, it still doesn't
guarantee that the code you want will acutally get executed,and thereby
actually "deploying the chute" at the correct time,so to speak.

Sean

|
| Sean Breheny
| Amateur Radio Callsign: KA3YXM
| Electrical Engineering Student
\--------------=----------------
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
.....shb7KILLspamspam@spam@cornell.edu  Phone(USA): (607) 253-0315 ICQ #: 3329174

1999\02\13@115552 by Wagner Lipnharski

picon face
Graeme Smith wrote:
> Ideally, if this protocol has ever been implimented, it would mean that a
> random jump, anywhere in memory, would end up, back at the central control
> loop fairly quickly.

Well, during many years of experience with uC's, even using high clock rates, di
fferent kind of u-processors and u-controllers, I can tell you with a nice peace
of mind that I
found random jumps only once, and it was a software (my guilty) failure, a unit
did random jumps.

You can ask, well, why then they produce chips like "Watch Dog Timers" and "uP P
ower Monitors" if problems like that never happens? They happens and very often,
if you don't take
the due care about software and hardware (mainly).

You really don't need a uC Voltage Monitor (that resets the cpu when power goes
below certain point), if you have a perfect power supply, right?   Well, it does
n't exist, so I use
it in all projects, DS1232, DS1233, DS1833 for example. A voltage reduction, eve
n for few microseconds, can cause an erroneous jump in the program execution. Vo
ltage Monitors RESET
the uC before it does something stupid.

External code memory, like eproms, offers a nice "antenna" situation for the pro
cessor.  Some people really thinks that a circuit board layout is like the under
wear drawer, "works
in anyway arranged", LIE!!!  Some people never heard (not their fault) about the
expression "BUS TERMINATION", that installs pull ups at the END of the bus, and
the word "END" or
"TERMINATION" means the far point, not just the first component close to the mic
roprocessor.  Some use to install the resistor packs close to the cpu, just beca
use it is easy, all
lines are just together, huh?, but it is NOT the best place, it can works better
at the end of the bus, mainly for high speed clock, so it can avoids reflective
electrical waves,
and less noise and signal echoes.

Just like buying a new radio transmitter and install "any kind of wire, format a
nd length" as antenna, because anything will works nice, right?  WRONG!!!!  To g
et the best possible
power flow (transmitted) you just need the right impedance, size, length, format
, style, and so on.  Just ask somebody that works with radio transmission... the
re are rules.  Well,
what do you think your microcontroller "is" running at 24MHz? ... answer: A PLAI
N RADIO TRANSCEIVER!!!.

If you worry about the processor doing wrong random jumps, why not worry about i
t changing register values, or losing some data bits somewhere in memory, then,
we need to install a
parity checker too?  perhaps parity is not enough...

Do you guys realize that the removal of the "parity bit and check" were removed
from most of the actual PC main memories?  At the beginning I thought it was a c
rime and that was
just marketing price competition at buyer's bad luck with bad memory cards conse
quences... but then i realized that PC technology evolved and memory problems we
re almost
eliminated, so rare, that it happens just once a week, when your computer freeze
s... :)  It doesn't matter if (by safety) you use "memory cards with parity", it
will not avoid your
computer to locks down, it would happens for a reason, a "parity error happened"
.  To avoid it, you need a complex LRC and CRC check, with a circuit called "Me
mory Error
Correction", that can fix up to a certain quantity of bad bits... price?... hoho
ho, another story.

So, before worry about how to identify problems on the fly, why not think about
avoid them to happens in first place?

For sure, this is my personal (and long) point of view and I can be wrong.

have fun.

Wagner Lipnharski
UST Research Inc.
Orlando, Florida
http://ustr.net

1999\02\15@110316 by Graeme Smith

flavicon
face
GRAEME SMITH                         email: grysmithspamKILLspamfreenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Sat, 13 Feb 1999, Wagner Lipnharski wrote:

> Graeme Smith wrote:
> > Ideally, if this protocol has ever been implimented, it would mean that a
> > random jump, anywhere in memory, would end up, back at the central control
> > loop fairly quickly.

> Wagner Lipnharski wrote:
>
> So, before worry about how to identify problems on the fly, why not think abou
t avoid them to happens in first place?
{Quote hidden}

Ok, I guess to some extent I deserved that, coming so close to the end of
another thread with an argument in it.... Clearly, this is controversial..
and I should be careful what I say.

However, I have been "In and around the business" for 15 years, and if
there is one thing I know about computers, its that they manage to find
their own reasons to fail, no matter how well they are designed, and no
matter how many hours you spend "Debugging" your software...

Lecture Mode ON.... WARNING HIGH LEVEL of SOMETHING OR OTHER

I have every intention of spending "As much time as it takes" to test my
software/hardware, and so on, but, from experience, I do not expect that
to be the answer to creating a "BugFree" software system.

As computer equipment gets more complex, and the programs we can run in
the same amount of time, get more and more sophisticated, we can expect to
find, that our "Radio Transciever" CPU's, and our "Photo-diode" based
circuits in our gradually smaller and smaller dies, on our gradually more
and more integrated circuits, are going to be gradually becoming more and
more sensitive to cosmic rays, static charges, and even Quantum Events,
which are going to fry, our assumptions about how robust our systems are.

At some point in the future, I am going to want to push the size limit on
software, with a massive project, that is going to need not only to be
robust, but to run for years, before it can "Prove" its aim. I already
understand that the nature of logic, limits the size of programs that can
be built without breaking, and have, in the wings, a theory of what can be
done about it.

But, if my BASE software, is not glitch free, or at least consistent in
the way it works, I am never going to get the bigger project to run, let
alone, run for years without a major glitch, because it exceeds the
current limit for "Testable Software" by a wide margin.

I know that most people who work with PIC's do so because they are simple
and don't require the same level of "Testing" as the larger systems, but
please don't assume that the only reason for PIC's is one of simple
systems, that can be easily tested, there are a lot of different designs
that are possible with this architecture, and simple ones, that are easily
testable are not the limit.

I recently read a book on "Testing Software" in the library... Ok, I
didn't read the whole book, but in the preface, it says quite clearly,
that you CAN'T TEST EVERYTHING. This from someone who is considered an
expert in the field of TESTING software.

I am no expert, just a guy with big dreams, and a flat pocketbook.

But, I am looking for a way to make my dream real, and that involves
getting ROBUST SOFTWARE, that doesn't break down quite as often, as other
programs do. Whether it is because the system tests itself exhaustively,
or because it makes no assumptions, I want to KNOW that my software will
run at least consistently, so I can test for errors, and make it run
correctly.

Lecture Mode Off.

I haven't been on this list long, but the following tips, have been heard:

1. WDT management

       Trouble with this, is that it limits loop length, and ties up a
       timer.... Might not be a problem on the "Big Boys", but its either
       a timer or the WDT on the 16c5x series on which my first
       experiments are planned.

       Since I plan to impliment at least a ghost of a RTOS on this
       architecture, pushing it to the limit of what can be done on it, in
       order to understand what that limit is....

       I need the prescaler, to make my system work. Hence at least for a
       time, I can't be bothered with a WDT.

       So.... I understand that it is considered de-rigor, to leave the
       un-programmed memory in a pic, in the "Programmable State" which
       suggests to me, that if a random jump goes into the stratosphere
       (you program goes off to never - never land) it is going to end up
       executing a lot of 1111 1111 1111 or 0000 0000 0000 h commands on
       1111 1111 or 0000 0000 h data.

       The PIC ML codes are interpreted to NOP's where they don't fall on
       the 33 commands, so I assume that this means that they are going to be
       doing a lot of NOP's, essentially shutting down your system.

       One guy here suggested a jump back to the main loop, (Which
       doesn't give you even a ghost of a chance to recover from the bad
       software) I like this idea... but would jump to an error recovery
       routine instead.

       My routines are going to have a Minimal run length in order to
       comply with the RTOS-like system, so I can plan to put a jump in
       every X number of bytes.... I assume that I am going to have to
       plan for the 12 bit NOP, so that means that I get 2 commands every
       three bytes, and a 10 command string, would be about 15 bytes
       long.

       Does this seem correct?

                               GREY

1999\02\15@112011 by Lawrence Lile
flavicon
face
>> Graeme Smith wrote:

>1. WDT management
>        So.... I understand that it is considered de-rigor, to leave the
>        un-programmed memory in a pic, in the "Programmable State" which


Graeme, thanks for the Rant.  Here's what I do for unprogrammed memory.  I
NEVER leave it in the unprogrammed state.

In MPASM, I add this line to the end of my code:



;       FILL REST OF MEMORY WITH GOTO SELF COMMAND
       FILL  (GOTO $), (0X1FD - $)

This makes every unprogrammed location essentially a watchdog timer
actuator.  It the software gets to any of these locations, something has
gone very wrong (like a brownout)  and you CANT TRUST YOUR SOFTWARE
ERRORHANLDER TO HANDLER ANY ERRORS.  At this point the microprocessor has,
essentially, gone crazy, and no software error recovery routine is reliable,
because by definition the processor has loaded an invalid address.  Reboot
it, start over, initialize all your variables, and if you want an error
checking routine, put it in your boot sequance.  Check for a WDT at bootup,
and if you see one, then do the errorchecking thing.

1999\02\15@122553 by Graeme Smith

flavicon
face
GRAEME SMITH                         email: .....grysmithKILLspamspam.....freenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Mon, 15 Feb 1999, Lawrence Lile wrote:

{Quote hidden}

       Hmmm... Kinda like hitting yourself with an axe, when trying to
       cut off a hangnail...

       A Cosmic Ray, passing through the high bit of an address stored
       in memory, can flip it from a 0 to a 1, thus creating at least
       temporarily, a jump to nowhere.

       Now, you COULD throw the baby out with the bathwater, and restart
       from scratch. Or you could try to recover, and THEN, if that fails
       restart from scratch. Of the two, I would prefer the last, because
       it reduces the amount of work my PIC will have to do again.

       But it does bring up the point, of what exactly, the unprogrammed
       code should be expected to do...

       Now, as I understand it, Unprogrammed codespace, can be used for
       patches, which are not recommended, but often required in real
       life. In this case, you want to leave some room, for reworking the
       PIC in the field, with unknown equipment, and without releasing
       the complete code set. A Patch, can be applied overlaying the
       original code, with complete security.

       Or at least that is what I figured the reason was for leaving it
       in the unprogrammed state. If you have it already preset to jump
       to the error handler, and can mask this to jump back to the main
       code, then you can use snippets of code within short nop lists
       (un-programmed state) with a jump to preknown state, to redirect
       errant programs back into the main loop, in a known safe state.

                                       GREY

1999\02\15@154335 by John Payson

flavicon
face
;       FILL REST OF MEMORY WITH GOTO SELF COMMAND
       FILL  (GOTO $), (0X1FD - $)

|This makes every unprogrammed location essentially a watchdog timer
|actuator.  It the software gets to any of these locations, something has
|gone very wrong (like a brownout)  and you CANT TRUST YOUR SOFTWARE
|ERRORHANLDER TO HANDLER ANY ERRORS.

Putting a [goto self] in the last memory address (or the penultimate
address on the 16C5x or 12Cxx parts] will achieve the same effect
with two extra benefits:

[1] You can use a 2 or 3 instruction sequence if needed to ensure
   that the 'goto self' actually DOES goto itself.  Otherwise, if
   PCLATH isn't pointing to the current page (*) the jump might
   end up going into some other part of the program.

   (*) it will be if the code got where it is via jump, but not
       necessarily if it got there via electrostatic glitch) or
       a bogus 'return'.

[2] Most of the unused code-space can be left blank, allowing for
   faster programming and/or future expansion.

Note that on the 17Cxx parts, a blank program location corresponds
to a CALL instruction, but on all the 12- and 14-bit parts it will
leave the PC and related registers alone.

1999\02\16@030311 by Mark Willis

flavicon
face
Graeme Smith wrote:
> <snipped>
>         (un-programmed state) with a jump to preknown state, to redirect
>         errant programs back into the main loop, in a known safe state.
>
>                                         GREY

 What I think everyone here's saying, is that they've looked for such a
beastie, and there isn't such a beast as a "known safe state", once you
ended up in a unknown state through some unknown means.

 If the principle of least astonishment is voided, you're best off to
trust nothing, run a (at least partial) hardware test, and RESTART
otherwise, i.e. either do a power-up restart or a Watchdog restart.
Then, you at least know that your hardware's set correctly, etc. -
because YOU JUST SET IT CORRECTLY.  (You might think of a state machine
for your project - occasionally when everything tests OK, save state
"Checkpoint dump", should you end up in psychotic code space, TRUST
NOTHING, restart, and load your last checkpoint dump and work forwards
from there.  At least that way if you crash, you don't have to duplicate
ALL your work from scratch...  Also, the checkpoint dump can give you an
idea on what's going on <G>)

 When you work with embedded hardware that controls electronics that
can quite literally blow up when over-driven, ASSUMING that things are
safe just isn't a good idea at all.  (Say you were controlling a piece
of high powered pulsed RF transmitter with a PIC part, the transmitter's
turned on at super high power, and just before it is to be turned off,
the software crashes;  You then assume that the transmitter's off and go
ahead and process the next 15 minutes worth of received data in the PIC,
setting up for the next transmission, as the transmitter not-so-slowly
melts into $25,000 worth of slag, but your job's secure and the boss
will be happy - you assumed it was safe, so it was, right? <G>)  This is
different than a lamp dimmer or soundmaker where an occasional "oops"
just makes the light bulb flash a little brighter or the sound a little
different than expected;  I for one find that the way you GET good
habits, is to always be very aware of what you're doing, when you're
-developing- habits <G>

 (You can bet that any sane person who had that one, would make sure
that his power-up / watchdog software turned off the transmitter, first
thing.  THEN went on to other things...)

 Mark

1999\02\16@110122 by dave vanhorn

flavicon
face
>  What I think everyone here's saying, is that they've looked for such a
>beastie, and there isn't such a beast as a "known safe state", once you
>ended up in a unknown state through some unknown means.
>
>  If the principle of least astonishment is voided, you're best off to
>trust nothing, run a (at least partial) hardware test, and RESTART
>otherwise, i.e. either do a power-up restart or a Watchdog restart.
>Then, you at least know that your hardware's set correctly, etc. -
>because YOU JUST SET IT CORRECTLY.  (You might think of a state machine
>for your project - occasionally when everything tests OK, save state
>"Checkpoint dump", should you end up in psychotic code space, TRUST
>NOTHING, restart, and load your last checkpoint dump and work forwards
>from there.  At least that way if you crash, you don't have to duplicate
>ALL your work from scratch...  Also, the checkpoint dump can give you an
>idea on what's going on <G>)

Here, we agree :)   The credit card terminals I worked with all were
conciously designed this way, because we were screwing around with people's
bank accounts. If the software hits an anomaly, we dump everything and
"reboot". If the rom dosen't checksum, we don't proceed. Far better that,
than put corrupt data into the banking system.  The "breadcrumbs" left in
battery backed RAM are often the only clue we had as to how the system got
corrupted, and we were very succesful in tracking down every bug.

I take the same approach on uC software. It might take a bit more romspace,
but I set everything in the processor, no default states, and go from
there. Inbetween routines, I drop a jump zero in case the processor gets lost.

Routine:
       do stuff
       do stuff
       ret

       jmp 0

Routine2:

So in normal execution the jmp 0 is never seen, and if it IS seen, we know
that something BAD has happened.. The JMP 0 could be JMP here, to hang the
processor in a loop, if I thought that recovering the ram state was more
important than the customer being able to use the equipment. Since the
startup is tightly controlled, I'm confident that after a reset, everythng
will be right, at least until we hit the bug that caused the bad jump
again.. :)

It's important to note though, that this approach works for equipment that
sits on a desk, and it's worst malf might be to put garbage on the display,
or print gibberish on a receipt printer. It's not controlling a welding
robot that could stab a person if it moves unexpectedly.  If you have
dangerous peripherals, then you need to make them safe too.

1999\02\16@163308 by Gerhard Fiedler

picon face
At 10:59 02/16/99 -0500, dave vanhorn wrote:
>The JMP 0 could be JMP here, to hang the
>processor in a loop, if I thought that recovering the ram state was more
>important than the customer being able to use the equipment. Since the
>startup is tightly controlled, I'm confident that after a reset, everythng
>will be right, at least until we hit the bug that caused the bad jump
>again.. :)

isn't a watchdog reset (eg. triggered by a "JMP here") somewhat more of a
reset (the micro's internal states?) than just setting all (accessible)
registers to known values like you do it? and if so, wouldn't it be safer
than a JMP 0 in case the program gets lost somewhere?

ge

1999\02\16@164507 by dave vanhorn

flavicon
face
>isn't a watchdog reset (eg. triggered by a "JMP here") somewhat more of a
>reset (the micro's internal states?) than just setting all (accessible)
>registers to known values like you do it? and if so, wouldn't it be safer
>than a JMP 0 in case the program gets lost somewhere?

Depends on the micro, but in all my systems, they are equivalent. My
startup code sets everything, including writing static values to all of
ram. (Registers are cleared as used)
If you had some periph that only a hard reset could fix, then that's what
you'd have to do.

1999\02\16@170609 by Gerhard Fiedler

picon face
At 16:43 02/16/99 -0500, dave vanhorn wrote:
>>isn't a watchdog reset (eg. triggered by a "JMP here") somewhat more of a
>>reset (the micro's internal states?) than just setting all (accessible)
>>registers to known values like you do it? and if so, wouldn't it be safer
>>than a JMP 0 in case the program gets lost somewhere?
>
>Depends on the micro, but in all my systems, they are equivalent. My
>startup code sets everything, including writing static values to all of
>ram. (Registers are cleared as used)
>If you had some periph that only a hard reset could fix, then that's what
>you'd have to do.

i was wondering about the pic and possibly internal registers which are not
accessible to your code, but which might get confused and reset to a
working state by a reset only.

ge

1999\02\16@171230 by dave vanhorn

flavicon
face
>i was wondering about the pic and possibly internal registers which are not
>accessible to your code, but which might get confused and reset to a
>working state by a reset only.
>
>ge


I haven't done a lot of pic work, but I don't remember anything like that
on the F84 anyway.
You have to check your hardware in each case though, I tend to work with a
lot of off-chip peripherals too, and they all have to be re-initted.

On the AVR, it's no problem, you can write anything that's settable.

1999\02\16@234745 by Graeme Smith

flavicon
face
GRAEME SMITH                         email: EraseMEgrysmithspam_OUTspamTakeThisOuTfreenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Mon, 15 Feb 1999, John Payson wrote:

{Quote hidden}

Thanks for the tip...

I haven't looked at programming 17cxx parts yet, so knowing that the
system I use for programming the 12 and 14 bit parts is different, will
at least explain why I can't achieve the same utility...

as for using the last memory location as your jump to self command...

This isn't a bad idea, unless your processing is time sensitive...

Having it run to the end of memory, doing nops, might take only
miliseconds, but it still takes up an appreciable amount of time at
computer speeds. I am planning to use a sort of "Macro-Cycle" in my
opsystem, that limits the number of commands that can be put in a single
code cluster, to a number small enough to take a set amount of time to
run.

if I can "Mask" the error routine to get the "Next" command, I can
essentially, leave the cluster as nops, and return to the main loop,
in sync with the next macro-cyle.

One of the problems with jump to self programs, I would think, would be
that they tend to require a hard reset, or a WDT reset to recover from.
In a time critical application, you essentially destroy the function of
the system, because it takes so long to do the reset.

Am I the only one that sees this as a problem?

                               GREY

1999\02\17@004441 by Graeme Smith

flavicon
face
GRAEME SMITH                         email: grysmithspamspam_OUTfreenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Mon, 15 Feb 1999, Mark Willis wrote:

{Quote hidden}

Well, you seem to be assuming that a WDT gives you a "Known Safe State",
(as long as you do the fail safe restart code).

There is no reason why you can't reset some of the software during your
"Error" handler, essentially putting the system into its "fail Safe" mode.
after all, if you set it up that way, the only way your system is going to
get to the error handler, is if it jumps into the middle of one of the
code spaces you left unprogrammed.

I fail to see the difference between reacting to a jump failure, and
letting the hardware react to the same jump failure, except perhaps the
necessity of going through the "Hardware Reset" for what may be a glitch
that only affects the software one time.

{Quote hidden}

I like that idea... just not sure I really want to perform a HARDWARE
reset, every time the software glitches..... It makes sense to drop back
to a checkpoint, especially if you checkpoint after you write to an
external device, so you don't end up sending the same message twice...

{Quote hidden}

I wouldn't be having this discussion, if I didn't want to get advice on
the best way to achieve my goals... I just don't necessarily want the
quick and easy answers, like doing a hardware reset every time you glitch.

It might take a little longer, to try to resurrect the software from the
checkpoint without doing a complete reset, first, then do the reset if you
end up going through the error routine with the same checkpoint the second
time, but at least, you don't have to take that time, just because you
assumed the hardware was having problems when it was a temporary glitch.

>   (You can bet that any sane person who had that one, would make sure
> that his power-up / watchdog software turned off the transmitter, first
> thing.  THEN went on to other things...)
>
>   Mark
>

Being sane, your error routine, would do the same thing....

In essence, I am wondering why you need the WDT at all, if your code is
designed to be fail-safe....

Maybe there is something I don't know about PIC's, that makes them
different, but when I want to reset my computer, I usually do a WARM BOOT
FIRST rather than pushing the reset, if only because some of the drives
don't reset well from the hardware reset, but they all accept a warm boot.

                               GREY

1999\02\17@005509 by Graeme Smith

flavicon
face
GRAEME SMITH                         email: @spam@grysmithKILLspamspamfreenet.edmonton.ab.ca
YMCA Edmonton

Address has changed with little warning!
(I moved across the hall! :) )

Email will remain constant... at least for now.


On Tue, 16 Feb 1999, dave vanhorn wrote:

{Quote hidden}

Well, a jump 0, is not a HARD RESET, it's simply a warm boot.

>
> It's important to note though, that this approach works for equipment that
> sits on a desk, and it's worst malf might be to put garbage on the display,
> or print gibberish on a receipt printer. It's not controlling a welding
> robot that could stab a person if it moves unexpectedly.  If you have
> dangerous peripherals, then you need to make them safe too.
>

There is no question in my mind that dangerous peripherals require special
treatment. There is also no reason why the init proceedure, cant be used
as an error handler...

My guess is that the "Cleanup" after a glitch is the real question here,
rather than the "HARDWARE RESET" what does the hardware reset do, that a
good initialization proceedure couldn't?

(Other than test the EEPROM)

1999\02\18@124804 by John Payson

flavicon
face
|There is no reason why you can't reset some of the software during your
|"Error" handler, essentially putting the system into its "fail Safe" mode.
|after all, if you set it up that way, the only way your system is going to
|get to the error handler, is if it jumps into the middle of one of the
|code spaces you left unprogrammed.

It depends why you got into the failed state in the first
place.  Often, the only probable mechanism for reaching
unprogrammed code space is an electrical glitch; such a
glitch may probably be better recovered via reset than a
warm start; better still would be to power cycle the chip,
but that would require external hardware.

|I fail to see the difference between reacting to a jump failure, and
|letting the hardware react to the same jump failure, except perhaps the
|necessity of going through the "Hardware Reset" for what may be a glitch
|that only affects the software one time.

The $10,000,000 question here is what caused the jump table
failure (if that's what killed the system).  If the only way
in which the failure could occur is by the CPU executing code
incorrectly (e.g. if I code:

       ; Table evaluator [starting at address $30]
Xlate:
       clrf    PCLATH
       addwf   PC
       db      1,2,4,8, 3,5,7,9, 2,4,6,8, 4,3,2,1

       ; Later on...
MainLoop:
       movf    PORTB,w
       andlw   $0F
       movwf   LatchB

       ; Do some munging...

       ; Later on
       movf    LatchB,w
       call    Xlate
       movwf   Result

       ; ...
       goto    MainLoop

If no code writes to INDF, and if the only write to LatchB is as
above, there is no way the computed jump near the beginning should
ever fail.  Nonetheless, if the chip gets glitched, it's possible
that the value in LatchB might get corrupted.  Of course, if this
DOES happen there's not much guarantee that anything else will be
as it should be either...


{Quote hidden}

I like that idea... just not sure I really want to perform a HARDWARE
reset, every time the software glitches..... It makes sense to drop back
to a checkpoint, especially if you checkpoint after you write to an
external device, so you don't end up sending the same message twice...

{Quote hidden}

I wouldn't be having this discussion, if I didn't want to get advice on
the best way to achieve my goals... I just don't necessarily want the
quick and easy answers, like doing a hardware reset every time you glitch.

It might take a little longer, to try to resurrect the software from the
checkpoint without doing a complete reset, first, then do the reset if you
end up going through the error routine with the same checkpoint the second
time, but at least, you don't have to take that time, just because you
assumed the hardware was having problems when it was a temporary glitch.

>   (You can bet that any sane person who had that one, would make sure
> that his power-up / watchdog software turned off the transmitter, first
> thing.  THEN went on to other things...)
>
>   Mark
>

Being sane, your error routine, would do the same thing....

In essence, I am wondering why you need the WDT at all, if your code is
designed to be fail-safe....

Maybe there is something I don't know about PIC's, that makes them
different, but when I want to reset my computer, I usually do a WARM BOOT
FIRST rather than pushing the reset, if only because some of the drives
don't reset well from the hardware reset, but they all accept a warm boot.

                               GREY

1999\02\20@082417 by paulb

flavicon
face
Wagner Lipnharski wrote, regarding the use of non-parity checking in
PC memory:

> but then I realized that PC technology evolved and memory problems
> were almost eliminated, so rare, that it happens just once a week,
> when your computer freezes... :)

 I don't think so.  Crummy memory existed 15 years ago, and it exists
now.  Cosmic rays haven't been designed out either, as agreed in this
thread!  If you have a PC that runs for a week or longer without
failure - keep it!

>  It doesn't matter if (by safety) you use "memory cards with parity",
> it will not avoid your computer to locks down, it would happens for a
> reason, a "parity error happened".

 The real reason parity was foregone was that (it was cheaper and) it
was ignored by the operating system.  Windoze has enough difficulty
keeping *itself* running.  They didn't devise a mechanism for handling
parity errors because - what was the selling point?

 Users kept on buying buggy (alternative, similar word to that also)
software anyway and evidently tolerated occasional failure; message to
manufacturers: "It doesn't matter".  (The minority of customers who want
reliable software just migrate to Linux, don't they?)

 Why waste effort on an operating system which can survive parity
faults (retest faulty area; reload module in error; relocate if
necessary, revive process and continue) if it won't sell more copies or
more computers?

 It just didn't fit the *style*.
--
 Cheers,
       Paul B.

More... (looser matching)
- Last day of these posts
- In 1999 , 2000 only
- Today
- New search...