Searching \ for 'Do an evil ghost live in my PIC18FxxJxx ?' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/microchip/devices.htm?key=18F
Search entire site for: 'Do an evil ghost live in my PIC18FxxJxx ?'.

No exact or substring matches. trying for part
PICList Thread
'Do an evil ghost live in my PIC18FxxJxx ? - using '
2007\08\27@173410 by Morgan Olsson

flavicon
face
Hi

We are having severe trouble with PIC18 programmed in C.

Is there anybody who have experienced anything odd, like functions  
returning wrong value, if statements evaluating erroneously, suspected  
nonprovoked jumps.. *occasionally*  while everything works OK during most  
executions of the same parts of code, then after a random time - bang!??

This is driving us nuts, me (hardware designer and assembler guy), and my  
collegue who does all programming, in C (I myself is not very C litterate)

To start from the beginning:
We have designed two cirquit boards, one is measurement and user I/O slave  
based on PIC16F883 and a couple precision A/D and other I/O, and the main  
control board use PIC18F66J15.

Both are programmed in C using CCS PCWH4.x compiler.

The PIC16 board works perfectly, so it seems we are not too stupid ;)

The PIC18 on the other hand works mostly, but once in some million  
structions or so it does some bad thing.
We can not explain what is the cause and whe have wasted weeks now on this  
it and is over the intended deadline already...
Simply, when we loaded the first program version we was happy to see it  
worked at all; bidirectional comms timesharing power on one line, also  
serial communication to a VFC motor driver, and some logic between.
But.. occasionally it just goes wrong.

For example we caught this behaviour using ICD2 and extra code to sample  
variables:
We shut of interrupts, then execute a part of program that calls a  
function, the function always evaluates correctly but *occasionally* the  
returned value is *not the evaluated one* - the function theoretically  
even cannot reuturn that value we got at the reveiving end!
AND THIS IS WITH INTERRUPTS SHUT OFF!!!!
Also, we could not find an error in the generated assembly code.
At that point, we thought it must be EMI or bad hardware.

But... we eliminated all hardware and EMI problems;


o  We tested to move the PCB away from the VFC, put it in a metal box,  
shielded the processor with copper foil, supply by batteries instead of  
the switchmode converter, toriods on cables.  Also tried cooling it far  
below freezing point, and warmed it hot using a hair dryer.  Also varied  
CPU voltage (core and I/O) to max and min.  All this had ZERO impact on  
error rate.

o  We also scoped the VDDCORE (which also is VDD) to be perfectly fine.  
Added extra capacitors of other types and even daming R-C just in case.  
Still no difference.

o  We changed to another PIC individual - same behaviour

o  Changed the design to use the similar PIC18F6722 (higher voltage) on  
another cirquit board: same problems (plus more, this chip have a bigger  
errata...)

o  Also we asked Microchip support if any execution bugs are known in this  
chip, answer is no.

So we rule out hardware problems.

My collegue have found and corrected some own errors in the source code,  
but still the basic problem i have described is not found.

But we also cannot understand how that problem expressing itself that  
randomly can be related to our own code.
-Or the compiler for that matter.
So if we rule out PIC chip, surrounding hardware, compiler and source  
code, what is left?
Nothing?
Still this stupid behavoiur!!  AAAAAH.


We have analysed parts of what the compiler have generated.
Some parts are smart, some very clumsy, but not really wrong.
We changed between a few 4.x compiler verisons and also ported it to 3.x  
compiler as a lot of users still prefer that and call 4.x still to be in  
beta.  Still we have about the same problem.

The problem seem to wander as we insert debug code.
We even have seen simple if statements go wrong!!  occasionally.
The error mostly happens when system operation mode is changing, when  
there are a lot of variables changing - but sometimes it just sit and  
change operation mode by itself...

It seems like there is some ghost throwing a dice and rewrites some  
register randomly, and/or cause the program to jump and/or return to wrong  
adress after call.

Even with interrupts shut of we have observed "something" hitting us.
Also the most spooky thing is that in one code setup PortB interrupt fired  
always after a timer interrupt althoug we could not find in hardware or  
code why it ever wold do that.

We ponder switching to C18 compiler.  Maybe the problem is not the CCS  
compiler, but the rewrite might make us find an source code problem, plus  
C18 supports REAL-ICE for better debugging.
But it is time consuming to port from CCS C, we are at the deadline  
already, so a direct fix would be much better.


The erratas we have found are
ww1.microchip.com/downloads/en/DeviceDoc/80246b.pdf
ww1.microchip.com/downloads/en/DeviceDoc/80315a.pdf
It was not easy to find both. Maybe we missed more erratas?

Our main thread on CCS forum on this:
www.ccsinfo.com/forum/viewtopic.php?t=31672
"zilog" is my programming collegue on this project.

We got a lot of help there, but nothing that found the problem.
Remembering the wisdom here on PICLIST i now turn to you  ;)

--
Morgan Olsson

2007\08\27@180639 by Harold Hallikainen

face
flavicon
face
Regarding functions returning values different than what was in the
function, make sure the code that calls the function has the correct
prototype for the function. If, for example, the header file is not
specified, the compiler assumes functions return an int, which is often
not the case!

Hope that is SOME help.

Harold
Been there, done that...


--
FCC Rules Updated Daily at http://www.hallikainen.com - Advertising
opportunities available!

2007\08\27@182233 by Morgan Olsson

flavicon
face
The thing is that the code works perfectly most of the time for thousands  
of iterations.
Then suddenly *the same part of the code* do something erratically.

Like if some value was overwroitten by an interrupt or something.
But that happens also with interrupts globally disabled...

/Morgan


Den 2007-08-28 00:06:35 skrev Harold Hallikainen <spam_OUTharoldTakeThisOuTspamhallikainen.org>:

{Quote hidden}

--
Morgan Olsson

2007\08\27@190106 by Harold Hallikainen

face
flavicon
face
These CAN be fun! Since function parameters and larger return values (I
think 8 bit values are returned in w) are returned on the software stack,
if the return value is misinterpreted as to type, it MAY return the
correct value if other stuff left that stack memory in the correct
condition, and not when it hasn't. I spent a LOT of time single stepping
through code and watching C18 return a wrong value now and then. My
problem turned out to be a missing #include of the header for the function
in the file where I called it. This may not be your problem, though..

Good luck!

Harold


{Quote hidden}

> -

2007\08\28@041538 by Ruben Jönsson

flavicon
face
Hi Morgan and welcome back,

First, I am not that familiar with the 18F (yet) but I have been through
similar cases with other micros.

1. Try to reduce the code more and more until the problem goes away and then
add things back to see where the problem is.

2. Mostly when I have had these kind of problems, it has been the growing stack
that has overwritten static structures and variables in memory. This can be
kind of fun since the problem appears to be somewhere quite different than it
really is.

3. Uninitialized variables that mostly have the right value to start with but
sometimes, depending on prior use of memory, not. (This may be something that
differs between debug and release since debug might initialize allocated memory
but release won't or not to the same value.)

4. Can the code be simulated? Does the simulator show the same symptoms?

5. Does it show up in both debug and release? (does this even exist on 18F
compilers?)

6. If all else fails, a realtime trace would show you exactly what happens. A
real emulator could perhaps be borrowd or rented for shorter times. I know, it
costs a lot and takes time to learn how to handle, but it is invaluable in
these kind of situations.

7. Could it be hardware related? Does your code perhaps manipulate hardware
that could be dangerous to your micro, ie a momentary state that would draw a
lot of current causing a short surge on the power supply? No shielding would
help here. Are all pins on the micro operating within absolute maximum ratings?
No current into protection diodes?

8. Overflow in intermediate variables?

9. Static buildup on moving parts, with tiny discharges causing hickups? (I
think I have this in a propeller clock circuit.)

10. Also agree with Harold that you should make sure that all functions have
the expected signature everywhere - return type, parameters types, parameter
passing methods, big/little endian use...

Good luck

/Ruben


{Quote hidden}

> -

2007\08\28@133651 by Morgan Olsson

flavicon
face

Den 2007-08-28 09:11:25 skrev Ruben Jönsson <rubenspamKILLspampp.sbbs.se>:

> Hi Morgan and welcome back,

Hi Ruben :)

Thank you for the ideas; I will forward to my programmer guy.
Small notes interspersed below.
/Morgan

> First, I am not that familiar with the 18F (yet) but I have been through
> similar cases with other micros.
>
> 1. Try to reduce the code more and more until the problem goes away and  
> then
> add things back to see where the problem is.

We try.  The problem seem to wander...

> 2. Mostly when I have had these kind of problems, it has been the  
> growing stack
> that has overwritten static structures and variables in memory. This can  
> be
> kind of fun since the problem appears to be somewhere quite different  
> than it
> really is.

My collegue (who is the programmer on this) have checked best he can...

> 3. Uninitialized variables that mostly have the right value to start  
> with but
> sometimes, depending on prior use of memory, not. (This may be something  
> that
> differs between debug and release since debug might initialize allocated  
> memory
> but release won't or not to the same value.)

Iĺl tell my programmer

> 4. Can the code be simulated? Does the simulator show the same symptoms?
>
> 5. Does it show up in both debug and release? (does this even exist on  
> 18F
> compilers?)

I am not sure what you mean vith debug and release?

We have an SPI drive UART we pus out debug info thrpugh (the onchip real  
UARTS are used in the application)
We also use pins and also an 4bit resistive D/A to track what happens,  
plus ICD2.

> 6. If all else fails, a realtime trace would show you exactly what  
> happens. A
> real emulator could perhaps be borrowd or rented for shorter times. I  
> know, it
> costs a lot and takes time to learn how to handle, but it is invaluable  
> in
> these kind of situations.

We need 40MHz to keep up communication and stuff.
IIRC real emulators only go to 25MHz.
I have now bought a REAL-ICE, and my programmer is porting to C18 in order  
to utliize it.
AFAIK it can not do full trace but report some kind of trace...

> 7. Could it be hardware related? Does your code perhaps manipulate  
> hardware
> that could be dangerous to your micro, ie a momentary state that would  
> draw a
> lot of current causing a short surge on the power supply? No shielding  
> would
> help here.

All are safe.

Are all pins on the micro operating within absolute maximum
> ratings?
> No current into protection diodes?

Good points.  Checked.

> 8. Overflow in intermediate variables?

Even the generated assembly looks good for a part we checked extra well.

> 9. Static buildup on moving parts, with tiny discharges causing hickups?

No way.

{Quote hidden}

>> -

2007\08\28@145329 by David VanHorn

picon face
Have you checked that your crystal is within the speed specifications
for the part, and actually running at the right amplitude?

Oscillator margin test?

2007\08\28@145626 by David VanHorn

picon face
Other thoughts:  I am a little paranoid about uninitted variables, so
I write $00 to all memory at boot, except for a couple locations that
I leave crash state variables in, which must survive a reset.

That way, even if I do have an uninitted variable, it starts off with
a known value.
I suppose if I wanted to take the other track, I could fill all of ram
with random noise, that way it would shake out any problems that
paving to $00 would otherwise hide.

2007\08\28@163130 by Ruben Jönsson

flavicon
face
>
> > 5. Does it show up in both debug and release? (does this even exist on  
> > 18F
> > compilers?)
>
> I am not sure what you mean vith debug and release?
>

In environments with enough resources (like a PC, pocket PC or higher end
embedded processors) the code in the libraries for standard functions are
written to behave somewhat differently depending on one or more preprocessor
symbols by the use of conditional compilation (#if, #ifdef, #else, #end and so
on).

During development, the compiler and linker can, by declaring a preprocessor
symbol (_DEBUG for example), be set to make more debug friendly code which
includes things like trace messages, validation of parameters, initialization
of allocated memory and so on. This is called the debug version of the program.

When the development and testing is finished, the code is compiled and linked
as a release version, which excludes a lot of the code that whas useful during
development and testing. The release version and the debug version should be
substantially identical regarding the inteded operations of the produced
functions. However, things like initialization of allocated memory in the debug
version could make the code operate differently in the release version if the
side effects of the debug version has been taken for granted by the programmer.

I don't know if this feature even exists for the PIC 18F C compilers, but it
could be a reason to why a program works during development but not standalone.

/Ruben


==============================
Ruben Jönsson
AB Liros Electronic
Box 9124, 200 39 Malmö, Sweden
TEL INT +46 40142078
FAX INT +46 40947388
.....rubenKILLspamspam.....pp.sbbs.se
==============================

2007\08\29@025650 by Morgan Olsson

flavicon
face
It does not seem to be uninitialised variables; everything works for a lot  
of iterations, then some variable that already have been handled OK,  
suddenly gets a completely wrong value.

/Morgan

Den 2007-08-28 20:56:23 skrev David VanHorn <EraseMEmicrobrixspam_OUTspamTakeThisOuTgmail.com>:

{Quote hidden}

--
Morgan Olsson

2007\08\29@031719 by Morgan Olsson

flavicon
face
Den 2007-08-28 20:53:28 skrev David VanHorn <microbrixspamspam_OUTgmail.com>:

> Have you checked that your crystal is within the speed specifications
> for the part, and actually running at the right amplitude?
>
> Oscillator margin test?

Done all.
Experimented with series resistors, different xtal type, borrowed a fast  
oscilloscope to see.
Large margins.

Also tried going down to 8MHz xtal (32MHz PLL)
Below that we can not keep up communication with the rest of the system  
(without a lot rewrite of the program)

Also tried varying voltage and temperature to recommended max and min.

Nothing we have tried in hardware changed the problem.

And we tried a LOT of measures, some are listed here
http://www.ccsinfo.com/forum/viewtopic.php?p=84891#84891

--
Morgan Olsson

2007\08\29@032654 by Richard Prosser

picon face
Morgan,
You've probably already checked and I'm no expert on the 18 series
(never used one) but don't they have prioritised interrupts? Could you
be getting a double interrupt & the temporary storage is getting
corrupted in the handler routine? The compiler should look after this
but it may require a #pragma or setup option.

RP

On 29/08/2007, Morgan Olsson <@spam@ost011KILLspamspamosterlen.tv> wrote:
{Quote hidden}

> -

2007\08\29@034349 by Morgan Olsson

flavicon
face

Thank you for the explanation, Ruben :)
I we have not seen that debugcode option in the compiler.

I built a 4-bit D/A on some pins to which we for example can dump a signature value for where the program is (one value for each interrupt for example) and track with analog or digital oscilloscope, and we use other pins directly, and also use ICD2.

Using Microchip C18 compiler and REAL-ICE you can select some events and variables to track while the CPU is running full speed, but AFAIK you cannot track everything that happens.

Better than nothing i bought REAL-ICE and we are changing from CCS PCWH to C18.
BTW, Hitec will begin to support REAl-ICE in next version coming in a month or two.

/Morgan

Den 2007-08-28 22:31:37 skrev Ruben Jönsson <RemoveMErubenTakeThisOuTspampp.sbbs.se>:

{Quote hidden}

--
Morgan Olsson

2007\08\29@040806 by Xiaofan Chen

face picon face
On 8/29/07, Morgan Olsson <TakeThisOuTost011EraseMEspamspam_OUTosterlen.tv> wrote:
> Using Microchip C18 compiler and REAL-ICE you can select
> some events and variables to track while the CPU is running
> full speed, but AFAIK you cannot track everything that happens.

I have not used Real-ICE but I hear that it is better than
ICD2. I have used ICE2000 and I think it is quite good.

> Better than nothing i bought REAL-ICE and we are changing
> from CCS PCWH to C18.

Good move! What I hear from my previous colleague was that
CCS was really buggy and the bugs keep changing across
different versions. Last time one of my colleagues specifically
mentioned not to switch CCS version for code maintenance.

CCS may be very good for some people because of the
provided libraries but what I hear is that it is not really up to
the standard of Hitech PICC.

C18 has bugs as well. What I hear is that the libraries are not
good. Normally the user should write his own peripheral
libraries.

> BTW, Hitec will begin to support REAl-ICE in next version
> coming in a month or two.

I have only uese Hitech PICC for the PIC16 series some years
ago. They were really good and should still be very good now.
At that time I used PICC and ICE2000 and I liked them both.
The processor module for ICE2000 is not cheap though. It is
said that PICC18 is also very good but I have never used them.

Regards,
Xiaofan

2007\08\29@043841 by Morgan Olsson

flavicon
face
Den 2007-08-29 09:26:52 skrev Richard Prosser <RemoveMErhprosserspamTakeThisOuTgmail.com>:

> Morgan,
> You've probably already checked and I'm no expert on the 18 series
> (never used one) but don't they have prioritised interrupts?

Yes, and we use that facility.

> Could you be getting a double interrupt

No i could not find interrupts being re-enabled, and high pri seem it can not disturb low pri.

> & the temporary storage is getting
> corrupted in the handler routine?

I have checked the generated save/restore sequences for both levels and they seem OK; saving to different areas, and high priority interrup - and only that - use "RETFIE, FAST".

/Morgan


--
Morgan Olsson

2007\08\29@045559 by Morgan Olsson

flavicon
face
Den 2007-08-29 10:08:01 skrev Xiaofan Chen <xiaofancEraseMEspam.....gmail.com>:

> On 8/29/07, Morgan Olsson <EraseMEost011spamosterlen.tv> wrote:
>> Using Microchip C18 compiler and REAL-ICE you can select
>> some events and variables to track while the CPU is running
>> full speed, but AFAIK you cannot track everything that happens.
>
> I have not used Real-ICE but I hear that it is better than
> ICD2. I have used ICE2000 and I think it is quite good.
>
>> Better than nothing i bought REAL-ICE and we are changing
>> from CCS PCWH to C18.
>
> Good move! What I hear from my previous colleague was that
> CCS was really buggy and the bugs keep changing across
> different versions. Last time one of my colleagues specifically
> mentioned not to switch CCS version for code maintenance.

New versions pretty often fix bugs introduced in the former...
We keep install files for a few versions.
Having programmed around some compiler bugs, still this very problem i vent here is the same - even shows up when we changed to last version of elder 3.x series compiler.  That do sound like we have a problem in our code but we have gotten crazy trying to track it down.


> CCS may be very good for some people because of the
> provided libraries but what I hear is that it is not really up to
> the standard of Hitech PICC.
>
> C18 has bugs as well. What I hear is that the libraries are not
> good. Normally the user should write his own peripheral
> libraries.

I would prefer that for quality.  But time is running out on this project.
Thank you for the head-up!

I wish the compiler makers would keep official databases of bugs, so users need not beng into all bugs withput knowing about them until they show up in a bugfix release when having vasted lots of hours...

> I have only uese Hitech PICC for the PIC16 series some years
> ago. They were really good and should still be very good now.


Yes it looks from what i have seen people talk aboiut that it is good.
If i will do more programming myself i might buy the new Hitec compiler when they support REAL-ICE.
Other people find CCS libraries handy, so did we and at first it saved time.
Different compiler suits different needs, so i do not waht to start a discussion of which is best...
Currently we think we need REAL-ICE, and changing compiler means at least changign compiler bugs - it will be interesting...


--
Morgan Olsson

2007\08\29@050940 by Dario Greggio

face picon face
Morgan Olsson wrote:
> I have checked the generated save/restore sequences for both levels
> and they seem OK; saving to different areas, and high priority
> interrup - and only that - use "RETFIE, FAST".

Hi Morgan, a silly one, but have you checked that no "interrupt errata"
does exist on this part, as in some 18F parts?

--
Ciao, Dario

2007\08\29@051312 by Morgan Olsson

flavicon
face
Ah...!

I just realised i broke against the rules here; i forgot to tag the subject!
So i repost it using [PIC]
Sorry.
/Morgan


Den 2007-08-27 23:33:13 skrev Morgan Olsson <RemoveMEost011EraseMEspamEraseMEosterlen.tv>:

{Quote hidden}

--
Morgan Olsson

2007\08\29@064529 by Michael Rigby-Jones

picon face


>-----Original Message-----
>From: RemoveMEpiclist-bouncesspam_OUTspamKILLspammit.edu [RemoveMEpiclist-bouncesTakeThisOuTspamspammit.edu]
>On Behalf Of Dario Greggio
>Sent: 29 August 2007 10:10
>To: Microcontroller discussion list - Public.
>Subject: Re: Do an evil ghost live in my PIC18FxxJxx ? - using
>CCS compiler
>
>
>Morgan Olsson wrote:
>> I have checked the generated save/restore sequences for both levels
>> and they seem OK; saving to different areas, and high priority
>> interrup - and only that - use "RETFIE, FAST".
>
>Hi Morgan, a silly one, but have you checked that no
>"interrupt errata"
>does exist on this part, as in some 18F parts?

Even if no errata exists for the fast interrupt, I'd personly try the standard workarounds anyway.  Going by the vast quantity of errata for 18F devices in general, it's entirely possible you are seeing bug that Microchip haven't discoevered/documented yet.

Regards

Mike

=======================================================================
This e-mail is intended for the person it is addressed to only. The
information contained in it may be confidential and/or protected by
law. If you are not the intended recipient of this message, you must
not make any use of this information, or copy or show it to any
person. Please contact us immediately to tell us that you have
received this e-mail, and return the original to us. Any use,
forwarding, printing or copying of this message is strictly prohibited.
No part of this message can be considered a request for goods or
services.
=======================================================================

2007\08\29@064822 by Gerhard Fiedler

picon face
Xiaofan Chen wrote:

> Good move! What I hear from my previous colleague was that
> CCS was really buggy and the bugs keep changing across
> different versions. Last time one of my colleagues specifically
> mentioned not to switch CCS version for code maintenance.

That was my impression also about ten years ago when I checked it out. A
bit surprising -- and then not :) -- that this is still the case.

> I have only uese Hitech PICC for the PIC16 series some years
> ago. They were really good and should still be very good now.

I can second that. I've found the occasional compiler bug, but nothing like
the CCS bug dance.

Gerhard

2007\08\29@072712 by Dave Tweed

face
flavicon
face
Morgan Olsson <EraseMEost011spamspamspamBeGoneosterlen.tv> wrote:
> Is there anybody who have experienced anything odd, like functions
> returning wrong value, if statements evaluating erroneously, suspected
> nonprovoked jumps.. *occasionally*  while everything works OK during most
> executions of the same parts of code, then after a random time - bang!??

This has all of the earmarks of a stack overflow problem. The "random"
element is a result of it only happening when interrupts and the calling
sequence of the mainline code happen to stack up in a particular way.

In this case, you have to worry about both the hardware return stack in
the CPU as well as any software data stack that the compiler sets up.

For the hardware stack, you need to do an analysis of the maximum calling
depth of your mainline code + deepest interrupt sequence. Beware of any
recursion, and high/low priority interrupt nesting. CCS may have tools to
assist with this. You can also see whether STKFUL (stack full) and/or
STKUNF (stack underflow) are set when the error occurs.

I don't know what tools CCS provides for monitoring data stack usage. One
simple approach is to fill the stack area with some easily-recognized data
value before running the program, running it for a while, and then seeing
whether all of the stack locations got overwritten with other things.

-- Dave Tweed

2007\08\29@082636 by Morgan Olsson

flavicon
face
Den 2007-08-29 11:09:36 skrev Dario Greggio <RemoveMEadpm.toKILLspamspaminwind.it>:

> Morgan Olsson wrote:
>> I have checked the generated save/restore sequences for both levels
>> and they seem OK; saving to different areas, and high priority
>> interrup - and only that - use "RETFIE, FAST".
>
> Hi Morgan, a silly one, but have you checked that no "interrupt errata"
> does exist on this part, as in some 18F parts?

Been thinking that too.
I found two erratas as listed in bottom of my first post.
Nothing there that affect us.
It was not easy to find the erratas, which make me wonder if there are more of them hiding somewhere...?


--
Morgan Olsson

2007\08\29@082707 by Russell McMahon

face
flavicon
face
>> Is there anybody who have experienced anything odd, like functions
>> returning wrong value, if statements evaluating erroneously,
>> suspected
>> nonprovoked jumps.. *occasionally*  while everything works OK
>> during most
>> executions of the same parts of code, then after a random time -
>> bang!??

> This has all of the earmarks of a stack overflow problem. The
> "random"
> element is a result of it only happening when interrupts and the
> calling
> sequence of the mainline code happen to stack up in a particular
> way.

Watchdog can also do something like this.
Long ago I had code which seemed to be operating perfectly but I found
that it was continually being reset by the watchdog timer. The nature
of the task was such that it would do it's thing for N.X cycles of the
total task and then be reset and, on average, the result was that all
worked apparently OK. It would be very easy for this sort of thing to
work often but not always.


       Russell


2007\08\29@084432 by Morgan Olsson

flavicon
face
Den 2007-08-29 12:40:34 skrev Michael Rigby-Jones <Michael.Rigby-JonesSTOPspamspamspam_OUTbookham.com>:

>
>
>> {Original Message removed}

2007\08\29@084539 by Morgan Olsson

flavicon
face
Den 2007-08-29 13:27:11 skrev Dave Tweed <spamBeGonepicSTOPspamspamEraseMEdtweed.com>:
> This has all of the earmarks of a stack overflow problem.

I think we tried what we could in this direction.
Forwarding to my programmer guy anyway.
Thanks

We are in the works moving the project to C18 and will probably not try CCS again on this source code.

--
Morgan Olsson

2007\08\29@085856 by Michael Rigby-Jones

picon face


>-----Original Message-----
>From: KILLspampiclist-bouncesspamBeGonespammit.edu [EraseMEpiclist-bouncesspamEraseMEmit.edu]
>On Behalf Of Morgan Olsson
>Sent: 29 August 2007 13:39
>To: Microcontroller discussion list - Public.
>Subject: Re: Do an evil ghost live in my PIC18FxxJxx ? - using
>CCS compiler
>
>
>The copiler generates the save and restore sequences...
>How do we (on C18) tell it not to use the fast return stack
>for any interrupt nor subroutine?

I do not use C18, so I can't give a definitive answer, but the following thread on the Microchip forum gives a neat  workaround:
<http://forum.microchip.com/tm.aspx?m=266134&mpage=1&key=disable%2cglobal%2cinterrupts&#266134>


>
>How do we tell it to when disabling interrupts while acessign
>variables, it handle interrupt disable like on old PIC16 (i
>guess that is what you meant); disalbe interrupt and if it did
>not get disabled loop to do it again.

I think only the early 16F parts suffered from the interrupt disabling bug, I've not seen any errata for the 18F regarding this.

Regards

Mike

=======================================================================
This e-mail is intended for the person it is addressed to only. The
information contained in it may be confidential and/or protected by
law. If you are not the intended recipient of this message, you must
not make any use of this information, or copy or show it to any
person. Please contact us immediately to tell us that you have
received this e-mail, and return the original to us. Any use,
forwarding, printing or copying of this message is strictly prohibited.
No part of this message can be considered a request for goods or
services.
=======================================================================

2007\08\29@104427 by Dario Greggio

face picon face
Morgan Olsson wrote:
> The copiler generates the save and restore sequences... How do we (on
> C18) tell it not to use the fast return stack for any interrupt nor
> subroutine?

Everything is in that thread that Micheal suggested.

> How do we tell it to when disabling interrupts while acessign
> variables, it handle interrupt disable like on old PIC16 (i guess
> that is what you meant); disalbe interrupt and if it did not get
> disabled loop to do it again.

no, this should not be needed any longer on 18F.
As for the "atomicity", you have to care about it (if I understand
correctly your point)


--
Ciao, Dario

2007\08\29@125316 by Morgan Olsson

flavicon
face
Den 2007-08-29 14:13:44 skrev Russell McMahon <@spam@apptech@spam@spamspam_OUTparadise.net.nz>:

> Watchdog can also do something like this.
> Long ago I had code which seemed to be operating perfectly but I found
> that it was continually being reset by the watchdog timer. The nature
> of the task was such that it would do it's thing for N.X cycles of the
> total task and then be reset and, on average, the result was that all
> worked apparently OK. It would be very easy for this sort of thing to
> work often but not always.
>
>
>         Russell

Thanks for your input.

We do have destruction of passed variables between function while it is running, and we do not have it resetting.

And watchdog is turned off now until we get the system running.



--
Morgan Olsson

2007\08\29@160756 by Barry Gershenfeld

face picon face
First of all, I'd have to answer your original question..."Yes, there's a
ghost".  We've all had these.

I would have liked to mention that CCS has a forum, but I see you've
visited it already.

I would like to suggest the same thing that Harold did, about "implicit
forward referencing", but I have to say that although I've been bitten by
this several times, the effect is rather immediate; I didn't have to wait
to start seeing funny results.  I did want to mention that in the scenario
where part of a returned value is actually "random memory values", that if
you depend on that part of the data to happen to be zero, a lot of times it
will be, just by chance.  About the compiler,  I use the CCS compiler and
it is very willing to do this forward reference thing.  I've otherwise had
pretty good luck with the compiler but I have not gone to the 4.x
versions.  You may want to try a 3.x version if it supports your chip.

It looks like you have done due diligence in the hardware area, so we
should look at the firmware carefully.  I do remember some errata on 18F
parts where they'd execute the wrong code under certain circumstances, at
speeds over 25MHz, and in some cases it was speeds over 4 MHz.  I would
like to say that's all fixed now, but I don't recall seeing a statement to
that effect.  But if it's not in your errata then it's probably fixed.

I had a problem not unlike yours several years ago, not with a PIC but in
an H8 system.  There was a task monitor that would periodically call
tasks.  Each task had a count-down timer and a flag that would determine
when it would run.  The problem was that the flag would get corrupted, so
that when the timer got to zero the task would never run.  I wrote all
kinds of code first to discover this condition, and then monitor it.  I
could watch the problem happen.  I even wrote code to detect it and restart
the task.  I never found the cause, though.  There was an interrupt clock
and I always suspected it was the interrupts.   By the way, one technique I
used was to snapshot some variables during the interrupt, and then when the
scheduler was idle, I would print out the values.  This allowed me to
monitor things that are normally difficult to see.  Another technique I
could use was to just shut off the interrupts and print the
information.  In this case the device in question was able to function even
with everything stalled now and then.  In this case I discovered that the
bug would occur more frequently when I was stalling it.  This is one of the
points I want to make--see if you can get it to screw up more.  It makes it
easier to find the bug.

The other advice I also agree with.  Remove as much of the code as you can
without disturbing the problem you are trying to find.   This narrows the
field to examine.  Try to get it to fault more often.  And throw at it any
monitoring tricks you can think of.

Oh, and it always turns out to be something stupid.  I can't reveal how I
know this.

Best of luck,
Barry

2007\08\29@205552 by Russell McMahon

face
flavicon
face
A few semi-random thoughts - probably all been covered already or
irrelevant, but just may catalyse a useful thought:


- Assuming persistence of non persistent local variables?

- Non initialisation of variables (mainly only for machine language -
a good compiler won't let this happen)

- Trashing interupt / subroutine return stacks by any mix of adding or
removing data improperly, mixing subroutine/interupt returns, overflow
of stack into other variable space, use of stack space by other
variables, stack enters badly behaved memory space (eg some processors
corrupt the last byte at top or bottom of memory due to wrap around
bugs), ... (Compiler should stop many of these).

- Local / global confusions.

- Non re-entrant re-entrant code. Reentrant operations occurring due
to running out of IRQ time and recalling IRQ routines you are already
in.

- Running out of IRQ time (effect varies depending on how it is
handled).

- Occasional variable overflow with truncation when it occurs.

- Occasional variable overflow with corruption of  adjacent memory
when it occurs.

- Flash memory marginally programmed - occasional flippy bit(s)

- *** Body diodes conducting, even very very very slightly *** - rapid
random anything but straight line program operation can occur often /
sometimes / very occasionally / almost never.



I've seen or heard of and/or caused most of these in one form or
another over the years.



           Russell


2007\08\31@050842 by Morgan Olsson

flavicon
face
Thank you Russel McMahon and Barry Gershenfeld
All are good points and ideas.

We have already tried moving to 3.x compiler.
Strangely still similar problem.

/Morgan

2007\08\31@051217 by Morgan Olsson

flavicon
face



------- Vidaresänt brev -------
Från: "Russell McMahon" <spamBeGoneapptechspamKILLspamparadise.net.nz>
Till: "Microcontroller discussion list - Public." <.....piclistspam_OUTspammit.edu>
Kopia:
Ärende: Re: Do an evil ghost live in my PIC18FxxJxx ? - using CCS  compiler
Datum: Thu, 30 Aug 2007 01:39:09 +0200

A few semi-random thoughts - probably all been covered already or
irrelevant, but just may catalyse a useful thought:


- Assuming persistence of non persistent local variables?

- Non initialisation of variables (mainly only for machine language -
a good compiler won't let this happen)

- Trashing interupt / subroutine return stacks by any mix of adding or
removing data improperly, mixing subroutine/interupt returns, overflow
of stack into other variable space, use of stack space by other
variables, stack enters badly behaved memory space (eg some processors
corrupt the last byte at top or bottom of memory due to wrap around
bugs), ... (Compiler should stop many of these).

- Local / global confusions.

- Non re-entrant re-entrant code. Reentrant operations occurring due
to running out of IRQ time and recalling IRQ routines you are already
in.

- Running out of IRQ time (effect varies depending on how it is
handled).

- Occasional variable overflow with truncation when it occurs.

- Occasional variable overflow with corruption of  adjacent memory
when it occurs.

- Flash memory marginally programmed - occasional flippy bit(s)

- *** Body diodes conducting, even very very very slightly *** - rapid
random anything but straight line program operation can occur often /
sometimes / very occasionally / almost never.



I've seen or heard of and/or caused most of these in one form or
another over the years.



           Russell



--
Morgan Olsson

2007\08\31@125717 by Morgan Olsson

flavicon
face
Thank you Russel McMahon and Barry Gershenfeld
All are good points and ideas.

We have already tried moving to 3.x compiler.
Strangely still similar problem.

/Morgan
--
Morgan Olsson

More... (looser matching)
- Last day of these posts
- In 2007 , 2008 only
- Today
- New search...