pxCurrentTCB corrupted

Working with FreeRTOS v7.1 and Rx62N port. My application would run for several hours or up to a day then execute the BRK exception handler. I have removed and verified that there are no BRK instructions in my application. Upon investigating after a invalid BRK exception, I found that somehow the pxCurrentTCB variable was being set to 2 instead of the address of a valid TCB. When this bogus pointer value was used to switch contexts the USP would point to an area of memory that was zeros – which, when excuted is the BRK instruction. All of the tasks stacks and interrupt stack appear to have plenty of room, and I have stack checking enabled. Using my Renesas E1 emulator I set a event breakpoint for a write to the pxCurrentTCB address with a value of 2. The event captured this bogus write at the end of vTaskSwitchContext() called from within vSoftwareInterruptISR(), prvYieldHandler() at the MVTIPL #1 instruction. 253 void vSoftwareInterruptISR( void )
254 {
255 prvYieldHandler();
     FFFE89B0 7FA8             _vSoftwa SETPSW      I
     FFFE89B2 7EAF                      PUSH.L      R15
     FFFE89B4 FD6A2F                    MVFC        USP,R15
     FFFE89B7 60CF                      SUB         #0CH,R15
     FFFE89B9 FD68F2                    MVTC        R15,USP
     FFFE89BC E00F                      MOV.L       ,
     FFFE89BE E50F0101                  MOV.L       04H,04H
     FFFE89C2 E50F0202                  MOV.L       08H,08H
     FFFE89C6 62C0                      ADD         #0CH,R0
     FFFE89C8 7FA9                      SETPSW      U
     FFFE89CA 6E1E                      PUSHM       R1-R14
     FFFE89CC FD6A3F                    MVFC        FPSW,R15
     FFFE89CF 7EAF                      PUSH.L      R15
     FFFE89D1 FD1F0F                    MVFACHI     R15
     FFFE89D4 7EAF                      PUSH.L      R15
     FFFE89D6 FD1F2F                    MVFACMI     R15
     FFFE89D9 6D0F                      SHLL        #16,R15
     FFFE89DB 7EAF                      PUSH.L      R15
     FFFE89DD FBF2740E0100              MOV.L       #00010E74H,R15
     FFFE89E3 ECFF                      MOV.L       ,R15
     FFFE89E5 E3F0                      MOV.L       R0,
     FFFE89E7 757004                    MVTIPL      #4H
     FFFE89EA 05BCC7FF                  BSR.A       _vTaskSwitchContext
     FFFE89EE 757001                    MVTIPL      #1H                                        <- this is where the pxCurrentTCB=2 event occurred.
     FFFE89F1 FBF2740E0100              MOV.L       #00010E74H,R15
     FFFE89F7 ECFF                      MOV.L       ,R15
     FFFE89F9 ECF0                      MOV.L       ,R0
     FFFE89FB 7EBF                      POP         R15
     FFFE89FD FD171F                    MVTACLO     R15
     FFFE8A00 7EBF                      POP         R15
     FFFE8A02 FD170F                    MVTACHI     R15
     FFFE8A05 7EBF                      POP         R15
     FFFE8A07 FD68F3                    MVTC        R15,FPSW
     FFFE8A0A 6F1F                      POPM        R1-R15
     FFFE8A0C 7F95                      RTE        
     FFFE8A0E 03                        NOP        
     FFFE8A0F 03                        NOP        
     FFFE8A10 02                        RTS        
256 } It looks like perhaps the listGET_OWNER_OF_NEXT_ENTRY(pxTCP, &(pxReadyTasksLists ) is returning an incorrect value ?  The value of uxTopReadyPriority = 2 and the contents of pxReadyTasksLists is: ADDRESS    LABEL            +0         ASCII
0001046C   _pxReadyTasksLi  00000001   ….
00010470                    000079A4   .y..
00010474                    FFFFFFFF   ….
00010478                    000079A4   .y..
0001047C                    000079A4   .y..
00010480                    00000003   ….
00010484                    0000B2FC   ….
00010488                    FFFFFFFF   ….
0001048C                    0000980C   ….
00010490                    0000A55C   …
00010494                    00000001   ….
00010498                    0001051C   ….
0001049C                    FFFFFFFF   ….
000104A0                    0000ACF4   ….
000104A4                    0000ACF4   ….
000104A8                    00000000   ….
000104AC                    000104B0   ….
000104B0                    FFFFFFFF   ….
000104B4                    000104B0   ….
000104B8                    000104B0   …. Any ideas on how to track this down ?  I’ve been trying to debug this for several weeks. Steven J. Ackerman, Consultant
ACS, Sarasota, Florida
http://www.acscontrol.com

pxCurrentTCB corrupted

Here are some general comments: + You mention that the stacks seem to have enough space.  Do you also have the stack overflow protection switched on?  Be aware that the stack overflow checking only checks the task stack, not the interrupt stack. + Are you 100% sure that all interrupts that make use of FreeRTOS API functions have a priority at or below configMAX_SYSCALL_INTERRUPT_PRIORITY (lots of people get this point wrong on Cortex-M3 chips, not surprisingly as the settings are complex, on those, it is much simpler on the RX though).  This type of corruption is a symptom of an incorrect value. + Have you checked against all the notes on the following page: http://www.freertos.org/FAQHelp.html + In your code snipped you note that pxCurrentTCB gets corrupted after the call to vTaskSwitchContext () .  That is unlikely because nothing is being written to memory at that point – unless the corruption is occurring from a nested interrupt.  What is much more likely is that the corruption is happening inside the vTaskSwitchContext() function – because that is where pxCurrentTCB is.  If you could catch it in there that may provide more useful information.  What would be even more useful would be to catch the corruption in the list you reference when it happens, rather than when the corrupt data is used. + Are you able to selectively remove pieces of functionality from your code in an attempt to isolate the cause (sometimes that can help, but sometimes changing the execution patter just moves the problem somewhere else). Regards.

pxCurrentTCB corrupted

Richard- First, thank you for your reply. I have stack overflow protection switched on. I can see the A5(s) in each task’s stack – there is plenty of room in each. I initialize the interrupt stack in my reset handler – before any C section initialization and main() is called. I’ve looked at the interrupt stack and there is plenty of room there as well. I will verify that I’m not calling any FreeRTOS API functions from any interrupt handlers above configMAX_SYSCALL_INTERUPT_PRIORITY. This is a possiblity as I do have some interrupts that run at higher priority and there may be an execution path. I have read and just re-read the FAQHelp again. The pxCurrentTCB appears to get written to 2 at the line I indicated – at least that is where the emulator stopped execution for the data access write=2 @ pxCurrentTCB event. I’m not sure if the access is pipelined or what delay the emulator introduces – I agree that the MVTIPL #1 instruction should not be the instruction that is corrupting the pxCurrentTCB – it is probably in the list handling at the end of vTaskSwitchContext().  I will try to establish a combined event of execution address and data write access to see if I can better establish the place where the write of 2 is occurring. Desperate to solve this as it is keeping me from selling this product. Regards, Steven J. Ackerman, Consultant
ACS, Sarasota, Florida
http://www.acscontrol.com

pxCurrentTCB corrupted

The stopping point may well be delayed due to pipelining in the processor, so if one of the last things done in vTaskSwitchContext is writing the new value, then returning, the return likely starts before the write starts, and thus is allowed to complete (landing you where you did) before the breakpoint hits. One thing that I can think of that can cause this sort of thing is if you have a TickIdleHook routine that blocks. This can cause the system to not have ANY task to run and then you get into a crash like this.

pxCurrentTCB corrupted

Richard- Just so I’m clear on this – this approach to handling an interrupt from a source that is higher priority than configMAX_SYSCALL_INTERRUPT_PRIORITY would be OK ? The RTC interrupt has a priority of 5 and occurs once/second. I call xTaskResumeFromISR() from the interrupt handler to wakeup the Timer1S task which reads the current time and queues a message into an event handler queue: /////////////////////////////////////////////////////////////////////////////
// local variables
static uint32_t  time_counter32bit;
static EVENT_MSG msg;
static tm time;
static void (*RTC_Callback)(void); /////////////////////////////////////////////////////////////////////////////
// rtc interrupt handler
#pragma interrupt Interrupt_RTC(vect=VECT_RTC_PRD)
void Interrupt_RTC(void)
{
/* Call the user function iff defined */
if (RTC_Callback != NULL)
{
RTC_Callback();
}
} /////////////////////////////////////////////////////////////////////////////
// initialize the RTC
void RTC_Init(void (* callback)(void))
{
rtcStop(); /* periodic interrupt 1Hz */
RTC.RCR1.BIT.PES = 0x6u; /* reset RTC */
RTC.RCR2.BIT.RESET = 1; /* setup the RTC periodic interrupt callback */
RTC_Callback = callback; /* Disable interrupt requests */
ICU.IER.BIT.IEN_RTC_ALM = 0;
ICU.IER.BIT.IEN_RTC_PRD = 0;
ICU.IER.BIT.IEN_RTC_CUP = 0; /* Enable RTC periodic interrupt requests */
RTC.RCR1.BIT.PIE = 1; /* Enable RTC carry interrupt requests (so we can read it in RTC_GetTime()) */ /* initialize periodic interrupt priority */
ICU.IPR.BIT.IPR = RTC_INTERRUPT_PRIORITY; /* Enable periodic interrupt requests */
ICU.IER.BIT.IEN_RTC_PRD = 1; rtcStart();
} /* called from RTC interrupt */
void rtc_interrupt_callback(void)
{
(void)xTaskResumeFromISR(Timer1SHandle);
} /* task created by TimerRTC_init */
static void Timer1S(void *pvParameters)
{
EVENT_MSG msg; msg.source = MS_DISPLAY;
msg.event = ME_TIME_UPDATE; RTC_Init(rtc_interrupt_callback); for (;;)
{   
// awoken by xTaskResumeFromISR() in rtc_interrupt_callback() above
vTaskSuspend(NULL); // a 32 bit counter that is updated every tick ( 1 second)
time_counter32bit++; RTC_GetTime(&time); timeUpdated = true; /* notify event manager */
msg.param.value = time_counter32bit;
(void)xQueueSend(EventQueue, (void *)&msg, 0UL); /* RTOS_USAGE */
}
} static void Timer1S(void *pvParameters);        // The task of Event Manager
xTaskHandle Timer1SHandle;     /* RTOS_USAGE */ void TimerRTC_init(void)      
{
/* create the timer read task */
(void)xTaskCreate(Timer1S, “t:Time”, configMINIMAL_STACK_SIZE, NULL, mainDEMOTASKS_PRIORITY + 1, &Timer1SHandle);        /* RTOS_USAGE */ time.tm_sec = 00;
time.tm_min = 45;
time.tm_hour = 16;
time.tm_am_pm = T_PM;
time.tm_wday = TUE;
time.tm_mday = 26;
time.tm_mon = JUN;
time.tm_year = 2012; RTC_SetTime(&time); timeUpdated = false;
}         Steven J. Ackerman, Consultant
ACS, Sarasota, Florida
http://www.acscontrol.com

pxCurrentTCB corrupted

Firstly, I hate the function xTaskResumeFromISR().  I added it because people requested it, but it can be dangerous if interrupts come in faster than the task that gets resumed can execute from the resume point back to the suspend point (using a semaphore take/give latches events so doesn’t miss them, whereas task suspend/resumes don’t latch events).  I don’t think this is the issue in this case though if the interrupt is only coming in once a second it should not be a problem.
The RTC interrupt has a priority of 5 and occurs once/second.
So the interrupt is calling an API function – you didn’t mention what configMAX_SYSCALL_INTERRUPT_PRIORITY is set to though.  If it is 5 or below you should be ok on that point. From the code it looks like you are using the Renesas compiler.  Which version? Regards.

pxCurrentTCB corrupted

Richard- #define configMAX_SYSCALL_INTERRUPT_PRIORITY    4 // 1 (lowest) – 15 (highest) so I guess that this is the problem… fantastic!  I guess that I got confused by the _FromISR() being allowed from an interrupt and forgot to check the interrupt priority. The low rate of occurence is probably due to the fact that this interrupt and API call only occur once/second. I’m going to review all of these priorities, make changes and run another test. I’m running:
C/C++ compiler package for RX family V.1.02 Release 01 (Update Utility) (3-27-2012 09:14:28) Regards,

pxCurrentTCB corrupted

Richard- This has indeed fixed my problem. My apologies for not catching this whilst perusing the manuals and re-reading the FAQ. My kudos for a great project and your continued excellent support. Also, thanks for the heads-up on the semaphore vs task suspend/resume. I hope that you enjoy a great holiday season ! Steven J. Ackerman, Consultant
ACS, Sarasota, Florida
http://www.acscontrol.com