Task stuck at taskNOTIFICATION_RECEIVED

Hello. I am experiencing lost task notifications in a communication module under heavy loads. The code affected is essentially a message transmission task that sends data to a transceiver. It gets notified either from a start transmission function or an ISR that sends a new notification when the physical transmission buffers become available. The transmission task procedure sits in an infinite loop waiting for events: ~~~ void optoProc(void *arg) { while (true) { uint32_t e = ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
    if (e & opto_Tx)
    {
        // Get message from txQueue and encode bytes for transmission
        optoTxProc(...);
    }
}
} ~~~ The ISR can wake up the task using the following code snippet: ~~~ if (bufferavailable) { BaseTypet pw; xTaskNotifyFromISR(optoTask, optoTx, eSetBits, &pw); portYIELDFROM_ISR(pw); } ~~~ And the message send function does basically the same after putting the message in a queue: ~~~ int optoTxPut(OptoDevice *d, const OptoMessage *m) { if (xQueueSend(d->txQueue, m, 0) != pdTRUE) return opto_EOverflow;
xTaskNotify(optoTask, opto_Tx, eSetBits);
return opto_Ok;
} ~~~ So everything is pretty straightforward. The interrupt priorities are configured properly and system/memory checking hooks are in place to trap any issues such as memory leaks or corruption. When running the system under heavy packet traffic, I started noticing that the optoTxPut function was returning queue overflow errors even when hardware transceiver buffers were available, and transmission stopped. I realized that under that condition, the optoProc procedure becomes stuck waiting for notifications even when the xTaskNotifyFromISR function is being called and attempting to send notifications. I tracked the underlying xTaskGenericNotifyFromISR function and I see that the decision to send a notification and wake up the task is performed there, when the task ucNotifyState state changes from taskWAITINGNOTIFICATION to taskNOTIFICATIONRECEIVED. However, when the issue happens, the task is already in the taskNOTIFICATION_RECEIVED state and therefore the task is never put in the task ready list. I can manually recover the system with the debugger modifying the ucNotifyState variable and triggering the notification actions. I see that the ucNotifyState only gets changed to taskNOTIFICATION_RECEIVED in the task notify functions (xTaskGenericNotify, xTaskGenericNotifyFromISR, vTaskNotifyGiveFromISR). Is there any other behavior I am missing that can affect the value of the notification or remove the task from the ready list so the notification gets lost or the task is not run (and explaining why the variable ucNotifyState does not revert)? I debugged the interrupt and didn’t see any improper interaction between that and the send function. In the meantime my obvious workaround is to use a time delay instead of waiting forever to periodically force the task procedure to run and check the queues, but I certainly would like to find the root of the problem. Thanks for any ideas.

Task stuck at taskNOTIFICATION_RECEIVED

Using pdTRUE as the first parameter to ulTaskNotifyTake() will clear the notify count to zero, and as the code is posted you only process one notify for each call to ulTaskNotifyTake(). That means you can loose events. See the code examples on the link below, which shows two ways you can fix that. http://www.freertos.org/RTOSTaskNotificationAsCounting_Semaphore.html

Task stuck at taskNOTIFICATION_RECEIVED

Hi Dave. Thanks for replying. Processing one notification is fine, since the optoTxProc function will exhaust the queues. It does not matter how many notification events were received. The task should wake up when the send function is called again or the interrupt realizes there are available buffers (which is what usually happens) and will send a notification. This will happen when optoTxProc has finished the byte encoding and the task is indeed waiting for a notification. I have played with the parameters and used the counter behavior, however all have the same issue. The ucNotifyState will eventually be stuck in the taskNOTIFICATIONRECEIVED state, thus I suspect on the first change the RTOS call placed the task in the ready list as when the systems runs fine. But then something happens in between and the ucNotifyState either never reverts to taskWAITINGNOTIFICATION, or it does and the task is never put in the ready list to be executed after changing the variable again back to taskNOTIFICATION_RECEIVED. I manually changed the FreeRTOS code to update the task lists regardless of the value of ucNotifyState, and as expected that will “fix” the problem. I haven’t been able to detect the exact condition when the issue will occur since most of the time the system runs as expected. I am trying to isolate if the issue is interrupt or task related.

Task stuck at taskNOTIFICATION_RECEIVED

I’ve been trying to get my head around exactly what it is you are describing here. It is hard to see how it could get into this state. First, probably not relevant to your problem, but inside the ISR pw should be initialised to 0 (pdFALSE) before it is passed to xTaskNotifyFromISR(). Question: Why are you using the eSetBits action, to set bits in the receiving task’s notification value, rather than just using eIncrement to increment the receiving task’s notification value? Does the receiving task also check for other bits that you have not shown in your post?
The interrupt priorities are configured properly and system/memory checking hooks are in place to trap any issues such as memory leaks or corruption.
Are you using a Cortex-M processor? Is it an M0(+), M3, M4 or M7? Do you have configASSERT() defined? Which version of FreeRTOS are you using? (taskWAITING_NOTIFICATION isn’t used in the latest version).

Task stuck at taskNOTIFICATION_RECEIVED

Hi rtel. Thank you for replying.
  • You are right about the pw variable. By chance it happens to be initialized to 0 anyway, but the correct initializer was missing. And the problem is not related to that or to calling or not portYIELDFROMISR based on the value of pw.
  • Normally, the task also checks for other bits. But for the purpose of debugging to isolate the issue and to post here I removed all the other actions based on other bits. From looking into the code I understand that receiving multiple notifications to update the same bit should be no problem, and on the first change from the “waiting” state to “notified” state, the task will be scheduled to be executed – which I am assuming occurs most of the time. From what it looks like as I explained above, the variable changed to notified and either the task was never scheduled, it was scheduled and for some reason it didn’t run, or it ran but the value of ucNotifyState was not properly restored. I can get into the same situation if I use the counter API. But then again, I am not using that because of the nature of how the optical transceiver is served and the way the hardware buffers become available.
  • This is indeed a Cortex-M4, configASSERT is defined and portASSERTIFINTERRUPTPRIORITYINVALID does not assert because priorities are properly configured. I also demoted the interrupt to the same priority as the RTOS to test if there were any interrupt conflicts, however everything seems to be configured fine in terms of interrupt priorities and the order of function calls. Interrupt priority was the first thing I thought was the cause of the issue and checked down to the register configuration of the NVIC.
  • I am using the trunk from SF. I thought this repo was the official one and up to date. I see the taskWAITING_NOTIFICATION macro defined and used in the tasks.c module. URL: svn://svn.code.sf.net/p/freertos/code/trunk Relative URL: ^/trunk Repository Root: svn://svn.code.sf.net/p/freertos/code Repository UUID: 1d2547de-c912-0410-9cb9-b8ca96c0e9e2 Revision: 2466 Node Kind: directory Schedule: normal Last Changed Author: rtel Last Changed Rev: 2466 Last Changed Date: 2016-06-28 06:39:25 -0400 (Tue, 28 Jun 2016)
Best.

Task stuck at taskNOTIFICATION_RECEIVED

By chance it happens to be initialized to 0 anyway,
No, it was uninitialised. It may have happened to take the value 0 some times.
I see the taskWAITING_NOTIFICATION macro defined
You are right, I had it the wrong way around. It was changed from an enum to a macro, whereas in my head it had been changed from a macro to an enum. The macro was introduced to force it to always be an 8-bit type in a portable way. So we are no closer to solving this one. Could you please cut down your project to the absolute minimum amount of code that demonstrates the issue, make sure the project does not contain any absolute paths so I am able to build it on my computer without having to first make any modifications, zip it is up, and send it over to me at r [dot] barry (at) freertos [dot] org.

Task stuck at taskNOTIFICATION_RECEIVED

Hi Pablo I am very new to freertos but I did observe the behaviour you describe- I think. And here is how I worked around it with a binary semaphore. My IRQ handler generated a TaskNotifyGiveFromISR..ok. In the task blocked waiting on this would stop receiving the noifications under ‘heavy comms load’ . I walked through the code and determined that my IRQ handler was generating another TaskNotifyFromISR before my waiting task could call the wait function. like this while(1) { if ( TaskNotifyTakeWait()==pdTrue) { process_something();
}
else {
/* timeout */
} } NOW- what was happening was that the function process_something() woudl reenable the source of the interrupts and a NEW interrupt and thus TaskNotifyFromISR would be generated BEFORE the task loop could call TaskNotifyWait…. IE if the TaskNotifyFromISR comes in but the task is not blocked already, nothing happens. I solved this using binary semaphores.- as it is value driven. that is xSemaphoreTake Maybe this will help you. I only have 1 day on freertos, but some experience elsewhere…. I am NOT sure if my observed behaviour is correct- but that is what the debugger showed- the task was not waiting… regards – Glen.