Understanding embedded microcontroller multitasking RTOS alternatives
SPLat Controller only $29.00.
The EC1 "EasyOne", a 32-bit SPLat board with USB and true multitasking is an easy way to learn and a cheap way to explore your project ideas.
We are SPLat Controls. We make control computers (microprocessor cards) that go into our customers' products to give them a "brain". Please visit the rest of our website if you are involved in a product that needs a PLC or embedded electronic controller.
What follows is intended as a gentle introduction to the somewhat confusing complexities of the choices facing a programmer when deciding on an underlying programming technology for a reactive realtime embedded control system. This is intended for students and others who need to gain an understanding of the subject but have perhaps been finding it difficult to get their heads wrapped around some of the ideas. I have provided a few links to other resources, usually going into greater depth on individual concepts. This is not a "how to" tutorial - I have left out a lot of subtle details in order to try and preserve the big picture.
Firstly, what do I mean by "reactive realtime embedded control system"? It's important to be clear on that, as I am addressing that one area, and no others. By realtime embedded control system I mean an electronic controller that is permanently embedded inside a product, which has to operate in real time and which reacts to things that happen outside itself. Realtime means it reacts in a timely fashion, i.e. fast enough for the task at hand. I am also addressing specifically those systems that use a microcontroller or microprocessor ("micro" for short), i.e. that are programmed, and in systems where the program is fixed during manufacture.
The multitasking challenge
In a "normal" computer program you usually only want to do one thing at a time: Read a file, then do a calculation, then print a result. In an embedded controller, all but the simplest application needs to do several things at once: Monitor a sensor input, operate a valve, watch for a push button, check for elapsed time, count bottles etc., etc. The ability of a controller to (seemingly) be able to do several things at once is what multitasking is all about.
A simple microcontroller program can only do one thing at a time. However, because it can do things very fast (millions of operations per second), it can be made to switch between tasks so fast that it gives an illusion of doing several things concurrently. The question is, how do you program it so it will divide its attention between multiple tasks, and not get yourself or the micro confused in the process? Here follows a description of the common ways of doing it.
If a person is new to a field, they will probably only be able to handle the simplest concepts within that field. As they gain experience and confidence they start to perceive the layers of complexity that exist, and gradually expand their thinking to take on board more abstract, and usually more powerful, concepts. This is why a novice embedded controls programmer will generally write spaghetti code. I certainly did, many, many years ago.
With spaghetti code the whole control program, say for an automatic grind and brew espresso machine (my favourite appliance!), is written as single unit that progresses through the steps of making a cup of coffee such as: Wait for user to press a button - Turn on grinder - Delay 12 seconds - Turn off grinder - Turn on water pump - etc., etc. If during any one of those operations that takes time, say a 12 second delay, the program must also monitor a temperature sensor to control the heater, then that code will have to be included within the delay. If there are several delays in the main sequence, then the temperature control code will have to be included in each one. If a new step is added to the main program, the programmer must remember to include the temperature control.
Imagine now a program with 25 steps in the main function plus 5 sub-functions (heating water, updating the display, watching 3 push buttons). Suppose some of those sub-functions also contain a number of steps with delays. Very, very quickly the complexity of the program becomes quite impossible to untangle. You have a bowl of spaghetti!
Interlude - Factoring
Factoring (in the current context) is the process of identifying the individual functional blocks that must exist in a realtime system. These might be Grinder control, Temperature control, Display management, Push buttons, Water level monitoring, Waste bin management, Clean cycle and (most importantly!) the Brewing cycle. If a way can be found to write the code for each of the functional blocks separate from each of the others, the overall program will be a lot less complicated and much easier to follow.
Factoring a program at the design stage, before any code is written, provides an opportunity to think carefully about the needs of each part of the program and decide what interactions (communication) are needed between them. It also lets you separate concerns about the mechanics of writing the program from consideration of the individual functions in the program.
Super loop programs
The simplest structure for a realtime reactive control program is a super-loop (I don't count spaghetti code as a structure). In a super loop program each of the functional blocks is coded as a separate block of code with a single start point and a single end point, and the blocks are strung together in a large loop that executes endlessly. In Relay Ladder Logic, the graphical design environment will produce a statement list program that is actually a superloop, where each rung of the ladder can be considered one functional sub-entity.
A variation of superloops is to place all the code for a particular functional block into a subroutine. This tends to modularize the program further. The general layout of such a superloop, in "pseudocode" becomes:
Loop: Call Grinder subroutine Call Heater subroutine Call Display subroutine Call Push button subroutine Call Brew cycle subroutine ... etc etc .. GoTo Loop
Interlude - Now where was I?
I have said nothing so far about how such a "factored out" functional block can keep track of where it is within its own sequence. For example, the Brew cycle subroutine will encode a sequence of operations that must be cycled through, one at a time. Each time Brew cycle is called it may therefore have to do something different. One very common way of handling that is to use "finite state machines", or FSMs for short. A discussion of FSMs is a topic for another day - suffice to say they are much less scary than the name (or most online descriptions) would have you think. Clue: Just before the subroutine exits it sets a variable to a number that indicates where it should resume next time it is called. Next time it is called it checks that variable and jumps to the applicable code segment.
Simple round robin schedulers
The superloop with subroutines can be taken one stage further. For a starter, let's adopt a slightly better terminology: Think of each of the subroutines that encapsulates a functional block as a task. Think of the whole program as a series of tasks (Grinder, Heater etc) that are running in parallel. We know that they are actually being executed (run) one after another, but it is happening so fast that it is helpful to think of them as being concurrent.
In the superloop structure the programmer takes responsibility for scheduling the running of the tasks by stringing them together into a program loop, as shown above. What a round robin scheduler does is to take care of switching from one task to another. The programmer has to initially tell the scheduler which tasks are to run, but after that the scheduler takes the place of the superloop. The scheduler maintains a list, often called a task queue, of all active tasks. Re-writing the previous superloop example, still in pseudocode, we get
Add Grinder task to task queue Add Heater task to task queue Add Display task to task queue Add Push button task to task queue Add Brew cycle task to task queue ... etc etc .. Start the task queue running
Now, on the face of it this may look like added complexity with no apparent gain. In fact, the potential gain is huge. Up until the superloop stage of evolution the application programmer is responsible for everything, and everything must be written from the ground up for every new project. With the simple round robin task scheduler, the running of the tasks has been hived off to a separate scheduler that can be written once and used many times in different projects. The task scheduler is the beginnings of an operating system.
In this simple scheduler system each task must still return control to the scheduler and remember what it was doing next time it runs. Let's see how we can improve on that ...
Fancier round robin schedulers, or "I know exactly where I was"
The real power of a round robin scheduler becomes apparent when the scheduler provides additional services, other than just scheduling. The very first thing it can do is help the tasks remember where they are, in other words which bit of their code they are executing, between runs. This is achieved by the task executing a special instruction that lets it yield to the scheduler (so the scheduler can run the next task in the queue). When the scheduler again runs the task in question, it starts it off at exactly the place where it left off.
The general structure of a task becomes:
Do some stuff Maybe jump elsewhere in the task Yield control to the scheduler Do some other stuff Maybe jump elsewhere in the task Yield control to the scheduler Do even more stuff Maybe jump elsewhere in the task Yield control to the scheduler Do yet more stuff Maybe jump elsewhere in the task Yield control to the scheduler etc etc.
"Maybe jump elsewhere in the task" means the task code can constantly be testing for internal or external conditions, such as the water temperature, and changing course accordingly. That is after all what a reactive control system is all about!
Yielding control means essentially that the task decides it has done all it can for now, or taken up enough processor time, and it yields control (use) of the processor back to the scheduler. The task scheduler is then responsible for remembering the yielding task's whereabouts and starting up the next task in the task queue from its last known position.
A cooperative multitasking realtime operating system
If we take the above round robin scheduler and start providing other services in the scheduler, we can start calling it an operating system. Because these embedded systems are realtime, they can benefit immediately from having timing and input/output (I/O) related features. That will reduce the amount of ad hoc programming that must be done by the application programmer.
Typical timing related operations could be generating a time delay (which simply locks out a task from running for a set amount of time), or waiting for an input to turn on.
We call these operating systems cooperative because each task must be written to be cooperative, i.e. yield control back to the operating system at appropriate times. Usually in a reactive realtime control system that will mean when it has nothing gainful left to do because it is waiting for something to happen. All it needs is a tiny bit of processor time every so often to see if the event has occurred (unless the operating system provides that waiting function as a service).
Windows 3.11 was a cooperative multitasing operating system. It was very vulnerable to applications programs that failed to "play fair" and yield at suitable times. The problem there was that there was inadequete control over who wrote programs for it. In a typical embedded control system the application program is usually written by a single person or a very small team, and so the required discipline is much easier to enforce.
Interlude - Getting tasks to work together
So far we have said nothing about how we get the individual tasks to work together, to coordinate their activities. Somehow information, meaning data and commands, must flow between them. The display task needs to be told the water temperature. The pump needs to know when the coffee is ground.
One very simple way of doing this is through shared variables held in RAM. For example, if the temperature control task stores the temperature in a RAM variable, the display task can fetch it at any time and display it. When the grinder has ground the coffee it simply stores a pre-agreed number in a RAM variable, which the pump can see and act upon.
The above is not the be-all and end-all of inter-task communications, but it shows you that a solution exists, and it's not particularly complicated.
Preemptive multitasking operating systems
Preemptive multitasking operating systems work on a very different basis to cooperative multitasking systems. Preemptive systems evolved out of the computer timesharing systems of the '60s and '70s. These allowed million dollar-plus mainframe computers to be used by many users simultaneously. Each user's program was allowed to run for a set amount of time, maybe 0.01 second, before being preempted in favour of the next user. This was a technology that fitted well with the need to have many separate users' programs running in isolation from each other.
In embedded reactive controls tasks take the place of individual users' programs in a timesharing system. In a preemptive RTOS task switching is triggered by events, which may be externally generated by hardware inputs or internally generated by hardware timers or other tasks. Tasks are assigned priorities (high, medium, low - whatever), and register with the operating system what events they are interested in. A task is said to be ready to run when one of the events it has registered for takes place. Whenever any event takes place, the currently running task is preempted and control is transferred to the highest priority ready task. If the running task runs out of things to do it yields voluntarily.
Events can be generated by the various tasks themselves, by I/O drivers supplied by the operating system or by Interrupt Service Routines (ISRs). ISRs can be part of the operating system or written as part of the application program. Events are stored up in first in, first out queues (nothing to do with the task queues above), so if an event is not handled immediately it will not get lost.
Because a preemptive system can preempt the running task at any time, the operating system is responsible for saving all the information pertaining to that task (its context), so that it can be restored perfectly when the task gets to run again. The "context save" and restoration adds complication and can consume quite a bit of memory and processor time. Also, the mechanisms for deciding which ready task has the highest priority add complexity, so premptive, prioritized systems impose a larger speed and memory overhead than cooperative systems.
The systems described above are not an exhaustive list. RTOSs exist with almost every imaginable permutation of cooperative, preemptive, prioritized or round robin or even preemptive with fixed time slices.
In a conventional computer program it is usually important to be able to process a lot of information quickly. In a reactive realtime controller very little data is being processed, but when something happens out in the real world, like a limit switch activating, you need the reaction time of the system to be quick.
The fastest possible response to an external event is achieved via an interrupt. With an interrupt the actual hardware will detect an event of interest, say a limit switch turning on. It will stop the currently executing program, save its data and transfer control to an Interrupt Service Routine (ISR). The ISR does whatever it has to do to handle the event, say turn off a motor if the interrupt came from a limit switch, then returns control to the main program (restoring its data in the process). This can all happen in a few microseconds.
Interrupts can be very powerful. However it takes a much higher skill level to write ISRs than regular program code, because ISRs are so invasive that they can make a real mess if not done well. Many operating systems therefore limit the application programmer's access to interrupts and instead provide ISR-based canned services as part of the RTOS or the programming language.
Interlude - SPLat Language programming
The SPLat proprietary language is optimised for machine control applications, and was designed to be easy for people who are not expert programmers. It features our MultiTrack™ multitasking system, which is deeply interwoven with the language itself. MultiTrack is a cooperative multitasking system. The application programmer launches tasks (using LaunchTask instructions) then sets the task queue running. Tasks can kill themselves and also launch other tasks. Each task must yield fairly frequently. A number of instructions contain an implied yield, such as pausing, waiting for an input to turn on, etc.
The SPLat language does not allow users to write ISRs. However, a vast amount of services are provided "under the hood", ranging from input contact debounce to quadrature counting, ModBus communications to management of expansion boards.
So, which is best?
Each of the schemes described has its pros and cons.
Spaghetti code has very limited value as something a novice can do, without having to absorb any advanced concepts, at a stage where they are still coming to terms with the raw basics of whatever language they are using. It will work for very elementary programs, but quickly becomes a liability if attempting anything that requires several parallel tasks.
Superloops, especially if each task is placed in a separate subroutine, are perfectly fine for multitasking applications of moderate complexity. They are not really an operating system, and lack the "hooks" that allow an OS to supply a range of built in services.
A round robin scheduler that remembers where each task is when it is not running provides a very powerful mechanism for structuring a program and providing services such as timer, input waits etc. These are however usually not available as ready-made components, so you need to roll your own, which requires a considerable degree of understanding and technical skill.
Cooperative multitasking systems are quite simple, and hence reliable, and provide a lot of useful functionality. A number of "light weight" commercially available RTOSs use this technique. It works very well when the application is being written by a single individual or a small team with good understanding and strict programming discipline. The downside is the inability of a cooperative system to scale as adding more tasks slows down the existing tasks.
Preemptive RTOSs are generally considered best in large teams because they are in many ways less demanding of discipline in the programmer than any of the cooperative systems. They do however have pitfalls like priority inversion (which led to problems with the Mars Pathfinder mission), resource starvation and deadlock
The real conclusion is that a realtime multitasking operating system, be it something you do as an ad hoc thing or a software product you buy, is a tool that you can use. You can however only use it productively and safely if you understand how it works and what the limitations and pitfalls are.
This page is copyright material. You have permission to link to this page providing you use the following for your link:
<a href="http://www.splatco.com/rtos_1.htm" title="Article on embedded RTOS selection at SPLat Controls">Embedded controls multitasking RTOS overview at SPLat Controls</a>
Just copy and paste the above into your page. It will come out looking like this: Embedded controls multitasking RTOS overview at SPLat Controls