Architecture Working Group: A report on Kernel FFI (Freedom From Interference) and some philosophical musings

Written by Eli Gurvitz, ELISA Project Ambassador and Functional Safety Architect at Intel (Mobileye)

In a functional safety system FFI is required when the system consists of elements of different Safety Integrity Levels (ASIL).This is to ensure that elements allocated with a lower ASIL do not interfere with elements allocated with a higher ASIL; if FFI cannot be demonstrated the lower ASIL elements must be upgraded to the higher ASIL.

The Architecture Working Group has been discussing “Freedom From Interference (FFI)” in the last several meetings and is considering two aspects:

FFI between user space processes allocated with different ASIL
FFI between Linux Kernel components/drivers/subsystems allocated with different ASIL

This blog post focuses on the second bullet.

FFI is a key goal of a possible Safety Concept for Linux because Linux is too complex and has too many features, thus considering Linux as a single element of a certain ASIL would result in a very high functional safety qualification effort. If the application runs in a single threaded process and handles interrupts synchronously, then it may be possible to avoid allocating Safety Requirements to the OS and mitigate all failures with application-level safety mechanisms. But this kind of use requires just a simple OS and Linux is an overkill. Using Linux in the way it was meant to be used means it will be the OS of a multi-core SoC that runs many processes with different requirements of different ASILs.

This mode of use is referred to in ISO 26262 part 6 section 7.4.8:

This section refers to ISO 26262 part 9 Clause 6 “Criteria for co-existence of elements”. This clause states:

The Architecture WG investigation considers the Linux kernel partitioned into sub-elements of mixed criticality, therefore the goal is to show FFI between the sub-elements. The approach to FFI that is currently being discussed in the Architecture WG was developed by ELISA Project members Mobileye and BMW.

The first step in demonstrating FFI between safety-related and non-safety-related sub-elements is to identify the sub-elments and to allocate them with an ASIL. Since we are analyzing a SW component, the sub-elements are functional areas (or features) of the kernel, e.g. memory management or file systems, and they are made of C language functions. We classify the C functions according to the allocated ASIL by using the Call Tree Tool.

The goal of Call Tree is to statically generate the tree of function calls departing from a specified input one; hence starting for example from a syscall, Call Tree would generate the tree of all invoked functions. Call Tree scans the Linux source code by using the GNU CFlow utility and generates an SQLite database that contains all functions and their calling relations – this provides an almost full call-tree for every C function.

To classify every Kernel function we allocate Kernel entrypoints (syscalls and interrupt handlers) with safety requirements and associated ASIL; hence every function falling in a certain tree inherits the ASIL associated with the top level entrypoint. If a function is present in multiple trees, it is then assigned with the highest ASIL across those allocated to the different trees.

For example, if there’s a safety requirement for “safe dynamic memory” then we consider the related system calls – mmap, sbrk – as safety related. The union of all functions in the call trees of mmap and sbrk are considered SR and inherit the ASIL allocated to mmap and sbrk.

Once we have partitioned the Kernel the next step is to consider the possible types of interference. These types are defined in Annex D or ISO 26262 part 6. There are three types of interference:

Temporal – interference related to time or scheduling. The most common case is when one kernel thread prevents other threads from getting CPU cycles, thereby causing delays. Another example is a process crashing.
Spatial – interference related to space, or memory. For example, a lower ASIL driver may corrupt a kernel data structure.
Communication – normally this type of interference relates to transfer of data between two entities over a communication channel. In our analysis we consider static and global variables and pointers as communication channels between sub-elements of the kernel.

The Architecture Working Group plans to deal with all types of interference and currently we are considering the third type – communication interference. We are looking at areas where the internal state of the kernel can be corrupted because of the interaction between NSR and SR C functions (or more generally, C functions of different ASIL ratings).

The internal state of the kernel consists of many persistent data structures. These data structures, for example linked lists, are pointed to by global and static variables and pointers. Corruption of these data structures can occur in different ways.

Data structures that are accessed via global variables can be corrupted when a lower ASIL function (for example a driver that is rated as ASIL QM) accesses the same data structure that is also used by an ASIL-B function, as depicted in the diagram below.

Corruption of data structures that are accessed via static variables can occur when a static variable is used by a higher ASIL (or SR) function but this function is used by a lower ASIL (or NSR) function. The NSR function may pass a faulty argument to the SR function and the SR function may use this argument to modify the data structure. The faulty data structure is later used in a safety-related flow. This failure mode is depicted in the diagram below.

This description is only a preliminary formulation of the concept of communication interference within the kernel. The working group is debating the correct use of terms, the concept itself, the correct use of the Call Tree Tool and the selection of ASIL rated system calls for our sample automotive use case – The Tell-Tale signal.

If you are interested in safety engineering, the Linux kernel, or both, then please join us in these discussions. The nice thing about applying the existing Functional Safety standards to the Linux kernel is that there’s plenty of space and freedom for creativity, as these standards were designed for much simpler HW and very much simpler SW. It is as if there’s a written tradition of Safety architecture – the ISO 26262 standard and an Oral interpretation of it which creates a more modern tradition of Safety. You can be a part of creating this tradition. I should also take back the word “creativity” I used four lines above because it will certainly trigger a hot debate around the question of whether Safety likes “creativity” or hates it. So I’ll clarify that we are trying to be creative in a conservative way.

Learn more about the ELISA Architecture Working Group or any of the other groups in this white paper.