The Spring ELISA Workshop, which took place on April 5-7 virtually, had more than 130 global registrants that learned more about the various working groups, hot topics related to enabling linux in safety applications and networked with ambassadors. If you missed the workshop, you can check out the materials here or subscribe to the new ELISA Youtube Channel and add these sessions to your watch list.
Gabriele Paoloni, ELISA Project Governing Board Chair and Senior Principal Software Engineer at Red Hat, and Daniel Bristot, Senior Principal SW Engineer at Red Hat, gave a presentation at the Spring ELISA Workshop titled, “Safety Monitors Inside the Kernel.”
The recently proposed “Runtime Verification Monitor” framework, which can be found here, has the capability of monitoring the Kernel Drivers / Subsystems to behave as expected and to protect them against interference from within the Kernel itself. The video will explain how the RVM framework works with a specific focus on the Watchdog Monitor that has been proposed in the patchset and how it can support a functional safety claim. Watch it here:
The Spring ELISA Workshop, which took place on April 5-7 virtually, had more than 130 global registrants that learned more about the various working groups, hot topics related to enabling linux in safety applications and networked with ambassadors. If you missed the workshop, you can check out the materials here or subscribe to the new ELISA Youtube Channel and add these sessions to your watch list.
For the first-time ever, the ELISA Project featured a keynote presentation. Robert (Bob) Martin, Senior Principal Engineer at the MITRE Corporation, presented a keynote titled, “Software Supply Chain Integrity Transparency & Trustworthiness and Related Community Efforts.” Check out the presentation materials here or watch the video:
Trust, transparency, and integrity of software supply chains is at the center of many of the global security and safety challenges confronting communities around the world, including government agencies and the industries that support them or provide our critical infrastructure. The pandemic, utility ransomware attacks, the attack on SolarWinds, and the Ever Given have brought supply chain security, resilience, integrity, transparency, and trustworthiness into sharpened focus to a broader audience, and the many inadequacies have surfaced regarding timely access to reliable suppliers, software, and stocks of fuel, personal protective equipment, micro-electronics, medical devices, and food supplies, to name a few.
At the same time, the computerization of everything gave rise to pervasive cyber threats for more and more of the capabilities and infrastructure we and our organizations rely upon to function – including those stemming from vulnerabilities inherent in repurposed software of often dubious provenance and unknown pedigree. Further complicating this landscape is the increasingly globalized nature of the technology in these systems and lack of transparency. Adversaries large and small seek to inject themselves into every conceivable stage of software technology development, supply, and support, for disruptive, monetary and intelligence goals of their own.
This video will discuss the capabilities emerging across industry and government to assess and address the challenges to providing trustworthy software supplies with assurance of integrity and transparency to their composition, source, and veracity – the building blocks of software supply chains we can gain justifiable confidence in at scale and speed.
Written by Paul Albertella, ELISA Project TSC member, Chair for Open Source Engineering Process Working Group and Consultant at Codethink
The ELISA Project hosted its annual Spring Workshop on April 5-7. It’s a combination of interesting talks and productive working sessions on Enabling Linux in Safety Applications. I’ve attended a lot of these over the past three years, but for this one there was a perceptible shift towards applying techniques and building solutions. If you couldn’t attend, here’s a quick recap of the workshop.
Day 1 opened with an interesting session from Red Hat’s Daniel Bristot and Gabriele Paoloni titled “Safety Monitors inside the Kernel” about a Realtime Verification Monitor concept, which they have been implementing for inclusion in the Linux kernel. This involves using ‘deterministic automata’ (definition here) modules within the kernel that are generated from directed graph models (defined using Graphviz/dot), which are then driven by events within the kernel using ‘instrumentation’ akin to systrace. The goal is to verify the expected behaviour of specific safety-related functionality within the kernel at runtime, and provide a way to trigger a ‘safe state’ mechanism if a problem is detected.
This was followed by a discussion about whether we should add a new ELISA working group with a focus on industrial and/or IoT safety use cases, led by Philipp Ahmann with Bosch GmbH. This discussion aimed to settle opinions about whether ELISA is a proper place to start such a working group, or if there are better communities to reach out to. Watch the video here:
Gabriele Paoloni then gave a talk about PAS 8296, which is an ISO initiative to provide more detailed guidance on applying the 26262 standard to pre-existing software.
On Day 2, Red Hat’s Christoffer Hall-Frederiksen and Gabriele Paoloni were back again, talking about the work that they have done to document how Linux manages address space integrity. This was very informative, providing an accessible overview of how Linux manages processes, threads and memory.
After that, Alessandro Biasci, Raffaele Giannessi and Fabrizio Tronici from Huawei talked about their use of STPA to analyse dynamic memory functionality for Linux, and how some of the risks they identified might be addressed using memory tagging.
I then did a talk on ‘Refining the RAFIA Approach, which addresses the challenges of creating safety argumentation and supporting evidence for systems involving open source software. This talk provided an update on how this approach is being applied and refined, both as part of ELISA workgroup activities and in Codethink’s projects.
The last session of the day focused on the Kernel Configuration database and was presented by Elana Copperman with Mobileye and Wenhui Zhang from Bytedance. This was an attempt to gather together information available on various kernel configuration items that may be relevant when addressing particular risks, together with some notes on best practice when using these.
This was originally structured using Common Weakness Enumerations (CWEs), and there are plans to include security-related configs as well. One topic of discussion was extending it to include information about the performance impact of the configs, and building automated setups to measure this for reference ‘instances’. At present the ‘database’ itself is just a big Google spreadsheet, but there’s potential for this to become a useful resource. I have been prototyping a solution to migrate the content to a GitHub repository and render it as web pages, which may help.
Day 3 was dominated by two more working sessions. The first, on Mixed Criticality Processing, was a productive discussion which I co-chaired with Chris Temple from Arm. There was some good engagement from existing ELISA participants and new ones, and we ended with a clearer understanding of the challenges faced when safety functions co-exist on a system with non-safety functions, and with other safety functions. There was a broad consensus about how ELISA might provide useful guidance for how to tackle some of these, by describing design patterns for systems that include Linux, rather than focussing on what Linux needs in order to be ‘safe’.
The second session, which I also co-chaired with Philipp, focussed on a possible example of such a pattern, which has been developed for the Telltale use case by the Automotive Working Group using STPA. We talked through the safety concept and control structure, and how we might build on this and implement a reference version based on an existing AGL demo, perhaps using a readily-available hardware platform such as Raspberry Pi.
The last session of the conference is always a working session for next steps and goals that is led by the Linux Foundation’s Shuah Khan, Chair of the ELISA Technical Steering Committee. Shuah and the TSC are currently putting together insights from that discussion, so stay tuned to learn more about the focuses for next quarter.
If any of these pique your interest, you can go to the new ELISA Project Youtube Channel to see some of these videos or click on the schedule to check out the PPT presentations.
Written by Dan Brown, Senior Manager, Content & Social Media, Linux Foundation Training & Certification
It’s that time of year – Linux Foundation Training (LiFT) Scholarships are here! Since 2011, The Linux Foundation has awarded over 1,100 scholarships for millions of dollars in training and certification to deserving individuals around the world who would otherwise be unable to afford it. This is part of our mission to grow the open source community by lowering the barrier to entry and making quality training options accessible to those who want them.
Applications are being accepted through April 30 in 12 different categories:
Open Source Newbies
Teens-in-Training
Women in Open Source
Software Developer Do-Gooder
SysAdmin Super Star
Blockchain Blockbuster
Cloud Captain
Linux Kernel Guru
Networking Notable
Web Development Wiz
Hardware Hero – NEW
Cybersecurity Champion – NEW
Whether you are just starting in your open source career, or you are a veteran developer or sysadmin who is looking to gain new skills, if you feel you can benefit from training and/or certification but cannot afford it, you should apply.
Recipients will receive a Linux Foundation eLearning training course and certification exam. All certification exams, and most training courses, are offered remotely, meaning they can be completed from anywhere.
ELISA Project members will come together for a quarterly Spring Workshop on April 5-7 to learn about the latest developments, working group updates, share best practices and collaborate to drive rapid innovation across the industry. Hosted online, this workshop is free and open to the public. If you haven’t yet checked out the schedule, click here.
As we prepare for the Spring Workshop, we’re taking a look at the most popular sessions from the November event. A full recap by Philipp Ahmann, ELISA Project Ambassador and TSC member, can be found here.
One of the most popular sessions was presented by Rachel Sibley, Senior Principal Software Quality Engineer at Red Hat, titled “Requirements Traceability using Code Coverage.”
In this video, Rachel talks about the existing techniques we use for kernel code coverage and how we would like to apply them towards requirements traceability and verification for Red Hat’s Automotive Initiative. By embedding both code coverage analysis and targeted testing during the verification stage using existing tooling, it will enable us to improve our test coverage starting with requirements. You can watch the full video here:
On Wednesday, March 16, the ELISA Project officially launched its Monthly Seminar Series, which focuses on hot topics related to ELISA and its mission. Presenters are members, contributors and thought leaders from the ELISA Project and surrounding communities.
The March seminar focused on the Real-time Linux Analysis Toolset. ELISA community member Daniel Bristot De Oliveira, Senior Principal Software Engineering at Red Hat, present the tools provided by rtla.
From 5.17, Linux includes a new tool named rtla, which stands for Real-time Linux Analysis. The rtla is a meta-tool that consists of a set of commands that aims to analyze the real-time properties of Linux. But instead of testing Linux as a black box, rtla leverages kernel tracing capabilities to provide precise information about the properties and root causes of unexpected results.
In this video, Daniel presents two tools provided by rtla. The timerlat tool used to measure IRQ and thread latency for interrupt-driven applications (important for the PREEMPT_RT kernel), and the osnoise tool used to evaluate the ability of Linux to isolate, from the scheduling perspective, a workload from the interferences from the rest of the system. The presentation also includes examples of using the tool to find the root cause of unexpected latencies and how to collect extra tracing information directly from the tool.
Written by Dan Brown, Senior Manager, Content & Social Media, Linux Foundation Training & Certification
The Linux Foundation has once again partnered with edX for the next iteration of the Open Source Jobs Report. The report examines the latest trends in open source careers, which skills are in demand, what motivates open source job seekers, and how employers can attract and retain top talent. Last year’s report can be found here. This year’s report will also examine the extent the “Great Resignation” has affected the technology industry.
The report is anchored by a survey exploring what hiring managers are looking for in employees, and what motivates open source professionals. All participants will receive a discount code for a Linux Foundation training course or certification exam upon survey completion.
We encourage you to share your thoughts and experiences. The survey takes around 10 minutes to complete, and all data is collected anonymously.
This blog previously ran on the Codethink website. Click here for more content like this.
Paul Albertella spoke at the ELISA November 2021 Workshop about how Codethink’s Deterministic Construction Service achieved ISO 26262 certification. In this article he explains the purpose of DCS and how it paves the way towards one of Codethink’s longer-term goals: establishing a viable approach to safety certification for Linux-based operating systems. Read more or watch the video from the ELISA Workshop below.
Background
Deterministic Construction Service (DCS) is Codethink’s design pattern for constructing critical software components. It defines a controlled process, based on an automated continuous integration (CI) workflow, for constructing and managing changes to software components, and to the tools used to build and verify them. A reference implementation of this design pattern was recently assessed by Exida and qualified using the ISO 26262 safety standard for use with automotive safety applications up to ASIL D.
DCS was made possible by many years of work on construction and integration tooling at Codethink, and builds directly on the previous efforts of open-source projects such as Baserock, BuildStream, Freedesktop SDK and Reproducible Builds. These projects have helped to establish and refine both the principles that inform DCS and the techniques used in its implementation.
As a tool, DCS is an important foundation for building a safety-certifiable Linux-based OS, but in creating and certifying it, Codethink had another goal: to validate the safety approach that we have been developing in collaboration with Exida and ELISA. This approach is called RAFIA, which is an acronym of Risk Analysis, Fault Injection and Automation); it was introduced in a previous article and further discussed in a second article.
Goals and principles
The goals of DCS are:
To construct software in such a way that it is consistently reproducible
To verify this property for a given set of inputs, for a given instantiation of DCS
To make use of this property to inform verification and impact analysis
To automate all of this as part of a continuous integration (CI) workflow
Reproducible, in this context, means that the outputs of the construction process (a binary fileset) for a given set of inputs (source code, dependencies, build instructions, etc) for the target software (the components that we are constructing and verifying) must be demonstrably identical. That is to say, re-running the DCS process without explicitly changing any of the inputs, must produce exactly the same set of binary outputs every time.
Inputs, in this context, means everything that is required to construct and verify the software, which includes:
the target software and its dependencies,
the actions required to construct and verify these,
the tools used to perform these actions, and
the execution environments for these actions
An instantiation of DCS is an implementation of the design pattern using a specific set of tools, configuration and infrastructure. The reference implementation, for example, is based on Codethink’s managed Gitlab service and its associated servers, resources (e.g. CI runners), and configurations (e.g. access control), together with a set of hosted git repositories. These repositories contain the component tools used to realise DCS, the build and test inputs for these – including safety-related criteria and tests for the DCS design pattern – and the automation scripts that implement the overall service logic.
In order to have reproducible outputs, we must have clearly defined and consistent inputs. This means having fine-grained change and revision control over the corresponding files and their organisation into components, configurations, target systems, etc.
It is essential to track all inputs, including (but not limited to): the source code of the target software; any other build-time dependencies required to construct it; any run-time dependencies required to verify it; the configuration, calibration or test data used to inform or verify its behaviour; the tools used to perform construction and verification actions; the execution environments within which these actions are performed; the criteria that are used to evaluate verification actions; and instructions detailing the actions required to provide, build or verify all of these as part of an automated workflow.
We also need fine-grained control over our construction processes, which means that build actions must not only be consistently performed, but must be executed in a controlled environment to avoid the introduction of unspecified or unplanned inputs into the build process.
Purpose
DCS verifies that we have control over our construction process and all of its inputs by comparing the binary outputs of two completed construction pipelines (automated executions of the specified construction steps and actions). If the results are identical, then the inputs and build actions may be considered under control. If the results differ, then the cause of the difference must be investigated, to determine whether an unspecified or uncontrolled input is involved in the construction process.
Once we have control over our process and inputs, we can use the same principle to inform impact analysis for the constructed software. If a change to one of the inputs has no effect on the output binaries, then we can be confident that there will be no impact on the software’s behaviour or properties, which may avoid unnecessary re-testing. Similarly, if we were expecting a change to have an effect (e.g. to fix a bug), then we can know that this will not be the case without having to re-test it.
These ‘no change’ cases may seem insignificant at first sight – after all, why would we want to make a change that has no effect? – but when maintaining complex software systems, the practice of regularly and systematically applying atomic changes can be invaluable. Not all changes to an input will affect the output for a given construction, because the source code may have conditional compilation sections, which are not used for a specific build. By atomically applying individual changes over time, instead of applying a large change set in one go, we are able to determine when a specific change does have an effect and use this to guide our impact analysis.
This becomes even more valuable if we are using artifact caching as part of a construction process. By storing the artifacts (binary outputs and intermediate objects) produced by previous build actions in a shared cache, we can dramatically reduce build times for large software components. But how can we be confident that these cached artifacts directly relate to our input files? Different caching solutions approach this in different ways, but by periodically rebuilding from source (e.g. with a weekend rebuild pipeline), and comparing the result with a build that uses cached artifacts, we can independently verify the integrity of our cache, regardless of the cache indexing strategy.
We can use the same principles to show that the property of reproducibility is independent of the specific instantiation of DCS – including host hardware, operating systems, compilers and other tools. This allows us to confirm that a new instantiation of DCS meets the design pattern requirements, by comparing the binary outputs for a reference build.
This approach can be extended to verify that a change to a tool used as part of the DCS instantiation, or as an input to the construction process, has no effect on the output. For tools that we expect to have no direct impact on the outputs, this is a confirmation of our analysis. For tools that we do expect to affect outputs (e.g. compilers), this is a confirmation that an upgraded tool has not introduced an unexpected change – or if a change is detected, to drive our analysis of its potential impact.
Using RAFIA for certification
Codethink’s DCS reference implementation was certified by Exida based on the ISO 26262 tool qualification requirements. This was achieved using safety argumentation that was developed using RAFIA, and a safety lifecycle built around the DCS controlled process itself.
We used STPA to analyse the risks associated with the specific purpose of DCS and to define safety requirements in the form of constraints. These were then used to derive tests to verify that a given DCS instantiation satisfies the applicable requirements, or to specify process requirements that must be applied by the user of DCS and verified as part of a safety assessment. We also identified loss scenarios that might lead to violation of constraints and developed fault injection tests to show that our mitigations were effective.
By applying this controlled process to all inputs to the certification assessment, we were able to demonstrate that we had addressed all of the applicable requirements in the standard, and provide evidence to support this. Controlled inputs included documentation and requirements as well as the build inputs. This included the documented STPA results, for which we developed a YAML data structure and validation tools, which have been shared in a new open source repository. For the reference implementation, all inputs were stored in git repositories managed by Gitlab.
This allowed us to map evidence to individual certification criteria based on the applicable safety standard (ISO 26262 for DCS). For each of these, we documented how the criteria were satisfied and provided links to documents, source code, tests or CI-generated output that provided supporting evidence.
By tracking all certification criteria, assertions, and evidence in the same manner as the software, all potential changes could be managed by the same CI-driven change control process. We could use CI to verify that supporting evidence links are valid and up-to-date, and trace requirements to tests, and to test results. We were also able to produce human-readable reports for safety assessors directly from the stored and generated evidence for a given release.
Role in safety and future work
As we have seen, DCS allows us to:
Verify that we have control of all of our inputs, including dependencies
Avoid retesting or re-validating unchanged binaries
Identify and investigate differences when changes are detected
Verify that our process is isolated from environmental disturbances
Show that tool upgrades do not impact previously validated binaries
But how can DCS contribute to a broader safety process? And how will this help us to certify a Linux-based OS?
Safety standards such as ISO 26262 identify the key engineering processes that are expected to be used in the development of safety-relevant software, as well as organisational processes and controls (e.g. quality and safety management) that are required to ensure that the engineering processes are correctly and consistently followed. However, the standards provide only limited guidance on implementing such processes, as every organisation is expected to have its own particular approach and tools.
A deterministic construction process provides a foundation for many of these engineering processes, as as well as a way to monitor and enforce the required organisational controls. The DCS design pattern defines a consistent, automated and verifiable foundation for implementing a safety lifecycle for large-scale and complex software components. It was developed with Linux and open source software in mind, but the principles can be applied to any software, and many aspects of the RAFIA process can be applied to hardware components as well.
DCS and RAFIA enable software components and their associated documentation, as well as the required engineering and organisational process criteria and automation tools, to be managed and maintained in close alignment with the software development process. They also support key processes as follows:
Change Management and Configuration Management are, as we have seen, fundamental parts of the DCS design pattern, and also key topics in safety standards. DCS allows us to verify that we have control over all changes to our software and its configurations. This can be especially important when components or component dependencies are provided by a supply chain.
Verification of the software (e.g. through testing and static analysis) is required to confirm that it satisfies both its functional requirements and its safety requirements. DCS gives us control over all of the inputs to verification actions, as well as the tools and execution environments that are used to perform them. Using it as part of RAFIA also allows us identify and specify safety requirements for components and tools, and to manage these requirements in close coordination with the software.
Validation of the software as part of its target system is required to confirm that it fulfils its intended purpose in the system, including its role in fulfilling the system’s safety goals. This may require the software to be constructed for one or more system configurations (CPU architecture, hardware configurations, etc) and deployed to a specific test environment (e.g. a test rig). DCS provides a way to manage this, and RAFIA provides a way to validate safety mitigations at the system level, by defining fault injection cases for software components.
Impact analysis is required to determine whether a change to software may impact the target system’s safety goals. DCS can be used to drive this process for software components, by identifying when changes actually have an impact on the deployed binaries, and helping to identify the specific change that resulted in this impact.
Tool qualification is used to provide confidence in the use of software tools. As demonstrated by the DCS reference implementation, RAFIA can provide a basis for qualifying and validating open-source toolchains, and DCS can be used to control the specific source and configuration of tools that are used. DCS can also help to classify new tools by determining their impact on the constructed output, and to validate tool upgrades by determining their impact on a known and verified previous build.
On this basis, we believe that DCS represents a solid foundation for defining and constructing a Linux-based OS for a safety-related use case. Furthermore, the qualification of DCS demonstrates that the RAFIA approach can be used to provide the required safety argumentation and evidence to achieve a formal safety certification for tools based on open source software; we are confident that this can be extended to support the same goal for a safety-related system.
Planning is currently underway for the ELISA Project Spring Workshop, which takes place virtually on April 5-7. If you haven’t yet, you can submit a CFP here (by Friday, March 4) or register to attend here.
As we prepare for the next workshop, we’ll be taking a look at the most popular sessions from the November event. A full recap by Philipp Ahmann, ELISA Project Ambassador and TSC member can be found here.
In this video, Shuah Khan, ELISA Project Chair of the Technical Steering Committee, kicks off the November Workshop with an overview of the TSC and introductions to a few of the Working Group Chairs for updates. Watch the video to learn more about the focused working groups for Safety Architecture, Tool Investigation and Code Improvement, Medical Devices and Automotive.
Planning is currently underway for the ELISA Project Spring Workshop, which takes place virtually on April 5-7. If you haven’t yet, you can submit a CFP here (by Friday, March 4) or register to attend here.
As we prepare for the next workshop, we’ll be taking a look at the most popular sessions from the November event. A full recap by Philipp Ahmann, ELISA Project Ambassador and TSC member can be found here.
In this video, Christopher Temple, Lead Safety & Reliability Architect at Arm Germany GmbH, provides an overview of the challenges and solutions of Linux in safety systems.