This glossary explains selected terms that have specific meanings in the book and therefore on this website, meanings that may be more specialised than their natural language or generic use. These terms are introduced and given context in the first part of the book.
Some management systems, procedures, activities or their outputs may need the consent of a Responsible Authority (e.g. a manager, an approval body or regulator) before they are used. Seeking consent is commonly referred to as seeking approval and when gained the management systems, procedures, activities or their outputs are considered to be approved.
If the strategy used to argue the safety of a transitional stage (see below) is to show that its safety performance is the same or better than that at an earlier transitional stage, that earlier transitional stage is the baseline for this comparison. The state of the functional system at that transitional stage is referred to as the 'baseline system'. The baseline could be the initial pre-change state or any suitable preceding transitional stage. The baseline system is used during Impact Analysis (see definition below) to identify what has been changed, and also to set the safety criteria to evaluate the safety performance for the transitional stage.
Such an argument will not necessarily use the immediately preceding transitional stage as the baseline. Other baselines may be used out of convenience (e.g. the assurance that is readily available) or because the preceding transitional stage is unsuitable for use as a baseline e.g. it is insufficiently assured.
Note:
It is possible for there to be two baselines if a specific approach is adopted for safety assurance. This note explains this approach and how it can be accommodated by the format in the book.
When an argument is made that compares the safety performance of only the changed and impacted POSSs (which comprise the Internal Assurance Scope) to previous safety performance, it is usually assumed that the baseline used for the safety performance is the same baseline used to define the Internal Assurance Scope. The Safety Case Report format in the book makes this assumption because the two baseline scenario seems unlikely, and moreover retains compatibility with CAP 1801.
It is in fact possible to have an assurance strategy that uses two baselines, one for the safety performance, and one to establish the Internal Assurance Scope. The most obvious example would be to use the pre-change system build state as the source of the acceptable safety performance, whereas a transitional stage could set the Internal Assurance Scope from the system build state in the previous transitional stage. This is possible because the fundamental justification of the Internal Assurance Scope approach is that everything outside its scope is considered to be assured, whether by:
a) 'grandfather rights' whereby existing systems are assumed to be acceptable safe, or
b) the existence, pre-change, of genuine assurance for those parts, or
c) the provision of genuine assurance to justify earlier transitional stages in the Safety Case Report.
From this it can be seen that the two-baseline approach relies on assurance generated for the earlier transitional stages, which could in fact be added into a larger Internal Assurance Scope such that the two baselines would then converge on the pre-change state.
Given that many organisations prefer to maintain complete assurance packages, rather than just those for changed and impacted parts, the continuation of the assumption of a single baseline seems appropriate. However, if the two-baseline approach is used, then the Safety Case Report authors will have to:
a) modify the format appropriately
b) ensure that references to baselines clearly and correctly identify which of the two baselines is being addressed at any point in the Safety Case Report
c) ensure that the approach is clearly and properly explained
d) ensure that the approach is shown to be valid.
The Service Provider’s change management procedures deal with the development, assurance and implementation of changes to the functional system, including implementing a new service(s). The procedures mandate activities intended to satisfy various business objectives during the conduct of changes, including management of the safety of the services provided by the functional system.
When the book refers to 'change management procedures', it is mainly concerned with the parts of these procedures that implement the Safety Management System (SMS). The change management procedures are viewed as forming part of the SMS, although in fact they may form part of the quality management or other business management systems documentation.
The way that an organisation views the relationship between change management procedures and the SMS may depend on which system or procedures need to be approved under applicable regulations.
The Cyber Security Maintenance System (CSMS) is a support system which addresses cyber security threats during service provision, to maintain the safety of the services provided. As the CSMS is a POSS (see definition below), changes to the CSMS will require a safety case to be produced.
The CSMS is a system comprising people, procedures and equipment. All CSMS activities must be conducted in accordance with defined procedures and ‘playbooks’ so that the response to cyber security threats and effects is known in advance and can be analysed for its effect on safety and during CTIBI Analysis (see definition below).
The CSMS plays a similar role to a conventional maintenance support function, in that both:
a) monitor to determine whether action is required
b) coordinate with those responsible for service provision and functional system management, to decide whether to place the functional system into fall-back modes of operation when anomalies are detected
c) restore the functional system to full operational capability when their remedial work is undertaken
d) implement changes that are within the scope of the existing safety case
e) interface with safety analysts to provide updated information based on practical in-service experience.
The CSMS functions are those cyber security functions which directly affect the safety of the provided service(s), including (as a minimum):
a) monitoring to identify changing cyber security threats, newly-identified vulnerabilities, newly-identified exploits, and new patches, and consequently determine whether a change needs to be made. This includes deciding:
i. whether the change is within the scope of the existing safety case or a new safety case is required to make the change, and
ii. whether to provide a normal service, or provide an alternative service in a fall-back mode, or even to close the service(s), until the change is in place.
b) implementing security-related changes (e.g. patches and updates) that do not require a new safety case. The criteria for this are in the CSMS procedures, which either:
i. identify relevant limits of validity in the specifications of the relevant POSSs, for which the specifications are a correct statement of behaviour, or
ii. define adequate verification procedures to establish that the change has no effect on the relevant POSS specifications.
c) detecting anomalies in functional system behaviour, supplied services, data, configuration and devices
d) responding to detected anomalies, in accordance with pre-defined procedures and playbooks (which cover any fall-back operation, remedial action, and return to full operation)
e) coordinating with those responsible for service provision and functional system management, to decide whether to place the functional system into lockdown or reduced service modes of operation when increases in cyber threat are detected.
An analysis, used to inform the safety case, that establishes the environmental cyber threat, and determines the potential effects on the functional system, in terms of potential behaviour of the POSSs. This analysis takes account of the functional system architecture, including the mitigations provided by the cyber and physical security controls and the specified activities of the CSMS. The analysis addresses threats attacking the functional system and its interfaces, and threats involving remote access and physical modification (e.g. an insider fits a USB device with malware, or enables a new communications link).
This analysis is conducted according to a procedure that must be justified, either directly or by prior approval. The CTIBI Analysis results in:
a) the potential cyber threat-induced behaviour, which is then included into the POSS specifications (the CTIBI Analysis provides the supporting evidence for these specification elements)
b) a record to demonstrate that the CTIBI Analysis has been carried out completely and correctly in accordance with the procedure, by competent personnel.
Sometimes a safety case is prepared for a system that has been providing a safety-related service(s) for some time without a safety case, or a safety case that is no longer deemed adequate. In this scenario, the safety case produced is for the service(s)/system in its current state (represented as the only transitional stage). Such services and systems are referred to as Extant Services/Systems in the book.
Fall-back modes address the need to continue providing a service, possibly a reduced service, under abnormal conditions such as maintenance operations or failure of one or more POSSs. They are pre-planned reconfigurations of the functional system that are used if an abnormal condition is identified, usually to isolate the abnormality for rectification.
The safety arguments justifying that the fall-back modes of operation are safe may be integrated either at the top-most safety case argument or within the argument for each individual transitional stage.
A functional system is operated by a Service Provider to deliver one or more desired services, and comprises a combination of people, procedures and equipment. This includes all those things considered as being assets (e.g. buildings, expertise, information in a database) that are required for the operation of the functional system. A functional system may use services and resources provided by other functional systems, which may be operated by different Service Providers.
The book considers the functional system as comprising an operational system that delivers the desired service(s), and support systems that are present to support running the operational system.
The operational system includes all systems that directly contribute to the provision of the service(s), including systems such as telecommunications, power, cooling, and underlying IT infrastructure.
Support systems include training, data preparation, and test and development systems. The activities undertaken by these systems are referred to as support activities.
The distinction between operational system and support system is made as the degree of assurance provided in the safety case for support systems is typically lower than that for the operational system.
If the assurance strategy is to just address the change, then an Impact Analysis is used to identify all POSSs whose existing assurance (arguments and evidence that their specifications are trustworthy) will be invalidated by a change, and hence (along with new POSSs) establish the Internal Assurance Scope (see Glossary entry below). The assurance for a POSS may be impacted by:
a) a direct change made to the POSS
b) a change to the (local) environment of the POSS, such that it is outside the range for which its specification was valid, brought about by changes in communications or resources shared with other POSSs
c) a change to the safety requirements apportioned to the POSS in the changed functional system, either in terms of function, integrity or confidence required.
Although a single transition plan for a transitional stage (see below) can encompass installation, commissioning, (other) transitional activities and recovery, each of these aspects could have its own separate plan. Typical activities covered by these plans (as relevant to a safety case) include:
a) training for operators and engineers, and the facilities required for this
b) issuing of new/revised procedures and possibly removal of old ones
c) installation, commissioning and connection of new equipment and interfaces
d) changes to, and commissioning of, existing equipment.
e) removal of replaced or obsolete equipment
f) initiating the services to be provided during the transitional stage, and are provided while the other transitional activities are undertaken
g) communication with other stakeholders concerning transitional activities, at the start of and during the transitional stage
h) internal coordination to synchronise and sequence the transitional activities
i) coordination with other changes
j) verification and assurance activities, including checking that the systems have been changed as planned.
Integrity is a complex subject where practice and theory are slowly advancing and may not be individually or even collectively stated for behaviours. Confidence is an even more complex subject that is poorly developed, along with the related problem of defining how much confidence is required.
A safety case is concerned with demonstrating that safety criteria are set and satisfied in order that the resultant risk is acceptable. According to the safety analyses, the dependence placed on each of the individual behaviours is established, and this is reflected in the safety criteria and the safety requirements subsequently derived from them.
For example, one safety requirement for the cruise control of a car might be:
Behaviour: the car shall maintain the set cruise control speed (under specified conditions)
Functional performance: the set cruise control speed shall be maintained within +/- 2%
Integrity: the probability that the car will maintain the set cruise control speed within +/- 2%
Confidence: sufficient evidence will be produced to provide x% (or high/medium/low) confidence that the set cruise control speed will be maintained within +/- 2% with that probability.
Integrity is the required probability that the required performance is delivered whenever required, which may include during failure conditions.
The required confidence concerns the level of assurance that is required to sufficiently demonstrate that the behaviour and performance will be achieved in operation with the specified integrity. Hence the required confidence does not concern the delivery of the behaviour, but the importance of assuring its achievement. The confidence that the assurance is required to demonstrate is primarily driven by the tolerability of the various risks associated with the operational impact of the behaviour. Whilst the required confidence could be mandated quantitatively for the industry sector, it is more likely to be mandated by defining acceptable practices.
Many sectors use the concept of Safety Integrity Levels (SILs) or Design Assurance Levels (DALs) to jointly address both integrity and confidence.
Where the safety case argument addresses only the change from a previous baseline, a key part of the argument is to establish that the correct Internal Assurance Scope, for which behavioural assurance is required, has been identified. The Internal Assurance Scope is defined in terms of the POSSs that are changed, together with those that are impacted. The validity of the safety case then relies on existing behavioural assurance outside that scope, because the behaviour outside that scope has, by definition, not been changed or impacted. If the previous assurance for parts outside the scope was no longer considered valid due to changing requirements for assurance, then they will have been included into the scope.
The way that the impact of a change can increase the scope of the functional system that needs to be assured is complex. The book provides further guidance when defining the safety case extract.
If the reader is using the book in conjunction with CAP 1801 it should be noted that the book uses the term 'The Internal Assurance Scope' in place of 'the scope of the change' as it is considered a more widely understood term.
See Functional system, operational system & support system.
'POSS' is used in the book to refer to any Part of the Operation and Support Systems. The term ‘Part’ is used to avoid the architecture level implications of terms like ‘component’, ‘subsystem’ and ‘system’.
POSSs must be uniquely identifiable and under configuration management. They can be defined at any architectural level (for example an electronic component or a data processing system), and so a POSS is part of another POSS, its parent, and so on until the whole functional system is reached. Parent POSSs are only considered to be changed if at least one of their (immediate) child POSSs has changed behaviour (which includes new and removed child POSSs).
The complete set of POSSs makes up the functional system. See glossary entry for ‘Functional system, operational system and support system’.
The extract of the safety case contains précis that provide an overview of the material omitted from the argument in the Safety Case Report.
The intended role of a précis is to provide information that is sufficient to satisfy the reader of the Safety Case Report by describing the lower-level argument material in principle. If this does not provide sufficient assurance, the reader can then gain further assurance by accessing lower-level material in the safety case, to examine specific aspects or examples of the précised material.
Précis should be written bearing in mind that readers of the Safety Case Report are not expected to need to verify the technical arguments in the safety case. If verification is required to adequately assure the validity of any part of the safety case, the project should have identified this, instigated the assurance activities (ensuring appropriate independence, if necessary), and integrated the resulting evidence into the arguments of the safety case. This assurance would then appear in the Safety Case Report as part of the Safety Case Extract, probably appearing as part of a précis.
Précis need to contain sufficient traceability/referencing information, either to the safety case or a referenced document, to facilitate further examination.
A project is the organisation that is responsible for the implementation of a system or change. A project may also be initiated to generate a safety case for an extant system.
Applicable legislation or regulations may permit proxies to be used to derive safety criteria. A proxy is some (quantitatively or qualitatively) measurable property that can be used to represent the value of something else related to risk (e.g. crowd density might be used as a proxy for the risk of people being crushed in some circumstances).
Regression testing refers to random and/or informed testing to check for unexpected impact on behaviour. It is not intended to provide the main formal verification of specified behaviour.
Depending on what is being approved and the applicable legal framework, the Responsible Authority for any particular approval could be a manager, an approval body or a regulator.
See ‘Approval & Approved’.
These are the principles that determine whether the change (or service) is acceptably safe (e.g. ALARP or GAMAB). They usually originate from legislation or regulations applicable to the industry sector of the service(s) provided, but it is possible that additional direction may be given by the customer of the service, the Service Provider or the project's own organisation.
The risk acceptance principles are the top-most safety requirements applicable to the change.
See ‘Safety requirements’ and ‘Safety Criteria’
The safety criteria specify acceptable ‘safety performance’ (see below) for determining whether the predicted performance of the functional system will be acceptable. The safety case defines safety criteria for each transitional stage, and possibly for the complete change.
The set of safety criteria for a transitional stage (or for the complete change) needs to address all safety-related behaviour within the Internal Assurance Scope, and so must be specified at that architectural level or above, and at a point on the accident trajectory (the causal chain leading from the behaviour to an accident) associated with that behaviour. Hence the set of safety criteria completely addresses all accident trajectories associated with the Internal Assurance Scope. The point at which a safety criterion is defined inherently also defines a scope for that safety criterion, being the extent of the functional system (and external mitigations, if appropriate) that contribute behaviour up to that point in the accident trajectory. The scope of a safety criterion may also be confined to one or more operating or fall-back modes.
See ‘Risk acceptance principles’ and ‘Safety requirements’
‘Safety performance’ denotes the functional system or service properties that are used as the measure of the safety of the service(s) provided, both when setting the safety criteria, and when predicting the performance of the changed functional system.
Using the term ‘safety performance’ allows the safety criteria and predicted performance to be measured in terms of either risk or rates of occurrence, according to the nature of the change.
Safety requirements are generally considered to be requirements that, once met, contribute to the safety of a service.
The book uses three terms for safety requirements at three distinct levels. In sequence from top-most to most detailed, these are:
a) The top-most safety requirements that define whether the change/service is acceptably safe are termed the 'risk acceptance principles'. Common examples are ALARP and GAMAB.
b) The 'safety criteria' define the acceptable safety performance for each transitional stage, derived from the ‘risk acceptance principles’, at the boundary of the Internal Assurance Scope applicable to that transitional stage.
c) When the book refers to 'safety requirements' other than these, it means the safety requirements derived from the safety criteria to identify the required performance of POSSs within the Internal Assurance Scope.
See ‘Risk acceptance principles’, ‘Safety Criteria’ and ‘Safety performance’
A service is an output from a functional system that is intended to be of use. The output may be either intangible (e.g. information, data or instructions) or tangible (e.g. a product, commodity or function). In addition to the primary services, the functional system may also have one or more functions that provide services to protect against localised harm.
See Functional system, operational system & support system.
See Functional system, operational system & support system.
Transitional activities are those activities concerned with effecting the change. Some are responsible for the moment of transition from one transitional stage to the next, as they effect the change in the state of the functional system. However, many other transitional activities take place during a transitional stage. They concern:
positioning or modifying assets that are currently non-operational, so that they are ready to be used operationally
clearing up, including ‘making good’
removing assets for disposal.
The safety of the transitional activities undertaken during each transitional stage needs to be demonstrated by establishing the risk to the service(s) being provided during the transitional stage. This needs to take account of all transitional activities occurring during that transitional stage. The safety of all these transitional activities together needs to be demonstrated in the context of the functional system/service in operation when the transitional activities are undertaken.
Whilst it is not necessary for a safety case to address the safety of the existing pre-change service(s), treating it as ‘transitional stage zero’, it may be necessary to justify the safety of any transitional activities taking place before the first transitional stage if these cannot be covered by ‘permit to work’ procedures.
A change can be implemented in one step (i.e. as a single transitional stage), or as a sequence of transitional stages by operating the functional system/service in a sequence of states. When the transition is made to the last transitional stage, it places the functional system into the final operating state i.e. the complete change has been implemented (although there may be remaining transitional activities to remove redundant assets and make good).
A safety argument needs to be provided to demonstrate the safety of the service(s) provided during each transitional stage. This is mainly accomplished by predicting whether the safety performance of the functional system during that transitional stage satisfies safety criteria that are set according to the applicable risk acceptance principles. Additionally, the potential effects of the transitional activities (see above) on the safety of the provided services needs to be demonstrated.
During a transitional stage the level of the service(s) provided may be unaffected, be reduced/restricted when compared to the initial or final intended service(s), or the service(s) may be completely withdrawn.
A transitional stage starts with the transition of the functional system/service to a new state. The activities that implement the transition to the new transitional stage are among those specified as transitional activities (see above).