Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Fault-Tolerance in UML
Transcript of Fault-Tolerance in UML
Dependable Software Design
Mohammad Javad Amiri
Dr. Mohammad Abdollahi Azgomi
we suggest some UML extension for the description of Fault-Tolerant (FT) software architectures.
The solutions are oriented to complex systems (in general distributed systems), with high-reliability requirements.
To model software architectures that employ FT
techniques, we require some constructors for the description of specific concepts involved in these techniques.
we limit the scope of FT to the safety mitigation means. We concentrate our work in the scope of the FT technical solutions.
FT Architectures MetaModel
Lyu & Torres propose a classification of FT techniques in two types:
i) single-version; in this type of solution a single piece of software includes some techniques to detect the fault and handle the errors
ii) multi-version; the same piece of software can have more than one version, and the architecture provides support to avoid global failures when one version fails.
Single and multi-version have different architectures, but both require some basic concepts
There are very different types of tools to support the fault detection.
1. Fault Detection:
, they include hardware detection mechanisms such as overflow, division by zero and others and software checks that raise exceptions
, such as watchdog timers
based on some kind of redundancy
Functions and software structures
that support some properties such as inverse computations and redundant fields
based on matching of multiple outputs
The replication of software elements requires the identification of the group of elements that compose a replication block to provide the common service.
Groups of elements have associated FT policies and styles that customize FT mechanisms according to application characteristics (response time requirements, maintenance considerations, development costs, etc.).
3. Replication Styles:
The FT architectures use different policies to handle the different types of replications, and recovery information.
Some styles define active replications (all replicas remain active) and others define more passive ones (only one replication is active while others wait for synchronization and wake-up).
Some policies require that the state of all replicas be the same, while in others replicas can have divergent behaviors.
For passive replications, there are different approaches to update the state of passive replicas. All these types of configuration parameters define the replication styles.
Four meta models of FT Profile
Fault Tolerant Core:
This package includes the basic concepts for the description of FT architectures. These basic concepts define how to apply FT policies and styles to groups of replications, how to identify these groups, and how to identify the individual replications.
This package includes solutions for the detection of faults. The different approaches for the detection of faults go from the automatic generation of detectors to the entire support from the application.
Object Group Properties:
The group properties include information such as the type of consistency checking, the monitoring of the different members in the group, and how to control the aggregation of new memberships.
FT techniques use different approaches to support the replications with different properties. The single and multi-version solutions have different replication techniques. Two different styles are passive and active replications that require different types of monitoring and state checking. Another classification depends on the type of information required to synchronize the state of replicas and their persistence.
FT Sub Profiles
Core Package for FT Architectures
FT extensions support the design of FT systems with:
No single point of failure that may cause the loss of a critical function.
Systematic monitoring and control of applications, nodes, external links, and networks to detect failures.
Manual System Control capability (e.g., stop/restart of run-time applications, or transitions to a degraded mode).
Global and redundant supervision.
Local supervision capability on each node.
Widely used synchronization algorithms, and tolerance to loss of external time reference.
core model for the description of FT Architectures
Fault Tolerance Domain
Many applications that need fault tolerance are quite large and complex. Managing such applications as a single entity is inappropriate. Each Fault Tolerance Domain typically contains several hosts and many object groups, and a single host may support several Fault Tolerance Domain. The Fault Tolerance Domain decides about the default policies that are applied in the Server Object Group and Replicas that it manages. The policies include the approaches to detect the errors and styles of replication management. Each Server Object Group has associated a Fault Tolerance Domain.
Examples of policies are the type of Replication Style (e.g., passive, active), initial number replicas, and minimum number replicas. Fault Tolerance Domain defines the default policies that apply to all object groups associated to this manager. It is also possible to set the properties of an object group.
Server Object Group
To render an object fault-tolerant, several replicas are created and managed as an object group. While each individual replica has its own identity, a reference to the entire group makes transparent to the clients the concept of replication. The clients invoke the object group, and the Group Manager decides the replicas that must execute the invocation and manages the validity of responses.
The redundancy entity in this specification is the Replica. The number of replicas or their location are basic parameters to support the failure management.
The Loggable State defines the significant state of the entire group of Replicas. This state is used for the synchronization of primary and backups.
Replica State defines the dynamic state information of a Replica, which depends on the role of the replica in the group. These roles depend on the policies used in the group but examples are Primary Replica, Backup Replica, Transient Replica. For each type of policy the information included in a Replica State is different.
Fault Detector Deployment Policy
Fault Detector Deployment Policy describes the required material that the safety engineering uses to describe how to monitor software faults. We define three types of detectors:
Statically Deployed Fault Detectors:
In an operating environment with a relatively static configuration, location-specific Fault Detectors will typically be created when the FT infrastructure is installed. For example, the stand-alone Fault Detectors could be implemented as daemon processes that are installed with the FT infrastructure. These Fault Detectors could be registered in a manner internal to the FT infrastructure, allowing the infrastructure to include them in every fault-tolerant application within the fault tolerance domain in a transparent manner.
Infrastructure Created Fault Detectors:
The FT infrastructure may create instances of Fault Detectors to meet the needs of the applications. Because these Fault Detectors are created (or, at least, configured) by the FT infrastructure, it is the only one who needs to know the identities.
Application Created Fault Detectors:
It might be necessary or advantageous for applications to create their own Fault Detectors. For example, applications might have unique knowledge of their operating environment, such as access to hardware indicators of faults within the operating environment. However, unlike the other types of Fault Detectors, the FT infrastructures do not need to know the identity of application-created Fault Detectors.
Metamodel of Fault Detection Policies
Metamodel of FT Core
FT Group Properties
FT infrastructures provide support to detect faults and activate mechanisms to handle these faults. The infrastructures can include different mechanisms to monitor the replicas to detect the failures, to check the consistency, and to handle the faults.
the metamodel for the description of FT Group Properties includes the following concepts:
Fault Monitoring Style
Fault Monitoring Granularity
Describes responsibilities for replica creation. Defines whether the membership of an object group is infrastructure controlled or application-controlled.
Application Controlled Membership:
The application may create a server object itself and then notify to the Group Manager the creation of the new replica. Another alternative is the creation from the Group Manager when application requests it. The application is responsible for enforcing the Initial Number Replicas and Minimum Number Replicas properties.
Infrastructure Controlled Membership:
The Group Manager decides when to create the members of the object group, and satisfies the Initial Number Replicas property, and after the loss of a member because of a fault to satisfy the Minimum Number Replicas property. The Group Manager initiates monitoring of the members for faults, according to the Fault Monitoring Style.
Describes responsibilities for replica consistency management. Defines whether the consistency of the states of the members of an object group is infrastructure-controlled or application-controlled. Some components of the FT infrastructure, such as the Logging and Recovery Mechanisms, are used only for object groups that have the infrastructure-controlled Consistency Style.
Application Controlled Consistency:
The application is responsible for check pointing, logging, activation and recovery, and for maintaining any kind of consistency appropriate for the application.
Infrastructure Controlled Consistency:
The FT infrastructure is responsible for check pointing, logging, activation and recovery, and for maintaining Strong Replica Consistency, Strong Membership Consistency, and Uniqueness of the Primary for the Cold Passive and Warm Passive Replication Styles.
Fault Monitoring Style
Describes how replica faults are controlled. Two types of Fault Monitoring Styles are:
• Pull Monitoring Style: The Fault Monitor interrogates the monitored object periodically to determine whether it is alive.
• Push Monitoring Style: The monitored object periodically reports to the fault monitor to indicate that it is alive.
The granularity determines the level of control used to detect the fails. Some types require more resources than others, but can detect exceptional occurrences.
Individual Member Monitoring:
Each individual member of this object group is monitored.
When a new replica in the group is created, and there is not another replication monitored in the same location, the new replica is monitored. This replica acts as a “fault monitoring representative” for the members of the other objects groups at that location. If another object at that location is already being monitored, then that object acts as the “fault monitoring representative” for the member of this object group at that location. If the “fault monitoring representative” at a particular location ceases to exist due to a fault, then the Replication Manager regards all objects at that location to have failed and performs recovery for all objects at that location. If the “fault monitoring representative” ceases to exist because the replica was removed from the group but had not actually failed, then the Replication Manager selects another object at that location as the “fault monitoring representative.”
Location And Type Monitoring:
When a new replica of a group is created at a particular location, and no other replica of the same group at that location is already being monitored, then the new replica of this object group at that location is monitored.
This member acts as a “fault monitoring representative” for the members of the other object groups of the same type at that location.
Fault Monitoring Granularity
Metamodel of FT Group Properties
FT Replication Styles
FT depends on entity redundancy, fault detection, and recovery. Replicated objects can invoke the methods of other replicated objects without regard to the physical location of those objects. Support for redundancy in time is provided by allowing clients to repeat requests on the server replicas, using the same or alternative transport paths.
The re-invocation is transparent to the client
Transient State Replication Style
This replication style family defines styles for objects that do not have any persistent state.
Persistent State Replication Style
This replication style family defines styles for objects that have a persistent state. The infrastructure uses persistent state to re-establish some state.
Stateless Replication Style
is a type of Transient State Replication Style. For the Stateless Replication Style, the behavior of the object group is unaffected by its history of invocations. A typical example is a server that provides read-only access to a database.
Passive Replication Style
This replication style family defines replication styles based on the uniqueness of the object replica that is responsible for managing incoming requests (this replica is usually called master or primary).
Active Replica Style
This replication style family defines styles where several replicas of a same object are active simultaneously (e.g., they all compute incoming requests).
The Active Replication Style requires that all of the members of an object group execute each invocation independently but in the same order. They maintain exactly the same state and, when a fault in one member occurs, the application can continue with results from another member without waiting for fault detection and recovery. Even though each of the members of the object group generates each request and each reply, the Message Handling Mechanism detects and suppresses duplicate requests and replies, and delivers a single request or reply to the destination object(s).
Active replication is useful when the cost of transferring a state is larger than the cost of executing a method invocation, or when the time available for recovery after a fault is tightly constrained. Two types of Active Replication Style are:
Active Replication Style
Active Replication Style:
All of the members of an object group independently execute the methods invoked on the object. If a fault prevents one replica from operating correctly, the other replicas will produce the required results without the delay incurred by recovery.
Active With Voting Replication Style:
They are active replication where the requests (replies) from the members of a client (server) object group are voted, and are delivered to the members of the server (client) object group only if a majority of the requests (replies) are identical.
The Passive Replication Styles require that, during fault-free operation, only one member of the object group, the primary member, executes the methods invoked on the group. Periodically if infrastructure is controlled, or on demand (if application controlled), the state of the primary member is recorded in a log, together with the sequence of method invocations. In the presence of a fault, a backup member is promoted to be the new primary member of the group. The state of the new primary is restored to the state of the old primary by reloading its state from the log, followed by reapplying request messages recorded in the log. Passive replication is useful when the cost of executing a method
invocation is larger than the cost of transferring a state, and the time for recovery after a fault is not constrained. Two types of Passive Replication Styles are:
Warm Passive Replication Style:
A form of passive replication in which only the primary member executes the methods invoked on the object group by the client objects. Several other members operate as backups. The backups do not execute the methods invoked on the object group; rather, the state of the primary is transferred to the backups periodically.
• Cold Passive Replication Style:
A form of passive replication in which only one replica, the primary replica, in the object group executes the methods invoked on the object. The state of the primary replica is extracted from the log and is loaded into the backup replica when needed for recovery.
Passive Replication Styles
Metamodel of FT Replication Styles
FT Architectures Profile
The packages of FT profile include stereotypes from the description of four main concepts:
• the FT policies for the domains (FT Fault Tolerance Domain)
• the identification of groups (FT Server Object Group)
• the state to be considered in state full replicas (FT Loggable State,and FT Has Replication State)
• the replicas styles (FT Replication Style and subclasses).
Core FT Profile
Profile of FT Replication Style
Iran University of Science and Technology
School of Computer Engineering