Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Fault tolerance in Cloud Computing:

No description

Karthik Rp

on 8 October 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Fault tolerance in Cloud Computing:

Fault tolerance management in Cloud Computing:
A System Level Perspective

III. Resource Manager
Role of Resource manager:

> Continuously monitor the working state of resources(Physical/Virtual) in database of inventory and log information.
> Create a graph representing topology and working state of resources and introduce in IP's system.

IV. Fault tolerance delivery Scheme
> ft_unit - fundamental module that applies a coherent fault tolerance mechanism to particular recurrent system failure.

> ft_unit handles failures at virtualization layer rather application layer itself.

> Scheme is realized in two stage: Design, Runtime
> Approach was presented toward transparently delivering fault tolerance property.
> Fault tolerance properties were realized as independent modules, thus making easier to create solutions.
> Components of this framework can be extended to improve the overall resilience of cloud infrastructure.
IEEE Systems Journal, Vol 7, No. 2, June 2013
Seminar by,
Karthik R P
M.Tech Network Engineering

1. Introduction
2. Motivating Scenario and Basics
3. Resource Manager
4. Fault tolerance delivery scheme
5. Fault Tolerance manager:
Architecture Framework
6. Conclusion

I. Introduction
> Internet: On demand service with extensive use of Virtual Machines
> Two main concerns: Availability and Reliability
> Huge risk: Manifest resides in data centers, out of scope of user's organization.
> Traditional way of addressing is to address this at development or procurement. But is this possible in cloud?? o.O
> Hence its required to present a scheme which,
i) Delivers a fault tolerance solution based on user requirements.
ii) Ascertain the properties of the solution by runtime monitoring.
Yes, But difficulty due to

i) High system complexity.

ii) Abstraction layers of Cloud computing that release limited information about underlying infrastructure to its users.
II. Motivating Scenario And Basic Concepts
III.A Motivating Scenario
Three Stake holders in Cloud Computing

i) Infrastructure Provider (IP)
ii) Client (C)
iii) Fault tolerance service provider(SP)

<Banking Problem>
Fig II.1: Banking Problem
Application Tier
Data Tier
Uses storage service offered by the IP to store and retrieve data.
Uses IP's Compute Service offered by IP to process its operations and respond to customers service.
Failure in IP's system: High Implications on reliability and availability of banking service
II.B Basic Concepts
Client engages with SP to obtain fault tolerance properties, and SP creates Fault tolerance solution based on the client's requirements.

While making sure, providing balance between

i) Fault Model
ii) Resource Consumption (CPU, Memory, Bandwidth, I/O)
iii) Performance (Fault detection, replica launch, failure recovery latencies)
Understanding the need.
Solution Basics
Most widely used fault tolerance mechanism:
Maintaining Redundancy(Active/Passive(Cold/Hot)).

Fault detection Protocol: Heartbeat Protocol.

SP Business goals:
i) Maintain consistent view on resource.
ii) Must develop
> Realize Fault tolerance mechanism.
> Evaluate fault tolerance properties.
>Delivery Scheme ->enforce FT mechanism on client's application.
iii) Design framework to integrate with existing cloud infrastructure.
Database contents
Machine's unique serial number
Composition of the machine
> Processor speed
> Number of Hard disks
> Memory modules
Date when machine was commissioned
Location of machine in cluster
Memory used/free
Disk capacity used/free
Processor core utilization
Fig III.1: Example of a graph generated by RM
IV.A Design Stage
> Client Request to SP for Fault tolerance properties.

> SP analyzes the requirements.

> Match them with available ft_units.

> Formulate ft_sol using ft_units.
Fault tolerance property can be specified by p=( u , A )

u -> a particular ft_unit

A -> set of attributes

p=(u, {mechanism = active_replication, no_of_replication = 4, fault_model = node crashes})
Runtime Stage
1. Starts immediately after the SP forms ft_sol.
2. Critical stage, since requirement and constraint may change during run time.

> Define a set R of rules over attributes a and their values v(a).
> Violation of any rule 'r' in R implies 'p' is invalid.
> check f(s,R) constantly over some time period.
>If f returns false, ft_sol will be remodeled.
An example of ft_sol
invoke: ft_unit(VM-instance replication)
invoke: ft_unit(failure detection)
execute(failure detection ft_unit)
}while(no failures)
if(failure detected)
invoke:ft_unit(recovery mechanism)
V. FTM: Architecture framework
V.A Client Interface
> Essential to include client interface component within FTM to provide specification language so that clients can specify the requirements.

> Input in High level format such as percentage, numbers or range.
V.B FTM Kernel
> Responsible for composing a fault tolerance solution by using ft_units implemented in SP.

> Composed of Service Directory, Composition engine and evaluation engine.
Fig V.1: Architectural overview of FTM and various components
Fig V.2: Sequence diagram showing the interaction of all FTM components for a single client.
Thank you..!!
Any Queries??
Ravi Jhawar
Vincenzo Piuri
Marco.D Santambrogio

Full transcript