Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

backup and restore estimation

No description
by

a b

on 11 July 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of backup and restore estimation

Estimation of time/size of backup/restore operations
Grzegorz Kwasniewski
Backup and restore team
mentor: Łukasz Gaża
April - July 2013
What do we have?
What is the problem?
IBM Pure Data System for Analytics
capable of strong TBs to PBs of data
user needs to know how much time will backup or restore take (it can take few hours)
user needs to know how big data storage does one need to perform backup or restore (backups can use TB of disk space)

What do we want?
Make precise estimations of:
size and time
of backup and restore operations
both full and incremental
?
How to do that?
What do we know?
database schema
number of rows in each table
size of each table in database
Database provides:
Additionaly... :
amount and size of eventual UDXes on disk
What do we not know?
How data will compress in backup
How much space will take metadata (users, schemas, views)
How current usage of database affects backup and restore time
How fast will it be written to disk and transferred on network
What else can we know?
yes
no
Do we have historical information about previous backups/restores?
In-progress rate sampling
We know:
yes
no
Estimate size and time basing on historical data
for backup:
size_of_backup = size_in_database / average_compression_ratio
time_of_backup = size_in_database * average_backup_speed
for restore:
size_in_database = size_of _backup * average_compression_ratio
time_of_restore = size_of_backup * average_restore_speed
Advantages:
fast
precise enough (average error is lower than 15%)
doesn't require additional operations
Disadvantages:
requires additional historical database to be created
requires stored information about previous operations
can be inadequate if operations speed change in time
Historical data - results
Do we want to perform
backup operation?
size estimation
current backup size on disk
We can calculate:
current backup speed
time estimation of operation
Advantages:
very precise (with good size estimation errors can be lower than 5%)
gives information about current progress
makes corrections to its estimation in-flight
Disadvantages:
can be only made during actual backup
In-progress rate sampling - results
Constants based
uses hard-coded database size
-backup size ratio and backup/restore operations speed based on tests and machine information
Advantages:
can always be performed (new system with no historical data, we want only estimation, don't want to perform actual operation)
Disadvantages:
very inaccurate
should be used to estimate only the magnitute of backup size and time
Errors up to 500%
Constants based - conclusion
Final effect - combination
Depending on information available and context, chooses propriate method
Gives best estimation for current problem
=
What have we acquired?
average estimation error: 12%
average estimation error: 2.7%
average estimation error: 200%
Platform
Tools
nzbackup and nzrestore
works from command line
allow to make backups to file system or 3rd party backup and restore systems
handle full and incremental backups
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Knowledge
Size and time of backup depend not only on size of database, but also on schema and nature of data
How precise are different methods of estimation
What it requires to make good estimations
Tools
developed scripts can be included in nzbackup and nzrestore operations
they can be used to optimize Netezza usage by better planning of backup plans
Full transcript