Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


backup and restore estimation

No description

a b

on 11 July 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of backup and restore estimation

Estimation of time/size of backup/restore operations
Grzegorz Kwasniewski
Backup and restore team
mentor: Łukasz Gaża
April - July 2013
What do we have?
What is the problem?
IBM Pure Data System for Analytics
capable of strong TBs to PBs of data
user needs to know how much time will backup or restore take (it can take few hours)
user needs to know how big data storage does one need to perform backup or restore (backups can use TB of disk space)

What do we want?
Make precise estimations of:
size and time
of backup and restore operations
both full and incremental
How to do that?
What do we know?
database schema
number of rows in each table
size of each table in database
Database provides:
Additionaly... :
amount and size of eventual UDXes on disk
What do we not know?
How data will compress in backup
How much space will take metadata (users, schemas, views)
How current usage of database affects backup and restore time
How fast will it be written to disk and transferred on network
What else can we know?
Do we have historical information about previous backups/restores?
In-progress rate sampling
We know:
Estimate size and time basing on historical data
for backup:
size_of_backup = size_in_database / average_compression_ratio
time_of_backup = size_in_database * average_backup_speed
for restore:
size_in_database = size_of _backup * average_compression_ratio
time_of_restore = size_of_backup * average_restore_speed
precise enough (average error is lower than 15%)
doesn't require additional operations
requires additional historical database to be created
requires stored information about previous operations
can be inadequate if operations speed change in time
Historical data - results
Do we want to perform
backup operation?
size estimation
current backup size on disk
We can calculate:
current backup speed
time estimation of operation
very precise (with good size estimation errors can be lower than 5%)
gives information about current progress
makes corrections to its estimation in-flight
can be only made during actual backup
In-progress rate sampling - results
Constants based
uses hard-coded database size
-backup size ratio and backup/restore operations speed based on tests and machine information
can always be performed (new system with no historical data, we want only estimation, don't want to perform actual operation)
very inaccurate
should be used to estimate only the magnitute of backup size and time
Errors up to 500%
Constants based - conclusion
Final effect - combination
Depending on information available and context, chooses propriate method
Gives best estimation for current problem
What have we acquired?
average estimation error: 12%
average estimation error: 2.7%
average estimation error: 200%
nzbackup and nzrestore
works from command line
allow to make backups to file system or 3rd party backup and restore systems
handle full and incremental backups
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Estimated operation time of different databases compared to actual operation time (in percentages of actual time)
Size and time of backup depend not only on size of database, but also on schema and nature of data
How precise are different methods of estimation
What it requires to make good estimations
developed scripts can be included in nzbackup and nzrestore operations
they can be used to optimize Netezza usage by better planning of backup plans
Full transcript