Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

PostgreSQL Disaster Recovery with Barman

Barman, Backup and Recovery Manager, standardises backup and recovery operations, allowing database administrators and system administrators to easily integrate their PostgreSQL solutions in their disaster recovery plan
by

Gabriele Bartolini

on 24 May 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of PostgreSQL Disaster Recovery with Barman

Some numbers

#1
most visited website in Italy in its category in 2014
#6
most visited website in Italy in 2014
5 million classified ads, differentiated by
product categories
geografic range
over
8 million unique users per month
over
30 million page views per day
It is not a matter of
IF
, but
WHEN

DISASTER
PLAN
BE
PREPARED

REACT
+ continuous backup
Replication
P.I.T.R.
Business continuity
low RPO and RTO
PostgreSQL Disaster Recovery
Gabriele Bartolini
with
History
Part one
An introduction to
Part two
Major features of
Part three
Relevant case studies
Part four
The road ahead
2008 - 2010
Join 2ndQuadrant
Oracle = RMAN
Postgres = custom scripts
Need for DR Tool?
2011
Oracle migrations
custom scripts? no, thanks
Prototype
2ndQuadrant
Navionics
CSI Piemonte
...
2012
Version 1.0
is released
first stable release
4CaaSt research project
2015
Version 1.4.0
is released
Barman XIII
Requirements
Python 2.6+
GNU Linux
rpm and deb packages
PostgreSQL 8.3+
rsync over ssh
GNU GPL 3
Main principles
Integration
Usability
user interface, configuration
Automation
backup, monitoring, recovery
Backup types
Support for
physical hot backup
Full, incremental and continuous
No support for logical backup
pg_dump, pg_dumpall, PgBackMan
Relies on WAL archiving + base backups
See PostgreSQL documentatation
Backup scope
Backup of a PostgreSQL Server
Cannot backup a single table
Cannot backup a single database
Tablespaces
transparent backup
recovery with relocation
RPO vs RTO
low RPO
high RTO
Remote
backup and recovery
Multiple servers
Backup catalogue
Retention policies
WAL compression
Monitoring
Other features
Incremental
backup and recovery
WHY
B
ackup
a
nd
R
ecovery
Man
ager
Successful recovery makes a backup
VALID
An architectural choice
A separate server for Barman
SPOF
Concentrate backups on a single server
How to protect?
Not a primary failure
Backup of Barman files
disk, tape, cloud
Future:
geo-redundancy
import/export
AWS support
any comments?
You can bet it's a Friday
Minimalistic DR solution
is "recovery" part of the name
?
Business/law/storage needs
Regularly prune the backup catalogue
remove
obsolete
backups
"retention_policy" option
REDUNDANCY X
e.g. "
REDUNDANCY 4
"
RECOVERY WINDOW OF X <period>
period = DAY/WEEK/MONTH
e.g. "
RECOVERY WINDOW OF 4 WEEKS
"
Usability and automation
Globally defined
Per-server
override
Easy to setup
Highly configurable
Fully automated
File based
Exploit rsync & hard links
Reduce space, network, time
average 50%-70%
Large portion of unchanged data between a full backup and the next
gzip, bzip2, custom
DETECT
NOTIFY
ACT
> barman check brian
Server brian:
ssh: FAILED (return code: 255)
PostgreSQL: OK
archive_mode: OK
archive_command: OK
directories: OK
retention policy settings: OK
backup maximum age: OK
interval provided: 7 days, latest backup age: 1 day, 8 hours, 58 minutes
compression settings: OK
minimum redundancy requirements: OK
have 6 backups, expected at least 1
> barman --nagios check brian
BARMAN CRITICAL - server brian has issues * brian FAILED: ssh
> echo $?
2
Limitation of bandwidth
global/server/tablespace
Network compression
Backup from standby (
pgespresso
)
Hook scripts
File listing
...
Company name
Navionics
Industry sector
Electronic Charts
Market
Global
Locations
Massarosa, Italy (HQ),
USA, India and Estonia
Founded
1984
Website
www.navionics.com
Company background

Navionics is the
leading company in Electronic Charts
, specialised in the manufacture of electronic navigation charts and systems for marine and outdoors use.

Navionics has the
world’s largest database of marine and lake charts
, covering the salt waters of the entire planet as well as tens of thousands of lakes and rivers.
PostgreSQL
Barman
Server size:
10.8 TiB
Frequency: weekly
Backup time:
~20h
Deduplication ratio: 53%
5.7 TiB reused out of 10.8 TiB
WAL rate: ~90,000 WALs per week
average of ~9 per minute
WAL compression ratio: 30%
Tape copies
Semi-automated process
Post-backup hook script
tar + "barman list-files"
BACKUP_ID-standalone.tar.gz
BACKUP_ID-wal.tar
runs in ~8.5h
Manual copy on tape
Tapes shipped to Italian HQ
via express courier
Future
Add more standby servers
Evaluate syncrep
Evaluate repmgr
Logical decoding
Cascading replication
Cascading backup
2011: Oracle to Postgres
2015: Postgres 8.4 to 9.4
Ubuntu 14.04 LTS
Largest cluster 10.8 TiB
was ~14 TiB on 8.4
India
Company name
JobRapido
Industry sector
Job search engine
Market
Global
Location
Italy
Founded
2006
Website
www.jobrapido.com
Company background

Jobrapido.com
is one of the biggest and fastest growing job sites in the world.

"We help job seekers search from one place through millions of jobs posted all over the web: on traditional job websites, on recruitment agency websites, on corporate career websites. And we help recruiters reach millions of job seekers."
PostgreSQL
Barman
Server size: 104 GiB (largest db)
Frequency: daily
Backup time: 2h10m
WAL rate: ~6,000/day (~4/hour)
WAL compression ratio: ~70%
Geo-redundancy
2 data centres
14 servers
64 databases
~650 GiB
Company name
Subito.it
Industry sector
Classified Media Ads
Market
Italian
Locations
Milan, Italy
Founded
2007
Website
www.subito.it
Company background

Subito.it is part of Schibsted Media Group, Norwegian multinational company that operates successfully in 29 countries in publishing (newspapers and free press), online (classified ads and digital services) and mobile markets

PostgreSQL
Barman
Server size:
1.38 TiB
Frequency: daily (early at night)
Backup time:
2h45m
Deduplication ratio: 58%
800 GiB reused out of 1.38 TiB
WAL rate: ~7,000 WALs per day
average of ~5 per minute
WAL compression ratio: 65%
Automated recovery
Future
Upgrade to PostgreSQL 9.4
logical decoding
materialised views
Add more standby servers
Evaluate syncrep
Evaluate repmgr
2007: Postgres 8.3
2014: "Migration" to 9.3
RHEL 6
Database size: 1.38 TiB
400 tps (peek)
at night 50 tps
since 2013
Milan 1
Milan 2
automated
synchronisation
Complete regeneration of the reporting database
After the successful end of "barman backup"
Data updated to the last transaction of the backup
Used by the marketing department for BI
Totally disjointed server
Overview
Subito.it is
100% sure
that
their backups effectively work.
Best outcome
Post backup hook script:
perform initial checks (e.g. connection)
create restore point (‘barman_last_backup’)
stop the PostgreSQL server
recover to ‘barman_last_backup’ (remote, incremental)
customise the configuration
start the PostgreSQL server
wait for PostgreSQL to exit the recovery phase
customise the PostgreSQL instance (e.g. new users)
enjoy!

Current runtime: 6 hours
Technical details
tar
copy method
pg_basebackup copy method
S3 storage
strategy
tar storage
strategy
Base backup
compression
Base backup
encryption
get-wal
Geo-redundancy
... and more
Windows support
Keep/NoKeep
Backup scheduling
Grandfather/Father/Son (GFS) backup scheme
Recovery nodes
PostgreSQL
on Windows
Icinga/Munin
Icinga/Munin
Nagios/Munin
low RPO
low-high RTO
Basic DR solution
low RPO
low RTO
Basic DR + HA solution
barman sync-info
barman sync-backup
barman sync-wals
2. rsync + ssh
1. ssh
passive node
primary node
primary_ssh_command
asynchronous copy (
barman cron
)
cascading backup
restore_command = 'ssh USER@HOST
barman get-wal
SERVER %f > %p'
ssh
Import/export
export
import
tar
Similar tools
walmgr
OmniPITR
WAL-E
pitrery
BART
devops
Practice
is the only way to
Mastery
Final thoughts
Robust, maintained and tested
Integration, usability, automation
Monitoring and alerting
Migrations from Oracle
Standardised our 24/7 support service
Tell us your Barman story
@_GBartolini_
Test and simulate
Gabriele, Marco, Giulio, Francesco, Giuseppe
Start simple, then grow
Evaluate RPO, RTO, costs and risks.
complex
simple
Have one.
Regularly update it.
Let everyone know about it.
DR plan
... and our Kanban board
www.2ndQuadrant.com
Roadmap 2015/16
subject to private funding
Thank you!
Website
http://www.pgbarman.org/

Documentation
http://docs.pgbarman.org/

Mailing list
https://groups.google.com/forum/#!forum/pgbarman

Blog
http://blog.2ndquadrant.com/tag/barman/

Professional support and turnkey solutions
http://www.2ndQuadrant.com/
Copyright (c) 2012-2015, 2ndQuadrant
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
gabriele.bartolini@2ndQuadrant.it
recovery/standy
WAL
PGConf US 2015
New York City
March 27
Some Barman commands
barman list-server

barman backup <SERVER ID>

barman list-backup <SERVER ID>

barman show-backup <SERVER ID> <BACKUP ID>

barman recover [options] <SERVER ID> <BACKUP ID> <DESTINATION DIR>
--remote-ssh-command
--tablespace NAME:LOCATION
--target-time, --target-name, --target-xid

barman delete <SERVER ID> <BACKUP ID>

barman cron

barman check <SERVER ID>

Every day.
Full transcript