Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Switching horses overnight

No description
by Stefan Andreatta on 1 November 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Switching horses overnight

Oracle
+ 10 years of experience

- high operating costs
- licensing policy:
- no support for virtualization
- SE vs. EE traps
non-technical migration todos
Evaluation
Management decision
Team Support
Customer involvement
Validation & regulatory clearance
Database migration
6 October 2011
4 November 2011
The challenge
Migrate databases
with lots of data (> 1 bn rows) in just one night
50 times (different customers)
reliably (system upgrade depends on it, maintainance window is precious)

synedra AIM release upgrade:
prepare release directory (binaries, configs, etc.)
stop system
migrate database
copy release directory
start system
Ora2Pg
Good for first attempts, but too slow
Reading single threaded from Oracle
Writing single threaded to Postgres
Still very useful to get empty master schema from Oracle to Postgres
view_by_pid_protection smallint DEFAULT 1,
CONSTRAINT orgunit_v_b_p_p_ck CHECK (view_by_pid_protection = ANY (ARRAY[0, 1]))
view_by_pid_protection boolean DEFAULT true
Master schema adaptions
Change column name 'constraint' to 'constrain'
Use boolean datatype in Postgres
Reduce width of some integer fields
Replace oracle functional indizes with partial indizes in Postgres
Normalized some tables and removed unused ones
Functions (ca. 20) and Triggers (ca. 20) migrated manually
Rolling our own migration tool
Written in Java, using Postgres JDBC
Migration:
disable triggers, drop constraints and indizes
copying data in parallel
recreate indizes and constraints, enable triggers
run ANALYZE to get valid statistics
fix sequences
Allows pre-migration of WORM tables
Also performs other tasks
Schema changes
Backup
Schema diagram
recreate indizes in parallel threads
recreate constraints in parallel threads
except foreign key constraints, which have (in our schema) cycles which in turn can produce deadlocks
get sequence value from Oracle DB
ALTER SEQUENCE <name> RESTART WITH <value from Oracle>
2 - 8 Threads run in parallel
data is split up into buckets
small tables (< 300.000 rows) are one bucket
large tables are split up into several buckets to avoid insufficient parallelism
Each thread migrates buckets until there are no more buckets left
For maximum performance insertions are done in batches of 10.000 rows using JDBC CopyAPI ("COPY <table> (<fields>) FROM stdin")
Performance tests / QA
Deployment
experiences & open issues
Backup application
Product adaptions
ca. 1 million lines of code
779 files had to be changed
early decision to pursue "Postgres only" strategy
syntactic sugar #1
SELECT
cast
(99
AS char
),
to_char(
localtimestamp
, 'hh24miss.
us
'),
cast
('42'
AS bigint
)
SELECT
to_char
(99),
to_char(
systimestamp
, 'hh24miss.
ff
'),
to_number
('42')
FROM DUAL
SELECT
myseq.nextval
...
SELECT
nextval('myseq')
SELECT * FROM ( SELECT query.*, ROWNUM p_rownum FROM
(...)
WHERE p_rownum <= :limit AND p_rownum > :start

SELECT ...
OFFSET :start LIMIT :limit
SELECT
NVL
(myrow, 'empty')
SELECT
COALESCE
(myrow, 'empty')
syntactic sugar #2
DELETE FROM mytable

UPDATE t1 SET f1 = 1

SELECT COUNT(*) FROM aet
GROUP BY
id ORDER BY id

SELECT * FROM (SELECT * FROM t1)
AS sub

SELECT ...
EXCEPT
SELECT ...
DELETE mytable

UPDATE t1 SET
t1
.f1 = 1

SELECT COUNT(*) FROM aet ORDER BY id

SELECT * FROM (SELECT * FROM t1)


SELECT ...
MINUS
SELECT ....
conceptional differences #1
SELECT 3 / 7 =>
0

WITH RECURSIVE
sub AS
(
SELECT id, name,
1 AS level
FROM orgunit WHERE abk = 'SYNEDRA'

UNION ALL

SELECT child.id, child.name,
parent.level + 1 AS level
FROM orgunit AS child
JOIN sub AS parent ON (
child.orgunit_fk = parent.id
)
)
SELECT * FROM sub


SELECT 3/7 FROM DUAL =>
0.42

SELECT id, name,
level
FROM orgunit
START WITH
abk = 'SYNEDRA'
CONNECT BY PRIOR
orgunit.id = orgunit_fk
conceptional differences #2
String
id = "17";
db.query(
"SELECT * FROM user WHERE id = $id[bind]$",
"id", id
);

Empty string is the same as NULL
SELECT NVL('', 'really') FROM DUAL =>
really
SELECT 'y' FROM DUAL WHERE '' IS NULL =>
y
SELECT 3 + '' FROM DUAL =>
<null>
SELECT null || 'y' FROM DUAL =>
y

Long
id = 17;
db.query(
"SELECT * FROM user WHERE id = $id[bind]$",
"id", id
);

Empty string is (of course) not the same as NULL
SELECT COALESCE('', 'really') =>
<empty string>
SELECT 'y' WHERE '' IS NULL =>
<empty result>
SELECT 3 + '' =>
<error>
SELECT null || 'y' =>
<null>
conclusion
A lot of minor adaptions required

Postgres is cleaner and safer. It improved the quality of the statements in our product.
12 April 2012
6 February 2012
7 August 2012
Switching Horses Overnight
A Software Manufacturer's View on Replacing an Existing Database Management System with PostgreSQL
Extracted user queries (search for patient documents) from log files
Executed the queries against a copy of a large customer database
Compared 16269 Postgres query times against Oracle query times (from logs)
Query times roughly the same
More than 95% of the queries faster than 1 s
99% of the queries faster than 5 s
Very few issues found in QA phase
Getting the name of a violated constraint in Java
Null handling issues
Issues with new DB tools within our administration GUI
Type issues with query variables
DB tools
Anxiety due to "loss" of Oracle Enterprise Manager
Implemented important tools within our own adminstration GUI
Trigram indices
Search for document description/document annotations has "contains" meaning
LIKE '%pattern%' usually means sequential scan
pg_trgm module provides a huge performance gain for such searches
Continuous Archiving and Point-in-Time Recovery
One backup per day
Configurable number of backup generations (PITR)
Creating our own tool for optimal integration into the product
Bugfix-Rollout with our software distribution server
Monitoring of last successful backup timestamp
Using WAL Archiving
Using rsync to make the base backup (potentially inconsistent copy of database files)
Unchanged files don't need to be copied
Before creating a new base backup, the current one is tarred (compressed) away as backup generation
WAL Archives are kept to the point that the oldest available (tarred) generation can be restored
The tool also provides a restore function that allows to restore the current backup or one of the older generations
synedra
healthcare software producer
35+ employees
70+ customers
Robert Lichtenberger
Head of Software Engineering
Stefan Andreatta
CTO
synedra AIM
Fileserver
Application
Database
Fileserver
embedded database
OLTP - 1000s commits/min
billions of rows - TB size
Again: migration tests, performance tests with customers' data

Migration strategy:
set up Postgres server - stop application - upgrade OS - have both databases running - migrate - upgrade application
start and hope ...


... strangely that went according to plan
Little Query Tuning necessary
Fact Based Understanding
Random Sampling
for Clustered Distributions
Index Access
for Large Tables
would we do it again?
+ very low operating costs
+ high performance and reliability
+ enterprise features for all customers

+ non-topic for customers

+ scales from very small to very large systems
+ infrastructure serves needs not licensing peculiarities


... live is definitely better among elephants :-)
How to get
PostgreSQL Wishes granted?
major concern during evaluation (no tuning pack, no query hints) - few actual problems - do not make DBAs stand in for developers
n_distincts get seriously underestimated when values are clustered and tables are large
car A
car B
car C
A - wheels
A - tires
A - windows
A - roof
A - radio
B - wheels
B - tires
B - windows
B - roof
B - radio
C - wheels
C - tires
....
random_page_cost too low for very large tables
if you try hard (and ask questions)
you can get to the source of a problem
See the full transcript