Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
postgres=# EXPLAIN (COSTS off) SELECT *
FROM pgbench_accounts
WHERE filler LIKE '%a%';
QUERY PLAN
---------------------------------------------
Gather
Number of Workers: 5
-> Parallel Seq Scan on pgbench_accounts
Filter: (filler ~~ '%a%'::text)
(4 rows)
postgres=# EXPLAIN ANALYSE SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON a.bid=b.bid WHERE a.filler LIKE '%a%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..288938.25 rows=1 width=461) (actual time=7218.031..7218.031 rows=0 loops=1)
Join Filter: (b.bid = a.bid)
-> Seq Scan on pgbench_accounts a (cost=0.00..288935.00 rows=1 width=97) (actual time=7218.029..7218.029 rows=0 loops=1)
Filter: (filler ~~ '%a%'::text)
Rows Removed by Filter: 10000000
-> Seq Scan on pgbench_branches b (cost=0.00..2.00 rows=100 width=364) (never executed)
Planning time: 0.476 ms
Execution time: 7218.088 ms
(8 rows)
Time: 7219,480 ms
postgres=# SET max_parallel_degree = 5;
SET
postgres=# EXPLAIN ANALYSE SELECT * FROM pgbench_branches b JOIN pgbench_accounts a ON a.bid=b.bid WHERE a.filler LIKE '%a%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..189937.35 rows=1 width=461) (actual time=2820.992..2820.992 rows=0 loops=1)
Number of Workers: 5
-> Nested Loop (cost=0.00..188937.25 rows=1 width=461) (actual time=2806.913..2806.913 rows=0 loops=6)
Join Filter: (b.bid = a.bid)
-> Parallel Seq Scan on pgbench_accounts a (cost=0.00..188935.00 rows=0 width=97) (actual time=2806.909..2806.909 rows=0 loops=6)
Filter: (filler ~~ '%a%'::text)
Rows Removed by Filter: 1666667
-> Seq Scan on pgbench_branches b (cost=0.00..2.00 rows=100 width=364) (never executed)
Planning time: 0.416 ms
Execution time: 2823.231 ms
(10 rows)
$ EXPLAIN ANALYZE SELECT sum(length(txt)) FROM test;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=165005.37..165005.38 rows=1 width=8) (actual time=730.439..730.439 rows=1 loops=1)
-> Gather (cost=165005.15..165005.36 rows=2 width=8) (actual time=730.401..730.435 rows=3 loops=1)
Number of Workers: 2
-> Partial Aggregate (cost=164005.15..164005.16 rows=1 width=8) (actual time=728.536..728.536 rows=1 loops=3)
-> Parallel Seq Scan on test (cost=0.00..143171.43 rows=4166743 width=31) (actual time=0.012..217.970 rows=3333333 loops=3)
Planning time: 0.177 ms
Execution time: 731.256 ms
(7 rows)
Quando si può parlare di "Big Data"?
Indici & Big Data
Indici
Indici & Big Data
a
range
b
d
a
c
a| |
d| |
b|c|
a| |
d| |
0
1
1
0
b|c|
BitMapHeapScan
...WHERE col<vall AND col>val2;
SeqScan
Block Range INdex
(PostgreSQL 9.5)
QUERY PLAN
---------------------------------------------------------------------------
Result (cost=0.00..84.00 rows=4400 width=44)
-> Append (cost=0.00..84.00 rows=4400 width=44)
-> Seq Scan on parent (cost=0.00..21.00 rows=1100 width=44)
-> Seq Scan on partition1 (cost=0.00..21.00 rows=1100 width=44)
-> Seq Scan on partition2 (cost=0.00..21.00 rows=1100 width=44)
-> Seq Scan on partition3 (cost=0.00..21.00 rows=1100 width=44)
QUERY PLAN
---------------------------------------------------------------------------
Result (cost=0.00..84.00 rows=4400 width=44)
-> Append (cost=0.00..84.00 rows=4400 width=44)
-> Seq Scan on parent (cost=0.00..21.00 rows=1100 width=44)
-> Seq Scan on partition2 (cost=0.00..21.00 rows=1100 width=44)
PRO
Big Data
&
storage
Possibilità di partizionare
i dati logicamente...
...è possibile farlo fisicamente?
dedicare specifici dischi a differenti oggetti del database
+
new created tables
CREATE TABLESPACE increased
LOCATION '/new/mountpoint/';
ALTER DATABASE mydb
SET TABLESPACE increased;
ALTER TABLE the_increasing_one
SET TABLESPACE increased;
Insiemi di campi su cui eseguire GROUP BY e unire i risultati
Estensibile:
Attenzione: query su foreign table possono avere problemi di ottimizzazione differenti
Benchmark: cluster con 16 nodi AWS i2.xlarge
Gestione oggetti esterni al database:
CUBE e ROLLUP sono shortcut per particolari GROUPING SETS
Un problema ignorato a lungo
CREATE MATERIALIZED VIEW pgmatview AS SELECT * FROM pgtab;
Migliorati piani e tempi di esecuzione
Data node:
PostgreSQL 9.5:
CREATE EXTENSION postgres_fdw;
CREATE SERVER pg_srv FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (connstring);
CREATE USER MAPPING FOR user SERVER pg_srv
OPTIONS (user 'username', password 'secret');
CREATE SCHEMA srv_schema;
IMPORT FOREIGN SCHEMA public FROM SERVER pg_srv INTO srv_schema;
Fino alla 9.4: UNION ALL
Global Transaction Manager:
CUBE: tutte le combinazioni possibili dei campi
Coordinatori:
Numerosi aggiornamenti nella 9.6
SELECT l.regione, p.categoria, sum(prezzo) AS totale
FROM vendite v
JOIN luoghi l ON (v.id_luogo=l.id)
JOIN prodotti p ON (v.id_prodotto=p.id)
GROUP BY GROUPING SETS ((l.regione, p.categoria), ())
ORDER BY regione, categoria;
regione | categoria | totale
-------------+---------------+---------
Toscana | cat1 | 104
Toscana | cat2 | 100
Umbria | cat1 | 14
Umbria | cat2 | 55
| | 273
ROLLUP: tutti i campi su cui è stato dichiarato l’operatore, tutti meno l’ultimo, tutti meno gli ultimi due...
SELECT expression
FROM table
TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ]
CREATE TABLE products (
product_no integer,
name text,
price numeric
) DISTRIBUTE BY HASH(product_no);
CREATE TABLE products (
product_no integer,
name text,
price numeric
) DISTRIBUTE BY REPLICATION;
Non solo zucchero sintattico
9.6(?): pushdown JOIN
www.depesz.com
32 core
d
range
b
range
c
range