Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Integrating PostgreSQL with Logstash for real-time monitoring

The Logstash pipeline is a very common framework that allows system administrators to monitor their distributed infrastructures using ElasticSearch and Kibana. PostgreSQL can now be push log entries into Redis for real-time monitoring.
by

Gabriele Bartolini

on 30 October 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Integrating PostgreSQL with Logstash for real-time monitoring

redislog
Integrate
PostgreSQL
databases in large, complex and distributed systems
Effective monitoring, alerting and management are crucial
Log management
is one important aspect to be considered
A very common scenario for
devops
is to rely on the
ELK stack
for log management and real-time monitoring
ELK = Elasticsearch + Logstash + Kibana
Option 1, DBA approach
Keep PostgreSQL logs separate
Option 2, devops approach
Integrate PostgreSQL logs with every other component of your system
Integrating
PostgreSQL
with
Logstash
for
real-time monitoring

Gabriele Bartolini
2ndQuadrant Italia, Director
@_GBartolini_

Marco Nenciarini
2ndQuadrant Italia, CTO
@mnencia

stderr csvlog syslog
/var/log
grep
post-analysis
(e.g. PgBadger)
stderr csvlog syslog
/var/log
Shipper
Broker
Indexer
Storage
+
Search
Web Interface
Redis
Logstash
Elasticsearch
Indexer
Storage
+
Search
Web Interface
Logstash
Elasticsearch
Classic Postgres and ELK
Sysadmin/DBA Approach
ELK large heterogeneous systems
Tweet, tweet!
Follow us:
@_GBartolini_
@mnencia

@2ndQuad
@2ndquadrant_it

#pgconfeu

#redislog
#logstash
#postgres
#postgresql
#logging

#elasticsearch
#kibana
#elk
Open Source tool for managing events and logs, used for:
data collection
centralisation
parsing
storage and search
About Logstash
https://github.com/elastic/logstash
Storing logs from
Logstash
in
Elasticsearch
allows us to view and analyse them with
Kibana
.
Search server based on Lucene:
JSON objects
RESTful interface
Distributed
About Elasticsearch
https://github.com/elastic/elasticsearch
Analytics and visualisation platform
Built on top of Elasticsearch
About Kibana
https://github.com/elastic/kibana
Beaver
e.g. syslog
stderr csvlog syslog
/var/log
postgres.log
Shipper
Broker
Redis
Beaver
jsonlog
Elog
Extension
PROs:
Robust technologies (syslog, csvlog)
Small delay (production to ES)
Low impact on Postgres transactions
Batch processing
Classic approach
CONs:
Heavier administration/setup
Computational inefficiency
String parsing
Introducing "Redislog"
Goal
Extensibility of PostgreSQL
Emit log hooks (elog.c)
Main concepts:
Plugin library
Ship the event to Redis
Ship the event as a JSON object
Prototype
Logging plugin
Enter the
ELK
pipeline from PostgreSQL through a broker such as Redis (or AMQP)
Logging JSON into Redis
PROs:
Elegant architecture
Computational optimisation
~ Real-time logging
Structured data
Redislog
CONs:
New technology
Redis impact on Postgres transactions
Prolixity of JSON vs CSV
https://github.com/2ndquadrant-it/redislog
PGDG, The PostgreSQL Licence
Redis
Broker
Indexer
Logstash
Elasticsearch
Storage + Search
Architecture with Redislog
Postgres
Source
Redis
Redis
Hiredis library required
Based on Michael Paquier's jsonlog
Support for:
1 Redis server (list, RPUSH)
Full JSON log record (same as CSV)
Log level filtering
Redis connection timeout
Fallback option for local log
Working prototype
Version 0.1 - Sept 15
$ git clone git@github.com:2ndquadrant-it/redislog.git
$ cd redislog
$ make
$ sudo make install
Installation
redislog.host = '127.0.0.1'
redislog.port = 6379
redislog.key = 'postgres'
redislog.min_error_statement = error
redislog.min_messages = warning
redislog.connection_timeout = 1000
redislog.ship_to_redis_only = true
Example of configuration
Reuse backend connection to Redis
Check log level
Build JSON log record
Ship event to Redis (RPUSH)
Write event to local log (if applicable)
redis_log_hook
"{\"@timestamp\":\"2015-09-14T10:38:54.913+0000\",\"user_name\":\"postgres\",\"database_name\":\"postgres\",\"process_id\":21,\"remote_host\":\"172.17.0.26\",\"remote_port\":\"54482\",\"session_id\":\"55f6a495.15\",\"session_line_num\":1,\"command_tag\":\"SELECT\",\"session_start_time\":\"2015-09-14T10:42:29+0000\",\"virtual_transaction_id\":\"2/8\",\"error_severity\":\"ERROR\",\"sql_state_code\":\"42P01\",\"query\":\"select count(*) from pgbench_branches\",\"query_pos\":22,\"application_name\":\"pgbench\",\"message\":\"relation \\\"pgbench_branches\\\" does not exist\"}\n"
Peeking from Redis
shared_preload_libraries = 'redislog'
$ redis-cli
127.0.0.1:6379> lindex postgres -1
Redis
Broker
Indexer
Logstash
Elasticsearch
Storage + Search
Data quality and acceptance test
Postgres
Source
You have a prototype now.
What would your main concerns be?
Data quality
Do not lose any log entry
Global performance
Reduce computational overhead
Reduce delay on delivery to ELK
Our main concerns
Redislog:
1) does not lose any log event
2) reduces logging lag/delay
3) improves global computational efficiency
3 "myths" to be busted
csvlog
Export
=
Docker containers:
Postgres 9.4 (redislog + csvlog)
Redis
Logstash
Elasticsearch
pgbench
Export script from Elasticsearch (Python)
Comparison, with PostgreSQL
Sprint 1: Acceptance test
Lost events: 0
shared_preload_libraries = 'redislog'
redislog.host = 'redis'
redislog.min_error_statement = debug5
redislog.min_messages = debug5
log_min_error_statement = debug5
log_min_messages = debug5
log_statement = all
log_destination = 'csvlog'
logging_collector = on
postgresql.conf
autovacuum = off
fsync = off
synchronous_commit = off
wal_buffers = 16MB
shared_buffers = 512MB
checkpoint_segments = 64
Testing hours: 100+
Sprint 1 was successful.
Criticality: Redis connection.
Move on with Sprint 2: benchmarking.
Redis
Broker
Indexer
Logstash
Elasticsearch
Storage + Search
Performance benchmarking
Postgres
Source
csvlog
Redis
Broker
Indexer
Logstash
Elasticsearch
Storage + Search
Beaver
Scenario 1 - Redislog
Scenario 2 - CSVlog
Scenario 1 containers:
Postgres 9.4 (
redislog
)
Redis
Logstash
Elasticsearch
pgbench
Sprint 2: Scenarios
Scenario 2 containers:
Postgres 9.4 (
csvlog
)
Beaver
Redis
Logstash
Elasticsearch
pgbench
Sprint 2: Benchmarking
For each scenario:
Let pgbench run
Measure lag/delay:
Logstash timestamp - Postgres timestamp
Continuously collect docker stats

Analyse and compare results
Sprint 2: Server
Rental server, Hetzner.de, 44.54 €/month:
Intel Xeon E3-1245V2 (8 cores)
4x RAM 8192 MB DDR3 ECC (32 GB)
2x HDD 3,0 TB SATA Enterprise
Ubuntu 14.04.3 LTS
Docker version 1.8.2
First run
pgbench: no rate limit, 2 hours
1 logstash component
Scenario 2 runs ok
Scenario 1 explodes after <9 minutes
Redis, out of memory
Increasing lag
lag (ms)
1st run
What went wrong?
Events were accumulating in Redis
Logstash was not able to keep up
No drama
Redislog was working fine
Issues:
No Logstash optimisation
No Elasticsearch optimisation
Disclaimer
Our focus at 2ndQuadrant Italy is on PostgreSQL.
We are not (and do not intend to be, now) experts in Logstash and related technologies.

However, we believe that it is
strategic
for PostgreSQL to be able to
integrate
with external tools and technologies, in order to encourage its
adoption
.
"EL" Optimisation
Logstash
:
4 input and 4 output worker threads
8 worker threads for filtering
Redis batch processing (5k events)
Elasticsearch
:
16GB heap
1 shard
Disable merge throttling entirely
"Unbounded tps" runs
pgbench options: -c 8 -j 8 -T 3600

With CSVlog:
2,831 tps, ~1.2M events/min.
Lag time: ~14k secs (~4 hrs over 1 hr)
With Redislog (+ CSVlog **):
1,884 tps (
-33.45%
), ~800k events/min.
Lag time: ~4.8k secs (~1.5 hrs over 1hr)
Events/minute
CSV output
Static user fields
Support for channel data type
Shipping optimisation (queue)
AMQP support (or amqplog?)
Futures
Multiple Redis hosts
Open source plugin
Positive "mythbusting":
Worst case for redislog (prolixity)
Best case for CSVlog
Another blade in your Postgres Swiss Army Knife:
Not for all use cases
Happy to hear your feedback!
Conclusions
Kibana
Kibana
csvlog - events/minute
"Bounded tps" runs
Limit pgbench to 500 tps (with -R)
Average of 200k events/minute

With CSVlog:
Avg lag time: 96.509 ms
With Redislog (+ CSVlog **):
Avg lag time: 3.678 ms (
-96.19%
)
csvlog - lag analysis
redislog - events/minute
redislog - lag analysis
redislog
redislog
redislog
2nd run
redislog Authors
Gabriele Bartolini
Marco Nenciarini
Leonardo Cecchi
csvlog - lag
3rd run
Questions?
Gabriele Bartolini
2ndQuadrant Italia, Director
@_GBartolini_
Marco Nenciarini
2ndQuadrant Italia, CTO
@mnencia
#pgconfeu
#redislog
#postgres #postgresql #elk
https://github.com/2ndquadrant-it/redislog
Thank you, Leo
Leonardo Cecchi
Energy consumption
Experimentation phase
GOAL:
bail out ASAP!
Aim for success. Accept failure!
Proposed plan, 2 sprints:
1) Data quality and acceptance test
2) If ok, proceed with benchmarking
Exploration phase
Myth 1: Redislog does not lose any log event
Myth 2: Redislog reduces logging lag/delay
Myth 2: Redislog reduces logging lag/delay
Myth 2: Redislog reduces logging lag/delay
-12%
-100%
+332%
-7%
-16%
+21%
Myth 3: Redislog improves computational efficiency
JSON vs CSV prolixity
On average, data transfer over network was twice larger with JSON
redislog - lag
Out of our scope
Decision:
bind tx rate with pgbench
move on with myth busting!
Scalability issue down the processing line (Redis + ELK).
Easily fixed by adding more RELK servers.
Multiple Redis servers
Load balancing of Redis servers (shuffle)
JSON customisation (reduce prolixity):
field selection
field naming
Version 0.2
Redis
Redis
Redis
1
2
3
Fallback Redis servers (HA)
Shuffle Redis Hosts
Redis
Redis
Redis
3
1
2
Random hosts order, per backend
Load balancing
redislog.host = '127.0.0.1'
redislog.port = 6379
New configuration
redislog.hosts = 'redis_a, redis_b, redis_c, redis_d:6378'
redislog.shuffle_hosts = on
Is
redislog
just for
ELK
integration?
Redis
Broker
Consumer
Custom script
Elasticsearch
Storage + Search
Architecture without Logstash
Postgres
Source
redislog
Redis
Broker
Indexer
Logstash
Architecture with Postgres
Postgres
redislog
Source
redislog
Redis
Logstash
Elasticsearch
Horizontal scalability (out of scope)
Load balancing in action
Thanks to
#1. No data loss
#2. Computational optimisation
#3. Real-time logging
Part 1 - Log management
Part 2 - Redislog
Part 3 - Experimentation
Part 4 - Futures
www.2ndQuadrant.it - blog.2ndQuadrant.it
Copyright (c) 2015, 2ndQuadrant Italia S.r.l.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
Currently in progress (Oct 2015)
PGConf.eu 2015
29 October
Vienna
Full transcript