Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Operating A Highly Available Messaging System On AWS

designing and operating a messaging system to be able to do background processing on events generated by our frontend

Zsolt Dollenstein

on 4 September 2018

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Operating A Highly Available Messaging System On AWS

Logs, logs, logs Franz I mean, Kafka Home brew Specs HA Messaging Systems on AWS Our experiences message producers consumers string of bytes process that wants to publish stuff process that is interested in stuff very loose definition multiple consumers, same message? reprocessable messages? history? read-only messages? parallell message streams? Why? Engineer self-satisfaction Long-running batch jobs Abstraction Separation of concerns Responsiveness Replaying events Async communication message producers consumers a log entry write to a log file parse the log and acts on it ez pz standard logging apis message format from module ez pz Multiple machines? scribe syslog scp + cron? SSL Disk space routing archiving high availability hadoop + data mining Wait, consumerS? only one log file to read two reasonable solutions: Terribad Ninja punch:
one more time! Your messaging layer can help demultiplex messages scribe: bucket store random hash modulo syslog: google it Each time you want to add or remove a consumer, you have to reconfigure the entire messaging layer. hence the name: Terribad process reading the log Queue actual workers fills up
quickly lots of lost
messages message producers consumers string of bytes publishing bytes to the Kafka API getting a "stream" of bytes Kafka's producer API fast but no HA slow because of HA usually you
only get this scala API
has this too proxy producing producer proxy async Kafka sync + HA producer host Kafka 101 topic Broker id number host:port topic partitions Broker id number host:port this is actually
a directory on
the filesystem it is divided into
fixed size chunks every message here
has a unique offset
starting from 0 files are named by the
offset of their first msg Zookeeper which brokers alive? broker -> topic (broker, topic) -> partition consumer groups simple
producer produce(host, port, topic,
message) complex
producer simple
consumer subscribe(host, port, topic, partition_list) complex
consumer balance partitions between other consumers Entirely up to you encryption, ACL? DIY fully featured API client only in Scala but Scala is awesome Writing the consumer logic is not trivial but not that hard either ... ... DynamoDB ELB Consumers Producers put stuff poll for messages HTTP get stuff message producers consumers list of attributes writing to event store (DynamoDB) polls for msg batches needs message store low latency batch queries strongly consistent highly available DynamoDB can do these in AWS Depends largely on the DB DynamoDB stores attributes JSON (but it shouldn't matter) multiple consumers case-by-case time-slicing shared database gossip batches of msgs configurable batch size process them in parallel too big -> slow too small -> sequential ... ... DynamoDB ELB Consumers Producers put stuff poll for messages HTTP get stuff no problem screwed anyway problem no problem ... Zookeeper Broker ... producer consumer problem problem problem problem no problem screwed anyway problem ... ... producer consumer log forwarder problem no problem no problem screwed anyway (prezi saves) (storage) (backup) OSS logserver by Facebook categories file/network stores fallback reason we don't like to use queues directly log forwarder when it comes back,
traffic is very bursty adding/removing
is a pain solution: alternative channel (logging) much more complex So simple, so beautiful! So many edge cases already had these in place order of events not preserved Conclusion less components build on existing infrastructure engineer self-satisfaction ownership of shared infrastructure? If you are trying something new and it feels very simple, get suspicious and step back to look at the bigger picture.
Full transcript