Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
A bird's nest:A primer on FlockDB & Gizzard
Transcript of A bird's nest:A primer on FlockDB & Gizzard
It is a cross between all three and much more. Think of it as a sort of cocktail party Small disclaimer -
I stole that description from another Prezi :) You don't know everyone there. But you can talk to anyone. You can listen to anyone. And you are exposed to a larger number of people
than you would be if all you were doing was... Texting... Chatting... or browsing a discussion forum. So how is Twitter different than Facebook? Completely different.
On Facebook, in order to communicate you must be friends with the other person. This creates a very simple relationship... Either you're friends with a person or not. But on Twitter relationships are asymetrical
which means there are now four different
forms of communication You follow a person He follows you You both follow each other Neither of you follow each other In fact, to me, that sort of describes how
things are in real life. Maybe thats why Twitter has become huge Twitter stores many graphs of relationships between people: who you're following, who's following you, who you receive phone notifications from, and so on. Instead of requiring each friendship to be requested and confirmed,
you can build one-way relationships by just following other people. There's also no limit to how many people
are allowed to follow you, so some people
have millions of followers (like @aplusk),
while others have only a few. FlockDB & Gizzard To deliver a tweet, we need to be able to look up someone's followers and page through them rapidly.
But we also need to handle heavy write traffic, as followers are added or removed, or spammers are caught and put on ice.
And for some operations, like delivering a @mention, we need to do set arithmetic like "who's following both of these users?" These features are difficult to implement in a traditional relational database. Well, conidering the following constraints: Write the simplest possible thing that could work. Use off-the-shelf MySQL as the storage engine, because they understand its behavior — in normal use as well as under extreme load and unusual failure conditions. Give it enough memory to keep everything in cache. Allow for horizontal partitioning so we can add
more database hardware as the corpus grows. Allow write operations to arrive out of order or
be processed more than once.
(Allow failures to result in redundant work rather than lost work.) FlockDB is a database that stores graph data, but it isn't a database optimized for graph-traversal operations. Instead, it's optimized for very large adjacency
lists, fast reads and writes, and page-able set arithmetic queries. It stores graphs as sets of edges between nodes identified by 64-bit integers For a social graph, these node IDs will be user IDs, but in a graph storing "favorite" tweets, the destination may be a tweet ID. Each edge is also marked with a 64-bit position, used for sorting (Such as a timestamp). The edges are stored in both a forward and backward direction, meaning that an edge can be queried based on either the source or destination ID. So how do we deal with scale? Taking into account that by simplifying
the data structure to a single and large table,
holding adjacency lists which describe relationships,
we still need a solution for the growing number of IDs. The solution is sharding.
And for that Twiter developed Gizzard. Gizzard operates as a middleware networking service. It sits “in the middle” between clients (In our case, Flapps) and the many partitions and replicas of data. The app servers calledFlapps, which are written in Scala, are stateless, and are horizontally scalable.
Twitter can add more as query load increases, independent of the databases.
Flapps expose a very small thrift API to clients, though Twitter have written a Ruby client with a much richer interface. Gizzard is designed to replicate data across any
network-available data storage service. Gizzard handles partitioning by mappings ranges of data to particular shards. We have: Application servers called Flapps A sharding framework called Gizzard A datastore layer on MySQL by Gidi Morris (@chekofif)