Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Jedberg Prezi

For RAMP Conference
by

Zane Groshelle

on 11 July 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Jedberg Prezi

Going from one to two is harder
1 > 2 > 3
REDDIT
INFRASTRUCTURE
ARCHITECTURE
REDDIT TIMELINE
2006
APRIL

S3 for logos
2007
SEPTEMBER
S3 for thumbnails
2008
NOVEMBER
EC2 for batch
processing...
WHAT LED REDDIT
Needed an easy way to distribute and upload our logo.
New Servers
New Servers
Didn’t want to rent another cabinet
Didn’t want to buy more servers
MONTHLY PAGE
Reddit Gold is Launched
Why am I here?
2
C
A
Is it necessary to build a scalable architecture from the beginning?
EXAMPLE 1
Why should we learn from
other people’s
mistakes?
EXAMPLE 2
NETFLIX
$130,000.00
$108,000.00
$86,000.00
$64,000.00
$42,000.00
$20,000.00
Mar
May
Jul
Sep
Nov
Jan
Mar
200M
420M
640M
860M
1,080M
1,300M
If it won’t scale, it'll fail.
The key to scaling is
finding the bottlenecks
before your users do
-
Paradox
- Jedberg
Way back in 2005...
They were called back
Two UVA astudents applied for this thing called YCombinator
They were rejected
MONITORING
Imaging and Racking Servers Is a
(sometimes fun)
chore
Reddit moved from self hosting to EC2
EC2 for Overflow
Used openvpn to create a secure link to our datacenter for batch processing
Started by migrating all data
Got a complete stack running on EC2
Long Friday night finishing the migration and “forklifting” the last bits
0M
375M
750M
BENEFITS

Servers:
Cabinet (x3):
Bandwidth:
Support:
DATA CENTER
(per month)
$6K
$15K
$2.5K
N/A
1125M
$13K
$1.5K
$1.1K
$1.2K
$23.5K
Total:
EC2
1500M
(per month)
$16.8K
Servers:
Cabinet (x3):
Bandwidth:
Support:
Total:
29%
CHEAPER!
Based on Amazon public pricing, reddit open source code, and public configuration information
10,000

1,000

100

10

0

0

0

0

0

0
Motivators for moving to EC2
Cost
Outgrew data center
Unpredictable growth
Takeaways
EC2 makes things easier, but isn’t a magic bullet.
The higher network latency and noisy neighbors will be problematic -- expect to work around it.
Scaling on EC2 is a lot like anywhere else, but you need to be more
Webserver or Proxy?
What about event driven and non-blocking web servers?
Good for long connections
More complicated to start, but scales better
Protip
To prevent someone from consuming too much, all resources have per account limits. Keep track of them and get them raised ahead of when you need them. Make sure to catch the exceptions too.
Keep track of those limits!
Mistake
Relying on a single cloud product and expecting it to work as advertised
Bleeding edge in production
Cassandra wasn’t always perfect
No data loss, but it was a pain sometimes
Automate all the things!
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
Mistakes we've made
What is Reddit?
Reddit is an online community
REDDIT
June 23rd, 2005
Advantages to a Service Oriented Architecture
Easier auto-scaling
Easier capacity planning
Identify problematic code-paths more easily
Narrow in the effects of a change
More efficient local caching
Disadvantages to a Service Oriented Architecture
Postgres is still a good database
Offload to the client with Javascript
What else do you need to worry about?
Queues
Locking service
Email
???
1 > 2 > 3
Mistake
Not having enough monitoring and using a system that isn’t “virtualization friendly”.
Need multiple dev teams, or need people to work on multiple services.
Need to come up with a common platform, otherwise work will be duplicated.
Too much overhead for a small team just starting out.
Don’t follow fads
Limits everywhere!
Put a limit on everything.
Make it really really high.
Lower it or raise it as needed
We used Ganglia
Backed by RRD
Makes good rollup graphs
Gives a great way to visually detect errors
Wasn’t friendly to rapidly changing
Going from two to three is hard
Going from one to two is harder
If possible, plan for 3 or more from the beginning.
B
3
A
D
Data is the most important
asset your business will have.
Data Gravity
Coined by Dave McCrory
First described here:
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
What is Data Gravity?
Data Gravity and you
The bigger your dataset, the harder it is to move from anywhere to anywhere
Also, how do you move that data without affecting your running application?
Sql or “nosql”?
Relational
vs.
Non-relational
Mysql, Postgres or something else?
Data schemas
Unless you are really really sure of your business model...
The less schema the better
reddit’s database is literally just keys and values
Expire your data
It’s a lot easier to manage if your data is either gone or in static form
Users will almost never notice
Think of SSDs as cheap RAM, not expensive disk
Database Scaling with Sharding
CODE
You must construct additional Pylons
Picking a framework
What is Pylons?
routing
paste
wsgi
mako
sqlalchemy
c and g
Scaling Pylons
pylons scaling == python scaling
run lots of appservers and make them independent of each other
We built our own caching
We built our own database layer
Would I use Pylons again?
Yes (although it’s called Pyramid now)
Event or thread based?
C is faster than Python (sorry)
filters
discount (markdown)
memcache
Open Source is Good
DATA
SOCIAL ASPECTS OF GROWTH
The Worm
Or, why you should never have your entire team on one airplane.
Provide an API
The business side of things
Running a site that requires user input?
Be one of the most active users
People like to see the founders participate
Moderation, cheating, spam and fraud
If you take user input, and get popular, people will cheat and spam.
If you take money, they will scam people.
Limits will help a lot, as will pattern detection.
Hard coded rules only go so far -- you need learning algorithms.
Let your users do the work for you.
What made reddit successful?
Empowered users
Better software
Community interaction
How does reddit make money?
Sidebox ads
Self-serve ads
Merchandise
reddit gold
marketplace
Reddit Gold
Ask Me Anything
Not only did I run technology for reddit but I also was deeply involved in the business.

Ask me anything about running a profitable social media company.

Getting in touch
Email:
Twitter:
Web:
Facebook:
Linkedin:

jedberg@{gmail,netflix}.com
@jedberg
www.jedberg.net
facebook.com/jedberg
www.linkedin.com/in/jedberg

Sharding
We split our writes across four master databases
Links/Accounts/Subreddits, Comments, Votes and Misc
Each has at least one slave
We avoid reading from the master if possible
Wrote our own database access layer, called the “thing” layer
How it works
Replication factor
Quorum reads / writes
Bloom Filter for fast negative lookups
Immutable files for fast writes
Seed nodes
Why Cassandra?
Fast writes
Fast negative lookups
Easy incremental scalability
Distributed -- No SPoF
Second class users
Logged out users always get cached content.
Akamai bears the brunt of reddit’s traffic
Logged out users are about 80% of the traffic
Queues are your friend
Votes
Comments
Thumbnail scraper
Precomputed queries
Spam
processing
corrections
Sometimes users notice your data inconstancy
Mistake
Not using a consistent key hashing algorithm at first.
Memcachedb
Using md5’d keys made it difficult to rebalance.
It didn’t really have a way to rebalance
Turns out it was pretty slow under high workloads
Solution
We moved to using a consistent key hashing for memcache
We moved to Cassandra, which follows the Dynamo model, which uses a type of consistent hashing
Protip
The environment in a public cloud is inherently more variant (co-tenants, abusive or heavy users, etc)
Make sure your code is written to handle this -- state should be kept somewhere shared and redundant, not on the instance.
Best practices
Keep data in multiple Availability Zones
Avoid keeping state on a single instance
Take frequent snapshots of EBS disks
No secret keys on the instance
Different functions in different Security Groups
Full transcript