Present Online
Send the link below via email or IM to invite your audience
Start the presentation
- Invited audience will follow you as you navigate and present
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can view together your prezi
- Learn more about this feature in the manual
Download prezi for:
Present offline on a PC or Mac.
- Embedded YouTube videos need an active Internet connection to play.
- Portable prezis are not editable.
Edit and present offline with Prezi Desktop
- To open PEZ file, please download Prezi Desktop
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Not a total experimental failure - an experience report on the Trove architecture
Update handling, load balancing and fault tolerance in Trove.
by kent fitch
on 2 December 2011
Tweet
Transcript of Not a total experimental failure - an experience report on the Trove architecture
Not a total experimental failure
- an experience report on the
Trove architecture Mark Triggs Kent Fitch Scalability
Availability
Directions More data
More integration Trove is big... Not big like Google NLA revenue Google revenue Big, like the local bully Trove
VuFind
LA (PB)
MA (PB)
PA(PB) Each day: 30K visits
400K pageviews ..."mostly newspapers" Updates averaged per day: Gale
Newspaper articles
LA
Pandora webpages
OpenLibrary
Newspaper corrections
Misc OAI sources
HathiTrust
Tags/comments/merges
90k
80k
35k
25k
10k
8k
7k
6k
2k
Books etc - 80GB
Articles - 350GB
Newspapers - 280GB
Pandora - 260GB
People - 7GB 5 Lucene indices 2 mySQL dbs Trove - 500GB
Newspapers - 600GB + 60TB of newspaper image derivatives transactions - Trove UI berry
stick
marley
threepwood
stump
largo update "master"
trove mySQL
slave
slave
slave
slave 6 servers + prod mySQL server servers are pretty similar 64GB memory
12 CPUs
500GB+ ssd
8 on stick and largo no ssd on berry ~$10-$15K each SSD makes all the difference when
querying a large index normal disk random read takes 5-10ms
SSD random read takes 0.1ms normal disk costs 10 cents/GB
fancy disk costs x10 more
SSD costs $1.80/GB
still, just ~15% of server cost index
search
slave local SSD queries responses SAN (disk) index
master updates index
search
slave local SSD index
master Q:
got a new
index for me? A:
yes!
take this... write index 2 copies of each index are distributed across the slaves A slave can fail, and the system still works (3 copies for newspapers) Load balancers direct requests to slaves The Trove "UI" JVM is managed like an index slave take it away, Mark... Scale a bit more Add more index copies (slaves) Split ("shard") index,
then replicate shards Scale a lot more House of cards? DB servers
front end dispatcher
SAN
internal network
external network
power supply
general mayhem Single points of failure.. a bit complicated... ...vast amounts of data compared to LA NLA Harvester CBS pusher NCM Trove UI Pandas Growth 100's of millions of articles?
whole domain web harvest ~100TB?
mass book digitisation?
manuscripts digisation?
ongoing newspapaper & magazine digitisation?
Integration Hathi?
ANDS?
AustLit? ! don't mention OAIster ! Hathi mirror? (8.6m volumes, 7k tons, 400TB)
OAIster?
ANDSs?
e-repository?
ABC archives?
electronic books? http://www.austlit.edu.au:7777/mockups/trove/home.html "We don't need no stinkin'
hardware maintainence" Overhead of "external" load balancing? What is
to become
of
Trove? Can we shard and
replicate forever?
See the full transcript- an experience report on the
Trove architecture Mark Triggs Kent Fitch Scalability
Availability
Directions More data
More integration Trove is big... Not big like Google NLA revenue Google revenue Big, like the local bully Trove
VuFind
LA (PB)
MA (PB)
PA(PB) Each day: 30K visits
400K pageviews ..."mostly newspapers" Updates averaged per day: Gale
Newspaper articles
LA
Pandora webpages
OpenLibrary
Newspaper corrections
Misc OAI sources
HathiTrust
Tags/comments/merges
90k
80k
35k
25k
10k
8k
7k
6k
2k
Books etc - 80GB
Articles - 350GB
Newspapers - 280GB
Pandora - 260GB
People - 7GB 5 Lucene indices 2 mySQL dbs Trove - 500GB
Newspapers - 600GB + 60TB of newspaper image derivatives transactions - Trove UI berry
stick
marley
threepwood
stump
largo update "master"
trove mySQL
slave
slave
slave
slave 6 servers + prod mySQL server servers are pretty similar 64GB memory
12 CPUs
500GB+ ssd
8 on stick and largo no ssd on berry ~$10-$15K each SSD makes all the difference when
querying a large index normal disk random read takes 5-10ms
SSD random read takes 0.1ms normal disk costs 10 cents/GB
fancy disk costs x10 more
SSD costs $1.80/GB
still, just ~15% of server cost index
search
slave local SSD queries responses SAN (disk) index
master updates index
search
slave local SSD index
master Q:
got a new
index for me? A:
yes!
take this... write index 2 copies of each index are distributed across the slaves A slave can fail, and the system still works (3 copies for newspapers) Load balancers direct requests to slaves The Trove "UI" JVM is managed like an index slave take it away, Mark... Scale a bit more Add more index copies (slaves) Split ("shard") index,
then replicate shards Scale a lot more House of cards? DB servers
front end dispatcher
SAN
internal network
external network
power supply
general mayhem Single points of failure.. a bit complicated... ...vast amounts of data compared to LA NLA Harvester CBS pusher NCM Trove UI Pandas Growth 100's of millions of articles?
whole domain web harvest ~100TB?
mass book digitisation?
manuscripts digisation?
ongoing newspapaper & magazine digitisation?
Integration Hathi?
ANDS?
AustLit? ! don't mention OAIster ! Hathi mirror? (8.6m volumes, 7k tons, 400TB)
OAIster?
ANDSs?
e-repository?
ABC archives?
electronic books? http://www.austlit.edu.au:7777/mockups/trove/home.html "We don't need no stinkin'
hardware maintainence" Overhead of "external" load balancing? What is
to become
of
Trove? Can we shard and
replicate forever?





