Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Big Data - NoSQL - MongoDB

No description

ted cole

on 5 June 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Big Data - NoSQL - MongoDB

NoSQL - MongoDB Big Data Big Data 90% of the data in the world today has been created in the last two years Data from: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records Big data spans several dimensions: Volume Velocity Variety Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
Convert 350 billion annual meter readings to better predict power consumption Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud.

Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn faster Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.

Monitor 100’s of live video feeds from surveillance cameras to target points of interest
Exploit the 80% data growth in images, video and documents to improve customer satisfaction Big Data Benefits Smarter decisions – Leveraging new sources of data to improve the quality of decision-making.

Faster decisions – Enabling more real-time data capture and analysis to support decision making at the “point of impact,” for example, when a customer is navigating your website or on the telephone with a customer service representative.
Decisions that make a difference – Focusing big data efforts on areas that can provide true differentiation. NoSQL Not Only SQL Key-Value Stores Amazon Big Table Google each row with individual schema Document Databases CouchDB - MongoDB Graph Databases AllegroGraph - Neo4j MongoDB non-relational, next generation operational datastores and databases Horizontaly Scalable
No Joins
No complex transcactions
No schema Characteristics What NoSQL means: JSON style documents (BSON)
Flexible Schemas (with list of values, date and embeded documents
Replication, Auto Sharding
Queries Run in parallel on all shards
Supports Indexing Basic Components The Database server
The interactive Shell
The Sharding Router Performance http://blog.michaelckennedy.net/2010/04/29/mongodb-vs-sql-server-2008-performance-showdown/ Terminology database - database
table - collection
row - document
column - field Flexible Schemas {"author": "ted",
"text": "bla bla"} {"author": "ted",
"text": "blabla",
"tags": ["pao", "basket"]} Example Insert post = {"author": "ted",
"text": "blabla",
"tags": ["pao", "basket"]}


_id : unique key identifier (if not specified drivers will add default, can be set to be any type) Examples Dynamic Queries db.posts.find({author:"ted"}) db.posts.find()
.sort({date: -1}).limit(10) aug_1 = new Date(2010, 7, 1)

db.posts.find({date: {$gt: aug_1}}) Update c={author: "dinos",
date: new Date(),
text: "great post"}

{$push: {comments: c}}) Indexing (B-tree) db.posts.ensureIndex({tags: 1})

db.posts.ensureIndex({comments.author: 1})

db.posts.find({"comments.author": "dinos"}) Replica Sets Replica Sets failover Questions? MongoDB in a nutshell "_id" : ObjectId("5081c97c7833857c5588f336"),
"name" : "mongo",
"type" : "db",
"doc_links" : {
"installation" : "http://docs.mongodb.org/manual/installation/",
"tutorial" : "http://docs.mongodb.org/manual/tutorial/getting-started/",
"reference" : "http://docs.mongodb.org/manual/reference/"
versions : [
{ "v" : "2.0.0", "released" : ISODate("2011-09-11T16:22:17Z"), "stable" : true },
{ "v" : "2.0.1", "released" : ISODate("2011-10-22T03:06:14Z"), "stable" : true },
{ "v" : "2.1.0", "released" : ISODate("2012-02-03T17:54:14Z"), "stable" : false },
{ "v" : "2.2.0", "released" : ISODate("2012-09-24T17:38:56Z"), "stable" : true },
"features" : [
md5 : BinData(5,"nhB9nTcrtoJr2B01QqQZ1g==") from “humongous” Production Use Archiving - Craigslist
Content Management - MTV Networks
Ecommerce - CustomInk
Gaming - Disney
Real-time Analytics - Intuit
Social Networking - foursquare http://www.mongodb.org/about/production-deployments/ SQL - MongoDB Inserts, removes and updates seem instantaneous because none of them waits for a database response" ... "If the server disappears, the client will happily send some writes to a server that isn't there, entirely unaware of its absence. For some applications, this is acceptable" this is a joke, move on to the next frame General Use Cases Bigness
Massive write performance
Fast key-value access
Flexible schema and flexible datatypes
Easier maintainability, administration and operations
No single point of failure Facebook needs to store 135 billion messages a month
At 80 MB/s it takes Twitter, a day to store 7TB so writes need to be distributed over a cluster Specific use cases
Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
Fast response times under all loads.
Real-time inserts, updates, and queries.
Caching. A high performance caching tier for web sites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider.
Real-time page view counters.
User registration, profile, and session data.
Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.
Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.
Full transcript