Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

sunil kumar

No description
by

Sunil Kumar

on 9 December 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of sunil kumar

abbas gadhia
Contents
What is MongoDB?
MonogDB vs Relational Databases
JSON Introduction
Mongo Shell
Installation
MongoDB is Schemaless
CRUD Operations (using MongoShell)
What is MongoDB?
Non - relational Document Store
MongoDB supports dynamic schema i.e. its schemaless
JSON
{
"firstName" : "Abbas",
"lastName" : "Gadhia",
"hobbies" : [ "cycling", "tennis", "photography"],
"address" : {
"street" : "22 Jump Street",
"city" : "Pune",
"sate" : "Maharashtra"
}
}
____________________________________________________________

{
'firstName' : 'Abbas',
'lastName' : 'Gadhia',
'cityOfBirth' : 'New Delhi'
}
People
Collection
MongoDB vs Relational Databases
Object containing all the information is stored as single entity
{
"firstName" : "Abbas",
"lastName" : "Gadhia",
"hobbies" : [ "cycling", "tennis", "photography"],
"address" : {
"street" : "22 Jump Street",
"city" : "Pune",
"sate" : "Maharashtra"
}
}
{
'firstName' : 'Abbas',
'lastName' : 'Gadhia',
'cityOfBirth' : 'New Delhi'
}
MongoDB doesn't supports JOINS!!!
Quick Question
Which of the following are true about MongoDB?
A. MongoDB is Document Oriented
B. MongoDB supports Joins
C. MongoDB has dynamic schema
D. MongoDB supports SQL

MongoDB Existence among DBs
JSON Introduction
JSON DataTypes
1. String
2. Number
3. Boolean (true/false)
4. null
5. Array [ value1, value2, value3 ]
6. Dictionary / Object { key1 : value1, key2 : value2 }
www.json.org
BSON DataTypes
1. ObjectID
2. Date/Timestamp
{
"fullName" : "Abbas Gadhia",
"randomNumber" : 123,
"likesMongoDB" : true,
"school" : null,
"hobbies" : [ "cycling", "tennis", "photography"],
"address" : {
"street" : "22 Jump Street",
"city" : "Pune",
"sate" : "Maharashtra"
}
}
MongoShell
MongoDB Installation
As a standalone deamon
As a service
MongoDB is Schemaless
Sample Blog - Post Collection
Sample Post Collection of Blog
{
"_id" : ObjectId("5143ddf3bcf1bf4ab37d9c6d"),
"body" : "We the People of the United States, in Order to form a more perfect Union, establish Justice...",
"permalink" : "cxzdzjkztkqraoqlgcru",
"author" : "machine",
"title" : "US Constitution",
"tags" : [ "january", "mine", "modem", "literature", "saudi arabia", "rate", "package", "respect", "bike", "cheetah" ],
"comments" : [
{
"body" : "Lorem ipsum dolo...",
"email" : "ZoROirXN@thUNmWmY.com",
"author" : "Gisela Levin"
},
{
"body" : "Lorem ipsum dolor sit amet, conse...",
"email" : "eAYtQPfz@kVZCJnev.com",
"author" : "Kayce Kenyon"
}
],
"date" : ISODate("2013-03-16T02:50:27.874Z")
}
Setting Path variable: PATH="C:\Abbas\other\MongoDB"

show dbs
use test
show collections


CRUD
Create
Read
Update
Delete
Create / Insert
Ques:
Insert a doc into 'fruit' collection with attribute:

name being apple
color being red
shape being round
Read
mongo
use gradesdb
db.grades.find().pretty()
it
Find
I
nequality ON Strings
use test
db.people.insert( { name: "Jones", age: 35, "profession": "baker" })
db.people.insert( { name: "Smith", age: 25, "profession": "musician" })
db.people.insert( { name: "Alice" })
db.people.insert( { name: "Bob" })
db.people.insert( { name: "Fred" })
db.people.insert( { name: "Edgar" })
db.people.insert( { name: "Dave" })
db.people.insert( { name: "Charlie" })
db.people.insert( { name: 42 })

db.people.find({name: { $gt: "B", $lt : "D" } })
db.people.find({name: "Jones"}, { name: true, _id: false}) //
projection
$regex, $exists, $type
Question:
What does it do??

db.scores.find( { score: {$gt: 50}, score: {$lt: 60} } );

Find all docs with score between 50 & 60
Find all docs with score greater than 50
Find all docs with score less than 60 - correct
None of above
Quering inside Arrays
db.accounts.insert({ name: "Howard", favorites: [ "fruits", "pizza" ] })
db.accounts.insert({ name: "George", favorites: [ "ice cream", "fruits" ] })
db.accounts.insert({ name: "John", favorites: [ "pizza", "cheese" ] })
db.accounts.insert({ name: "Avril", favorites: [ "fruits", "pizza", "cheese" ] })
Question:
which will be matched by the query: db.products.find({tag: "shiny"})
1. {_id:0, tag: ["awesome","shiny"]}
2. {_id:1, tag: "shiny"}
Cursors

cur = db.people.find(); null;
cur.hasNext()
cur.next()
while(cur.hasNext()){ printjson(cur.next()); }

cur = db.people.find(); null;
cur.limit(5); null;
cur.sort( { name: -1 } ); null;
while(cur.hasNext()){ printjson(cur.next()); }

cur.sort( { name: -1 } ).limit(3).skip(2); null;
Count

db.people.count()
db.people.find().count()
db.grades.find({score:{$gt: 50}}).count()
MongoDB doesn't supports
multi-document
TRANSACTIONS!!!
Instead store pre-joined data in Mongo Collection
Users collection
Posts collection
Sessions collection
Query Operator -
$gt, $lt, $gte, $lte
db.grade.find( { score: { $lt : 99, $gte : 90 } }).count()
db.grade.find( { score: { $lt : 99, $gte : 90 }, student_id : 30 })
db.people.find({ profession: {$exists: true} })
db.people.find({ profession: {$exists: false} })

db.people.find({ name: {$type: 2} }) // returns strings
db.people.find({ name: {$type: 1} }) // returns number

// not optimizable
db.people.find({ name: {$regex: "a" }})
db.people.find({ name: {$regex: "e$" }})

// optimized regex
db.people.find({ name: {$regex: "^A" }})
$or
db.people.find({ $or:[ {name:{$regex:"e$"}} , {age:{$exists:true}}] })

$and
db.people.find( { $and: [ { name: { $gt : "C" } }, { name : { $regex : "a" } } ] } )
db.people.find( { name: { $gt: "C", $regex: "a" } } )
db.accounts.find({ favorites: "pizza" })
db.accounts.find({ favorites: "pizza", name: {$gt : "H"} })
db.accounts.find({ "favorites.0": "pizza" })
$in & $all

Query: find favorites where "fruits" , "pizza"

db.accounts.find({ favorites: { $all: ["pizza", "fruits"] } })

db.accounts.find({ name: { $in : [ "John" , "Avril" ] } })
db.accounts.find({ favorites: { $in : [ "pizza" , "ice cream" ] } })
Quering inside dictionaries / JSObjects
dot notation

{
name: "sunil",
email: {
work: "sunil.kumar28@searshc.com",
personal: "mail@sunilkumar.in"
}
}


Exact match
db.users.find({email: { work: "sunil.kumar28@searshc.com", personal: "mail@sunilkumar.in" } })
db.users.find({email: { personal: "mail@sunilkumar.in", work: "sunil.kumar28@searshc.com" } })
// not work representation in BSON doc will be different


Not Exact match
db.users.find({ email: { personal: "mail@sunilkumar.in" } }) // not work
db.users.find({ "email.personal": "mail@sunilkumar.in" })
Contents Continued...
CRUD in detail (using RoboMongo)
Setup eclipse with maven & mongo driver
BasicDBObject
QueryBuilder
Importing grades data


mongoimport
--db gradesdb -c grades < "C:\Sunil\training\MongoDB\grades.839101a18c6f.json"

Updating (wholesale updation)

db.people.update({name: "Smith"}, {name: "Thompson", salary: 50000})
Updating (partial updation)

$set, $inc, $unset
db.people.update( {name:"Alice"}, { $set: {age: 23} } )
db.people.update( {name:"Alice"}, { $inc: {age: 1} } )
db.people.update( {name: "Jones"}, { $unset:{ profession: 1 } } )
Updating arrays
$set, $push, $pop, $pushAll, $pull, $pullAll, $addToSet

db.arrays.insert( { _id: 0, a:[ 1,2,3,4 ] }); db.arrays.find();

db.arrays.update({_id:0}, { $set: { "a.2" : 5 }}); db.arrays.find();
db.arrays.update({_id:0}, { $push: { "a" : 6 }}); db.arrays.find();
db.arrays.update({_id:0}, { $pop: { "a" : 1 }}); db.arrays.find();
db.arrays.update({_id:0}, { $pop: { "a" : -1 }}); db.arrays.find();

db.arrays.update({_id:0}, {$pushAll: { a: [ 7,8,9 ] }}); db.arrays.find();
db.arrays.update({_id:0}, {$pull: { a: 5 }}); db.arrays.find();
db.arrays.update({_id:0}, {$pullAll: { a: [ 2,4,8] }}); db.arrays.find();
db.arrays.update({_id:0}, {$addToSet : { a : 5 } }); db.arrays.find();
db.arrays.update({_id:0}, {$addToSet : { a : 5 } }); db.arrays.find();
Upsert
db.people.find()

db.people.update({name: "Alice"}, {$set: {age:26}}); db.people.find();
db.people.update({name: "George"}, {$set: {age:26}}, {upsert: true}); db.people.find();
Multi - doc updating

db.people.update( {}, { $set: { title: "Dr." } })
db.people.update( {}, { $set: { title: "Dr." }}, {multi: true})
Removing docs

db.people.remove( {name: "Alice"} ); db.people.find();
db.people.remove( {name: { $gt: "D" }} ); db.people.find();

db.people.remove(); db.people.find(); // sequential remove
db.people.drop(); db.people.find(); // empties space directly
db.getLastError()

db.people.insert( {_id: "Smith", age: 30} ); db.people.find();
> db.people.insert( {_id: "Smith", age: 30} );
E11000 duplicate key error index: test.people.$_id_ dup key: { : "Smith" }

db.getLastErrorObj()
Document Representation

BasicDBObject doc = new BasicDBObject();
doc.put("userName", "jyemin");
doc.put("birthDate", new Date(234832423));
doc.put("programmer", true);
doc.put("age", 18);
doc.put("languages", Arrays.asList("Java", "C++"));
doc.put("address", new BasicDBObject("street", "20 Main")
.append("town", "Westfield")
.append("zip", "56789"));
Creating query

DBObject query = new BasicDBObject("x", 0)
.append("y", new BasicDBObject("$gt", 10).append("$lt", 90));

{ x : 0, y : { $gt : 10 , $lt : 90 } }
QueryBuilder builder = QueryBuilder.start("x").is(0)
.and("y").greaterThan(10).lessThan(70);
{ x : 0, y : { $gt : 10 , $lt : 90 } }
BasicDBObject query = builder.get();
Query Builder
Contents Continued...
Setup eclipse, maven & mongodriver
BasicDBObject
QueryBuilder
MongoClient
Quering mongo using mongo-java-driver
Document Representation

BasicDBObject doc = new BasicDBObject();
doc.put("userName", "jyemin");
doc.put("birthDate", new Date(234832423));
doc.put("programmer", true);
doc.put("age", 18);
doc.put("languages", Arrays.asList("Java", "C++"));
doc.put("address", new BasicDBObject("street", "20 Main")
.append("town", "Westfield")
.append("zip", "56789"));
QueryBuilder builder = QueryBuilder.start("x").is(0)
.and("y").greaterThan(10).lessThan(70);
{ x : 0, y : { $gt : 10 , $lt : 90 } }
BasicDBObject query = builder.get();
Query Builder
Creating query

DBObject query = new BasicDBObject("x", 0)
.append("y", new BasicDBObject("$gt", 10).append("$lt", 90));

{ x : 0, y : { $gt : 10 , $lt : 90 } }
MongoClient
client =
new

MongoClient
();

DB
db = client.
getDB
("
course
");

DBCollection
collection = db.
getCollection
("
findTest
");
Contents
Replication
Introduction to Replication
Replica set election
Rollback
Write Concerns
Read Preference
Sharding
Horizontal scalability
Implications of sharding
Choosing a shard key
Replication

It is a technique for increasing a degree of fault tolerance
Data is replicated among multiple nodes ASYNCHRONOUSLY

Sharding

Sharding is a way by which we scale out in mongo
It allows to split a collection up amongst multiple shads/instances
Introduction to Replication
availability
fault tolerance
If primary goes down
remaining nodes (2 nodes) will perform election
to elect a primary you have to have a strict majority of nodes
One of the secondary becomes primary
If primary node comes back up
It will rejoin as secondary
Replica Set = One Primary + secondaries
If you have less than 3 nodes, you won't be having a majority of the replica set. So there will be no primary node.

Because of no primary node, application will be blocked from writing
Replica set elections
Types of replica set nodes
Regular
- has data, can be primary, can vote
Arbiter
- has no data, cannot be primary, only for voting
Delayed
- disaster recovery node, can vote, cannot be primary, has data
Hidden
- cannot be primary, can vote, has data
Write concerns

single primary at a time
write will go to primary
reads can be configured from secondary
with reads & writes going to primary, data is not stale
may read a stale data from secondaries
during failover no writes will be there because their is no primary
Oplog

special capped collection
secondary will get oplog updates from primary based on TimeStamp
RollBack
Question
: what happens if a node comes back up as a secondary after a failover and the oplog on the primary has looped

The new node stays offline ( does not rejoin the replica set )
A rollback will occur
The entire dataset will be copied from the primary
Write Concerns
getLastError defaults

W = 1 RAM
j = true Journal
fsync = true
Question
: what are the w & j settings required to guarantee that an operation (CUD) has been persisted to disk?

w = 0, j = 0
w = 1, j = 1
w =2 , j = 0
Network Errors
Read Preference
primary
secondary
secondary preferred
primary preferred
nearest
Sharding
Horizontal scaling
Vertical Scaling
Sharding Architecture

Shards
Shards + Replica sets
mongos
mongod
config servers
shard key
If query doesn't include shard key
- scatter gather
If query includes shard key
- directly transferred to shard
Question
: If shard key is not included in a find operation and there are 3 shards, each one is a replica set with 3 nodes. How many nodes will find query hit?

1
3
9
6
Implications of sharding
every doc should include the shard key
shard key is immutable
index that starts with the shard key
No shard key means -> scatter gather
Choosing a shard key
there is sufficient cardinality
Hot spotting -> avoid these
is immutable
Set up temp machines before applying it to prod
Contents
Indexes
Multi Key Indexes
Unique Indexes
Duplicate Keys
Sparse Index
Explain
Index Size
Hint
Aggregation Framework
$limit
$skip
$match
$sort
$project
$group
$unwind
Case Studies
FourSquare
Real Time Pricing Application
Indexes
Multi Key Indexes
db.multikey.insert({ tags: [ "cycling" , "tennis" , "football" ] , categories: [ "sports" , "hobbies" ] })

db.mutlikey.ensureIndex({ tags: 1 })
db.multikey.ensureIndex({ categories: 1 })
//error cannot have two multi key indexes
db.multikey.ensureIndex( { tags: 1, categories: 1 } )


/* Similarly */
db.multikey.ensureIndex({ a: 1, b: 1})
db.multikey.insert({ a: [1,2,3] , b: [ "x", "y", "z" ] })
Unique Indexes

db.unique.insert({ a: 1, b: 1 })
db.unique.ensureIndex({ a: 1}, {unique: true})
db.unique.
getIndexes
()

db.unique.insert({ a: 1, b: 2 })
db.unique.insert({ a: 2, b: 1 })
db.unique.insert({ a: 2, b: 2 })
Duplicate Keys

db.dups.insert({ a: 1, b: 2, c: 3 })
db.dups.insert({ a: 1, b: 5, c: 6 })
db.dups.insert({ a: 2, b: "x", c: "y" })

db.dups.find()

db.dups.ensureIndex({ a: 1 }, {unique: true, dropDups: true})

db.dups.find()
Sparse Index

db.sparse.insert({ a: 1, b: 1, c: 1 })
db.sparse.insert({ a: 2, b: 2 })
db.sparse.insert({ a: 1, b: 1 })

db.sparse.ensureIndex({ c: 1 }, {unique: true,
sparse
: true})
db.sparse.find()

db.sparse.insert({ a: 3, b: 5 })
db.sparse.insert({ a: 3, b: 5, c: 1 })
db.sparse.find()
Explain

db.testdata.find().
explain
()
db.testdata.find({ x: 80 }).
explain
()

Index Size

db.testdata.
totalIndexSize
()
db.testdata.
stats
()

Hint

db.testdata.find({ x: { $gt: 80 } }).
hint
({ y: 1 }).explain()
db.testdata.find({ x: { $gt: 80 } }).
hint
({ x: 1 }).explain()
db.testdata.find({ x: { $gt: 80 } }).
hint
({ $
natural
: 1 }).explain()
______________________________________________________________________________

AGGREGATION FRAMEWORK
______________________________________________________________________________
$limit

db.zip.aggregate([
{ $limit : 5 }
])
$skip
db.zip.aggregate([
{ $skip : 2 }
])

$match
db.zip.aggregate([
{ $match : { state: "NY" } },
{ $limit : 3 }
])

$sort
db.zip.aggregate([
{ $match : { state: "NY" } },
{ $sort : { pop : -1 } },
{ $limit : 3 }
])
$project
db.zip.aggregate([
{ $match : { state: "NY" } },
{ $sort : { pop : -1 } },
{ $project :
{
_id : 0,
zip_code: "$_id",
city : { $toLower : "$city" },
population : "$pop"
}
},
{ $limit : 3 }
])
{
pop : 267490,
state : "NY",
loc : [
log : -1.64323454712,
lat : 15.67632234343
],
_id : 411014,
city : "NEW YORK"
}
$group

/* with top 3 city based on population in New York */

db.zip.aggregate([
{ $match : { state: "NY" } },
{ $project :
{
_id : 0,
zip_code: "$_id",
city : { $toLower : "$city" },
population : "$pop"
}
},
{ $group :
{
_id : "$city",
population : { $sum : "$population" }
}
},
{ $sort : { population : -1 } },
{ $limit : 3 }
])
db.testdata.aggregate([
{ $group:
{
_id : null,
minX : { $
min
: "$x" },
maxY : { $
max
: "$y" },
avgX : { $
avg
: "$x" },
avgY : { $avg : "$y" },
avgZ : { $avg : "$z" }
}
},
{ $project :
{
"_id" : 0,
minimumX : "$minX",
maximumY : "$maxY",
averageX : "$avgX",
averageY : "$avgY",
averageZ : "$avgZ"
}
}
])
$min, $max, $avg
{
_id : ObjectId("8921374"),
index : 52,
x : 34,
y : 87,
z : 12
}
$unwind
db.unwind.insert({ person : "A", likes: ['cycling', 'music', 'sleeping'] })
db.unwind.insert({ person : "B", likes: ['swimming', 'sleeping', 'reading'] })
db.unwind.insert({ person : "C", likes: ['cycling', 'swimming', 'sleeping'] })

db.unwind.aggregate([
{ $unwind: "$likes" }
])

db.unwind.aggregate([
{ $unwind: "$likes" },
{ $group:
{
_id: "$likes",
totalPersons: { $sum : 1 }
}
}
])
Check Current Operation running

db.
currentOp
()

db.
killOp
(233221)
FourSquare
Mobile App that give suggestions, on basis of where yours friends have been
5 Millions checkins / day
Checkins collection was having 2.5 billion documents last year
Use short key names in checkins collection
2 - 3 indexes on checkin collection & less than 10 indexes on other collections
Uses geo - spatial indexing
Uses hints so as to be sure they are gonna use right index
[2009]
Database story
First built on
MySQL
Then moved to
Postgres
to speed up
The moved to
MongoDB
for sharding & speed in 2010
Moved to
SCALA
for its features
WRITES
goes to MySQL & MongoDB but slowly shifted
READ
to MongoDB from MySQL
You'll get a compile time error in SCALA if you don't use an index in your queries
Problem
:- In Replication, mongo didn't know what to do if the server is too slow
Real Time Pricing Application
Questions ???
Full transcript