Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


a little information about NoSQL

No description

injae yeo

on 5 October 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of a little information about NoSQL

CAP , ACID , BASE a little information about NoSQL consistency , availability , partition tolerance CAP theorem what's CAP? three properties desired from distributed system.
Consistency ( same with Atomicity in ACID. )
each operation looks as if it were completed at single instance

every request received by a non-failing node must result in a response in certain time bound.

any failing set of system must not result in incorrect response. What's CAP Theorem? it's for distributed system , not for NoSQL.
but most of NoSQL consider scalability so it's frequently referred when using NoSQL.

it's impossible to guarantee three CAP properties where using Asynchronous Network model. just two of CAP can be guaranteed
Asynchronous Network Model is ...
there is no clock
each node make decision using local data and communication with other nodes. meaning , characteristic , category Why to use NoSQL? interactive software has changed.
application architecture has changed.
but Database architecture has not kept pace
so some tactics is needed to extend the useful scope of RDBMS.
manual sharding
distributed cache You might know ... if data for application does not fit on single server, we could spread data across multi-server manually.

when a shard is full, it's very very hard to re-spread data.
it lose the benefit of RDBMS (something like JOIN).
need to manage schema on multi-server. manual sharding why to normalize data?
to support ACID.
to increase performance on general purpose query.
but it result in limitation of concurrent process.

To solve this , store data as denormalized form. denormalization cache builds on two of the important transition.
high speed data network
cheap RAM price

accelerates only read.
cold cache thrash.
another tier to manage. distributed cache common characteristics
distributed query support.
no schema required. ( = schemaless )
integrated caching. Why to use NoSQL? A , B, C, D , E A, B, C, D B, C , E A and E are frequently updating field. aspect 1 aspect 2 To support frequent change of schema, data simply stored as key-value form.
value is "blob" type and hold any-data.
but it lost most of RDBMS features. 1. assume network is partitioned. we call it as G1 and G2.
2. write operation ( call it a1 ) occurs in G1 and it change the value.
3. step 2 ends with termination. ( because of Availability )
4. read operation ( call it a2 ) occurs in G2.
5. step 4 returns value and ends with termination ( because of Availability)
6. because G2 have no write operation , result of step 4 must not be the value of step 2. Proof by contradiction G1 G2 network partitioned.
no message can be sent and received from each other write of value, not equal to initial value. result of read operation must be initial value. C and P
a2 operation should not be terminated.
G2 ignore all requests
C and A
a2 operation should end with error.
G2 responses error.
A and P
result of a2 operation should be initial value.
G2 returns initial value. two of CAP can be satisfied. eventual consistency , 2 phase commit ACID vs BASE Atomicity
all of the operations in the transaction will complete , or none will.
the database will be in a consistent state when the transaction begins and ends.
the transaction will behave as if it is the only operation being performed upon database.
upon completion of the transaction , the operation will not be reversed. ACID answer is '2 phase commit' ( called as 2PC ) How can we support ACID in distributed server? 1. coordinator requests query to all related DBMS and waits.
2. DBMS execute query in transaction with undo log and redo log.
3. DBMS return the response ( it must be YES or NO )
4 -1. if all response are YES , coordinator send commit command,
4 -2. if some of DMBS response NO, coordinator send rollback command. coordinator DBMS DBMS DBMS it guarantees 'ACID'.
very good characteristics.

it performs with locking algorithm.
may cause performance issue.
lowest performing server may determine total performance.

it is pessimistic algorithm.
if one of DBMS would be down , all query fail. characteristics of 2PC => it causes emergence of BASE as alternative. Basically Available , Soft State , Eventual Consistency

BASE is optimistic. (opposite to ACID)

it accepts partial failure.

Eventually it will be same state with ACID.
but temporary it would be inconsistent state. BASE? convert ACID to BASE Example of BASE step 1 to avoid 2PC , split query. tx start
update data into A in server a
tx end
tx start
insert data into B in server b
tx end if second query would not success , it causes data-inconsistency step 2 to prevent data-inconsistency , use queue tx start
update data into A in server a
enqueue "insert data into B in server b"
tx end
tx start
dequeue msg
execute msg (= insert data into B in server b)
tx end second query may cause 2PC again. step 3 to avoid 2PC again , "idempotence" is introduced.
it produce same result regardless of how many times it is applied. tx start
update data into A in server a
enqueue "insert data into B in server b"
tx end
pick msg
tx start
if end-process in server b doesn't contain msg
execute msg ( = insert data into B in server b )
insert msg into end-process in server b
tx end
dequeue msg lesson of previous example
tx start
update data into A in server a
insert data into B in server b
tx end application layer must consider temporal inconsistency.

since BASE doesn't use locking across multi-server, it relatively get high performance.
Full transcript