Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Titan:db

No description
by

Tomas Hruby

on 17 December 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Titan:db

Titan
:db

brief introduction
Motivation
"Titan is designed to support the processing of graphs so large that they require storage and computational capacities beyond what a single machine can provide"
The end!
please ask questions after the practical example
Thank you!
Titan is:
compatible
(HBase, Cassandra, ElasticSearch,...)
integrated
(Rexster, Gremlin, Blueprints,...)
easy to install
(download-unzip-play)
open source

ACID
(with BerkleyDB)
/ BASE
(otherwise)
Titan-Hadoop: why?
Graph derivation (
cousin-graph from father and brother edges
)
Backends
Why backends?
Titan focuses on effective
data modeling
and compact
graph serialization
. Therefore persistence layer utilizes
adapters
.
local server mode
remote server mode
remote server with rexster
embedded mode
HBase
strong consistency
linear scalability
HDFS

Cassandra
continuous availability
elastic scalability
No master\slave
Also other 3rd party adapters
Supports indexing on persisted data
However,
Titan can store up to 2^60 edges and half as many vertices
Graph or vertex-centric indices cannot be removed, only disabled.
Batch loading in Titan is currently slower than batch loading modes provided by single machine databases
BerkleyDB
Single machine (same JVM)
Graph statistics (
counting vertices, number of friends, ...
)
Titan-Hadoop: how?
originates from project
Faunus
(since v0.5.0)
breaks down graph to
adjacency list items
then processes units
in parallel
Limitations
Titan-Hadoop only supports
integer
,
float
,
double
,
long
,
string
, and
boolean
property values (whereas the Blueprints API supports objects)
A vertex and it's incident edges must fit into the memory. Therefore
supernodes
can cause overflow
Indexers
Index string properties (
full-text search
)
Geo
-based indices (within a circle)
Numeric range (age between 18-25)
Blueprints
Provides standardized API for graphs.
Rexster
: server that encapsulated Titan and provides REST access graphs
embedded\external
subset of Titan API
Gremlin
: API for graph traversals
DogHouse
: graph visualization and web-based gremlin shell
Installation
download from https://github.com/thinkaurelius/titan/wiki/Downloads
unzip archive
run
bin/titan.sh
This starts Cassandra, Titan and Rexster in separate processes (all contained in the downloaded archive)
launch doghouse and play :)
http://147.251.253.233:8182/
temporary
Data model
Property graph (same as Neo4j)
Adjacency list (list of node-neighbors)
By default random partitioning
leads to ineffective traversals
can be replaced with own explicit partitioning
Virtual partitions (on the same machine)
the number of partitions cannot be changed without reloading the graph
Edge cut, Vertex cut
Full transcript