Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Big Data Analytics: Telco domain
Transcript of Big Data Analytics: Telco domain
In real life, most data are Big
Data Avalanche (Moore’s law of data)
We are now collecting and converting large amount of data to digital forms
90% of the data in the world today was created within the past two years.
Amount of data we have doubles very fast
Think about a day in your life?
Use case 2 : Targeted marketing
Collect data and build a model on
What user like
What he has brought
What is his buying power
Giving personalized deals
Where does it originate from?
Sensor data (IoT)
What is Big Data? 3V's
(how data flow, at high rates)
(large amounts of data gathered)
(various degrees of structure)
What is the best road to take?
Would there be any bad weather?
How to invest my money?
Are my servers running healthily?
Should I take that loan?
Is there a way to do this faster?
Which products should I buy?
There are many decisions that you can do better if only you can access the data and process them.
Web does millions of activities per second,
and so much server logs are created.
Social networks e.g. Facebook, 800 Million active users, 40 billion photos from its user base.
There are >4 billion phones and >25% are smart phones.
Observational and Sensor data– Weather Radars, Balloons
hSenied Mobile systems
15m API calls
processed per day
send per second
Internet of Things (IoT)
Currently physical world and software worlds are detached
Internet of things promises to bridge this
It is about sensors and actuators everywhere
In your fridge, in your blanket, in your chair, in your carpet.. Yes even in your socks
What can we do with Big Data?
: 1% saving in Airplanes and turbines can save more than 1B$ each year (GE talk, Strata 2014). Sri Lanka’s total export 9B year
: Weather, Disease identification, Personalized treatment
: Most high tech work are done via simulations
Why Big Data is hard?
How to store?
Assuming 1TB of Disk. It takes 1000 computers to store a 1PB
How to move?
Assuming 10GB network, it takes 2 hours to copy 1TB, or 83 days to copy a 1PB
How to search?
Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277CPU days to process a 1TB and 785 CPU years to process a 1 PB
How to process?
Convert algorithms to work in large size
Create new algorithms
Why Big Data is hard?
That handles lots of data
Running complex logic
This pushes us to frontier
of Distributed Systems
More data does not mean there is a simple model
Some models can be complex as the system
Making Sense of Data
To know (what happened?)
Basic analytic + visualizations (min, max, average, Standard Deviation, distribution)
Interactive drill down
Real-time analytic to track things fast enough and Alerts, Searches(web Search, graph search)
To explain why (Why it happened?)
Data mining, classifications, clustering Understanding underline patterns from data
To forecast (What will happen?)
Neural networks, decision models , Electricity demand forecast, Targeted marketing
Use case 1: Travel
Collect traffic data + transportation data from sensors
Build a model (e.g. Car Following Models)
Predict the traffic. Possibilities on congestion
E.g. divert traffic, adjust traffic lights
Use case 3: Subscriber Churn Prediction
Use case 4: Ad Insertion Engines
Use case 1: Travel
We are at hMS preferred
platform provider for telco
No of Transactions generated by a Sri Lankan Mobile Operator is nearly
No of Transactions Per Day = 50M
For a year => 50*365 = 18.25 Billion Transactions
If a single transaction is 3KB, then
Total Data size per
50,000,000 * 365 * 3/(1024*1024*1024) =
hSenid Mobile Big Data Platform
Can receive events via multiple data sources as Files(Binary, ASCII, CSV etc..), XML, External DWH, HTTP etc..
Retrieve via remote location (FTP,SFTP)
Event collection is highly optimized and scalable (500K events collection per Second)
Default Adapters available and you can write custom adapters.
How to store the data?
Relational databases (Scaling is the challenge)
Block data stores -> HDFS
Column oriented -> HBase, Cassandra
Document based -> MongoDB, CouchDB
In-Memory -> VoltDB
How to analyze data?
: Schedule data processing jobs and receive the processed data later
: The queries are executed and the results are retrieved instantly
Batch Processing with Hadoop/Hive : Scheduled campaigns
Apache Hadoop: Map/Reduce processing system and a distributed file system
Batch processing - Data Warehouse
Apache Hive - Hadoop based framework for working on large scale data stores with SQL-like queries
First introduced by Google, and used as the processing model for their architecture
Implemented by open source projects like Apache Hadoop and Spark
Users writes two functions: map and reduce
The framework handles the details like distributed processing, fault tolerance, load balancing etc.
Widely used, and the one of the catalyst of Big data
Process data as they are
received in streaming fashion
Very fast output
Lots of events (few 100k to millions)
Use case : hMS LAP
Scheduled Based Analysis
Giving a reward for Gold Profile, Teenagers who is calling more than 5 times per month to SAARC countries during week days
Real Time Analysis
When ever Reload event received with the amount > Rs.100 and the user is a Pizza Lover and During last 3 months traveled to Kandy for more than 5 times and stayed with the network for more than 1 year will receive Rs. 2000 worth of Pizza Hut Voucher from Kandy City Center.
hMS Loyalty Analytic Platform
Dashboard and Last Mile : Visualization
– To end user
– To decision takers
– To scientists
Web Visualization Libraries
eg: D3, Flot etc
A clustering problem with 1 billion 2-dimensional data points, and 1000 clusters, can be processed in less than 30 seconds per iteration with a GPU, compared to about 6 minutes per iteration with our highly optimized CPU version on 8 cores.
“Clustering Billions of Data Points Using GPUs by HP Labs”
Significantly faster for some applications due to their parallel nature.
Invasion of privacy
Data might be used for unexpected things
Data likely to used for control (e.g. governments)
Speed (e.g. targeted advertising, reacting to data)
Extracting semantics and handling multiple representations and formats
Security Data ownership, delegation, permissions,and Privacy
Making data accessible to all intended parties,from anywhere, anytime, from any device,through any format
Integration of big data technologies into enterprise landscape
Leveraging cloud computing with
big data storage and processing.
Sample CDR file >>