Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
by Audrius Galinskis
Transcript of by Audrius Galinskis
History - from wooden counter to Big Data
Picture of Data Scientist/Analyst
Mining, not only gold
DA in daily life
Fred brings home 100kg of potatoes, which (being purely mathematical potatoes) consist of 99 percent water. He then leaves them outside overnight so that they consist of 98 percent water. What is their new weight?
... therefore sometimes it can be as much worth as gold.
Mining - is not for gold only...
Data analytics in POP style
by Audrius Galinskis
is, in general terms, the extraction of knowledge from data.
It has many names...
Define 'Data Scientist' Geeks with gigantic glasses and round bellies.
Data Scientist as Fortune-teller - The Power to predict who will click, buy, lie or die
Demystifying A Data Scientist/Analyst
...Data Analytic(s), Data Analysis, Data Science, Data Mining, Business Intelligence, Predictive Modeling, Statistical Modeling, Advanced Analytics, Big Data.
is, in general terms, the extraction of knowledge from data.
focuses on developing new insights and understanding of business performance based on data and statistical methods.
encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.
is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes.
Short History Lesson
Egypt ~3400 BCE
First Abacus (predecessor of wooden calculator ) 2300BCE
1749, Start of Statistics. At that time it was "Collection and analysis of data about state"
1880 Census in US. It took more than 8 years to complete and publish it. Next Census was about to start in 1890.
Still no Excel, no PowerPoint
1940s, dawn of Computer Age
First ever computer with disk storage. Fifty 24-inch disks for a total capacity of 5 megabytes, weighed 1 ton, and could be leased for $3,200 per month ($27,482 in today’s dollars). IBM described the product as “a stack of disks that stores millions of facts and figures less than a second from management’s reach. Because transactions are processed as they occur, the fresh facts held in a random access memory show business as it is right now, not as it was hours or weeks ago.”
1956: IBM Launches the Disk Drive Industry
At that time Big Data fit into 5MB!
1969, Start of Internet
Digital storage becomes more cost-effective for storing data than paper according to R.J.T. Morris and B.J. Truskowski, in “The Evolution of Storage Systems,” IBM Systems Journal, July 1, 2003.
Introduced one of the first loyalty cards.
Francis X. Diebold presents to the Eighth World Congress of the Econometric Society a paper titled “’Big Data’ Dynamic Factor Models for Macroeconomic Measurement and Forecasting” in which he states “Recently, much good science, whether physical, biological, or social, has been forced to confront—and has often benefited from—the “Big Data” phenomenon. Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology.”
2000-2012 Analytical software market grows from 11$ billion to 35$ billion.
1.7 billion mobile devices sold and 2+ billion people social network add to data explosion.
21st Century's Sexiest Job - 2011-2012 data scientist job posts jump 15,000%.
Computer processor speed
Fast processors and increased data storage capacity was the main key for a quick data science development. To process large quantities of data powerful computers is needed.
Progress in computers lets progress analytical software development.
Data Storage Capacity
For many years data generated by companies wasn't valued - it was by-product, often not collected nor saved.
Now data needs to be collected.
Data needs to be stored and saved.
For many companies data and information - is THE product, which generates revenues
New technologies are like small explosions for data science development - Internet, Social Networks, mobile devices, fitness trackers and etc.
Data - is the most valuable asset. It lets not only to understand customer, but to offer him your services before he understands the need of it.
Development of Data science is driven by
Data was produced slowly. The only data generators was states censuses. Later financial institutions. This was a rule till 20th century when data was started generate in telecommunication, retail, health sector, coming to today - when data is everywhere.
Excel spreadsheet - isn't Data Analytics.
Reporting - isn't Data Analytics.
Accounting - isn't Data Analytics.
Data Scientists are warm, pleasant, individuals like any of us, who are very adept at analyzing data, seeing non-obvious patterns in data, and creating a competitive edge for the business.
DAs serve no purpose and are a nouveau fad that will soon fade.
Machines will solve complex equations, sort huge amounts of data, but you always need human expertise to figure out what to do with it.
DAs are statisticians who failed to make the cut.
While a lot of DAs have an academic background in statistics, many of them consciously to use their theoretical and analytical skills to solve business problems.
Want to be a Data Scientist? Fancy tools are all you need.
Tools are just a tip of iceberg. All world tools will not make you a good DS, if you don't have sound knowledge of statistics, programming and business.
Data mining is usually the first step in data analysis or modeling. In contrary with BI specialist or simple analyst - DA/DS will mine data himself. Many problems could be faced in the very beginning - incomplete data, not correct data, difficult to find or extract.
After data is found - another big step is to prepare data for next step - analysis or modeling.
Most popular tools:
Other fancy words usually coming together - Neural Networks, Logistic regression, Decision Trees, propensity, likelihood and etc.
From elections to sporting events to the stock market, you can find countless opinions on what the future will bring.
Predictive analytics combines techniques from statistics, data mining and machine learning to find meaning from large amounts of data and help you foretell the future. Whether you’re in marketing, compliance, customer service, operations or any other business unit, your data can show where you are – and predict where you’re going.
Lets say we have a small group of customers who exactly the same action and about whom we know everything - their behavior in the past, who they are, what they do.
And of course, we have much bigger customer group, about whom we no only what they did in the past.
Using complex statistical-mathematical methods we can identify what same behavioral patterns had small customer group.
After we identified the behavioral pattern, we can use it on the rest customer group...
... to find the most similar customers to the small group.
The output of such model is the rank ordered customers from the best to worst, from the one who will definitely buy to the one who will never buy.
Is Big Data really Big?
Big Data - its a new Era in Data Science. It gives and will give more opportunities, tools and Data!
Big Data can be defined by the 3 V's rule
It is estimated that over 90% of digital data is unstructured.
A collection of data can be big, but a collection of
Big Data is also unstructured
and not easily searchable.
New information which can be used!
We are facing it each day
And even more... You are the one who let it happen
The best tools are here