Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Data Mining: Introduction (Kyoto University)
Transcript of Data Mining: Introduction (Kyoto University)
I love to be interrupted.
I love getting e-mail: email@example.com
Software: http://www.ailab.si/orange (please install!) Evaluation
and deployment Supervised
modelling Visualization Data hospitals
commercial data - market chains, mobile phone operators, web sites
research (bioinformatics, physics...)
other organizations (ecological modelling etc.) What shall we do with all these data?! Data mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in the data (Fayyad)
Data mining is the process of extracting previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. (Zekulin)
Data mining is the process of discovering advantageuos patterns in the data. (John)
Data mining is a decision support process where we look in large data bases for unknown and unexpected patterns of information. (Parsaye)
Janez Demšar (assistant: Andraž Žagar)
Faculty of Computer and Information Science
University of Ljubljana, Slovenia
Computer Science -> Artificial Intelligence -> Machine Learning
medical data mining
genetics and bioinformatics
Area: 20.000 sq. km (Japan: 378.000, 19×Slovenia)
Population: 2 mio (Japan: 127 mio, 64×Slovenia) GapMinder (http://www.gapminder.org/)
Life expectancy vs. Income (http://www.bit.ly/d4hOUy)
#Cell phones per 100 people (http://www.bit.ly/doYy6k)
See also: Hans Rossling's talk at TED (http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html) AIDS prevalence Toy exports Military spending Alcohol consumption Housing prices More:
http://www.dailymail.co.uk/news/article-439315/How-world-really-shapes-up.html Guys with data hire the guys with hammers ;) Department within the company is in charge of making the collected data useful
-- or --
The company (institution, organization) hires (or asks) somebody outside to help getting sense from the data. Either case: two parties, which need understand each other. The "miner" needs to have the basic understanding of the problem/the business and the data
-- and --
The data owner needs to have the basic understanding of the data mining process and methods. These lectures:
a "miner" talks to the "data owners". You will
get the basic knowledge of data mining,
learn to do as much by yourself as you can
understand what can you expect from a professional. Machine
learning Mining Visualization Main goal of data visualization is to communicate information clearly and effectively through graphical means (Friedman, 2008)
fun to use,
puts the patterns right in front of our eyes.
finding the right visualization Statistics Mathematical discipline based on probability calculus.
Features methods for.
observing properties of data (mean, variance, correlation...),
fitting models to data (esp. regression, Bayesian models) to predict values, probabilities of events...,
The emphasis of school curricula is on the latter (sadly!).
Cohen (1994): The Earth is Round (p<0.05) (http://www.ics.uci.edu/~sternh/courses/210/cohen94_pval.pdf)
Gigerenzer (2004): Mindless Statistics (http://courses.umass.edu/bioep740/yr2009/topics/Gigerenzer-jSoc-Econ-1994.pdf)
Null-hypothesis significance testing requires a strict procedure
form the hypothesis in advance,
collect appropriate data,
disprove the null-hypothesis
Using the data in forming the hypothesis
is strictly prohibited.
Null-hypothesis significance testing is
the opposite of data mining. So - we get do to all the work?! Area of artificial intelligence, which tries to make the computers to mimic human reasoning
can handle large amounts of data
uncover patterns in the data
gives understandable (human brain compatible) patterns
the holly grail of ML is to not let the computer know anything and let it discover everything,
ML is not about discovering new knowledge.
interest in models that optimize predictive accuracy, without any regard for understandability (e.g. SVM, the successor to neural networks)
More interested in imitating and following
humans than in assisting and leading them About lecturer Data Collections Heart rate