Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Algorithm Literacy
this beautiful data...
Adjacency network of common adjectives and nouns in the novel "David Copperfield" by Charles Dickens.
And you want to find out
what is the most central word
in "David Copperfield"!
You find a class of algorithms called "centrality indices" that are supposed
to identify the most important nodes
of a network.
What you need to know
Prof. Dr. Katharina A. Zweig
You find the
"It is equal to the number of shortest paths from all vertices to all others that pass through that node.
A node with high betweenness centrality has a large influence on the transfer of items through the network."
The most central word of "David Copperfield" is:
We have been tricked
This is the formula for the betweenness centrality:
Before you can download this talk from my harddisk you have to accept the general terms and conditions.
This talk can induce serious hazards to your research and your life as a scientist and a citizen.
Now, let me
get to know you better!
Have you already applied some data analysis method to data and interpreted the resulting values?
Have you already developed a new method of data analysis?
Have you already implemented a method yourself and made it public?
Have you already applied
a data analysis method you learned about not in a textbook but in some paper or in the documentation of a software package?
Now, let me introduce myself...
"A description of a method that contains wrong or at least incomplete instructions
on how to interpret
the results of
"Statistical rituals largely
eliminate statistical thinking
in the social sciences. Rituals are indis-
pensable for identiﬁcation with social groups,
but they should be the subject rather than the procedure of science.
Gerd Gigerenzer: "Mindless Statistics", The Journal of Socio-Economics 33, 587-606, 2004
In network analysis
I see similar "rituals"
Measuring power-laws by plotting a distribution on a log-log plot and looking for a line
Equating power-law degree distributions with the preferential attachment model
Applying centrality indices without arguing the choice
How does it happen?
Especially bad, when software is published for ready-use but the method itself is not clearly defined.
are tied to network flows, i.e., something that uses the network as an infrastructure. [Borgatti2005]
There is no network flow on the abstract network of correlated important nouns and adjectives...
This method cannot be applied to this data.
Okay, let's assume we have a nation-wide cellphone communication network including persons with known terroristic background. Can the betweenness centrality identify them?
Communication takes shortest paths
All persons want to talk to all other persons in the same way (no weights)
Most important (wrong) assumption:
Terrorists use this network
like other people
Borgatti, S. P. Centrality and Network Flow Social Networks, 2005, 27, 55-71
Air Transportation networks
Using DB1B data, we showed that 40% of all possible pairs of airports within the USA are never asked for.
The rest of the pairs are asked for in very different frequencies.
Dorn, I.; Lindenblatt, A. & Zweig, K. A.:
"The Trilemma of Network Analysis".
SNAM 2012, Istanbul, 2012
Don't use a method if it is not formally defined and open source
Ask your buddy data scientist questions until you are sure you understand the limits of the method and when it is really applicable
Fire your buddy data scientist if (s)he does not ask YOU enough about your data!
What is a model?
A bit of Science Theory
Weisberg’s definition is cautious:
of target systems.”
(2013, p. 171)
"Simulation and Similarity: Using Models
to Understand the World",
Oxford University Press, 2013
Weisberg (2013) argues
that models are composed of
Their structure (e.g., a concrete model, a graph, a mathematical formula, a computer simulation, ...)
A construal containing:
an assignment of real-world elements to the structure
„To generate a target, theorists choose some phenomenon in the world that they wish to study. From the full contents of the phenomenon, they abstract, omitting all but the relevant features of this phenomenon. This process generates the target system.“ (Weisberg2013, p. 172).
The Target System is
a construction by the modeler
Many data analyses contain TWO models
As seen above, many analyses are based on modeling assumptions
as well. For example:
Centrality indices assume certain flow characteristics;
Clustering algorithms assume an underlying homophily between entities
Network motif analysis requires the choice of a null-model
Terrorist identification and cell-phone networks
Why is it a model? For example because:
1) not all kinds of communication are observed;
2) some people may share a phone, others have multiple phones;
Implicit assumption: this communication network resembles the full one
Okay, algorithms in data analysis are important for me as a scientist.
Why are they changing my life as a citizien as well?
Software that predicts time and place of future crime.
PredPol says it reduced crime rates by 10 to 40% in many cities.
Future: predict crime rate of individuals
Such interdisciplinary, trans-institutional data lines make algorithmic folklore even more likely than in academic circles!
"This software predicts
Learning algorithms are very much depending on the data they are fed with.
Statistical intuition (and lack thereof) becomes very important to interpret the results.
Science & Society
... needs our interdisciplinary efforts more than ever.
Become literate in the algorithms that we depend on!
Do we need algorithmic leaflets?
... to avoid serious side effects of your analysis, ask your local data scientist or your local algorithm dealer...
(who watches the watchmen?)
Tiger Mom Tax
Princeton Review charges customers based on their ZIP code
Asian-dominated regions pay almost double as much as other regions
New Data - New World
Algorithms not yet there
We could show that someone like Facebook can deduce acquaintanceship between non-members.
Horvát, E.-Á.; Hanselmann, M.; Hamprecht, F. A. & Zweig, K. A. One plus one makes three (for social networks) PLoS ONE, 2012, 7, e34740
What I call the “null ritual” consists of three steps (1) set up a statistical null hypothesis, but do not specify your own hypothesis nor any alternative hypothesis, (2) use the 5% signiﬁcance level for rejecting the null and accepting your hypothesis, and (3) always perform this procedure.
I report evidence of the resulting collective confusion and fears about sanctions
on the part of students and teachers,
researchers and editors, as well as
Please note that it is not the sum of
all shortest paths containing v!
This is the part we are most often aware of: complex networks are an abstract representation to understand a phenomenon of interest.
What about the