Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

Algorithm Literacy

No description
by

Katharina Zweig

on 6 March 2016

Report abuse

Transcript of Algorithm Literacy

You
have found
this beautiful data...
Adjacency network of common adjectives and nouns in the novel "David Copperfield" by Charles Dickens.
And you want to find out
what is the most central word
in "David Copperfield"!
You find a class of algorithms called "centrality indices" that are supposed
to identify the most important nodes
of a network.
Algorithm Literacy

What you need to know
Prof. Dr. Katharina A. Zweig
You find the
betwenness centrality
Wikipedia says:

"It is equal to the number of shortest paths from all vertices to all others that pass through that node.

A node with high betweenness centrality has a large influence on the transfer of items through the network."
The most central word of "David Copperfield" is:
LITTLE
We have been tricked
by
"Algorithmic Folklore"
This is the formula for the betweenness centrality:
WARNING
Before you can download this talk from my harddisk you have to accept the general terms and conditions.
This talk can induce serious hazards to your research and your life as a scientist and a citizen.
Now, let me
get to know you better!
Have you already applied some data analysis method to data and interpreted the resulting values?
Yes
No
Have you already developed a new method of data analysis?
Yes
No
Yes
No
a data analysis method you learned about not in a textbook but in some paper or in the documentation of a software package?
Yes
No
Now, let me introduce myself...
https://networkdata.ics.uci.edu/data.php?id=4
"A description of a method that contains wrong or at least incomplete instructions
on how to interpret
the results of
the method."
"Statistical rituals largely
eliminate statistical thinking
in the social sciences. Rituals are indis-
pensable for identiﬁcation with social groups,
but they should be the subject rather than the procedure of science.
Gerd Gigerenzer: "Mindless Statistics", The Journal of Socio-Economics 33, 587-606, 2004
In network analysis
I see similar "rituals"
Measuring power-laws by plotting a distribution on a log-log plot and looking for a line
Equating power-law degree distributions with the preferential attachment model
Applying centrality indices without arguing the choice
....
How does it happen?
Especially bad, when software is published for ready-use but the method itself is not clearly defined.
Example: Nestedness

Centrality indices
are tied to network flows, i.e., something that uses the network as an infrastructure. [Borgatti2005]
There is no network flow on the abstract network of correlated important nouns and adjectives...
This method cannot be applied to this data.
Okay, let's assume we have a nation-wide cellphone communication network including persons with known terroristic background. Can the betweenness centrality identify them?
Hidden assumptions
Communication takes shortest paths
More importantly:

All persons want to talk to all other persons in the same way (no weights)
Most important (wrong) assumption:

Terrorists use this network
like other people
Borgatti, S. P. Centrality and Network Flow Social Networks, 2005, 27, 55-71
Air Transportation networks
Using DB1B data, we showed that 40% of all possible pairs of airports within the USA are never asked for.
The rest of the pairs are asked for in very different frequencies.
Dorn, I.; Lindenblatt, A. & Zweig, K. A.:
"The Trilemma of Network Analysis".
SNAM 2012, Istanbul, 2012
Algorithm Literacy
Don't use a method if it is not formally defined and open source
Ask your buddy data scientist questions until you are sure you understand the limits of the method and when it is really applicable
What is a model?

A bit of Science Theory
Weisberg’s definition is cautious:

“potential representations
of target systems.”
(2013, p. 171)

Weisberg, M.:
"Simulation and Similarity: Using Models
to Understand the World",
Oxford University Press, 2013
Weisberg (2013) argues
that models are composed of
two things:
Their structure (e.g., a concrete model, a graph, a mathematical formula, a computer simulation, ...)
A construal containing:
an assignment of real-world elements to the structure
fidelity criteria
intended scope
„To generate a target, theorists choose some phenomenon in the world that they wish to study. From the full contents of the phenomenon, they abstract, omitting all but the relevant features of this phenomenon. This process generates the target system.“ (Weisberg2013, p. 172).
The Target System is
a construction by the modeler
Many data analyses contain TWO models
As seen above, many analyses are based on modeling assumptions
as well. For example:
Centrality indices assume certain flow characteristics;
Clustering algorithms assume an underlying homophily between entities
Network motif analysis requires the choice of a null-model
Terrorist identification and cell-phone networks
Why is it a model? For example because:

1) not all kinds of communication are observed;
2) some people may share a phone, others have multiple phones;
Implicit assumption: this communication network resembles the full one
Okay, algorithms in data analysis are important for me as a scientist.
Why are they changing my life as a citizien as well?
Predictive Policing
Software that predicts time and place of future crime.
PredPol says it reduced crime rates by 10 to 40% in many cities.
Future: predict crime rate of individuals
Algorithmic Folklore
Such interdisciplinary, trans-institutional data lines make algorithmic folklore even more likely than in academic circles!
"This software predicts
crime rates"
Data Dependency
Learning algorithms are very much depending on the data they are fed with.
Statistical intuition (and lack thereof) becomes very important to interpret the results.
Science & Society
... needs our interdisciplinary efforts more than ever.
Become literate in the algorithms that we depend on!
Strange(r) Data
Do we need algorithmic leaflets?
Quis custodiet
ipsos custodes?
(who watches the watchmen?)
Tiger Mom Tax
Princeton Review charges customers based on their ZIP code
Asian-dominated regions pay almost double as much as other regions
New Data - New World
Algorithms not yet there
We could show that someone like Facebook can deduce acquaintanceship between non-members.
Horvát, E.-Á.; Hanselmann, M.; Hamprecht, F. A. & Zweig, K. A. One plus one makes three (for social networks) PLoS ONE, 2012, 7, e34740
What I call the “null ritual” consists of three steps (1) set up a statistical null hypothesis, but do not specify your own hypothesis nor any alternative hypothesis, (2) use the 5% signiﬁcance level for rejecting the null and accepting your hypothesis, and (3) always perform this procedure.
I report evidence of the resulting collective confusion and fears about sanctions
on the part of students and teachers,
researchers and editors, as well as
textbook writers."
Please note that it is not the sum of
all shortest paths containing v!
This is the part we are most often aware of: complex networks are an abstract representation to understand a phenomenon of interest.