Wir stellen vor:
Ihr neuer Präsentationsassistent.
Verfeinern, verbessern und passen Sie Ihre Inhalte an, finden Sie relevante Bilder und bearbeiten Sie Bildmaterial schneller als je zuvor.
Suchtrends
Vanja Subotić,
Institute for Philosophy, University of Belgrade
vanja.subotic@f.bg.ac.rs
Urbino, June 2019
Approaches, problems, arguments
In cognitive science, we are looking for empirically testable answers on the following questions:
(1) What processes support the complex behavior of intelligent systems;
(2) What kinds of representations do such processes operate on;
(3) What is the core basis of such processes and representations, i.e. are the processes and representations innate or they can be learned from experience
However, any answer presupposes theoretical commitments, which in turn have philosophical and methodological consequences for assumptions concerning the nature of human cognitive processes.
Where is the essence of intelligent behavior located?
A debate between symbolism and connectionism can be stated as the debate between good-old-fashioned AI and biologically flavored AI as well, since better models of human brains can be indicative for building human-like AI. Key methodological difference between these two approaches can thus be summarized as follows:
The Paradox is simple enough to identify. On the one hand, cognition is hard characterized by the rules of logic, by the rules of language. On the other hand, cognition is soft: if you write down the rules, it seems that realizing those rules in automatic formal systems (which AI programs are) gives systems that are just not sufficiently fluid, not robust enough in performance (...) This ancient paradox has produced a deep chasm in both the philosophy and the science of mind: on one side, those placing the essence of intelligent behavior in the hardness of mental competence; on the other, those placing it in the subtle softness of human performance.
(Smolensky 1988: 21-22)
(1) Cognitive processes are similar to digital computer programs: they are like ordered lists of explicit or implicit rules; as well as modular and sequential, which means that each process follows domain-specific rules
(2) Representations are discrete, symbolic and they have combinatorial syntax and semantics
(3) Encoded knowledge in symbolic models is largely innate since the number of possible ordered lists of rules is virtually unbounded, so the initial constraints must be prespecified rather than learned
Proponents of symbolic approach, such as Fodor&Pylyshyn (1998) or Pinker&Prince (1988) were arguing that this approach is superior when it comes to accounting for productivity and sistematicity of thoughts and language.
(1) Cognitive processes are like analogue computer programs because their aim is to find the most highly associated output corresponding to an arbitrary input within the connectionist neural network. Weights of connections between input units and output units are being adjusted via learning algorithms until the statistical properties of input units are recapitulated among the environmental events. The detection of statistical patterns is produced by hidden units that are abstract and that are not directly connected to the environment as input and output units are.
(2) Representations are distributed within a neural network in a manner that states of units’ activation correspond to patterns of statistical and neural activity
(3) Knowledge is largely inferred from experience by using various learning procedures such as backpropagation.
Proponents of symbolic approach argue that - since connectionism does not guarantee systematicity - it does not explain why systematicity is found so pervasively in human cognition. Systematicity may exist in such architectures, but where it exists, it is no more than a lucky accident. Therefore, connectionist models can be useful for cognitive research and accepted in the mainstream cognitive science as long as they are implementations of symbolic architectures with biological flavor.
Can connectionism save constructivism? Can connectionism be saved by generative grammar?
Symbolic models of language processing were heavily influenced by Noam Chomsky (e.g. 1965), who had made sharp distinction between linguistic competence and linguistic performance, thereby arguing that certain sets of structural, linguistic rules are innate which allows speakers to acquire native languages quickly and rather accurately; i.e. that each speaker is endowed with “universal grammar”. Chomsky's ideas constitute a very popular theory in linguistics - generative grammar.
On the other hand, models of language processing were an insurmountable obstacle for connectionist researchers during the 1980s because it seemed that every model had to include at least some “hand-wired” rule into the neural network, which run counter to ambition of giving fully empiricist account of language processing. Nevertheless, by using dynamical systems theory and novel recurrent neural networks, Elman (1991, 1996) and Tabor & Tanenhaus (1999) made promising results. Such connectionist models of language processing were quite different from symbolic ones: instead of focusing on abstract competence, the aim was to model performance of actual language users, i.e. to articulate the computational principles that account for linguistic usage.
Smolensky (1999) argues that generative grammar and connectionist approaches to language research are not incompatible paradigms, but rather they are constituted by opposing hypotheses when it comes to model building. Core commitments of these paradigms can be reconciled and used for research advancement.
Because the basic connectionist principles are too general to have definitive consequences for key theoretical issues, less vague connectionist proposals are needed. Therefore, Smolensky proposes the (generative) grammar-based strategy which seeks to provide a way of pursuing the explanatory goals of one popular linguistic theory while incorporating computational insights from connectionist theory concerning cognitive representation, cognitive processing, and learning. Smolensky bassically claimes that eliminative connectionism, which aims to put symbolism and nativism ad acta, represents impractical delusion
Main point of Marcus' criticism is almost the same as arguments raised by Fodor&Pylyshyn (1988) or Fodor&McLaughlin (1990): connectionist researchers use deeply flawed methodology, since the models are unable to generalize in the ways that humans do to items that include properties that did not appear in the training set or corpora This is also known as still-not-systematic-after-all-these-years argument.
A fortiori, connectionist models do not provide any support for Piagetian constructivism, or valid objection to Chomskyan nativism, but the fact that the authors were not able to deliver connectionist models that are consistent with their eliminative goal does not mean that the endeavour of constructivism is doomed. Again, Marcus accepts connectionism to the extent it represents an implementation of symbolic architecture.
Arguments in favor of usage-based theory as a key theoretical commitment of connectionism
However, both Marcus and Smolensky did not leave an open possibility that in time eliminative connectionism may be methodologically enhanced and theoretically refined.
Karpathy & Fei Fei (2015) and Karpathy, Johnson, Fei-Fei (2016) started with an idea that computationally boosted connectionist architecture may well be indicative of how children acquire knowledge of the objects of reference which surround them. In their models, convolutional neural networks (CNN) are combined with bi-directional and multi-modal recurrent neural networks (RNN) in such a way that CNN is used for image classification and object detection, bi-directional RNN for determining the sequence of the words in sentences of corpora, and multi-modal RNN for generating the novel descriptions of image regions by using inferred alignments of two modalities. The authors summarize their attempts in following way (2016: 3128):
The primary challenge towards this goal is in the design of a model that is rich enough to simultaneously reason about contents of images [as input units] and their representation in the domain of natural language [as output units]. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data.
What we need for giving this sort of eliminative connectionism a shot is, in addition, a theory of language acquisition and processing which must specify a priori only a single set of general learning processes with which it is possible to learn everything about a language, including these correspondences between visual input and verbal output. Such a theory may be, pace Smolensky, usage-based theory instead of generative grammar.
According to the proponents of UBT, children come to the process of language acquisition equipped with two sets of cognitive skills, both evolved for other, more general functions before linguistic communication emerged in the human species:
For general mechanisms such as categorization, analogy and categorial-based induction, pattern-finding is the central cognitive construct.
Tomasello has remarked that even though connectionism is way more in accordance with UBT than symbolism, nevertheless it has its own limitations, viz. ignorance of communicative intentions and visual pieces of information. Yet, models that have been proposed by Karpathy & Fei Fei seem to be going in the direction of providing the computational framework for at least one essential part of the UBT, i.e. pattern-finding.
As for the the other essential part of the UBT, i.e. intention-reading, Colombo (2015) offers interesting theoretical possibilities which have just started to be explored empirically: emerging research on hierarchical Bayesian algorithms may offer an interesting, biologically plausible model of collective learning and agency that can be used to flesh out UBT proposals. E.g. conventions would be devices whose role is to reduce uncertainty in order to create patterns or schemas of behavior represented as probability distribution.