A historic overview of Symbolic vs Connectionist Machine Learning techniques
Battles have been fought, won and lost, for sufficiently tangible implementations for representational theories of the mind. The two key contenders to naturalise the mind and then model it artificially are explicit representations using symbolic processing and the more neurobiological-inspired connectionist systems (Bechtel,1993).
This article will contrast the composition of connectionist and traditional models of cognition, evaluate the merits and weakness of both systems in the quest for naturalisation and then conclude that there is a paradigm of which both systems can be used to form a comprehensive model of the mind.
Initial collaborations between psychology and linguistics with Chomsky’s transformational generative grammar revolution (Chomsky, 1957) and Miller’s finite state languages (Chomsky and Miller, 1958) gave cognitive science a framework, whereby propositional attitudes could be naturalised, using traditional programming languages with explicit representations in a symbolic system (Miller, Galanter and Přibram, 1960).
Techniques used during early attempts at modelling cognition through reasoning and heuristic procedures showed the viability of the syntactical approach; such as GPS Programs (Newell and Simon, 1961) or chess playing programs (Baylor and Simon, 1966). The symbolic approach was championed as the framework to model thoughts and propositions in Fodor’s language of thought (Fodor, 1975) and by other computational functionalists, proposing that fully conceptual propositional systems could account for cognition (Hook et al., 1961).
Although connectionist networks were initially heavily criticised, because of scalability issues and an inability to learn certain tasks like the exclusive-or (XOR) problem (Minsky and Papert, 1969), leading to a deficit of research funding (Bechtel, 1993).
Limitations became apparent in the symbolic approach and connectionist networks were re-evaluated, due to the development of new learning algorithms, rigidity of the symbolic approach and an increase in the availability of big data and computational processing power (Bechtel, 1993).
Connectionist networks focus on the inter-connectivity of individual units, whereby representations exist implicitly in the strengths of connections made between individual units, rather than the explicit representations used in symbolic systems (Bechtel, 1993).
Early connectionist networks, loosely inspired by the composition of individual neurons in the brain were developed using McCulloch and Pitts’ (1943) simplified model of a neuron. The simplified model utilised a set of normalised inputs that are summed to determine if they meet a constant threshold value and produce a binary output (McCulloch and Pitts, 1943).
Rosenblatt’s (1962) interconnected perceptrons were an implementation based on the simplified neuron, that showed it was possible to develop computational systems electronically that had structures closer to the neurophysiology in the organisation of our brains (Rosenblatt, 1962).
This is a clear advantage to the connectionist systems as although our brains and computers may share similar properties, such as logic processing reasoning and memory storage, they are very different in their composition.
However, as discussed earlier, Minsky and Papert (1969) were highly critical of connectionist systems, arguing that it was not possible to develop a generalised rule to solve all types of problems. They also insinuated that it was not possible to improve connectionist networks by adding an additional hidden layer between input and output units (Minsky and Papert, 1969).
These claims were later challenged by further research into more advanced versions of connectionist systems using feed-forward networks, such as the investigations by Rumelhart, Hinton and Williams (1986).
Feed-forward networks employ similar architectures as early connectionist systems, with the addition of the hidden layer of units suggested by Minsky and Papert, where each input unit is connected to every unit in the hidden layer and each unit in the hidden layer is connected to all output units simultaneously (Rumelhart, Hinton and Williams, 1986).
The backpropagation of error techniques that were tested to learn a variety of problems by Rumelhart, Hinton and Williams (1986), where the initial output of the network is compared with the expected output, led them to propose that a generalised learning rule may be possible for neural networks. During training, the neural network compares differences between the expected output and the actual output. If a difference is found, the values of weights are then updated to reduce the difference of the expected and actual output, incrementally improving the systems output (Rumelhart, Hinton and Williams, 1986).
This suggests that connectionist systems are more robust and flexible than the laborious programming of rule based systems, where incremental learning is possible without the need to explicitly define new representations or functionality into the system manually — it just needs to be ‘taught’ how to do it.
Fodor and Pylyshyn (1988) returned to the debate to argue that connectionism is at best an explanation of the lower level for which classical theories of cognition, such as the language of thought and propositional attitudes can be implemented. At worst, they argue that it adds nothing to the debate and cannot account for the productivity and systematicity of thought (Fodor and Pylyshyn, 1988)
Nevertheless, algorithms such as feed-forward neural networks researched by Plunkett and Marchman (1991), like the networks investigated by Rumelhart, Hinton and Williams (1986), which utilise incremental increases in training data to learn English verb morphology, have shown similarities to learning in children. Their analysis displayed a comparable error rate to that of children learning past tense verbs, suggesting a transfer from repetition learning to a more systematic manipulation of words (Plunkett and Marchman 1991).
It seems we are now collectively progressing to a common ground. Through complimentary models of connectionist and symbolic systems such as the Integrated Connectionist / Symbolic (ICS) explanation, embodying both the sub-symbolic nature of connectionist system at the lower level, as well as the systematic and productive aspect of thought at the higher level. There is a key middle computational layer of abstraction, where higher level representations or symbols are derived from the many activations and connection weights of the lower level connectionist networks (Smolensky and Legendre, 2006).
This framework enables us to model from the lower neurophysiological level, where connectionism is used as a basis for neural computation, and higher level functionality with more explicit concepts can account for higher level cognitive tasks and productivity of thought.
References:
Baylor, G. and Simon, H. (1966). A chess mating combinations program. Proceedings of the April 26–28, 1966, Spring joint computer conference on XX — AFIPS ’66 (Spring).
Bechtel, W. (1993). The case for connectionism. Philosophical Studies, 71(2), pp.119–154. Chomsky, N. (1957). Syntactic Structures. 1st ed. The Hague: Mouton & Co.
Chomsky, N. and Miller, G. (1958). Finite state languages. Information and Control, 1(2), pp.91–112.
Fodor, J. (1975). The language of thought. 1st ed. Cambridge, MA: Harvard University Press.
Fodor, J. and Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), pp.3–71.
Hook, S., Köhler, W., Feigl, H. and Putman, H. (1961). Dimensions of mind. 1st ed. New York: Collier Books.
McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), pp.115–133.
Miller, G., Galanter, E. and Přibram, K. (1960). Plans and the structure of behavior. New York: Holt [u.a.].
Minsky, M., (1991). Logical vs Analogical or Symbolic vs Connectionist or Neat vs. Scruffy. AI Magazine, 12(2), p.47.
Minsky, M. and Papert, S. (1969). Perceptrons. 3rd ed. Massachusetts: Institute of Technology.
Newell, A. and Simon, H. (1961). GPS, a program that simulates human thought. Santa Monica, Calif.: Rand Corp.
Plunkett, K. and Marchman, V. (1991). U-shaped learning and frequency effects in a multi- layered perception: Implications for child language acquisition. Cognition, 38(1), pp.43–102.
Rosenblatt, F. (1962). Principles of neurodynamics. 1st ed. Washington D.C.: Spartan Books.
Rumelhart, D., Hinton, G. and Williams, R. (1986). Learning representations by back- propagating errors. Nature, 323(6088), pp.533–536.
Smolensky, P. and Legendre, G. (2006). The harmonic mind. Cambridge, Mass.: MIT Press.