Deep Systematicity and Connectionist Representation

Commentary on Andy Clark, "Theoretical Spaces"

1. I think that by emphasizing theoretical spaces of representations, Andy has put his finger on an issue that is key to connectionism's success, and whose investigation will be a key determinant of the field's further progress. I also think that if we look at representational spaces in the right way, we can see that they are deeply related to classical phenomenon of systematicity in representation. I want to argue that the key to understanding representational spaces, and in particular their ability to capture the deep organization underlying various problems, lies in the idea of what I will call deep systematicity. This is closely related to classical systematicity, but I think that it may potentially be much more far-reaching.

So I'll spend a few minutes developing this point, starting with a look at classical systematicity.

If we look at the cluster of phenomena that Fodor and Pylyshyn emphasize in their discussion of the Classical approach - compositionality, systematicity of cognitive capacities, productivity - we can see that all these derive from one key property of Classical representation: that the semantic structure of what is represented is systematically reflected in the formal structure of the corresponding representation. This is the general property that I'll call Classical systematicity.

e.g. look at [example and explanation]. we can see how this systematicity of representation immediately buys the phenomena we were talking about, and most importantly of all, it buys a systematicity of functional role. Representations that have a relevantly similar semantic structure, will, by virtue of their systematic form, play a relevantly similar functional role. So the propositions "John loves Bill" and "John loves Mary" will play similar functional roles within the system, with any differences being a systematic reflection of the difference in one constituent; and these two propositions will play an entirely different functional role to that played by a proposition such as "The quarterback fainted last Saturday."

This is guaranteed precisely by the complex internal structure in each representation. If the representations were structurally atomic, any such systematicity of functional role would have to be arbitrarily imposed by external rules.

It's difficult to overemphasize the power of this idea, and it's been been at the root of much of the last century's work in formal logic and AI.

BUT...

are there semantic primitives?

classical atomic representations lack any internal structure

so there is no reflection of semantic similarity among such representations

inferences performed on such atomic representations are therefore quite unsystematic.

consequences include the well-known brittleness of symbolic AI models.

Reason: internal structure of a classical representation reflects at best the compositional properties of what is represented. but there are a lot more semantic properties out there.

2. This is where connectionism comes in, and connectionist representational spaces in particular. I think what ought to be regarded as the key philosophical commitment underlying connectionism is the following:

No structurally atomic representation.

Instead, in connectionist systems, every representation has complex internal form. This is made possible by the universal use of distributed representation, where the representation of any concept is spread over a number of separate computational units, none of which is a representation in its own right.

Once we allow every representation to have complex internal structure, then we have opened the door to what I will call deep systematicity, that is, the reflection of semantic properties of what is represented in the formal structure of the corresponding representation, all the way down.

In connectionist representational spaces, that's just what we find. After a period of learning, representations fall into a multi-dimensional vector space, such as the one depicted here (in a diagram lifted from PC). What we find is that the structure of this space is in no way arbitrary. Instead, objects that are semantically similar, in a way that is relevant is relevant to a given problem, have corresponding representations that are formally similar - lying close together in this vector space, for instance. In a connectionist network that classifies animals, for instance, the representations of "cats" and "dogs" will be much closer together than those of "cat" and "elephant". This is a direct consequence of the fact that relevant semantic properties of the "cat" concept are reflected directly in the structure of the vector that represents "cat".

Of course, this space here is just one of Andy's theoretical spaces, that, assuming it is successful, has developed in a way that captures the deep organizational structure of a given problem space. So theoretical spaces and deep systematicity are two sides of the same coin.

Now what does deep systematicity buy you:

(1) Firstly and most importantly, it buys the key property of automatic generalization that Andy has been stressing. If a representational space has been built up that handles cats and mice and elephants, and all of a sudden a hamster comes along, then deep systematicity causes the hamster to be slotted into a vectorial representation that is likely to be closer to the representation of "mouse" than that of "cat", and certainly than that of "elephant", though also different in relevant ways. Any actions to be taken will depend directly on this representational structure, so it is likely that if we've got the representation right, we should get the action about right too - and that's what we find in practice.

Other properties that come along with deep systematicity includes the ability of context to subtly influence the form of a representation, and therefore the consequent actions - something that is more difficult to achieve when objects are represented by simple primitives; and also the ability to interpolate between two different representations when appropriate.

Now, of course the existence of a deeply systematic representational space doesn't do all the work for you - you still need processes that will operate on the representations, allowing them to play appropriate functional roles. But the point is that (just as in the case of Classical systematicity), the systematicity in representational form provides a strong assurance of systematicity of functional role - i.e., a functional role that will cohere with the semantic properties of what is represented. In effect, deep systematicity tries to, as far as possible, take the responsibility for reflection of semantic regularities out of the rules, and into the representations.

Words on classical systematicity (tensor products, RAAM).

Now, to get back to the problems that Andy was worrying about, it seems to me that the central problem is just this: How can deep systematicity be achieved? Connectionism per se, through the introduction of distributed representation, allows the possibility of deep systematicity. But that potential needs to be backed up by ways of getting there - by actual mechanisms that develop deeply systematic representational spaces. So far, connectionism has developed just a few algorithms, that demonstrate the potential, but they're far from exhausting the possibilities.

In connectionism's most frequent incarnation to date, deep systematicity of representation is achieved through the use of the backpropagation algorithm. Now backpropagation is terrifically powerful, but Andy has criticized it, rightly I think, for being too tightly tied to first-order properties of the input and output spaces. If the relevant organizational properties of the relevant representational space can be straightforwardly extracted from the input and output spaces, then it will do fine, otherwise it tends to have trouble. The problem seems to me to lie in the fundamentally associationist nature of the algorithm. What is being learned is an input-output association, and the hidden representation in constrained to be a simple way station on the path from input to output. There is little room for the complex processing that might exploit more abstract properties of the input and output spaces when necessary.

Andy's response is to lean on the possibility of exploiting the sequential order in what is learnt. I think this is an intriguing possibility, but that he may be overestimating it. For even incremental learning shares with traditional backpropagation a strong associationism - indeed backpropagation is the algorithm that Andy supposes is used throughout. So it seems to me that the associated problems will not so easily be avoided. It may well be that the right kind of incremental learning will provide a slightl improvement over vanilla backpropagation, to speak somewhat figuratively an improvement of 20 or 30 percent; but it seems to me that it's not the kind of thing that will lead to a vast qualitative difference, on the order of 200 percent. Incremental learning or not, backpropagation remains strongly driven by the first-order properties of input and output spaces. The empirical evidence is not yet in, but I've heard of as many failures with incremental learning as successes, so I think there are grounds to be skeptical.

Andy's proposal shares with traditional connectionism the tendency to lean as strongly as possible on learning. There seems to be a deep suspicion in these circles of positing too many innate mechanisms. To some connectionists, to posit that a problem is solved by the use of innate mechanisms is something of a cheat, if it could be solved by learning instead. You might say that on this view, an organism has to earn its abilities.

I don't entirely agree with this attitude, but neither do I agree with the attitude that the only real explanation of systematicities must be due to innate architectural constraints, an attitude that is suggested by the work of Fodor and Pylyshyn. This view goes to the other extreme, arguing that innate mechanisms carry the real burden, and that so-called "learning" is just a matter of the triggering of the right innate mechanisms.

Instead I think there is a middle ground that allows the admission of strong innate mechanisms while retaining the connectionist emphasis on the importance of learning. This view, which I've objectively labeled "enlightened connectionism", sees our cognitive capacities as the result of the continuous adaptive processes of evolution and learning.

Of course there's nothing at all radical about this - it's just a way of looking at the development of cognitive capacity that causes all of the nativist vs. empiricist problems to seem relatively unimportant. It involves the recognition that evolution and learning are processes of adpatation that are relevantly similar in kind: both process whereby organisms can climb up adaptive landscapes, improving their ability to deal with an environment. Evolution takes you so far, and learning gets you the rest of the way. That point that is so important to nativists and empiricists alike - that is, the question of what is built-in at birth - is reduced to the status of a relatively unimportant border where these two processes of adaptation meet. Furthermore, this line between evolution and learning is actually quite arbitrary. There are studies showing that what was once achieved by learning, some time in a species' past, can later be achieved by evolution. It's just a matter of which is more efficient.

With this view in hand, we can construct a view of cognitive development that is quite compatible with the spirit of connectionism, but which eschews any simple form of empiricism. For a start, the only tabula rasa are in the priordial soup. Everything since then has been the result of an adaptive process, and so is very much a product of its evolutionary environment, with mechanisms developed that are adapted to that environment. On the other hand, we can't throw around innate mechanisms willy-nilly - on this methodology, there at least has to be some reasonable evolutionary explanation of how an innate mechanisms might have come about.

As a very rough rule of thumb, we can say that in the development of cognitive capacities, evolution may be likely to build mechanisms that exploit regularities that are robust across a number of different environments, to deal with problems that any member of a species is likely to have to handle; whereas learning is likely to handle regularities that are specific to the environment that an individual organism finds itself in. If we apply this to the case of the classical systematicity of cognitive capacities, as Andy considered at the end of his talk, we can see that it may be more plausible that the mechanisms responsible for this systematicity might be a product of evolution, precisely because it seems to reflect a similar robust systematicity in most environments. One doesn't just find trees to the left of lakes; one also finds lakes to the left of trees, and so on for a very large number of relations and objects, though by no means all of them. So it maybe more reasonable to suppose that classical systematicity would be sufficiently adaptive that it might be built in by an evolutionary process. Of course, as we've seen, the line is somewhat arbitrary - it is also quite possible that some of the job might be left to learning, as long as it was guaranteed that learning could do the job relatively quickly, and reliably across different environments. The latter suggestion might also help explain the gaps in systematicity that are purported to exist in at least some animals. But it does not seem to me that systematicity should entirely be a product of a difficult process of knowledge acquisition, as Clark suggests - that would seem to be a maladaptive way to deal with such a vital capacity.

All of this would come to very little if it were not for the fact that there exists a perfect computational tool to go along with this variety of connectionism, and that is the genetic algorithm.

[explanation goes here]

Now this can be combined very directly with the use of connectionist networks, e.g. to specify the modularity of a network, or the topology, as an initial state. Of course we only specify the state of an orgamism at the beginning of its life; we then allow learning to take over. Those organisms are most fit that are doing the best after a period of learning.

What we find is that individual organisms are selected that have a complex initial structure that allows them best to learn problems with given domains.

How does this relate to deep systematicity? Well, the explanatory burden need not fall entirely onto learning. Instead, we can find - and experiments have been done along each of these lines, that through an evolutionary process, we can develop innately:

modular structures within network design
biases upon the learning algorithms that networks use
development of appropriate learning algorithms for appropriate domains
initial weights that are well-suited to particular domains
etc.

There have been various studies showing how connectionist learning and generalization can be improved by each of these techniques. These might seem to be "innate" mechanisms that a connectionist might like to rule out - Andy talks of building in "task-specific information" -- but in fact we are nowhere buying something for nothing. We need only suppose that a given task, or more likely a given class of tasks, has been around in an organisms evolutionary environment, and it is quite natural that mechanisms might develop that are specifically to handle certain classes of tasks, or even more likely, to enable the right organizing structure of given task domains to be picked up quickly and easily.

In this way we can step back from the empiricist/nativist debate, and admit a strong role for evolution in the development of such cognitive capacities as deep systematicity while still admitting the centrality of learning in cognitive explanation.