Connectionism and the Language of Thought

Murat Aydede

  • The University of Chicago
  • Department of Philosophy
  • 1050 East 59th Street
  • Chicago, ILLINOIS 60637
  • (maydede@midway.uchicago.edu)
  • June, 1995


    1. Introduction

    Fodor and Pylyshyn's (F&P) critique of connectionism has posed a challenge to connectionists: Adequately explain such nomological regularities as systematicity and productivity without postulating a "language of thought'' (LOT). Some connectionists declined to meet the challenge on the basis that the alleged regularities are somehow spurious. Some, like Smolensky, however, took the challenge very seriously, and attempted to meet it by developing models that are supposed to be non-classical.

    My aim in this paper is twofold: First, I will provide a critical/reconstructive historical survey of the debate between connectionists and classicists. Second, within this framework, I will offer an analysis of what the minimal truth-conditions of the LOT architecture essentially are. In this vein, I will reconstruct F&P's original argument against connectionism and locate the responses and opposing positions according to possible/actual reactions to its premises. Then, I will take up the claim of those connectionists who took the challenge seriously, and look into some of their models in order to see what it is about them that, allegedly, enables them to adequately explain the cognitive regularities without becoming classical. Given my analysis of what the notion of LOT involves, I will argue that to the extent to which they can explain the law-like cognitive regularities, a certain class of connectionist models proposed as radical alternatives to the classical LOT paradigm will in fact turn out to be LOT models, even though new and potentially very exciting ones. I think that connectionists have contributed to a proper philosophical understanding of the LOT Hypothesis. It is time to see what exactly this contribution involves.

    2. Fodor and Pylyshyn's Argument

    The following reconstruction is what I take to be the canonical formulation of F&P's argument in their (1988) article against connectionism:

    i. Cognition essentially involves representational states and causal operations whose domain and range are these states; consequently, any scientifically adequate account of cognition should acknowledge such states and processes.

    ii. Human cognition, conceived in this way, has certain scientifically interesting properties: in particular, it is a law of nature that cognitive capacities (more specifically, propositional attitudes) are productive, systematic, and inferentially coherent.

    iii. Accordingly, the architecture of any proposed cognitive model is scientifically adequate only if it guarantees that cognitive capacities are productive, systematic, etc. This would amount to explaining, in the scientifically relevant and required sense, how it could be a law that cognition has these properties.

    iv. The only way for a cognitive architecture to guarantee systematicity (etc.) is for it to satisfy a certain description P. (Classical architectures necessarily satisfy P.)

    v. Either the architecture of connectionist models does satisfy P, or it does not.

    vi. If it does, then connectionist models are implementations of the classical LOT architecture and have little new to offer (i.e., they fail to compete with classicism, and thus connectionism does not constitute a radically new way of modeling cognition).

    vii. If it does not, then (since connectionism does not then guarantee systematicity, etc., in the required sense) connectionism is empirically false as a theory of the cognitive architecture.

    viii. Therefore, connectionism is either true as an implementation theory, or empirically false as a theory of the cognitive architecture.

    According to F&P, the description P is any description according to which:

    a.

    representations of a system have a combinatorial syntax and semantics such that structurally complex (molecular) representations are systematically built up out of structurally simple (atomic) constituents, and the semantic content of a molecular representation is a function of the semantic content of its atomic constituents together with its syntactic/formal structure, and

    b.

    the operations on representations are (causally) sensitive to the syntactic/formal structure of representations defined by this combinatorial syntax.

    F&P seem to take P as the defining characteristic of what makes a cognitive architecture classical; P, it appears, defines the sense in which classical cognitive systems are all symbol manipulating systems, or, as is sometimes called, formal symbol manipulators. In effect, Fodor's Language of Thought Hypothesis (LOTH) is just the empirical hypothesis that there is a system of mental representations (symbols) physically realized in the brain of cognitive organisms and that P is true of this system. The LOTH is the essential backbone of the Computational/Representational Theory of Mind (CRTM), of which Fodor has also been a champion: common sense propositional attitudes like beliefs and desires are to be scientifically explicated in terms of computational/functional relations organisms bear to the symbols in their language of thought.

    According to F&P, P defines two kinds of relations: constituency (P-a) and (structure sensitive) causal relations (P-b) among mental representations. As we will see in a moment, F&P's criticism of connectionism is that connectionist systems can acknowledge, at the cognitive level, at most causal, but not constituency, relations, and it is precisely because they cannot acknowledge constituency relations that causal processes defined over representations cannot be structure sensitive. This is the basis of F&P's charge that connectionist systems in some sense are all bound to be associationist systems as long as they process representations according to their statistical rather than structural properties. But, as such, they have long been shown to be inadequate, on F&P's view, as models of cognitive processes.

    3. Classical-Cognitive Architecture

    Let me expand a bit on the notion of a classical cognitive architecture that is involved in the debate. F&P's characterization of the notion of a cognitive architecture goes as follows:

    The architecture of the cognitive system consists of the set of basic operations, resources, functions, principles, etc. (generally the sorts of properties that would be described in a "user's manual'' for that architecture if it were available on a computer) whose domain and range are the representational states of the organism. (1988, p.10)

    Their emphasis here is on what makes an architecture a cognitive one. But let us first focus on what an architecture is.

    As suggested by the parenthetical remark, what F&P seem to have in mind here is whatever notion of architecture is involved when we consider current high-level computer programming languages like BASIC, PASCAL, PROLOG, LISP, etc. These languages have different architectures in that their syntax and organization (e.g., some may require ample use of "GO TO'' statement, whereas others not, thus forcing the programmer to write highly "structured'' programs, etc.), primitive operations (e.g., the square root function might be primitive in one but not in others, etc.), use of computational resources (e.g., memory, processor time, etc.) and the like are different. In this sense, indeed, the architecture of these universal languages is what is being described in their "user's manual'' (e.g., when you buy an over-the-counter compiler for one of these languages).[1]

    So, if the notion of a (computational) architecture is to be understood in this way, i.e. in analogy to what is described in the "user's manual'' of programming languages, what makes it cognitive? According fo F&P, when we talk about the cognitive architecture of the (computational) mind/brain, we are talking about a computational level whose primitive operations, functions, etc. have, as their domain and range, representational states, i.e., data structures (symbols) that, at a minimum, represent the states of affairs in the world. So, an architecture is cognitive if, and only if, what is being processed in this architecture has such representational content.

    F&P want to say, then, of any such cognitive architecture that it is classical if, and only if, P-a is true of what is being thus processed (i.e., representations) and the processing architecture does actually exploit the (syntactic/formal) structural features of the representations in processing them (hence, P-b).[2]

    It should be emphasized that what F&P define in terms of P is what it is to be classical for a cognitive architecture. Or, we may put the point by saying that what they define is 'classical-cognitive'. This is important to keep in mind. For in citing P, they are not concerned with defining the predicate 'x is classical' tout court. That this is so is apparent from the fact that we may have an (intuitively classical) computational architecture with universal computational power that is not classical-cognitive in the sense defined but that, nevertheless, may be used to implement any classical-cognitive architecture. For instance, simple universal Turing machines or von Neumann machines are just like that. Their basic architecture in many cases cannot be "cognitive,'' but nevertheless they can be used to implement any computational processes defined over representations that satisfy P-a.[3] Similarly, F&P allow that there may be connectionist architectures that are not classical-cognitive but may nevertheless be used to implement architectures that are classical-cognitive. This is the gist of the premise (vi) of their argument.

    Perhaps a better and more transparent illustration of P as defining features of a classical-cognitive architecture can be found in the notion of an interpreted formal system. The proof theoretic notion of a formal system consists of, first, constructing a formal language by means of an alphabet and a finite set of formation rules, and, then, of adding to this language a deductive apparatus (a set of derivation or transformation rules) that would define the rules of transforming the well-formed formulas of this language. The paradigmatic examples can be found in different formulations of propositional and first-order predicate logic. In fact, it is no accident that there are strong parallels between P-a and formation rules given for formal languages on the one hand, and between P-b and the derivation rules given for formal systems on the other. To say that P is true of mental representations is just to say that they constitute (or, are charaterizable as) an (interpreted) formal system in the logicians' or mathematicians' sense of the phrase.

    So, to recap, P defines what makes a cognitive architecture classical only by putting conditions on the nature of what is being processed and on the character of the processing in that architecture. To say, then, that any cognitive architecture that satisfies P is classical is just to say that the architecture processes representations with combinatorial syntax and semantics (P-a), and that the architecture is so designed that it processes the representations by (causally) responding to their formal/syntactic features defined by this combinatorial syntax (P-b).

    Finally, it should be noted that P-a and P-b are abstract meta-architectural properties in that they are themselves conditions upon any proposed specific architecture's being classical. There are indefinitely many possible classical architectures. To illustrate the point, consider, for instance, different formulations of sentential logic: in one, the only formally complex sentences may be negations and conditionals in which case the transformation rules that are appropriate for these would define the primitive processing operations; in others, all the five standard logical forms of sentences and different sets of primitive rules for transforming them might be given. But, P would come out to be true of any different formulation of sentential logic if considered as a representational system run in a computational architecture. Similarly, any architecture (LISP, PROLOG, etc.) that would process such representations in a structure sensitive way would count as a classical one. This is the sense in which P-a and P-b are abstract meta-architectural properties. They define classicism per se, but not any particular way of being classical. Classicism as such, then, is not committed to any particular architecture or to any particular P-like representational system in advance. It simply claims that whatever the particular cognitive architecture of the brain might turn out to be, P must be true of it. To be sure, this is an inference to the best explanation for certain cognitive regularities like systematicity that cognitive organisms are claimed to nomologically exhibit. F&P claim that connectionism, if it is to be adequate, should do at least just as well.

    Let us now briefly see how they elaborate and defend the premises of their argument against connectionism.

    4. The Law-Like Cognitive Regularities and their Classical Explanation

    In their article, F&P do not discuss premise (vi) at all. They seem to regard it as obvious: if the architecture of connectionist models satisfies P, then they are simply implementation models. Instead, they focus their attention on making the antecedent of premise (vii) stick: the connectionist cognitive architecture does not satisfy P. They spend a good deal of their time on arguing that connectionist representations are all atomic and do not have constituent structure in the sense P-a requires. For that reason, connectionist representational processes cannot be sensitive to the structure of the representations either. Hence, connectionism also violates P-b. F&P's discussion contains certain strange elements that are in fact symptomatic of some larger difficulties. I want, however, to spare the reader the details and peculiarities of that discussion.[4]

    After F&P argue for premise (vii), they return to their basic line of argument and introduce in great detail the reasons for why P is necessary in any adequate explanation of cognitive phenomena. This amounts to the elaboration and defense of premises (ii) to (iv). Their focus, however, is on premise (ii) and (iv). Premise (iii) is intimately connected to (ii) and (iv). In particular, they describe in some detail those law-like regularities of cognition, and explain why their adequate explanation requires the postulation of a representational system that satisfies P. Its intended job is in fact to prevent certain ad hoc solutions on the part of connectionists in the explanation of these regularities. The idea is that since these regularities hold by nomological necessity, the proposed explanation must reflect this fact in order to be adequate. Then, the claim is that only a P-like structure can guarantee the required nomological necessity. Hence, premise (iv). (The root of this point about nomologicality can, I think, be traced back to Pylyshyn's earlier work: see his (1984, pp.35-8)).

    Here I would like to briefly describe only systematicity and inferential coherence.[5]

    Systematicity. According to F&P, systematicity is the fact that the capacity to entertain certain thoughts is intrinsically connected to the capacity to entertain certain other thoughts. Well, which thoughts? Thoughts that are related to each other in a certain way. In what way? There is a certain difficulty in answering such questions, which gets reflected in clearly describing what systematicity essentially consists in. This is a problem, and the usual way to get around that problem is to appeal to examples.[6]

    Consider the capacity to think that John loves Mary. It is claimed that any mind that is systematic and has that capacity has by nomological necessity also the capacity to think that Mary loves John, and vice versa. Or, consider the capacity to think John loves Mary and Bill hates John. Any systematic mind with that capacity will have the capacity to think thoughts that can be described by taking the permutations on the subject/object positions of the description of the original thought. Hence, for example, it will have the capacity to think that Bill loves Mary and John hates Bill, etc. Roughly, it seems that for any asymmetric relation R, if one has the capacity to think aRb then by nomological necessity one also has the capacity to think that bRa for any a and b. Intuitively, according to the classicist, it is a law of nature that (at least some) minds are not punctate: capacities to entertain thoughts necessarily come in natural clusters. For the classicist, of course, once you view the capacity to have thoughts as a capacity to token structurally complex (mental) representations that belong to a system that at least satisfies P-a, it is not surprising that this is so. So, once you can token (in the computationally appropriate way) the mental representation aRb, you can surely token bRa, since they are made up, so to speak, out of identical parts and, it is postulated by the CRTM that you have the necessary mechanisms to bring them together according to the rules of combinatorial syntax. This is the classical explanation of what I will call formative systematicity.

    Inferential Coherence. Systematicity of thought is not restricted to the systematic ability to entertain certain thoughts. If the system of mental representations does have a combinatorial syntax, then there is a set of rules, syntactic formation rules, that govern the construction of well-formed expressions in the system. It is this fact that guarantees formative systematicity. But inferential thought processes are systematic too: the ability to make certain inferences is intrinsically connected to the ability to make many others. For instance, according to the classicist, you do not find minds that can infer 'A' from 'A&B' but cannot infer 'C' from 'A&B&C'. Again, it is a nomological psychological fact that inferential capacities come in clusters that are homogeneous in certain aspects. How is this fact, which I will call inferential (or, sometimes, transformational) systematicity, to be explained?

    The classical explanation depends on the exploitation of the notion of logical form or syntactic structure determined by the combinatorial syntax postulated for the representational system. The combinatorial syntax does not only give us a criterion of well-formedness for mental expressions, but it also defines their logical form or syntactic structure. In other words, once the expressions can have a formal structure, the operations can apply to them in virtue of their form. This is the sense in which the explanation of inferential systematicity requires P-b. Since from a syntactic view-point, similarly formed expressions will have similar forms, it is possible to define a single operation which will apply to only thise expressions that have a certain similar form. This allows the classicist to give homogeneous explanations of what appears to be homogeneous classes of inferential capacities. I will have, later on, plenty to say about this point and how it ought to be understood for a correct conception of LOTH, but just to anticipate, let me make one quick point about the role a combinatorial syntax plays.

    Syntax is what mediates the semantic properties of representations with their causal properties: on the one hand, with the principle of compositionality, it determines in a systematic way the assignment of semantic content to representations, while, on the other hand, determining which mental operations will apply to them. How it is supposed to do this, causally, will be of great importance to us in a moment. However, according to Fodor, this is the greatest virtue of classical architecture: it gives an explanation of how the causal processes can be made to mirror their semantic coherence. In fact, the classical paradigm, for the first time in the history of human thought, gives a clear idea of how human rationality can be mechanized.[7]

    According to F&P, on the other hand, connectionism pitched at the cognitive level with its atomic and structureless representations is unable to even begin to address the issues that the classical paradigm is so good at explaining. On a connectionist view, they claim, systematicity of cognitive capacities is a mystery. But on the classical view, it is what you would predict. So, they conclude, connectionism, when pitched at the representational level, must be false as a theory of human cognitive architecture.

    4. Connectionist Rebuttals

    It is possible to classify the kinds of connectionist rebuttals on the basis of their different responses to the premises of F&P's argument.

    The acceptance of premise (i) of their argument, as F&P point out, draws a general line between two radically different traditions in the philosophy of mind, namely, between eliminativism and representationalism (intentional realism), and places the connectionists within the representationalist camp. However, I presume, not everyone who views connectionism as a radically different and promising story would like to see herself placed in this camp. Indeed, there has been a considerable controversy as to whether connectionism is a new approach with the necessary resources to constitute a serious challenge to the fundamental tenets of folk psychology.[8] I think it is still too early to assess the potential connectionism has in the support it allegedly gives to the elimination of folk psychology.[9] On the other hand, many connectionists do in fact advance their models as having causally efficacious representational states, and more often than not, explicitly endorse F&P's first premise.

    The conclusion of F&P's argument posed a dilemma for connectionists, and with it, a challenge for at least those connectionists who were eager to see their approach on a par with classicism and in competition with it. The challenge was: explain the law-like cognitive regularities such as systematicity without becoming classical.

    Some have declined to meet the challenge. They are the ones who reject either premise (ii) or premise (iv), or both.[10] I tend to side with F&P about premises (i)-(v). But, as I have said, my concern in this paper is not this group, but rather the other group that has attempted to meet the challenge.

    What unites this group of connectionists who usually accept premises (i) through (v) is the rejection of premise (vi). In what follows, I will examine some fundamental issues raised in the context of this rejection.

    5. Attempts to Meet F&P's Challenge

    The connectionists who accept F&P's challenge all seem to accept that the law-like cognitive regularities like systematicity need an explanation and that the only adequate explanation is the one that draws on an architecture that satisfies something like P. However, they seem to think that not every way of satisfying P would amount to a LOT architecture. So they reject premise (vi). Smolensky is very explicit about the rejection:

    ...distributed connectionist architectures, without implementing the Classical architecture, can nonetheless provide structured mental representations and mental processes sensitive to that structure. (1990a, p.215)

    Smolensky and some others have developed some very interesting connectionist models that all utilize the so-called distributed representations, and they all seem to support combinatorial procedures which form structurally complex representations.[11] I will return to some of these models below. The existence of such models immediately raises the following obvious question, since, as I have said, F&P seem to take premise (vi) as a definitional truth:

    (Q1)

    If there are connectionist models that genuinely satisfy P, why do they not count as instances of the LOT paradigm? Or, to put it differently, if they are not instances of the LOT architecture, in exactly what sense do they satisfy P?

    The answer given by connectionists is not very clear. But they are only partly to blame. For, the defenders of the LOT hypothesis were not clear either on the answer they provided to the following question.

    (Q2)

    What needs to be minimally true of a physically realized system in order for it to count as an instance of a cognitive architecture that satisfies P? In other words, what, minimally, does it take for a physical system to satisfy P and thus to count as a LOT system?

    Connectionists' answer to (Q1) roughly comes down to this: When you devise a representational system whose satisfaction of P relies on a non-concatenative realization of structural/syntactic complexity of representations, you have a non-classical system, i.e., a system that is in no way a realization of the LOT architecture. Strangely enough, the classicists', or at least Fodor and McLaughlin's, answer to the second question seems to converge on connectionists' answer to the first question. Fodor and McLaughlin (F&M) wrote a paper (1990) that was meant to be a reply to Smolensky's Tensor Product Representations System. In that paper, they stipulate that you have a classical system or a LOT architecture only if the syntactic complexity of representations are realized "concatenatively'':

    We... stipulate that for a pair of expression types E1, E2, the first is a Classical constituent of the second only if the first is tokened whenever the second is tokened. (1990, p.186, emphasis in the original)

    I will have more to say on what concatenation is later on. The important point for the moment is to see what F&M seem to commit themselves to: given that you have a formally specified representational system, one that apparently satisfies P, how it is to be realized (concatenatively or not) makes a difference whether or not the system does have a LOT architecture.

    Connectionists want to reject premise (vi) which is a conditional claim: if the cognitive architecture of connectionist models does satisfy P then they are LOT models. But, given that F&M take P to define classicism, it is reasonable to see the conditional claim as tautological. So, according to F&M, anyone who attempts to deny it must be confused about the dialectical situation in the debate. For F&M, the real issue would have to be whether any proposed model does indeed genuinely satisfy P, and not whether, if it genuinely does, it would be classical or not. Questioning the latter must not be open to connectionists and therefore not allowed. In short, the truth of premise (vi) must be non-negotiable for F&M.

    On the other hand, in the quotation I have just given, since they explicitly incorporate the concatenation condition into the conditions of being classical, they ought to think that if there can indeed be models that can satisfy P without concatenation, then they should count as non-classical. This is important because in a sense the real issue is the adequate explanation of the cognitive regularities and if they can be explained with non-concatenative, hence, non-classical, architectures that satisfy P, then connectionists have all they want, namely, to succesfully meet F&P's challenge.

    I am, however, utterly unhappy with the dialectics of the situation in the debate I have just reconstructed. I will argue that the very notion of LOT should not conceptually be tied to concatenation as F&M explicitly do. I think that the issues about concatenative realization are irrelevant to a proper minimalist understanding of LOT. In this vein, I will argue that both parties are wrong.

    Let me state in more explicit terms the picture I want to argue for. Here is how I take the LOT architecture to be:

    LOT

    Language of Thought. Any physically realized system of representations that satisfies P.

    The LOT Hypothesis (LOTH) then is the existential claim that there is such a system of representations realized in our brain. As just stated, I claim, the definition should be taken to be essentially indifferent to how P is satisfied. In particular, whether P is realized concatenatively or not is irrelevant to the truth or falsity of LOTH. So this definition of LOT architecture is a more abstract and general definition than the following two more specific ones:

    C-LOT

    Concatenative Language of Thought. Any physically realized system of representations that satisfies P concatenatively.

    NC-LOT

    Non-Concatenative Language of Thought. Any physically realized system of representations that satisfies P non-concatenatively.

    Accordingly, you can formulate the two corresponding more specific empirical hypotheses (i.e. C-LOTH and NC-LOTH) by existentially quantifying over such representational systems as realized in the brain. The logical and epistemic relations, then, among the three corresponding hypotheses should be obvious.

    Given these definitions, the picture I want to argue for can be put pictorially thus:

    The connectionists who took F&P's challenge seriously claim to have produced non-classical models that can explain the cognitive regularities. Their models, they claim, are non-classical because they satisfy P non-concatenatively. This means that they regard C-LOT as the "real'' LOT. Classical architecture essentially consists in satisfying P concatenatively. And therefore to the extent to which they can satisfy P non-concatenatively to that extent they meet the challenge and refute F&P's argument.

    On the other hand, F&M also seem to think that the "real'' LOT should be defined as C-LOT. So they seem to think that if there are connectionist models that satisfy P non-concatenatively then they must be non-classical. However, they are not much worried about the possible consequences of this view because they think that there is not only no actual non-concatenative realization of P but also there cannot be any such.

    The reason they give has to do with structure sensitive causal processing, which relates therefore to the satisfaction of P-b. They think that causal structure sensitivity for representational processes can be obtained only with concatenative realization of the syntactic constituents of structurally complex representations. In short, they think that connectionists fail to satisfy the antecedent of premise (vi): their models do not in fact satistfy P. I will come back to this issue later.

    For the moment, notice the modal force of their claim. The claim is not just that the extension of 'NC-LOT' happens to be actually empty. Rather, the claim is that it is necessarily empty: it is impossible that any object can be in its extension. Given the observation that the extensions of 'NC-LOT' and 'C-LOT' are mutually exclusive and jointly exhaustive as far as satisfaction of P is concerned, the modal claim about NC-LOT follows from another modal claim, namely, that the extensions of 'LOT' and 'C-LOT' are necessarily identical. In a nutshell, then, F&M's conviction is that concatenation is necessary to satisfy P (especially P-b).

    What is the nature of F&M's modal claim? I don't think that F&M take it as a logical necessity, i.e., that satisfying P logically necessitate concatenation. Rather it seems that they take the modality in question as a nomological or empirical necessity. In other words, they seem to think that it is an empirical truth that satisfying P concatenatively is the only way (i.e. necessary) to satisfy it at all.

    It is this conviction that seems to force classicists like F&M to make concatenation an essential part of the notion of LOT. What underlies this conviction is the idea that causal structure sensitivity nomologically requires the actual presence of syntactic constituents in the representations themselves. I will come to that below at greater length. I will argue that however useful, fruitful, and illuminating to ponder about the modal claim itself, it per se should not play any role in conceptually setting up what the notion of LOT essentially and minimally is. I claim that satisfaction of P in itself is enough for LOT no matter how it would turn out to be satisfiable.

    On the other hand, connectionists do, of course, reject the modal claim. In fact, they claim to have empirically refuted it by developing models that they claim fall within the extension of 'NC-LOT,' only that they call them non-classical (non-LOT) models of course.

    Here is my strategy in what follows. To the extent to which I will be arguing against classicists, I will not commit myself to any specific claim about whether the particular connectionists models we will see below actually satisfy P non-concatenatively or otherwise. My experience with classicists is that they tend to be very stubborn in their modal intuitions and don't want to see the proposed modals as satisfying P. So, understandbly enough, they don't want to give the antecedent of premise (vi) away to connectionists. I officially want to be agnostic whether the set, NC-LOT, in Figure 1, is empty while obviously, I believe, along with everyone else, that C-LOT, hence P (=LOT), is not empty. The reason I don't want to tackle this issue is because I believe I can make my point against classicists without tackling it.[12]

    All I need to have against the classicists is the epistemic claim that for all we know the modal claim could turn out to be false. This is different from saying that it is false. Again, for all we know, there may well be some deep metaphysical truths about our world that would physically necessitate concatenative realization of constituency relations among symbols in order to adequately satisfy P in its entirety. What I need is the claim that we don't know that, not yet anyway. And this is easier to get. For usually the modal claim is never argued for: it is question-beggingly taken as granted and self-evident by classicists. But, if anything, connectionists can at least be taken to have shown that we don't know whether the modal claim is true. So I am interested in the case that if the modal claim possibly turns out to be false, that would in no way show that the connectionist systems under considerations are not LOT systems. I want to free the notion of LOT from the notion of concatenation.[13]

    On the other hand, while arguing against connectionists, I will assume that they are right in that their models genuinely satisfy P. My aim is to show that they are still LOT models. This is the gist of the diagram in Figure 1.

    Let us first see, however, how the connectionists who attempt to meet the F&P challenge propose to satisfy P. This requires us to look into the nature of their models. It is impossible for me to do justice to the complexities of these proposals here. But fortunately we don't need to look at them in any careful detail since I will not be discussing whether they in particular are adequate to do the job. All I want to do therefore is simply the convey the flavor of their proposals, which will do for my purposes in this paper.

    Classicists who tend to view connectionist models as overly simplistic toy models compared to traditional AI models usually just refuse to take them seriously as such and refuse to speculate on them. As McLaughlin (1993a) puts it, in science "you have to deliver,'' waiving hands in a conceptual space or even vector space does not cut any ice. As far as the ultimate purposes of this paper are concerned, I have nothing against this kind of attitude as long as classicists grant me the more general point those "toy models'' raise. For the general point is a conceptual one about the proper understanding of LOTH. LOTH itself may be an empirical/scientific claim, but surely we have to know what exactly it says in the first place. To repeat, I intend to spell out the minimal truth conditions of LOTH. What follows is an attempt to unpack this intention as clearly as possible. For this, just conveying the spirit of the connectionist proposals will do, even at the cost of still more simplifying them.

    6. The Connectionist Proposals

    Since P consists of two parts, whether P can be non-concatentively satisfied may be discussed separately in relation to each part. As I said, the real motivation that underlies the classicist insistence on concatenation has to do with structure sensitivity, hence P-b. I will eventually come to that. For now, however, let us see how it might be possible to have non-concatenative satisfaction of P-a. This is, at any rate, necessary in order to evaluate the claims made about P-b. I will look into two proposals.

    6.1. Pollack's Recursive Auto-Associative Memory (RAAM)

    Pollack (1990) has developed a connectionist architecture for a class of networks that can recursively encode tree structures with a fix-valence to an almost arbitrary depth. Since tree structures can be used to describe syntactic or formal constituents of expressions, any complex representation whose constituent structure can be analyzed by tree structures can be encoded in such networks. Since my aim is just to convey the idea, let me describe how the RAAM works on an example.

    Suppose we want to produce connectionist representations of conjunctions. We can do so by using a RAAM network. RAAM networks use distributed representations, so we will need a set of vectors representing the atomic sentences which of course will correspond to activation patterns of some set of connectionist units in the network. In conjunctions, it is natural (but not necessary) to use 3-valence tree structures. So the basic architecture of the RAAM network we need will look like this.

    This is a feed-forward network: the activation spreads from the input units to the output units through the hidden units. Let us use 'A,' 'B,' 'C,' etc., to denote the vectors standing for atomic sentences. These will be fed to the units of Pool-1 and Pool-3, each vector to a single pool. These are the inputs to the network. The units of Pool-2, in each input cycle, will be fed by a constant vector, call it &-vector. This vector is what makes the network encode conjunctions: it is a conjunction-marker. When the vectors for A/&/B are fed to the input units, the activation will spread to the hidden units creating a distinct activation pattern that can be treated again as a vector, which will in turn activate the output units. The aim is that the network should produce the same exact pattern in the output units as that of the input units. This can easily be done by using simple learning techniques. Since this is an auto-associator, unsupervised back-propagation learning method is natural here. When the network in this way learns to auto-associate the vectors A/&/B in the correct order in its output units given the same input, we will have a distinct activation pattern in the hidden units: the vector corresponding to this is the distributed connectionist representation of 'A&B' compressed to one third of the entire original input and is non-concatenative, hence non-classical.

    Notice two points. First, since this representation is produced by a process of auto-association, when it is supplied to the hidden units of the network, it will uniquely decompose to its original constituents in the output units. Second, the compressed representation can be re-supplied as a conjunct to the input units of the network to produce yet another compressed, more complex conjunctive representation. This is the recursive aspect of the RAAM architecture. When the network is suitably organized and large enough, the same network can produce, decompose, and store a very large number of conjunctions of almost arbitrary complexity. Furthermore, and this is the crucial point, the same network can be used to compose and decompose in the same manner many other complex representations whose "logical forms'' are different. This can be done by replacing &-vector with other theoretically relevant vectors, for instance, with a suitably chosen v-vector, or even with a not-vector in which case one of the input pools will be supplied bay a "nil-vector,'' and so on.

    The RAAM architecture is still under development. Its prospects especially for natural language sentence parsing seem promising, hence it is natural to suppose that its impact on natural language processing as well as on any sort of analysis that requires variable binding in general will be significant.

    6.2. Smolensky's Tensor Product Representations

    In recent years, Smolensky[14] has developed a powerful connectionist technique for binding values to variables, all represented again by activity pattern vectors, hence using distributed representations. Although the technique, called 'tensor product variable binding,' is quite complicated, the idea behind it is simple. Let us use an example again.

    Suppose we want to produce a tensor product representation corresponding to the sentence 'John loves Mary.' Since this sentence can syntactically be decomposed into its constituents, we can work on its syntactic structure:

    {(John)NP[(loves)VP(Mary)NP]P}S

    The sentence (S) is first decomposed to a noun phrase (NP) and a predicate (P), then the predicate is decomposed to a verb phrase (VP) and a noun phrase. These can be taken to be syntactic "roles'' or "positions'' (variables) that needs to be filled by particular lexical items: in our case the "fillers'' (values) are 'John,' 'loves' and 'Mary'.

    Smolensky postulates a set of particular filler-vectors (these are the connectionist representations corresponding to lexical items), and a set of particular role-vectors for syntactic positions (for instance, an NP-vector, a P-vector, a VP-vector, and so on). If we want to bind a filler vector to a role-vector, say, the vector representing Mary to the NP-vector, we multiply the two vectors to get their tensor product, the result is the tensor product vector representation for 'Mary' in the NP-position. We then perform the same operation for 'loves' in the VP-position. We then superimpose the resulting two vectors (i.e., add the two vectors by simple vector addition) to get a new filler vector to be bound to the P-vector. When we do this we now have a tensor product vector for the predicate bound to particular values. After similarly binding (by tensor product operation) the vector representing John to the NP-vector, we can now get a single vector corresponding to the whole sentence by simply superimposing the two vectors, namely the NP-vector bound to "John'' and the P-vector bound to, as it were, "loves Mary.'' This is the compressed distributed connectionist vector representing the state of affairs [John loves Mary]. Also, under certain conditions, there is, Smolensky claims, a connectionist network which will uniquely decompose it back to its constituents.[15] This connectionist representation does seem to have constituent structure. But, again, it is non-concatenative. Notice also the recursive aspect of this technique: the tensor product vectors (vectors standing for roles bound to fillers) can be re-used as fillers to be bound to further roles, as we have just done in binding the P-vector. By using the same technique, we can still, of course, bind the whole vector corresponding to the sentence in question, for instance, to a left-hand-conjunct-vector (a role vector) in order to get a new vector representing the state of affairs, say, [John loves Mary and Mike hates John] and so on.

    There are other attempts to develop techniques for incorporating complex distributed representations into connectionist models.[16] All of them use a non-concatenative scheme to capture syntactically structured representations in the form of compressed vectors. And all of them are committed to distributed representations. In a sense, this is no surprise. For resources, especially the number of processing units, in connectionist networks are limited. For this reason, connectionists had to find out ways of using finite resources over and over again in a recursive fashion in order to handle, to a psychologically respectable degree, the problems posed by what prima facie seems to be recursive cognitive capacities.

    7. Formal Systems and their Instantiations

    F&M's (1990) article was a response to Smolensky's Tensor Product System. They accuse Smolensky of confusing two issues that need to be clearly distiguished. Namely, Smolensky, they claim, confuses the issue of a representation's actually having syntactic structure with the issue of a representation's representing syntactic structure. F&M claim that Smolensky's tensor product representations do only the latter; such representations do not themselves have any actual syntactic structure. This issue relates to whether there can be non-concatenative satisfaction of P-a. Even with respect to P-a, then, F&M seem to think that for genuine syntactic structure concatenation is necessary.[17] I believe that this claim (with respect to P-a) is false.

    I will argue against it by considering the minimal conditions that need to hold in order for a formal system to have a notation. Since there is a clear parallel between providing a notation and a physical instantiation in a machine (or, organism), my discussion will equally apply to effecting a physical instantiation mapping from an abstractly characterized formal system onto the states of a physically realized (computational) machine.

    What I've just said may at first sound strange but we need to remind ourselves that formal systems are abstract entities. By this, I simply mean that for their existence no particular notation is necessary. There is something about formal systems, in other words, that in some interesting sense transcends their notational realizations. There are many quite different kinds of formal systems. But the ones we are interested in are the ones whose structure conforms to P. Sentential Logic (SL) is a prime example of such a formal system. Let us work on the example it provides. Here is an abstract characterization of SL with only three logical forms.

    ABSTRACT CHARACTERIZATION OF SL

    I. There is a set of distinct atomic sentences in the language of SL.

    II. Formation Rules for sentences of SL:

    1. Each atomic sentence is a sentence;

    2. For any x and y, if x and y are sentences, then there are three (formative) operations N, C, D, such that N(x), C(x,y), and D(x,y) are (non-atomic) sentences;

    3. Nothing else is a sentence in SL.

    Remark 1: (Terminology) N(x) is the negation of x. C(x,y) is the conjunction of x and y. D(x,y) is the disjunction of x and y. x and y are called conjuncts in C(x,y) and disjuncts in D(x,y). Any output of any operation is a complex sentence. Any sentence that is an argument to any operation is a constituent of the output complex sentence. The sentences mentioned in I and II are sentence types.

    Remark 2: (Conditions on Formation Rules) The formative operations in (II.2) are such that: for any x and y, if x and y are sentences or 2-tuples of sentences,[18] then for any operation Q and R,

    (1) x not= Q(x);

    (2) x = y if and only if Q(x) = Q(y);

    (3) Q = R if and only if Q(x) = R(x);

    (4) Q not= R if and only if Q(x) not= R(y);

    (5) Q is an effectively computable function such that there is an "inverse'' operation S such that S effectively computes [x, Q] given Q(x).

    III. Transformation Rules:

    [DN1] Given the negation of the negation of any sentence, derive the sentence.

    [DN2] Given any sentence, derive the negation of the negation of the sentence.

    [CON] Given any conjunction, derive any one of its conjuncts.

    [ADJ] Given any two sentences, derive their conjunction.

    [ADD] Given any one sentence, derive any disjunction one of whose disjuncts is the given sentence.

    [DS] Given a disjunction and the negation of one of its disjuncts, derive the other disjunct.

    Etc.

    This characterization of SL is abstract in the sense that it is notation-free.[19] An indefinite number of notational schemes can satisfy this abstract characterization. Put differently, what makes indefinitely many notations equivalent (hence, notations of SL) is the existence of systematic ways of satisfying the above abstract characterization. So, let us first begin by characterizing what it takes to provide a notation for SL.

    Providing a specific notation for formal systems generally proceeds through two major phases. The initial phase is to concretely specify the atomic symbols, in the case of SL, the atomic sentences. How is this done? It is done by producing the identity criteria for the atomic sentence types. And this, in turn, is done by supplying a token for each type with the hope that the tokens will give enough idea of what the types are. Tokens are physical entities, as such they have certain physical properties. By providing tokens for each type, we in fact try to indicate that certain physical features of the tokens are what makes the tokens tokens of a certain type. Thus we use tokens as identifying examples of their types. In other words, we identify the primitive types by ostension. Here is how it would go:

    Atomic sentences of SL: 'A,' 'B,' 'C,' 'D,' ...

    The point to emphasize here is that in providing a notation the atomic symbol types are individuated by certain sets of (quasi-)physical properties of their tokens. Any token produced to satisfy a certain set of physical properties, say, a certain shape, is a token of a particular atomic symbol type. This is what it is to provide identity criteria for atomic symbol types.

    The second phase in providing a notation is to specify the formative operations concretely. Since the formative operations are what define the syntactic constituency relations among symbols, what needs to be specified concretely is, as van Gelder (1990) names it, "a mode of combination'' for symbols.[20] This mode of combination must not only satisfy the conditions in Remark 2 but also reflect their recursive character. Here is a standard example that does both in the case of SL:

    Operation N: N(x) = '~' ^ 'x'

    Operation C: C(x,y) = '(' ^ 'x' ^ '&' ^ 'y' ^ ')'

    Operation D: D(x,y) = '(' ^ 'x' ^ 'v' ^ 'y' ^ ')'

    where x and y are any (atomic or molecular) sentence and '^' is meant to be the concatenation symbol. Now that the atomic symbols are concretely specified in the way indicated above, any substitution instance of the formation operations so specified will now give us a (syntactically) complex sentence. Also, notice that since we now have the concretely specified modes of combination, we have two kinds of individuation criteria: one for the particular sentence types with which we can distinguish, for instance, between '(A&B)' and '(C&D),' and one for the logical type (form) of sentences with which we can distinguish between negations, conjunctions, and disjunctions, e.g., between '(A&B)' and '(AvB)'.

    Clearly, this standard scheme just indicated does satisfy the conditions specified in Remark 2. The significance of these conditions, among other things, is that they ensure the uniqueness of the output of operations given distinct input and the constancy (or the sameness) of the output given the same input. Compliance with condition (5) guarantees two things: the procedures for forming a complex sentence and then decomposing it back to its constituents (thereby making its logical form explicit) are reliable and mechanically realizable. In short, what hese conditions together guarantee that the mode of combination for symbols is effective, general, and reliable.[21] They are effective in the sense that they can be mechanically implemented. Furthermore, for each complex sentence there is an effective procedure that decomposes it into its original constituents and specifies the logical form of the complex sentence. Both the operations themselves and the decomposing procedure for each of them are general in the sense that they apply in composing and decomposing arbitrarily complex sentences. They are reliable in the sense that they always produce the same result. This is in general then what it means to specify concretely the mode of combination.

    The notational scheme I have just provided is more or less the standard one. But in fact there are indefinitely many others. Almost all the familiar notational schemes use what is called a concatenative mode of combination in their concrete specification of the formative operations. Let me be more precise:[22]

    A mode of combination is concatenative if, and only if, when a syntactically complex symbol is tokened, some aspects or features of it satisfy the individuation criteria for typing all the token syntactic constituents of it.

    Intuitively, in concatenative schemes, any token of any complex symbol type contains, literally and explicitly, the tokens of its proper constituents, such that when a token of the complex symbol is produced, the tokens of its constituents are produced too. As we have seen, defined this way, concatenation is what F&M mean by "classical constituent'' (see the quotation above). For instance, certain (spatial) parts of the token '(A&B)' satisfy the individuation criteria for its constituents, namely 'A' and 'B,' which are given in the first phase while concretely specifying the atomic sentences of SL.[23]

    Effecting a concatenative instantiation of SL is only one way and not necessarily the only way of satisfying conditions in Remark 2. True enough, it is the most practical one. But, in principle, there is no theoretical difference between a concatenative and a non-concatenative instantiation schemes, in so far as the scheme satisfies the conditions in Remark 2. These conditions put no requirements on whether the instantiation be a concatenative, or for that matter, non-concatenative one. We may think of the operations as simple input/output devices or little black boxes, so that when you supply the inputs they output further complex sentences. The only contraints on the devices is that they should comply with the conditions of Remark 2.

    As noted by many people, one good example of a non-concatenative instantiation scheme is the Goedel numbering procedure used in encoding the expressions of a formal language. This procedure uses a quite effective method to assign to each well-formed expression of a formal language a unique natural number. And it does this in a recursive manner. First, a distinct natural number is assigned to each of the primitive expressions of the language. Then the formative operations are specified by a distinct set of prime numbers. When the numbers standing for expressions are supplied, the operations produce (by using certain simple mathematical operations on both the supplied numbers and the prime numbers characteristic of each formative operation) a unique natural number standing for a complex expression whose constituents are the initially supplied number-coded expressions. (We may think of the "boxes'' or I/O devices, which concretely specify the formative operations of SL, as embodying the necessary operations over numerals.) Using terms like 'expressions encoded in numbers' might make the Goedel numbering scheme appear as somehow parasitic upon concatenative schemes. But this is not necessary. You can think of the symbols of the language as consisting solely of numerals, and the operations of the formal language as operations over these numerals. What is truly remarkable about "Goedelese'' is that thanks to the theorem of prime decomposition there is an effective decomposition procedure by means of which we can uniquely recover the constituents of any given complex Goedelese expression and also identify its logical form. Goedelese is not a concatenative scheme: complex Goedelese expressions, when tokened, do not literally contain the tokens of their constituents.

    Other examples of non-concatenative instantiation schemes seem to be provided by the kind of connectionist representational schemes we have seen above. Let us take up Pollack's RAAM, and think of the concrete specification of formative operations C, D, and N as the specification of little boxes or I/O devices. When we specify concretely the mode of combination, we in fact provide a specification of the internal workings of these boxes. In this vein, we may think of the RAAM network as the concrete embodiment of these devices. You supply connectionist distributed representations as input, the device outputs a complex representation. For instance, think of Pollack's RAAM architecture as a concrete specification of Operation C when we supply &-vector to Pool-2, or Operation D when we supply v-vector, or Operation N when we supply ~-vector (with the nil-vector). So the question becomes: Do the suggested modes of combination satisfy the conditions in Remark 2? The answer to that question seems to be: yes, they do. Pollack, it appears, does exactly what is needed to be done in satisfying the abstract characterization of SL. He first concretely specifies the atomic sentences by providing individuation criteria for them in just the required sense. He indicates what count as the atomic sentences. They are concretely specified vectors. The second phase is completed by concretely specifying the mode of combination recursively defined over these. And this is the RAAM network itself, or its complete mathematical description thereof in terms of vector algebra.

    As we have seen, what is essential in the concrete specification of the formative operations is that the conditions in Remark 2 be satisfied. Pollack's way of specifying the mode of combination concretely seems to satisfy these conditions in a non-concatenative way: Given the input, the production of the output is general, reliable, effective and unique. Furthermore, the output is uniquely decomposable into its constituents again in a general, reliable and effective way. In fact, the case here is parallel to Goedel numbering scheme. If the latter satisfies P-a, so does, it appears, the connectionist techniques we have seen.

    Let me briefly recapitulate. Formal systems are abstract systems. There are certain conditions that need to be met by any concrete instantiation (notation or physical realization) of a formal system. I spelled out what those conditions are by using the example of SL. These conditions do not differentiate between concatenative and non-concatenative instantiations of formal systems. Therefore, if one scheme is a genuine instantiation of a formal system, so is the other. A fortiori, if a complex representation belonging to one scheme does genuinely have a syntactic structure and constituents --as opposed to representing the structure-- so does the one belonging to the other scheme.

    In other words, there is no theoretical basis for making the distinction and then, as F&M do, accusing the connectionists of confusing the two. The only basis for the accusation is a question begging one: Simply assume that all and only genuine concrete instantiations of formal systems are concatenative ones. It follows from my analysis, however, that whatever grounds we have for viewing a particular concatenative instantiation as a genuine instantiation of an abstract formal system, the very same grounds equally hold for the non-concatenative schemes. Hence, as far as P-a is concerned, F&P cannot have any good argument for pressing that concatenation is necessary for genuine concrete instantiations of formal systems. Of course, as I said, their real reason for pressing on that is structure sensitive processing. They think that structure sensitive processing requires concatenation, to the discussion of which I am about to return.

    One final point. If what I have said is right, then connectionists are in a position to explain formative systematicity quite adequately, since, as I pointed out, explanation of formative systematicity requires only P-a.[24]

    8. Connectionists on Structure Sensitivity

    Can connectionists explain inferential systematicity with the kind of models we have seen? The answer to this question depends on the answer to the following question: How can non-concatenatively structured connectionist representations engage (causal) structure sensitive processes? In other words, can connectionist models genuinely satisfy P-b?

    The general consensus seems to be that if connectionist models using non-concatenative compositionality have to first decompose the compressed complex representations back to their constituents, thereby making their logical form available, in order for the structure-sensitive processes to operate on them, then the models are rightly to be called LOT models.

    Many connectionists,[25] however, have proposed that connectionist models using some non-concatenative composition technique can directly process structurally complex representations in a structure-sensitive way without first decomposing them into their constituents, i.e., they can operate on non-concatenatively compositional representations holistically, as it is called. And, it is claimed, it is this feature of connectionist models that makes them at bottom truly and radically non-classical.

    In their reply to Smolensky, F&M seem pretty confident that structure sensitive processing, hence inferential systematicity, can only be guaranteed in a concatenatively realized scheme. Here is their "argument'':

    The relevant question is ... whether [tensor product representations] have the kind of constituent structure to which an explanation of [inferential] systematicity might appeal. But we have already seen the answer to this question: the constituents of complex activity vectors aren't "there,'' so if the causal consequences of tokening a complex vector are sensitive to its constituent structure, that's a miracle. (1990, p.200)

    As I said, it is not clear what the nature of the claim is. It seems that F&M take it to be a self-evident empirical truth. Here is how it could be false.

    8.1. Chalmers' Experiment

    Chalmers (1990a) has conducted a toy experiment which shows in a nice and compact way how connectionist models might be able to handle syntactic transformations by operating holistically on connectionist complex representations. First, by using a RAAM architecture exactly similar to the one I used above, he encoded 125 active English sentences that are permutations on 5 proper names and 5 transitive verbs, and their passive forms, totaling 250 sentences. Chalmers then trained a simple three-layered feed-forward network to associate 70 active sentences with their passive forms (see Figure 3). Then, when he supplied the remaining 55 active sentences to the network one by one, the network produced their proper passive forms. Since these were in compressed form like the active ones used as inputs, he then supplied the outputs to the decomposing network. All of them correctly decomposed to their constituents in the right order. The success of the generalization of the network was 100%. He experimented also with first supplying the compressed passive sentences into the transformation network to get their active form. The results were equally successful. Chalmers claims that these results experimentally refute the arguments given by F&P&M.[26]

    There are various important points about holistic connectionist processing that come out nicely in Chalmers' experiment. The most important one is the astonishing success rate of generalization of the transforming network. Let me emphasize what we have here. We have a bunch of compressed vectors that are the connectionist representations. These representations are furthermore non-concatenatively complex. They have syntactic structure, as we have seen, in one well-defined sense. The transforming network is trained to process these in a certain way that is determined by an abstract (interpreted) formal/syntactic system. When the training is complete, the network acquires a general capacity to transform similarly structured representations in the appropriate way. The success of the network's generalization over vectors for which the network is not trained is clearly not accidental. This quite robust generalization rate of the network seems to make a strong case for the claim that structurally similar connectionist representations are processed in similar ways, i.e., as their logical form requires.[27] The generalization success of the network makes it quite clear that structure sensitive processing is non-accidentally obtained for all the structurally similar complex representations: i.e., nothing similar to look-up tables or brute force storing exists. Clearly the network somehow learns to detect the form of the complex representations supplied and process them as their form requires. What does this mean? Well, it just means that non-concatenatively complex representations can be processed in structure sensitive way just as P-b requires. The reason I am using 'can' is that in Chalmers experiment we have only a fragment of a possibly integrated connectionist system. It is reasonable to expect that more serious models will have more integrated and complex architectures consisting of many subnetworks. Chalmers experiment shows some of the basic principles about how some connectionists propose to handle structure sensitive processes.

    If we take 'structure sensitivity' in some pretheoretic (uncorrupted, natural) sense, we really do seem to have genuine structure sensitivity here. How is this structure sensitivity obtained, if not by miracle? Anyone with a bit of knowledge of matrix algebra can guess how it works. I cannot go into a detailed mathematical analysis of the network here, but I can convey the idea which is in fact quite simple. What Pollack's network does is to locate all the vectors with identical "logical form'' into a more or less homogeneous subspace in the multidimensional vector space defined for the network. In other words, the encoding of structurally similar representations proceeds by grouping them in one region of the high-dimensional vector space. That is the point of training the RAAM network. It is trained to locate, for instance, all the conjunctions in a particular subspace. That a certain vector is located in that subspace is in a certain sense the determinant of its form. And Chalmers' transformation network learns to treat vectors located in that subspace all in a similar fashion. The hidden units of the transforming network learns to detect the "shape'' of the complex input representation as located in the multi-dimensional subspace reserved for, say, passive sentences, or conjunctions, etc., and treat them accordingly as it is taught to do. That is how it succeeds in generalizing over vectors for which it is not trained. Of course, this is no surprise, since what connectionist networks are particularly good at is exactly to map one vector onto another in any way you like. When this process is regimented through training according to whatever transformational regularities are to be obeyed, what you get is the holistic processing of complex representations according to their non-concatenatively realized syntactic structure. In fact, when a cluster analysis is performed on the hidden units of the transforming network, it can be seen that they divide their space and group the incoming patterns exactly according to the subdivisions of the encoding network, i.e. according to the distinct logical forms of representations.

    The point I want to emphasize here is that the transformational ("computational'') profile of a complex connectionist representation is determined by its location in the vector space reserved for those kinds of representations (e.g., conjunctions or active sentences, etc.). And this in turn is determined (within the context of an already trained network) by the specific numerical values of the vectorial representations at specific positions. There is a clear sense in which this is the "shape'' of this kind of representations made computationally relevant, i.e., their particular shape determines their processing profile, and determines it causally -- if the network is physically realized. It is important to keep this in mind because this will ultimately constitute the sense in which they are instances of LOT schemes at a suitably abstract level of understanding of what the LOTH minimally requires for its physically realized specific architectures.

    9. Structure Sensitivity and LOT: The Connectionist Contribution

    Chalmers claims to have empirically refuted F&M's modal claim that structure sensitivity can only be obtained with concatenation. Hence he claims to have provided a radically different alternative to the LOT paradigm. Can he be right? Hard to say.

    Part of the reason why we do not know how to settle the epistemic issue of whether the modal claim is true is that we do not know what exactly the truth-conditions of the claim are in the first place. Let us then get clearer about what structure sensitivity comes to.

    We have to be very careful about distinguishing between two levels at which we may understand 'syntactic structure,' because it is in fact precisely in virtue of this two-level picture that formal systems are so important in the study of cognition.

    At one level, we may conceive the syntactic/formal structure qua physically realized in representation tokens. P-b requires syntactic structure at this level. In other words, P-b requires causal, and not "logical,'' structure sensitivity. This is the concrete sense of syntactic/formal structure. On the other hand, we may understand 'structure' more abstractly as, for instance, required by the abstract characterization of SL; i.e. at a level where no committment to how it is to be concretely realized has yet been made.

    This distinction is important, since it is precisely because syntactic structure abstractly understood can be exhibited or realized in concrete physical structure that we can bring abstract logical/semantic relations down to earth and make them subject to causal/physical processes. In other words, it is only to the extent to which we have a formally/syntactically regimented semantic domain that we can see how semantically coherent behavior can be obtained in a thouroughly physical/mechanical medium. The key to this feat is the two-level picture of syntax.[28]

    The essential trick is to get causal or nomic properties of certain physical states correlated in a systematic way with the syntactic/formal properties postulated in the formalization of a semantic domain. If you choose the physical medium and its causal properties rightly, then you can make the causal processes of the physical system mirror the application of formal transformation rules to physical states now individuated as symbols. Put differently, certain causal state transitions of the system will occur only if certain rightly chosen physical conditions hold within the system, such that these causal transitions will be described as following (or, obeying, governed by, whatever) those formal rules specified in the formalization. In such a scheme, state transitions of the system (now interpretable as transformations on representations) will not be a miracle, however complexly structured the representations are. In a nutshell, the point that lies behind the LOT paradigm is to design a physical device such that some of the causal or nomic properties of physical symbol tokens become the Computationally Relevant Properties (CRPs).

    I believe that ignoring (or at least not being clear about) this two-level picture of syntax has been at the heart of a lot of confusion about the nature of syntactic properties and the role they are supposed to play in the Computational Theory of Mind (CTM). Consider the following two claims often made in the computationalist literature in the same breath without any warning as if they can both be true at the same level:

    (S1)

    The syntactic properties (or, form) of a complex symbol are (metaphysically) determined by its computational (causal/functional) profile.

    (S2)

    The computational (causal/functional) profile of a complex symbol is (metaphysically) determined by its syntactic properties or form.

    Clearly these two claims can't both be true in any interesting sense at the same level. However, they are both true and often claimed to be so without any clear indication about their status. How is this possible? The answer would remain a mystery without the two-level picture of syntax I described.

    The sense in which (S2) is true is the sense in which syntactic properties are conceived qua realized or implemented in a physical/computational medium. In other words, when we talk about syntactic properties of symbols as determining their causal/functional role we are talking of them under a hypothesized instantiation mapping, i.e., qua mapped onto some physical state whose quasi-physical features are what makes a state token count as a symbol token of a particular type. The syntactic properties in this sense are the computationally relevant physical properties that drive the state transitions of the physical system. Indeed this is the sense in which we connect the causal properties of symbols with their semantic properties, i.e., through syntax, so that when viewed from a certain perspective the state transitions of a system are revealed as semantically coherent. Exactly this point is what is supposed to be philosophically so exciting about the Computational Theory of Mind.

    Here is how Fodor appears to make much the same point:

    You connect the causal properties of a symbol with its semantic properties via its syntax. The syntax of a symbol is one of its higher-order physical properties. To a metaphorical first approximation, we can think of the syntactic structure of a symbol as an abstract feature of its [geometric or acoustic] shape. Because, to all intents and purposes, syntax reduces to shape, and because the shape of a symbol is a potential determinant of its causal role, it is fairly easy to see how there could be environments in which the causal role of a symbol correlates with its syntax. It's easy, that's to say, to imagine symbol tokens interacting causally in virtue of their syntactic structures. The syntax of a symbol might determine the causes and effects of its tokenings in much the same way that the geometry of a key determines which locks it will open. (1987, pp.18-9)

    Accordingly, that the brain is such a physical environment is, roughly, the LOT hypothesis. Fodor is not very clear in this paragraph about the two-level picture of syntax. It is, however, clear that he is hesitant in identifying syntax directly with physical properties tout court even though what he says in the last two sentences indicates that it is syntax in the (S2) sense that he has in mind.[29]

    The sense in which (S1) is true is the sense in which syntactic properties are multiply realizable, i.e., qua conceived at an abstract level from which there are indefinitely many mappings onto the states of physical systems. What ultimately guides this mapping, of course, is in some loose sense whatever is captured in the formalization of a semantic domain. In other words, since to regiment the semantic coherence of representational processes in terms of syntax is just to try to capture in non-semantic terms the role that representations play in the economy of thought processes, the syntactic properties postulated ipso facto mimic the semantic properties of representations. But this is to say that the syntactic properties are those properties that make the representations play a certain role, however they are realized. But once the semantic domain is formalized, this role is captured and type-individuated by the syntactic transformational rules like the ones specified in SL above. Hence the sense in which (S1) is true is given by the fact that syntactic properties are said to be whatever properties that make the physical symbol tokens behave in the system the way they do.

    What is absolutely crucial is to notice that neither Fodor in the quotation above nor I in the previous several paragraphs have said absolutely anything about concatenation, i.e. about how exactly the syntactic properties need to be physically exhibited or realized in order to achieve causal structure sensitivity. But this is not surprising. For, it should by now be obvious that all that is essential for obtaining causal structure sensitivity is to design systems with their appropriate CRPs, whether or not they are the properties arising out of a concatenative realization of syntactic structure.

    Viewed this way, there is a clear sense in which whether or not the CRPs of physical symbol tokens can be obtained through concatenation is an engineering problem, much below the level at which the truth of LOTH must be evaluated. Now, there is nothing inappropriate in, being unconvinced by the prospects of non-concatenative connectionist models, having a very strong hunch about how things at this engineering level will empirically turn out to be. F&M's modal claim could then be taken to be an expression of this hunch: only the properties of concatenatively realized constituent symbolic structure can be the CRPs in any physical system that exhibits the cognitive law-like regularities.

    But this is quite different from making an essentialist claim about how the LOTH itself should be understood. F&M make such a low-level engineering requirement as concatenation a defining feature of the very notion of LOT. This is unnecessary and unmotivated in so far as we want to be very clear about the minimal truth-conditions of LOTH. And, I believe, it also puts the LOTH dangerously close to being empirically refuted, as is claimed by Chalmers (1990a, 1990b), especially given that connectionist research seems to have shown that concatenation may not be empirically necessary to satisfy P in its entirety.

    On the other hand, given the foregoing discussion, let me elaborate a bit more on why the connectionist models under consideration here count as belonging to the LOT paradigm to the extent to which they can support structure sensitive processing.

    Complex connectionist representations carry the information of their own syntactic structure, but differently than the way their concatenatively realized counterparts carry it. For holistic operations on such compressed representations to be general and reliable, the compressed (implicit) syntactic structure of the representations should be available to the processing network which does the holistic transformations. The only way this can be done is by picking out some CRPs of the compressed representational vectors that are to be fed into the network. What are they? As we have seen, these CRPs are given in the specific patterns of activation values of the units of distributed representations that determine their location in the vectorial space according to their logical form. Whatever specific values of such CRPs are, it should be clear that all that is needed is some such features of the vectors that will -if the network is physically realized- causally effect the processing of the network in a systematic and desired way. These properties constitute, in some well defined sense, as I have argued above, the "shape'' of the connectionist symbols that would causally determine their computational profile, just as Fodor himself says (see the quotation above). Their shape is indeed radically different at some level of analysis from the "shape'' of concatenatively realized symbols of conventional von Neumann style computers. But from the perspective of a properly understood LOT, they should all count as symbols in LOT, and the processes are properly called symbolic processes, because what counts is the reliable transformation of representations themselves: as long as representations are reliably handled in the desired way, any physical medium with its appropriate CRPs would in principle do from the formalist perspective.[30]

    Now, in the light of this, consider the following "argument''. If there is no CRP involved in the actual process of holistic connectionist transformations, then the reliability of a network with which it systematically generalizes for structurally similar new inputs is a miracle. If there is such a property, however, then holistic transformations on compressed representations are simply a new and, I submit, a very exciting way of realizing formal transformations, since the behavior of the network can then be described as obeying some abstractly specified formal rules.

    But, of course, successful holistic processing, as I tried to briefly and informally describe above, is not a miracle. In fact, all the heavy mathematical wizardry of connectionists is in the process of finding such properties that are increasingly more powerful. The analyses are at the level of what is sometimes called "subsymbolic'' processing (this is, in fact, also true in designing concatenative machines), but the explicit aim in such analyses is to secure powerful and adequate symbolic processing capable of explaining exhibited cognitive regularities like systematicity at the cognitive or representational level, and explaining them essentially by satisfying P.[31]

    I do not mean to downplay the importance of non-concatenative connectionist models by saying that they are ultimately LOT models. On the contrary, I want to view them as very important and in many ways quite exciting contribution to the LOT paradigm. True enough, so far LOT models have always been identified with computational architectures that use concatenative representational schemes. But, if connectionists are right about the possibility of satisfying P non-concatenatively, then we should treat this finding as a significant contribution to the proper understanding of what the LOT architecture essentially involves: concatenation is not necessary to satisfy P. In other words, if connectionists are right, then what we have is not a radically different paradigm threathening to overthrow the LOT paradigm, but rather a radically different way of being a LOT model.[32] Why is this important in a way that goes beyond a verbal point? Let me try to explain.

    Fodor had once identified "three great metaphysical puzzles about the mind'':

    How could anything material have conscious states? How could anything material have semantical properties? How could anything material be rational? (where this means something like: how could the state transitions of a physical system preseve semantical properties?). (1991, Reply to Devitt, p.284)

    He took the LOT story to solve the third one. The Computational Theory of Mind offers a naturalist solution to the problem of explaining how thinking understood dynamically as thought processes is possible assuming that thoughts are already intentional states.[33]

    Fodor's reasoning was something like this. Modern logic has taught us that the behavior of semantic properties can be studied non-semantically, i.e. proof-theorectially, where this meant roughly, syntactically. And the rise of modern computers has shown that whenever the behavior of any semantic domain can be formalized, i.e., syntactically captured, we can build physical devices, called computers, which would exhibit the same behavior, i.e. devices whose state transitions would mimick the behavior of the semantic domain.

    Since Turing, it so happened that all the interesting physical computers we have actually built or designed happened to use concatenative symbolic schemes. Connectionists' contribution then might be seen to lie in the fact --if it is a fact-- that this was a historical accident and there was nothing metaphysically necessary about it. This could hardly be a trivial result.[34]

    What I am suggesting is that the historical association of the LOT architecture with the kind of concatenative machines traditionally used in AI may have conditioned people to think of the LOT paradigm always in these terms, namely essentially requiring a concatenatively realized symbolic language. What I am urging therefore is that this is not essential about LOT.

    Perhaps the best way to see that the very notion of LOT (hence the Computational/Representational Theory of Mind) cannot conceptually be (and should not have been) tied to concatenation is to consider the arguments historically offered for LOTH. They all abductively justify the postulation of a representational system that essentially satisfy P. They don't justify any futher claim about how exactly P must be satisfied.

    It was the need for an adequate explanation of a certain set of empirical phenomena, namely the law-like cognitive regularities like systematicity that motivated to postulate a LOT in the first place. But when we see that the explanation essentially draws only on satisfying P and not on any particular way (like concatenation) of satisfying it, insisting that the notion of LOT should essentially involve concatenative realization of P becomes unmotivated, because the very reasons that have historically prompted to postulate a LOT do not in themselves justify any further and stronger claim how to physically realize it.

    That is why connectionists could claim to be able to explain systematicity: they claim to have satisfied P in their non-concatenative models. In other words, when it comes to the explanation of the cognitive regularities, what is doing the work is solely the satisfaction of P, and not any particular way of satisfying it. I am claiming that this is the reason why tying the notion of LOT essentially to the concatenative satisfaction of P would be unjustified: no arguments that have prompted to postulate a LOT in the first place could underwrite any further claim about how to satisfy P, and if so, no further and stronger claim should be made about the essential nature of LOT.

    We have to be very clear about this because there are all sorts of people out there who are insistent that connectionism is about to overthrow the classical symbolic paradigm. For instance, van Gelder writes:

    It is possible to demonstrate that the classical paradigm is deeply committed to representations having concatenative compositionality. In fact, this is typically just what a classicist means when she claims that a representation is "syntactically structured'' ... This commitment to concatenative structure can be demonstrated in many ways: by appeal to authority (i.e., by reference to the canonical statements of the classical approach); in theory (i.e., by showing the crucial theoretical role that concrete constituency relations play in the Classical explanation of cognitive processes); and by simply pointing to Classical practice, which has always been such as to utilize concatenatively structured representations. (1991b, p.365)

    The point of his article is that connectionists with their new models pose a real threat to classicism: they are in a position to explain the cognitive regularities without becoming "implementational'' since they don't utilize concatenative complex representations.[35] He even goes so far as to claim that we are possibly witnessing a paradigm shift in cognitive science.[36] As to his "demonstration'' that the symbolic LOT paradigm is essentially tied to concatenation, I have almost nothing to say: Arguments from authority and simply pointing to the classical practice, as I have been arguing, don't demonstrate anything, they simply beg the question against me. Similarly for the "classical theory'': if "Classical explanation'' is defined as one in which concatenatively realized constituency relations are implicated in the explanation of cognitive processes, then, of course, you can show the "crucial theoretical role'' they have in classicism. But, again, I don't think that concatenatively realized constituency relations play any "crucial role,'' let alone a "theoretical'' one, if we understand classicism as not committed to concatenation; i.e. once you reject the definition. Whatever "crucial'' role concatenative implementation have so far played in actually building machines and in the imagination of actual AI researchers, concatenative implementation is a matter at the level of computer engineering, and as such it is not, and ought not to be, a theoretical necessity for classicism as long as we have the explanation of cognitive regularities that essentially draws only on satisfying P, and on nothing more concrete and specific.

    Consider again:

    ...talk of "sentences'' in the brain mustn't be taken on the model of sentences as they are inscribed on pages of books. One natural objection to the proposal --"Sentences in the head??!''-- is due to an overly concrete conception people often have of sentences. Sentences, it must be remembered, are highly abstract objects that can be entokened in an endless variety of ways: as waveforms (in speech), as sequences of dots and dashes (Morse code), as sequences of electrically-charged particles (on recording tape). It is presumably in something like the latter form that sentences would be entokened in the head ... Indeed, [the Computational/Representational Theory of Thought] is best viewed as simply the claim that the brain has logically structured, causally efficacious states. Surely, that isn't patently absurd. (Loewer & Rey, 1991, p.xxxiii)

    Of course, it is not patently absurd. But, after making this point, what would be absurd is go on and insist that sentences must be entokened in the brain concatenatively, or else there is simply no LOT, especially if non-concatenative compositionality have promising resources to explain those cognitive regularities that, let me say again, had prompted the postulation of LOT in the first place. It is of crucial importance to be clear about this point. For, again, there are people out there who are even ready to quickly infer the demise of folk psychology from the falsity of LOTH:

    [Dennett and I] both accept the premise that neuroscience is unlikely to find "sentences in the head,'' or anything else that answers to the structure of individual beliefs and desires. On the strength of this assumption, I am willing to infer that folk psychology is false, and that its ontology is chimerical. Beliefs and desires are of a piece with phlogiston, caloric, and the alchemical essences. We therefore need an entirely new kinematics and dynamics with which to comprehend human cognitive activity, one drawn, perhaps, from computational neuroscience and connectionist AI. (P.M. Churchland, 1990, p.125)

    Churchlands, of course, are the extreme case. But there are many other people with, more often than not, eliminativist tendencies who have an overly "concrete'' conception of what needs to be the case for the LOTH to be false.[37] They simply refuse to take seriously the model of the brain as a concatenative von Neumann style computer,[38] and tend to embrace connectionism as a softer, biologically more realistic, "radical'' alternative to LOTH. From a biological and evolutionary point of view, there may well be something to this intuition, if the alleged biological realism of connectionist models is even slightly true, but to infer from this that LOTH is bankrupt is a too extreme view, as long as we have the phenomena of the cognitive regularities to explain and explain them essentially by drawing on P.

    Here is still another way of seeing the rationale that underlies my insistence. From a cognitive psychologist's view-point, what is important in this context is the formal specifiability of proposed cognitive models. The reason for this is, I take it, the main rationale behind classicism: to evade homunculi needing cognitive capacities and processes. For we know that if there is a formalization of a capacity or a complex set of processes, then there is a physically realizable mechanism that has the same capacity or carries the same processes. That is the way to exorcise homunculi. This puts the necessary theoretical pressure on the working psychologist that is much needed from a scientific methodological perspective. Such psychological models usually take the form of competency theories that postulate a recursively specifiable system of representations (P-a), and then define the cognitive processes over these representations in a way that, if the theory is true, would ideally explain the exhibited cognitive regularities in its domain.[39] But from the perspective of the competency theorist, what is theoretically important is the existence of at least one abstract formal structure that could capture what seems to be a competence capacity. Now, of course, this is not enough, the proposed model should also be psychologically realistic. But let us leave this aside for the moment.

    The theorist can use any notation or (programming) language that she thinks adequate for the description of the abstract formal system she postulates in the explanation of the cognitive competency in question. What she is not theoretically required to do is specify how the postulated structure of representations will map onto brain structures, and how the processes defined over them will specifically accomplish their structure sensitivity. Within current cognitive psychology, there is a clear sense in which this is the job of the implementation theorist. The classical cognitive scientist sort of lives with the hope that neuroscientists will eventually vindicate the picture of the cognitive mind as (interpreted) formal symbol manipulator by showing how the brain does manage to implement formal systems. Now it should be clear that if the brain turns out to implement, if at all, formally specified competency systems non-concatenatively, this would in no way show that there is no LOT. All it would show is the non-concatenative realizability of P. But LOTH per se is in principle neutral between concatenative and non-concatenative realizations. My claim is that this is indeed as it should be: to make an engineering feature like concatenative realization definitive of LOTH, as F&M do, would be to miss the essential idea behind the multiple realizability of abstract formal systems.

    Given what we currently know about the brain, we simply do not know whether NC-LOTH or C-LOTH is true, my claim is that we do not have to know in order to agree on what is minimally required for LOTH to be true.

    It is surely true that LOTH is empirically weaker than NC-LOTH and C-LOTH. But LOTH has still plenty of empirical content especially when considered in its historical context, i.e. vis-a-vis its theoretical rivals like mentalistic associationism and eliminativist behaviorism, or even vis-a-vis any theory that is not committed to there being any syntactically complex representational brain states but that nevertheless aims to explain the same range of empirical cognitive/behavioral phenomena. The Representational/Computational Theory of Mind is what has been at the foundational core of the the so-called Cognitive Revolution in psychology. It is therefore absolutely essential to be clear about what it is and it is not committed to.

    I conclude that to the extent to which they can satisfy P, the models that are being developed by connectionists who took F&P's challenge seriously are still LOT models, however new and potentially exciting ones they might be at that, when the notion of LOT is rightly understood. Hence, their rejection of premise (vi) of F&P's argument fails. However, F&M are mistaken too in their insistence on the alleged necessary connection between concatenation and LOT. Defending LOTH does not require and ought not to be tied to such a strong and unnecessary feature like concatenative realization of P. Reminding us of this, if nothing else, is the connectionist contribution as far as the proper understanding of the very idea of LOT is concerned.

    11. A Curious Objection

    F&M, towards the end of their paper (1990) take up one issue apparently brought out by a reviewer that is directly relevant to our discussion so far. The reviewer asks:

    ...couldn't Smolensky easily build in mechanisms to accomplish the matrix algebra operations that would make the necessary vector explicit (or better yet, from his point of view, ...mechanisms that are sensitive to the imaginary components without literally making them explicit in some string of units)? (F&M, 1990, pp.201-2)

    To which F&M respond in the following way:

    But this misses the point of the problem that systematicity poses for connectionists, which is not to show that systematic cognitive capacities are possible given the assumptions of a connectionist architecture, but to explain how systematicity could be necessary --how it could be a law that cognitive capacities are systematic-- given those assumptions.

    No doubt, it is possible for Smolensky to wire a network so that it supports a vector that represents aRb if and only if it supports a vector that represents bRa; and perhaps it is possible for him to do that without making imaginary units explicit ... The trouble is that, although the architecture permits this, it equally permits Smolensky to wire a network so that it supports a vector that represents aRb if and only if it supports a vector that represents zSq ... The architecture would appear to be absolutely indifferent as among these options. (1990, p.202)

    F&M's point here is related to premise (iii) of F&P's argument against connectionists. The first thing to notice about this argument is that it proves too much. F&M grant that there exist theoretically non-problematic connectionist implementations of (concatenative) classical architectures. Now, any such connectionist implementation has to be wired up in some specific way in order to be an implementation. But given any such implementation, we may always say with respect to it: it could have been wired up in a different way such that it could no longer support the classical architecture, and therefore, it could no longer explain how systematicity can be nomological. Hence, we could conclude, connectionist wiring up is absolutely indifferent as among architectures that guarantee nomological systematicity and the ones that do not.

    This shows that we need to be very careful about which counterfactuals (nomological necessities) need to be explained in a principled way. F&M's question is: How could systematicity be necessary? This question is ambiguous. It may be demanding an architectural (synchronic) explanation, or an evolutionary (diachronic) explanation. There are plenty of signs that F&M intend the question in the former sense. What kind of mechanism (cognitive architecture) could make systematicity necessary? Their answer is: only those mechanisms that enforce concatenative compositionality. But we have seen that those connectionist models that enforce non-concatenative compositionality would also guarantee systematicity in the required sense. Connectionists offer a mechanism that, if wired up in the proper way, guarantees that if the organism can represent aRb, it can also represent bRa. That is what non-concatenative modes of combination of atomic symbols (like Goedel numbering system, Tensor Product Representations, the RAAM Architecture, and others) promise to offer.

    Similarly for inferential systematicity: given the existence of a proper transformation network, it will by nomological necessity transform similarly (non-concatenatively) structured representations in formally similar ways. If this is right, then the question "how have they come to exist in cognitive organisms?'' (or, "how has the brain come to be wired up to nomologically exhibit these cognitive regularities?'') is a different one. It is, I take it, the business of evolution theory, or perhaps, developmental psychology, to answer this kind of diachronic question. In the second paragraph, F&M seem to sort of slip from the synchronic to the diachronic sense of the question. It is of course possible to wire up the connectionist networks quite differently. But given that there exits a class of connectionist models that have obviously the potential to guarantee systematicity, saying that they could always be wired up differently does not do any good to F&M argument, because the same point applies to concatenative models: their set up could always be changed so that they can represent aRb if and only if they can represent zSq, or for that matter if and only if they can represent "The Last of The Mohicans''. This kind of tinkering with the architecture does not count and is outside the rules of the game.

    12. How Adequate Is the Computational Power of Connectionist Models?

    Let me take up one point that may have been nagging the reader so far, as F&M seem to bring it out in a few places in their article: it is only when the complex representations and structure sensitive processes defined over them are concatenatively realized that they provide adequate explanation of the cognitive regularities like systematicity; i.e., they can only be explained adequately by C-LOTH. Anything less will not do. This can be put in the following manner.

    One might say that all I have shown is that connectionists have found some way to satisfy P non-concatenatively to some extent, but it does not follow that they are thereby in a position to adequately explain systematicity of human cognitive capacities: what also needs to be shown is that non-concatenative satisfaction of P has enough computational power to provide the adequate explanation.

    The reason behind this is that non-concatenatively compositional connectionist models and holistic structure sensitive processes have often alleged to have some computational limitations that may make them inadequate for the explanation of systematicity. For instance, Smolensky lists four such limitations:

    a. Uniqueness with respect to roles or fillers. If we're not careful ... we can end up with P&Q having the same representations as Q&P, or other more subtle ambiguities about what fills various roles in the structure.

    b. Unbounded depth. We may avoid the first problem for sufficiently small structures, but when representing sufficiently large or deep structures, these problems may appear. Unless the vector space in which we do our representation is infinite-dimensional ..., we cannot solve [a] for unbounded depth.

    c. Nonconfusability in memory. Even when problem [a] is avoided, when we have representations with uniquely determined filler/role bindings, it can easily happen that we cannot simultaneously store many such structures in a connectionist memory without getting intrusions of undesired memories during the retrieval of a given memory.

    d. Processing independence. ... [W]e may find that we can associate two vectors representing symbolic structures with what we like, but then find ourselves unable to associate the representation of a third structure with what we like, because its associate is constrained by the other two. (1990a, pp.224-5)

    Smolensky does not say that these properties or some of them do in fact fail to hold in any particular application of the tensor product variable binding technique. He simply points to the potentiality of their failure.[40]

    A similar point can be made with regard to transformation networks that holistically process complex connectionist representations: to what extent can they preserve their structure sensitivity when the compressed vectors, under the pressure of encoding an indefinite amount of similarly structured items, are allowed to be arbitrarily located in a multi-dimensional vector space?

    Supposing for the moment that these limitations are intrinsic to non-concatenatively compositional connectionist models and therefore cannot be overcome,[41] we may ask: in precisely what way do these limitations show that such connectionist models cannot adequately explain the cognitive regularities in question?

    Let me, however, emphasize one point again. The issue here as I set it up is not whether a cognitive architecture that satisfies P per se is sufficient and/or necessary for an adequate explanation of cognitive regularities. Both connectionists and classicists under discussion here agree that P is indispensable in this regard. We have seen that P, it seems, can in principle be realized concatenatively and non-concatenatively. The issue here is rather what happens when P is realized non-concatenatively by using connectionist techniques. Would the resulting models have enough computational power to secure systematicity (etc.) to a psychologically respectable (acceptable) degree?

    The right reply to the charge would certainly require us to examine more carefully what exhibited cognitive regularities empirically necessitate. For instance, if systematicity is something than can come in degrees,[42] then we need to see to what extent human cognitive capacities are systematic. Suppose we have an answer to that, we have to try to find out whether the explanation, then, requires appeal to universal computational powers.

    These are empirical issues. Once the empirical facts are in, they need to be matched with the results of connectionist research, and this is also an empirical issue. In short, connectionist models may be computationally limited, but it is not obvious that they are limited in such a way that would make their explanations automatically inadequate. Perhaps their limitations are just the kind of limitations that the brain would exhibit. But what would obviously be wrong, in this context, is to try to decide the issue a priori. Empirical facts are needed if connectionism is to be adequately criticized, and they are simply not available at the moment.

    Furthermore, connectionism is literally our best current story about how the brain could turn out to be a symbol manipulating system in the broad sense, whatever its problems are otherwise. This is so even if there are serious problems with its biological realism. And there are indeed serious problems about whether connectionist networks are biologically realistic. However, there is also no question about the fact that the neural network research is the only one in its kind in terms of offering some hope of understanding how the brain realizes higher level cognitive phenomena. In fact, connectionist research has already started the so far almost non-existent "functional'' analysis of neural structures (as opposed to traditional "anatomic'' analyses), and the literature is growing at an astonishingly rapid speed.

    So if connectionism has to use non-concatenative compositionality for various reasons, then it is reasonable to believe that a similar thing might somehow turn out to be true of the cognitive brain. This is an inference to the best explanation: but given the Fodorian premise that remotely plausible theories are better than no theories at all and that connectionism is our only remotely plausible theory of how the brain does it, then the prima facie case for non-concatenatively compositional connectionist models over a concatenative understanding of LOT (C-LOT) should be granted, if they otherwise turn out to be explanatorily adequate.

    There are other considerations of course. The most important one, perhaps, is the problem of solving how the (Quinean/isotropic) central cognitive processes can be mechanized within a LOT paradigm. Fodor's pessimism (his "Wagnerian mood'') is well known in this respect.[43] Many people have pointed out that connectionism may well be the solution to the problem. In fact, some have even argued that concatenative schemes might be what produces the problem in the first place. Whatever the case may be, there is every reason to believe that connectionism is here to stay, only that we have to be really clear about its status vis-a-vis the LOTH.[44]

    REFERENCES:

    Aydede, Murat (1993). Syntax, Functionalism, Connectionism and the Language of Thought, Ph.D. Dissertation, University of Maryland, College Park.

    Butler, Keith (1991). "Towards a Connectionist Cognitive Architecture,'' Mind and Language, Vol.6, No.3, 252-72.

    Butler, Keith (1993). "On Clark on Systematicity and Connectionism,'' British Journal for the Philosophy of Science 44:37-44.

    Chalmers, David (1990a). "Syntactic Transformations on Distributed Represenations,'' Connection Science, Vol.2.

    Chalmers, David (1990b). "Why Fodor and Pylyshyn Were Wrong: The Simplest Refutation,'' Draft, Indiana University, Bloomington.

    Churchland, Patricia Smith (1986). Neurophilosophy: Toward a Unified Science of Mind-Brain, The MIT Press.

    Churchland, Patricia Smith (1987). "Epistemology in the Age of Neuroscience,'' Journal of Philosophy, Vol.84, No.10, 544-553.

    Churchland, Paul M. (1990). A Neurocomputational Perspective: The Nature of Mind and the Structure of Science, The MIT Press.

    Churchland, Paul M. and P.S. Churchland (1990). "Could a Machine Think?'', Scientific American, Vol.262, No.1, 32-37.

    Clark, Andy (1988). "Thoughts, Sentences and Cognitive Science,'' Philosophical Psychology, Vol.1, No.3, 263-278.

    Clark, Andy (1989a). "Beyond Eliminativism,'' Mind and Language, Vol.4, No.4, Winter 1989, 251-279.

    Clark, Andy (1989b). Microcognition: Philosophy, Cognitive Science, and Parallel Distributed Processing, The MIT Press.

    Clark, Andy (1990). "Connectionism, Competence, and Explanation,'' British Journal for Philosophy of Science, 41, 195-222.

    Cummins, Robert (1989). Meaning and Mental Representation, The MIT Press.

    Cummins, Robert and Georg Schwarz (1987). "Radical Connectionism,'' The Southern Journal of Philosophy, Vol.XXVI, Supplement.

    Cummins, Robert and Georg Schwarz (1991). "Connectionism, Computation, and Cognition'' in Connectionism and the Philosophy of Mind, Terence Horgan and John Tienson (Eds.), Studies in Cognitive Systems (Volume 9), Kluwer Academic Publishers, 1991.

    Dennet, Daniel C. (1986). "The Logical Geography of Computational Approaches: A View from the East Pole'' in The Representation of Knowledge and Belief, Myles Brand and Robert M. Harnish (Eds.), The University of Arizona Press: Tucson, 1986.

    Dennett, Daniel C. (1989). "Two Contrasts: Folk Craft versus Folk Science and Belief versus Opinion'' presented at the NC Conference on the Future of Folk Psychology, January 10, 1989.

    Dennett, Daniel C. (1991a). "Real Patterns,'' Journal of Philosophy, Vol.LXXXVIII, No.1, 27-51.

    Devitt, Michael (1990). "A Narrow Representational Theory of the Mind'' in Mind and Cognition, W.G. Lycan (Ed.), Basil Blackwell, 1990.

    Dreyfus, Hubert L. (1979). What Computers Can't Do, New York: Harper and Row.

    Dreyfus, Hubert L. and Stuart E. Dreyfus (1986). Mind over Machine, The Free Press, New York.

    Elman, Jeffrey L. (1989). "Structured Representations and Connectionist Models,'' Proceedings of the Eleventh Annual Meeting of the Cognitive Science Society, Ann Arbor, Michigan, 17-23.

    Fodor, Jerry A. (1983). The Modularity of Mind, The MIT Press.

    Fodor, Jerry A. (1987). Psychosemantics: The Problem of Meaning in the Philosophy of Mind, The MIT Press.

    Fodor, Jerry A. (1991). "Replies'' (Ch.15) in Meaning in Mind: Fodor and his Critics, B. Loewer and G. Rey (Eds.), Basil Blackwell, 1991.

    Fodor, Jerry A. and B. McLaughlin (1990). "Connectionism and the Problem of Systematicity: Why Smolensky's Solution Doesn't Work,'' Cognition 35, 183-204.

    Fodor, Jerry A. and Zenon W. Pylyshyn (1988). "Connectionism and Cognitive Architecture: A Critical Analysis'' in Connections and Symbols, Pinker, Steven and Jacques Mehler (Eds.), The MIT Press (A Cognition Special Issue), 1988.

    Hinton, Geoffrey (1990). "Mapping Part-Whole Hierarchies into Connectionist Networks,'' Artificial Intelligence, Vol.46, Nos.1-2, (Special Issue on Connectionist Symbol Processing), November 1990.

    Hinton, G.E., J.L. McClelland and D.E. Rumelhart (1986). "Distributed Representations,'' in Parallel Distributed Processing (PDP, Vol.1) edited by D.E. Rumelhart, J.L. McClelland and the PDP Research Group, The MIT Press, 1986.

    Horgan, Terence and George Graham (1990). "In Defense of Southern Fundamentalism,'' Philosophical Studies 62, 107-134.

    Horgan, Terence and John Tienson (1987). "Settling into a New Paradigm,'' The Southern Journal of Philosophy, Vol.XXVI, Supplement.

    St. John, M.F. and J.L. McClelland (1990). "Learning and Applying Contexual Constraints in Sentence Comprehension,'' Artificial Intelligence, Vol.46, Nos.1-2, (Special Issue on Connectionist Symbol Processing).

    Loewer, Barry and Georges Rey (Eds.), (1990). Meaning in Mind: Fodor and his Critics, Oxford: Basil Blackwell.

    McLaughlin, B.P. (1993a). "The Connectionism/Classicism Battle to Win Souls,'' Philosophical Studies 71:163-90.

    McLaughlin, B.P. (1993b). "Systematicity, Conceptual Truth, and Evolution'' in Philosophy and Cognitive Science edited by C. Hookway and D. Peterson, Royal Institute of Philosophy, Supplement No.34.

    Minsky, M. (1967). Computation: Finite and Infinite Machines, Englewood Cliffs, NJ: Prentice Hall.

    Pollack, J.B. (1990). "Recursive Distributed Representations,'' Artificial Intelligence, Vol.46, Nos.1-2, (Special Issue on Connectionist Symbol Processing), November 1990.

    Pylyshyn, Zenon W. (1984). Computation and Cognition: Toward a Foundation for Cognitive Science, The MIT Press.

    Ramsey, William, Stephen Stich and Joseph Garon (1990). "Connectionism, Eliminativism and the Future of Folk Psychology,'' Philosophical Perspectives: Action Theory and Philosophy of Mind 4, edited by J.E. Tomberlin, Ridgeview.

    Schneider, W. (1987). "Connectionism: Is It a Paradigm Shift for Psychology?,'' Behavior Research Methods, Instruments, and Computers 19, 73-83.

    Searle, John R. (1990). "Is the Brain a Digital Computer?,'' Proceedings and Addresses of the APA, Vol.64, No.3, November 1990.

    Smolensky, Paul (1988). "On the Proper Treatment of Connectionism,'' Behavioral and Brain Sciences 11, 1-23.

    Smolensky, Paul (1990a). "Connectionism, Constituency, and the Language of Thought'' in Meaning in Mind: Fodor and His Critics, B. Loewer and G. Rey (Eds.), Basil Blackwell, 1991.

    Smolensky, Paul (1990b). "Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems,'' Artificial Intelligence, Vol.46, Nos.1-2, (Special Issue on Connetionist Symbol Processing), November 1990.

    Smolensky, P., G. Legendre and Yoshiro Miyata (1992). "Principles for an Integrated Connectionist/Symbolic Theory of Higher Cognition,'' Report CU-CS-600-92, Computer Science Department, University of Colorado at Boulder.

    Stich, Stephen P. (1983). From Folk Psychology to Cognitive Science: The Case Against Belief, The MIT Press.

    Thomason, R.H. (1969). Symbolic Logic, New York: Macmillan.

    Tienson, John (1987). "Introduction to Connectionism,'' The Southern Journal of Philosophy, Vol.XXVI, Supplement.

    van Gelder, Timothy (1990). "Compositionality: A Connectionist Variation on a Classical Theme,'' Cognitive Science, Vol.14.

    van Gelder, Timothy (1991). "What is the "D'' in "PDP''? A Survey of the Concept of Distribution'' in Philosophy and Connectionist Theory, W. Ramsey, S.P. Stich and D.E. Rumelhart (Eds.), Lawrence Erlbaum Associates.

    van Gelder, Timothy (1991b). "Classical Questions, Radical Answers: Connectionism and the Structure of Mental Representations'' in Connectionism and the Philosophy of Mind, Terence Horgan and John Tienson (Eds.), Studies in Cognitive Systems (Volume 9), Kluwer Academic Publishers, 1991.

    1. Robert Cummins has criticized Pylyshyn's (1984) notion of functional architecture and proposed a more specific notion of cognitive architecture: "Pylyshyn often makes it sound as if the primitive operations of a programming language define a functional architecture, but this cannot be right. The functional [cognitive] architecture of the mind is supposed to be that aspect of the mind's structure that remains fixed across data structures (i.e., in what is represented). This is the [hardwired] program itself, including its control structure, not the primitive operations of a language we might write in'' (1989, pp.165-6). I think that Cummins is right about this. However, the difference between these two conceptions will not be important for the purposes of this paper. [return to main text]

    2. Notice that it is not enough to say that the architecture is capable of exploiting the structural features of representations, but does not actually exploit them while processing them. This wouldn't do for classicism because the satisfaction of P-b requires actual structure sensitivity, i.e., actual exploitation of formal properties of representations in their transformation. This point raises some murky issues to some of which I will return later. [return to main text]

    3. Consider, for instance, the basic architecture of Marvin Minsky's (Minsky, 1967) simplest universal Turing machine with only four symbols and seven intrinsic states. A first-order theorem prover can in principle be implemented in it. But in such a case, the primitive operations (there are only twenty-eight of them!) cannot be defined over "representational'' states (i.e., over the interpreted well-formed formulas as such), since the four kinds of symbols cannot individually be used representationally; rather, their combinations would have to serve as "representational'' states of the virtual theorem prover. [return to main text]

    4. At the time they wrote their article, almost none of the connectionist models that were explicitly developed later on to deal with cognitive phenomena like systematicity were existent. In fact, many of them were developed as a response to F&P's challenge. For that reason, F&P were not too much worried about their attack's being a strawman: almost all of the connectionist models were indeed using atomistic representations. See below. [return to main text]

    5. F&P themselves do not want to dwell too much on productivity for their argument since they think that the point about productivity cannot be made without risking the danger of begging the question against some connectionists. See pp.34-6 of their (1988). The point about compositionality, similarly, had better be dropped, since its contribution to the debate is obscure. Fodor himself seems to vascillate not only about its proper description but also about its exact import. But, anyway, Fodor and McLaughlin (1990), and McLaughlin (1993a, 1993b) themselves have dropped them, and instead, focused only on systematicity. [return to main text]

    6. I think that it is no accident that all attempts to describe systematicity at some point appeal to using examples. But it is not clear what exactly, at the end, examples succeed at conveying in the way of what is alleged to be a thouroughly pervasive law-like regularity about the cognitive economy of certain organisms. If systematicity is to be used as an argument for LOTH, it must be describable as an empirical phenomenon without any implicit or explicit appeal to a P-like structure. Otherwise the argument for LOTH and against connectionism would be circular. The problem is how to do this without using any examples. Now, of course, the use of examples may be kosher at some stage, but then, if it can't be eliminated without risking circularity at the end, it is not clear what facts might constitute empirical counter-examples to systematicity. [return to main text]

    7. Indeed Fodor's enthusiasm is worthy of mentioning in this respect: "The real achievement is that we are (maybe) on the verge of solving a great mystery about the mind: How could its mental processes be semantically coherent? Or, if you like yours with drums and trumpets: How is rationality mechanically possible? Notice that this sort of problem can't even be stated, let alone be solved, unless we suppose ... that there are mental states with both semantic contents and causal roles'' (Fodor, 1987, p.20). Fodor's point is that syntactically structured symbols physically realized in the brain can be the only things that can fill those roles. [return to main text]

    8. For example, Churchlands, who are presently the champions of eliminativism, hope that connectionism is the long waited story which will provide the scientific foundations of the elimination of folk psychological constructs in psychology (P.S. Churchland 1986, 1987; P.M. Churchland 1990; P.S. Churchland and P.M. Churchland 1990). Ramsey, Stich and Garon (1990) have recently defended that if certain sorts of connectionist models turn out to be right then the elimination of folk psychology will be inevitable. Dennett (1986), and Cummins and Schwartz (1987) have also pointed out the potential of connectionism in the elimination of at least certain aspects of folk psychology. [return to main text]

    9. In fact, it is not clear at all, how connectionism can genuinely give support to eliminativism regarding representational constructs at least as far as the connectionist processing units are treated as representing. If they are not treated as such, it is hard to see how they could be models of cognitive phenomena. Two vague strands can, I think, be discerned among eliminativists in this regard. One stems from the intuition that it is unlikely that there are really any concrete, isolable, and modularly identifibale symbol structures realized in the brain that would correspond to what Stich has called (1983, p.237ff) functionally discrete beliefs and desires of folk psychology. It is thought that connectionism will vindicate this intuition. For somewhat similar remarks, among others, see Dennett (1986, 1991), Clark (1988, 1989b). The second trend seems to be that connectionism will vindicate the idea that the explanation of mental phenomena does not require a full-blown semantics for such high-order states as propositional attitudes. Rather, all that is needed is an account of some form of information processing at a much lower level, which, it is hoped, will be sufficient for the whole range of cognitive phenomena. But, again, it is not clear what the proposals are. See, however, P.M. Churchland (1990). [return to main text]

    10. Since premise (vii) in certain ways draws on the acceptance of (ii)-(iv), those who reject them will most likely reject (vii) too. Also, as I said, premise (iii) is connected to (ii) and (iii); its rejection all by itself does not mean much. So, the main critical focus for those who decline to meet the challenge is on premises (ii) and (iv). For a more elaborate discussion of various connectionist attempts to rebut F&P's argument, see my (1993). For what comes to the same categorization of connectionist reactions to F&P's argument and an effective criticism of the various ways in which connectionists decline to meet the challenge, see McLaughlin (1993a, 1993b). Also, see Butler (1993) for a criticism of Clark (1989) who rejects premise (iv). For what appears to be an attempt to reject premise (v), see Butler (1991) and van Gelder (1991b). [return to main text]

    11. Smolensky (1990a, 1990b) explicitly addresses the challenge put forward by F&P and his tensor product model is advanced by him explicitly as a conterexample to F&P's premise (vi). Other connectionists like Elman (1989), Hinton (1990), Pollack (1990) have developed techniques to accommodate combinatorial procedures in their models but they themselves seem not so much willing, not at least explicitly, to advance their models as direct responses to F&P's challenge. Some philosophers such as van Gelder (1990), Chalmers (1990a, 1990b), on their part, use such models as counterexamples to F&P's criticism. For a general discussion of the significance of "distributed representations,'' see van Gelder (1991) and Hinton et al. (1986). [return to main text]

    12. However, as the reader will no doubt realize as we go along, I think that connectionists have indeed showed the non-necessity of concatenation. In the discussion that follows, I won't therefore strictly adhere to my official position. But nothing should hang on this, since the point about my official position is to remind the classicist that I have always a fall-back position if any of my claims about the connectionist models is challenged or turn out to be false. [return to main text]

    13. In fact, I can make do with even a weaker claim. Since the modal claim is proposed only as a nomological necessity claim, it can at most be empirically true, which means that the classicist accepts that there are logically possible worlds in which non-concatenative structure sensitivity obtains. This is in fact enough ground for me to make my point against the classicist: In those worlds, the representational systems that satisfy P non-concatenatively do still count as LOT systems, which goes to show that the proper understanding of the LOT architecture does not involve such low-level requirement as concatenative realization. [return to main text]

    14. A general and technically elaborate description of the basic architecture of tensor product systems can be found in Smolensky (1990b). See Smolensky (1990a) for an informal and easily accessible discussion of the same issues. For a truly impressive application of the tensor product technique to higher cognitive processes, see Smolensky et al. (1992). [return to main text]

    15. This claim is problematic. However, it is generally assumed to be true in the literature. I will continue to pretend that it is true, since, in a certain sense, my aim is to work out its philosophical consequences if it were true. [return to main text]

    16. See, for instance, Hinton (1990), Elman (1989), St. John and McClelland (1990) among others. [return to main text]

    17. In fact, it is not at all clear that whether F&M do really require concatenation vis-a-vis P-a. They seem ambivalent: "We have agreed that [tensor product representations] may be said to have constituent structure "in an extended sense'' '' (1990, p.200). [return to main text]

    18. This is to accommodate in an informal way the extra complication created by the difference between operations that accept one and two arguments. [return to main text]

    19. For an interesting attempt to characterize first order predicate logic without a commitment to a particular notation, see Thomason (1969). In my abstract characterization of SL I do not claim to have captured every aspect of SL that a logician might want to be very curious or scrupulous about. My aim, again, is just to convey the basic idea. [return to main text]

    20. van Gelder (1990) also contains a very helpful discussion of formal systems parallel to mine here. Although he argues that concatenation is not necessary for instantiation of formal systems, he nevertheless thinks it is this fact that makes them radically different from classical LOT systems. So he thinks that the very idea of a LOT involves concatenative realizations against which I am arguing in this paper. See below. [return to main text]

    21. Cf. van Gelder (1990) who also talks about the digital character of modes of combinations. [return to main text]

    22. Cf. van Gelder (1990). [return to main text]

    23. Note that many actual physical realizations of abstract formal systems like von Neumann computers are also concatenative just in this sense: when such a conventional computer stores, for instance, a token of a well-formed complex expression of its machine language in many of its registers equipped with a pointer system, the registers literally contain tokens of its constituents, albeit in a spatially distributed fashion. [return to main text]

    24. Smolensky et al. (1992) claim that the Tensor Product System can even explain productivity, which is, again, a matter of adequately satisfying only P-a. [return to main text]

    25. See especially Chalmers (1990a) and van Gelder (1990). Butler (1991) appeals approvingly to Chalmers (1990a). [return to main text]

    26. In his experiment, Chalmers made no attempt to capture tense and noun-verb agreement in active-passive transformations. McLaughlin (1993a) attacks Chalmers by rightly claiming that it was precisely these difficulties that led Chomsky to postulate a "deep structure'' from which active and passive forms can be obtained. So he accuses Chalmers of false advertisement. Chalmers' model is not a successful connectionist model that can adequately explain English active-passive transformations. I think that this criticism is right but not quite relevant. In fact, it is unfortunate that Chalmers had chosen to model this particular phenomena in order to illustrate how connectionist models can handle structure sensitive operations holistically, thus explain inferential systematicity. All McLaughlin shows is that the very structure Chalmers had chosen in order to illustrate how it could be causally used in structure sensitive processing happened to be the wrong kind of structure. But nothing really hangs on this. He could have illustrated holistic processing on the transformation rules of SL for instance. The point is whether structure sensitivity can be achieved by holistic transformations. [return to main text]

    27. Compare the following remark by F&P: "If you hold the kind of theory that acknowledges structured representations, it must perforce acknowledge representations with similar or identical structure... So, if your theory also acknowledges mental processes that are structure sensitive, then it will predict that similarly structured representations will generally play similar roles in thought'' (1988, p.48). [return to main text]

    28. Devitt (1990) contains a very helpful discussion of syntax and draws a similar distinction between different levels of syntactic specification. [return to main text]

    29. One reason why he is hesitant may have something to do with the peculiar nature of "shapes'': shapes, it seems, can be multiply realized in a number of different media with very different physical properties without necessarily having a functional nature. Shapes of letters, for instance, can be realized in a variety of physical media. Just think of the letter 'A' inscribed in sand, wax, etc. In this sense, they still seem to be absract entities. If this is right, we should perhaps call syntactic properties in the sense of (S2) not quite physical but quasi-physical properties, as I have indeed done a few times so far. [return to main text]

    30. F&M sometimes put their criticism in the following way. They say that since constituent structure is not literally in the complex connectionist representations it can't be causally efficacious: what doesn't exist can't cause anything! But this is question begging. It assumes that syntactic structure sensitivity requires concatenative realization of constituents. The question is not whether the constituents themselves should be causally efficacious in the processing (this is only one way of obtaining structure sensitivity), but rather, whether you can obtain structure sensitivity without explicitly tokening the constituents. And we have already seen the answer to that: it seems you can. The "miracle argument" here is no good. There is no reason to think that the issue can be decided in this a priori way. [return to main text]

    31. See Cummins (1989), and Cummins and Schwartz (1991) for a somewhat parallel discussion and conclusion. [return to main text]

    32. In a way, we may even classify the historically traditional concatenative LOT models as Classical-LOT (C-LOT) models and the non-concatenative connectionist ones as NonClassical-LOT (NC-LOT) models. But both kinds would still be LOT models. [return to main text]

    33. Contrary to the supposition of some, the CTM does not offer a solution to the problem of how it is possible to have intentional states --Fodor's first question. [return to main text]

    34. Cf. Smolensky (1988) who writes: "There is a reasonable chance that connectionist models will lead to the development of new somewhat-general-purpose self-programming, massively parallel analog computers, and a new theory of analog parallel computation: They may possibly even challenge the strong construal of Church's Thesis as the claim that the class of well-defined computations is exhausted by those of Turing Machines'' (p.3). This I find hard to believe, but the claim is at least not obviously false. The proper epistemic attitude should be: let us wait and see. Nothing should hang on this as far as LOTH is concerned. [return to main text]

    35. There is a trivial sense in which I agree with this claim when we talk about implementation. As the attentive reader might have notice, in arguing that non-concatenative connectionist models are still within the classical paradigm of LOT, I haven't claimed that they are implementation models. Instead I've used words like 'realization' and 'instantiation'. There is a technical sense of 'implementation' in computer science according to which a program written, say, in PASCAL is implemeted, say, in the machine code of the particular computer that happens to run it. I think it would be ridiculous to confuse this sense of 'implemetation' with the claim of a connectionist model's 'instantiating' the LOT architecture where this simply means satisfying P. I think, the careless use of 'implementation' in computational contexts has created needless confusion in the debate. [return to main text]

    36. For similar claims, see Tienson (1987), Horgan and Tienson (1987), Schneider (1987). [return to main text]

    37. Ramsey, Stich and Garon (1990) are a good example: they seem to take the concatenative model as essential not only to LOTH but to folk psychology as well; they then claim that if certain connectionist models prove to be explanatorily adequate, folk psychology is bound to be eliminated. Dennett (1989, 1991a) and Clark (1988, 1989a, 1989b, 1990) seem to think that showing that there are no concatenatively realized entities corresponding to beliefs and desires is enough for proving that LOTH is false, even though they are not themselves eliminativist. Horgan and Graham (1990) argue that if LOTH turns out to be false, this does not show the demise of folk psychology. I agree with that in principle, but again they appear to have C-LOT in mind. [return to main text]

    38. Searle (1990), H. Dreyfus (1979), and Dreyfus & Dreyfus (1986) are particularly striking in this respect. [return to main text]

    39. Performance factors, of course, need also to be taken into account in this explanation. But they do not constitute the idealized core of the theory according to the classicist. [return to main text]

    40. Smolensky, strangely enough, sometimes gives the impression that he takes these potential computational limitations to be the properties that make connectionist models non-implementation of any particular classical architecture. At some point, he even rejects the idea that any such connectionist model is an implementation but a lousy one. He rightly remarks that for the term 'implementation' to be applicable at all in computational approaches, there must be a strictly defined mapping from one level onto the other, each of which must again be described completely and precisely. What he seems to have in mind is the relation that holds between a program written in high level programming language and its, for instance, machine language translation, or its implementation in the hardware. Now this is a very curious way of making connectionism a radically new and threatening approach, when the issue is exactly whether or to what extent systematicity and the like can be explained adequately by models that have these weaknesses! [return to main text]

    41. It is in no way obvious that that they are intrinsic. However, I just want to operate under that supposition for present purposes. [return to main text]

    42. Fodor (1987) is explicit that systematicity is a matter of degree; see pp.150-1. But in his later writings he seems less willing to make the same claim. [return to main text]

    43. See Fodor (1983, pp.101-19). This is in fact more than a bit puzzling especially given his enthusiasm about the prospects of a scientific vindication of folk psychology (see, for instance, the first chapter of (1987)), which is, to all intents and purposes, a psychology of central cognitive processes if it is of anything. [return to main text]

    44. I would like to thank many people for their support, encouragement and help while I was writing this paper. I am especially grateful for their insightful comments and criticisms to Michael Devitt, Kenneth Taylor, Guven Guzeldere, Brian Cantwell Smith, Georges Rey, David Chalmers, Jesse Prinz, Ken Aizawa. I would also like to thank John Perry and the CSLI crowd in Stanford for their warm hospitality and never ending help during my stay there as a visiting scholar.

    ^M [return to main text]