Constituent tests

The replacement/substitution test

Here is an illustration of the replacement test. If did so can be used to replace a set of words, that set of words is a constituent. So the following sentence is ambiguous:

but the following one is not:

The pro-form replaces [ saw the boy ], so

A: Jane [ saw the boy ] with a telescope

is not ambiguous, it is one of the interpretations of the sentence.

The other interpretation is

B: Jane saw [ the boy with a telescope ]

Note that the sentence

is also not ambiguous, which means that [ the boy ], which him replaces is a constituent in only in interpretation A, but not in interpretation B.

So we have

A: Jane [ saw [ the boy ] ] with a telescope

B: Jane saw [ the [ boy [ with a telescope ] ] ]

The movement test

Let us use the same ambiguous sentence

In the following versions different portions of it were moved:

A: it was the boy Jane saw with a telescope

B: it was the boy with a telescope Jane saw

Movement disambiguates the sentence, in A [ the boy ] is a constituent, so Jane had the telescope, in B [ the boy with a telescope ] is a constituent, so the boy had the telescope. It is only constituents that can be moved, and a movement can move only a single constituent. Therefore the two strings of words moved in the two cases are both constituents.

Maximum binarity

Without going into details of why, let’s accept the axiom that any constituent contains a maximum of two things, not three. So a phrase containing three words, like with a telescope, must be structured either as [ [ with a ] telescope ] or as [ with [ a telescope ]. I hope you all would go for the latter structure.

Similarly, the shy boy is either [ [ the shy ] boy ] or [ the [ shy boy ]. The three words cannot be on a par, two of them must belong together more closely than the third. And these two must be adjacent, since, recall, we prefer not to have discontinuous constituents (something we discussed in connection with “infixes”). So the middle word cannot not be part of a constituent that the first and the last one is. (Please take a minute to understand the previous sentence.)

Noun phrases

The word him replaces [ the boy ], which is not a noun, but a noun phrase (NP). So “pronoun” is not a good name for this kind of pro-form, him (and he, she, them, etc) is not a pronoun, but a pro-noun-phrase (pro-NP).

The fact that Jane saw him with a telescope cannot mean B, shows that the and boy here do not belong together. Boy and with a telescope, on the other hand do belong together, this constituent can be replaced by the pro-form one:

We will call this constituent an N′ (pronounced en-bar). It is larger than a noun (N), but smaller than a NP. So we conclude that a NP is made up of a determiner (D), like the, my, some, every and an N′. An N′ in turn consists of a noun and optional further stuff, like an adjective, usually before the noun, and/or a prepositional phrase (PP), after it:

In each of these NPs the head noun is emboldened, there is an adjective before it and a PP after it.

Him cannot replace an N′, only a NP: *the him. What is quite intriguing is that in the boy with a telescope not only boy with a telescope, but also boy may be replaced by one:

This suggests that here we have an N′ (boy) inside another N′ (boy with a telescope). A structure where a constituent of some type contains another constituent of the same type is a RECURSIVE structure.

Recursion

Indeed, we see that the noun in a NP is really an N′:

We see here that this structure is recursive, it can be expanded, in theory, endlessly. This is because an N′ can always be replaced by an adjective and an N′, and this second N′ can also be replaced, etc. And indeed each string of words we claim is an N′ can be replaced by one:

The adjective (or PP) in an N′ is optional, so an N′ can contain only a noun. This is crucial, since otherwise an NP could not be finite: the recursion finishes when N′ is only a noun.

There are other recursive structures. For example, some verbs are complemented by a sentence (S). Since a sentence always contains a verb, we have a sentence inside a sentence:

This is relevant because it explains how a sentence is potentially infinitely long (only potentially, humans are limited: the speaker will die before finishing an infinitely long sentence and listeners will leave the speaker long before they had the chance to finish their infinitely long sentence). If a sentence may be infinitely long, then the number of possible sentences is also infinite. So we can all produce new sentences nobody has ever used before. And we are doing that all the time!

N′ vs N

Consider the following two noun phrases:

We can replace teacher with one in the first one, but not in the second one:

This means that teacher is an N′ in the blonde teacher, but not in the physics teacher. This is consistent with the following facts:

Since physics must be followed by a noun, not an N′, it can be followed by teacher, but not by blonde teacher, which is an N′.

Also we can have

and

This last noun phrase makes sense because the two Englishes in it are different, the first one means ‘from England’, the second one ‘teaches English’.

Another constituent test: the standalone test

It is often said that “sentences are made up of words”. Strictly speaking, this is not true: it’s not words that make up sentences, it is phrases: noun phrases (NP), verb phrases (VP), preposition phrases (PP), etc. A word cannot be part of a sentence, unless it is first part of a phrase. So although we seem to have a word as the subject of the sentence Amy sings, Amy here is a NP: it can be replaced by she, which is a pro-NP. It also follows that there can be no determiner or adjective before or prepositional phrase after Amy: *The Amy sings, *The yellow Amy in the garden sings.

The standalone test states that an utterance consists of nothing smaller than a phrase. Consider the following Q&As:

B’s answer is a full phrase in each case: first a NP, then a PP, finally a VP. It cannot be anything “smaller”. This is the STANDALONE test.

And another one: the integrity test

Recall that constituents are maximally binary. So if we have a sentence like this one:

the part cat and dogs must either be parsed as [ cats [ and dogs ] ] or as [ [ cats and ] dogs ]. How do we decide?

The INTEGRITY test states that if extra words cannot be inserted between two words, they form a constituent. (Extra words here mean words that do not relate to its neighbours.) So let us try to insert yesterday in the above sentence:

  1. yesterday the girl fed cats and dogs
  2. *the yesterday girl fed cats and dogs
  3. the girl yesterday fed cats and dogs
  4. ?the girl fed yesterday cats and dogs
  5. the girl fed cats yesterday and dogs
  6. *the girl fed cats and yesterday dogs
  7. the girl fed cats and dogs yesterday

The impossibility of sentence b shows that the girl is a constituent, but we knew that anyway. I’m not sure of the grammaticality of sentence e. The impossibility of sentence f, however, shows that the right structure of cats and dogs is [ cats [ and dogs ], that is, and goes with the last member of the conjunction. Actually, this is why we have A, B, and C, not *A and, B, C.