Constituent tests

The replacement/substitution test

Here is an illustration of the replacement test. If did so can be used to replace a set of words, that set of words is a constituent. So the following sentence is ambiguous:

Jane saw the boy with a telescope

but the following one is not:

Jane did so with a telescope

The pro-form replaces [ saw the boy ], so

A: Jane [ saw the boy ] with a telescope

is not ambiguous, it is one of the interpretations of the sentence.

The other interpretation is

B: Jane saw [ the boy with a telescope ]

Note that the sentence

Jane saw him with a telescope

is also not ambiguous, which means that [ the boy ], which him replaces is a constituent in only in interpretation A, but not in interpretation B.

So we have

A: Jane [ saw [ the boy ] ] with a telescope

B: Jane saw [ the [ boy [ with a telescope ] ] ]

The movement test

Let us use the same ambiguous sentence

Jane saw the boy with a telescope

In the following versions different portions of it were moved:

A: it was the boy Jane saw with a telescope

B: it was the boy with a telescope Jane saw

Movement disambiguates the sentence, in A [ the boy ] is a constituent, so Jane had the telescope, in B [ the boy with a telescope ] is a constituent, so the boy had the telescope. It is only constituents that can be moved, and a movement can move only a single constituent. Therefore the two strings of words moved in the two cases are both constituents.

Maximum binarity

Without going into details of why, let’s accept the axiom that any constituent contains a maximum of two things, not three. So a phrase containing three words, like with a telescope, must be structured either as [ [ with a ] telescope ] or as [ with [ a telescope ]. I hope you all would go for the latter structure.

Similarly, the shy boy is either [ [ the shy ] boy ] or [ the [ shy boy ]. The three words cannot be on a par, two of them must belong together more closely than the third. And these two must be adjacent, since, recall, we prefer not to have discontinuous constituents (something we discussed in connection with “infixes”). So the middle word cannot not be part of a constituent that the first and the last one is. (Please take a minute to understand the previous sentence.)

Noun phrases

The word him replaces [ the boy ], which is not a noun, but a noun phrase (NP). So “pronoun” is not a good name for this kind of pro-form, him (and he, she, them, etc) is not a pronoun, but a pro-noun-phrase (pro-NP).

The fact that Jane saw him with a telescope cannot mean B, shows that the and boy here do not belong together. Boy and with a telescope, on the other hand do belong together, this constituent can be replaced by the pro-form one:

Jane saw the one (= boy with a telescope)

We will call this constituent an N′ (pronounced en-bar). It is larger than a noun (N), but smaller than a NP. So we conclude that a NP is made up of a determiner (D), like the, my, some, every and an N′. An N′ in turn consists of a noun and optional further stuff, like an adjective, usually before the noun, and/or a prepositional phrase (PP), after it:

the yellow submarine in the song
every fierce dog in the garden
my beloved wife for thirty years

In each of these NPs the head noun is emboldened, there is an adjective before it and a PP after it.

Him cannot replace an N′, only a NP: *the him. What is quite intriguing is that in the boy with a telescope not only boy with a telescope, but also boy may be replaced by one:

the one with a telescope

This suggests that here we have an N′ (boy) inside another N′ (boy with a telescope). A structure where a constituent of some type contains another constituent of the same type is a RECURSIVE structure.

Recursion

Indeed, we see that the noun in a NP is really an N′:

the unsinkable submarine
the unsinkable cute submarine
the unsinkable cute yellow submarine
…

We see here that this structure is recursive, it can be expanded, in theory, endlessly. This is because an N′ can always be replaced by an adjective and an N′, and this second N′ can also be replaced, etc. And indeed each string of words we claim is an N′ can be replaced by one:

the one
the unsinkable one
the unsinkable cute one
…

The adjective (or PP) in an N′ is optional, so an N′ can contain only a noun. This is crucial, since otherwise an NP could not be finite: the recursion finishes when N′ is only a noun.

There are other recursive structures. For example, some verbs are complemented by a sentence (S). Since a sentence always contains a verb, we have a sentence inside a sentence:

Joe said Amy knew Jack believed Kate thought…

This is relevant because it explains how a sentence is potentially infinitely long (only potentially, humans are limited: the speaker will die before finishing an infinitely long sentence and listeners will leave the speaker long before they had the chance to finish their infinitely long sentence). If a sentence may be infinitely long, then the number of possible sentences is also infinite. So we can all produce new sentences nobody has ever used before. And we are doing that all the time!

N′ vs N

Consider the following two noun phrases:

the blonde teacher
the physics teacher

We can replace teacher with one in the first one, but not in the second one:

the blonde one
*the physics one

This means that teacher is an N′ in the blonde teacher, but not in the physics teacher. This is consistent with the following facts:

the blonde physics teacher
*the physics blonde teacher

Since physics must be followed by a noun, not an N′, it can be followed by teacher, but not by blonde teacher, which is an N′.

Also we can have

the English teacher ~ the English one (= from England)
the English teacher ~ *the English one (= teaches English)

and

the English English teacher

This last noun phrase makes sense because the two Englishes in it are different, the first one means ‘from England’, the second one ‘teaches English’.

Another constituent test: the standalone test

It is often said that “sentences are made up of words”. Strictly speaking, this is not true: it’s not words that make up sentences, it is phrases: noun phrases (NP), verb phrases (VP), preposition phrases (PP), etc. A word cannot be part of a sentence, unless it is first part of a phrase. So although we seem to have a word as the subject of the sentence Amy sings, Amy here is a NP: it can be replaced by she, which is a pro-NP. It also follows that there can be no determiner or adjective before or prepositional phrase after Amy: *The Amy sings, *The yellow Amy in the garden sings.

The standalone test states that an utterance consists of nothing smaller than a phrase. Consider the following Q&As:

A: what’s this?
B: a (yellow) cat/*(yellow) cat
A: where is it?
B: in the garden
A: what is it doing?
B: chasing mice

B’s answer is a full phrase in each case: first a NP, then a PP, finally a VP. It cannot be anything “smaller”. This is the STANDALONE test.

And another one: the integrity test

Recall that constituents are maximally binary. So if we have a sentence like this one:

the girl fed cats and dogs

the part cat and dogs must either be parsed as [ cats [ and dogs ] ] or as [ [ cats and ] dogs ]. How do we decide?

The INTEGRITY test states that if extra words cannot be inserted between two words, they form a constituent. (Extra words here mean words that do not relate to its neighbours.) So let us try to insert yesterday in the above sentence:

yesterday the girl fed cats and dogs
*the yesterday girl fed cats and dogs
the girl yesterday fed cats and dogs
?the girl fed yesterday cats and dogs
the girl fed cats yesterday and dogs
*the girl fed cats and yesterday dogs
the girl fed cats and dogs yesterday

The impossibility of sentence b shows that the girl is a constituent, but we knew that anyway. I’m not sure of the grammaticality of sentence e. The impossibility of sentence f, however, shows that the right structure of cats and dogs is [ cats [ and dogs ], that is, and goes with the last member of the conjunction. Actually, this is why we have A, B, and C, not *A and, B, C.