advanced searching in CUBE

This page covers the following topics:

» full word
» watch stress
» minimal pairs
» negate
» search in results
» maximal output
» web frequency
» syllable count
» grammar
» systems
» symbols
» analyses

full word

Ticking this box will only return results that take your spelling input as a full word. While bat normally returns battery and acrobat besides bat, with ‘full word’ only bat is matched. So bat the ‘full word’ option is equivalent to #bat#.

The authors are notified of ‘full word’ searches that give no result, ie of words that are missing from the database. We will add the new items, if legitimate, normally within 24 hours.

If the spelling input contains a space, that is, if it consists of more than one word, each of these words is searched for as a full word. So if you were looking for Gwyneth Paltrow, CUBE would only find Gwyneth (at the time of writing this sentence), and it would send a report to the authors about the absence of the surname.

watch stress

This is useful for matching only unstressed vowels (read more…).

minimal pair searches

CUBE can search for minimal pairs of a given string. For this, you must enter a string in the transcription box. This string will be considered a full word. No sound classes (introduced by !) or regular expressions are allowed in this string. Stress is ignored in minimal pair searches, ie the verb record rəkóːd and the noun record rɛ́koːd are considered a minimal pair. To search for minimal pairs of your input string tick the ‘minimal pairs’ button.

All vowels are interchangeable with each other. Thus fit fɪ́t is a minimal pair of feet fɪ́jt or fought foːt, ie the short vowel ɪ and the diphthong ɪj or the long monophthong oː are interchangeable.

Alternatively, you may give two strings separated by an equals sign (=) or slash mark (/) in the transcription box (remembering also to tick the ‘minimal pairs’ box in the first case, this is not obligatory if you separate the strings by a slash mark). This finds word pairs differing only in the strings specified. The first string may be ‘0’ (zero), in this case you get word pairs where one is longer by the second segment. Currently, e/a only finds cases where these two vowels do not bear a stress mark, to find minimal pairs in which they do (like bet vs bat), you have to input E/A.

negate

Ticking the negate box find items that do not match the sound pattern you enter. For example, !c normally finds entries containing any consonant; but !c with negate ticked will find entries that do not contain any consonants. The negate button is most useful with the ‘search in results’ option described below.

Note that the negate box is different from entering a negative expression (using ~) in your search, which affects only one consonant in the string you are searching for. Thus negating !n matches only entries that do not contain a nasal consonant at all, while !~n matches any nonnasal consonant.

search	finds	does not find
`#!~nap` ☐ negate	cap, chap, gap, lap…	map, nap
`!n` ☑ negate	cap, bag, fast, lab…	can, camp, conquer, nag…

search in results

Ticking this button means that a new search will look only at the results of your last search. This is useful for narrowing down your results with successive searches.

It can be useful to combine this function with the ‘negate’ filter mentioned above.

maximum output

If you do not limit your search sufficiently, there may be a very large number of matches, the displaying of which could slow down some machines. By default, therefore, CUBE limits results to 300 entries. By inserting a number in the ‘maximum output’ box, you can specify a smaller or greater number.

Note that this does not affect the ‘search in results’ option in any way: all the results are searched for in the next run, irrespective of how many of them are actually shown to you.

web frequency

Each entry in CUBE is marked with its frequency of occurrence, as acquired from web searches (limited to the top-level domain ‘.uk’). These counts therefore reflect up-to-date usage, though they obviously exhibit some biases (eg towards popular culture and information technology) and are to some degree ephemeral. An additional limitation is that any items with the same spelling, regardless of upper/lower case, are treated as the ‘same word’: for example, the same frequency counts are given for the noun bit and the past-tense verb bit, the name Bill and the noun bill, the verb rip and the abbreviation RIP, the French capital and the Trojan prince Paris, etc. Despite these shortcomings we believe that these figures do provide some guidance regarding word frequency in current BrE.

CUBE groups the frequency counts into ten searchable categories as shown in the following table.

category	occurrence in web search	number of words	% of all words
10	1,000,000,000–1,760,000,000	50	0.05%
9	100,000,000–999,999,999	1,247	1.20%
8	10,000,000–99,999,999	6,737	6.47%
7	1,000,000–9,999,999	12,922	12.40%
6	100,000–999,999	25,610	24.58%
5	10,000–99,999	34,891	33.49%
4	1,000–9,999	16,795	16.12%
3	100–999	4,919	4.72%
2	10–99	849	0.81%
1	1–9	162	0.16%

You can search for items of a given frequency by entering a number from 1 to 10 in the ‘web frequency’ box, or a range, eg 8–10 for the most frequent 8% of words. If you enter a frequency value, the approximate figures for each entry of your result will be shown. These figures will link to Google Book’s Ngram Viewer, which shows a frequency timeline for the entry in English-language books stored by that service. If you want to display frequency without limiting your search, enter all in the ‘web frequency’ box.

If you tick the ‘sort by’ button too, the results will be ordered from most to least frequent, instead of the default alphabetical order. When frequency counts are displayed, all these numbers are summed up at the top for your convenience.

syllable count

You can put numbers in this box to filter the length of words found. 1 lists only monosyllabic (one-syllable long) words, 1,3 lists one- and three-, but not two-syllable long ones. -3 lists words that are contain up to three syllables. 5-6 list words of five or six syllables, 8- lists words that are eight syllables or longer. Even 1,3-5,7 is meaningful, though it is hard to see why one would want to use it. Values that cannot be interpreted by CUBE are ignored.

stress patterns

Each item in the dictionary is associated with a stress pattern. The stress pattern is a string of letters, w, s, and S, for each vowel (=syllable) of the word. w represents unstressed (ie reduced) vowels, s represents stressed vowels, S represents accented vowels the last of which is the tonic. If your input contains only lower case s and w, but not upper case S, s will match both stress and accent. As elsewhere, # anchors the pattern to word edges (note that multiword items are represented as one stress pattern without internal word boundaries).

The stress pattern of parrot párət is Sw, that of robot rə́wbɔt is Ss. Unknot ə́nnɔ́t is SS. The hyphen can be used to represent ‘anything in between’, so #S-S# will find words of two or more syllables that have an accent on both their first and last vowels.

Stress patterns are calculated automatically. The stress of ɪj’s and ʉw’s without a stress mark are calculated as follows: word-final ɪj spelled y/ie/i and ʉw spelled u/ue is w, prevocalic ɪj is w unless spelled ee. As a result, some of these vowels are taken to be s stressed, although they are unstressed. For example, volume and copied are taken to be Ss, although both are Sw. Other ɪ’s are all w, while other ʉ’s are all s.

the grammar pane

this section is out-of-date

If you open the grammar pane (by clicking on ‘show grammar’) something like the flight deck of a stealth bomber opens up. Don’t be alarmed: you can’t do any harm in CUBE. These grammatical categories, which can be included/excluded in searches, have been adapted from Roger Mitton’s CUVOALD; we plan to simplify them in future versions of CUBE. The categories have short descriptive labels and letter-codes, eg ‘transitive verb (H)’, ‘definite article (R)’. The letter-codes are shown in the rightmost column of search results. Probably the best way to familiarize yourself with these categories is to experiment with the buttons and cross-check the letter-codes in your results. Note that their short description is shown if you hover over the letter-codes.

There are various categories of names (proper nouns), eg personal forenames and names of towns and cities. The catch-all category No includes many surnames, eg Jones, Zuckerberg. There is no separate surname category since many words used as surnames are common nouns (eg smith and brown) or place names (eg Holland and Lincoln). The No category also contains many brand names (eg Nintendo) and some full names (eg Mao Zedong, Darth Vader, Ralph Fiennes), which are commonly used as wholes and/or have pronunciations differing from the most common form of their components (eg Ralph).

There are three buttons before each category. If you tick the first one (☑ ☐ ☐), you get words of that category. If you tick the last button (☐ ☐ ☑), you get words that are not of that category. You can cancel either of these choices by ticking the middle button (☐ ☑ ☐).

You can limit your search to certain grammatical categories by ticking the categories you need. If you enter the spelling thin and use the ‘verb’ filter, you get:

search	finds	does not find
☑ ☐ ☐ verb (G-J)	bathing, bethink…	airworthiness, anything…
☐ ☐ ☑ verb (G-J)	airworthiness, anything…	bathing, bethink…
☐ ☑ ☐ verb (G-J)	airworthiness, anything, bathing, bethink…

If you tick two categories you get results if either match. That is, if you tick the ‘verb’ and the ‘adjective’ filters you get both verbs and adjectives in the output.

search	finds	does not find
☑ ☐ ☐ verb (G-J) ☑ ☐ ☐ adjective (O)	bathing, labyrinthine…	airworthiness, anything…
☐ ☐ ☑ verb (G-J) ☐ ☐ ☑ adjective (O)	airworthiness, anything…	bathing, labyrinthine…

If you need only words that may be used both as a verb and as an adjective, you have to do two searches: first search for verbs, then tick the search in results button and search again for adjectives.

If you close the grammar box, all grammar filters are turned off, and no grammar codes are shown in your results.

customizing the transcriptions

A unique feature of CUBE is that it allows some freedom to customize the IPA transcriptions in search results. These options are explained below.

Any options that you select will be retained for further surches until cancelled. If your browser allows cookies to be set, the options of your last search will be active at the beginning of your next session too. (See also our privacy policy.)

types of transcription

CUBE always returns the search results in IPA transcription (in dark blue). Search strings are matched against this transcription. Several additional types of transcription may additionally be selected. These are shown if you hover over or click the link labelled systems on the right.

simp: simple

By ticking this button the results section will include an IPA transcription (in light blue) with the vowel symbols simplified: both a and ɑ are shown as a, both o and ɔ are shown as o, ɪ is shown as i, ɛ as e, and ʉ/ɵ as u. This gives the impression that the vowel inventory of English is quite simple.

trad: traditional (Gimsonian)

By ticking this button the results section will include a ‘Gimsonian’ IPA transcription (in black). This kind of transcription is not exactly what you would find in Gimson’s EPD or Wells’ LPD, because it is produced simply by converting CUBE’s symbols. As a result, mergers are not undone (eg both agree and angry end in iː).

respell: BBC-style respelling

This is based on the re-spelling system used by the BBC Pronunciation Unit for guiding BBC staff. Similar re-spelling systems are used by some dictionaries, Wikipedia, etc. They are based on the conventions of English spelling and don’t require knowledge of special phonetic symbols. This is given in green.

The re-spelled transcriptions in CUBE are converted from those in the CUBE database, and so will not always be identical those recommended by the BBC Pronunciation Unit.

Note that English re-spelling systems generally use many digraphs (double letters), such as uu and sh for the middle and last sounds in the word push. Hyphens are therefore necessary to prevent the misinterpretation of letter sequences (eg goshawk gos-hawk v Toshack tosh-ak). These hyphens are not intended, either in CUBE or according to the BBC’s own re-spelling guide, to represent English syllable boundaries, which are a matter of controversy.

hu: Hungarian respelling

While BBC-style respelling uses the conventions of English spelling, this transcription uses those of the spelling for Hungarian in khaki. This is useful for those familiar with Hungarian spelling, by reading out this transcription they produce an British English-like accent. There are four symbols not present in standard Hungarian spelling: ȧ=the short version of á, ē=the long version of e, ŋ=ng without the g, and w.

ASCII: plain ASCII

By ticking this button the results section will include the ASCII transcription that the database uses in beige. This might be useful for you in designing your transcription search strings.

symbol choices

fɑɑ (geminate vowels)

There are two ways to transcribe long monophthongs: by using the IPA length mark, ː, and by doubling the vowel symbol. By default CUBE uses the length mark, but the double-vowel alternative can be selected with this option.

option	result
☐ fɑɑ	airport ɛ́ːpoːt EHpoHt
☑ fɑɑ	airport ɛ́ɛpoot EHpoHt

gəu (vowel offglides)

English diphthongs are falling, gliding to weaker endpoints. The offglides can be represented either by the consonantal symbols j and w, or by nonsyllabic vowel symbols. Tick this button to select the latter option.

option	result
☐ gəu	linotype lɑ́jnəwtɑjp lAJnowtaJp
☑ gəu	linotype lɑ́i̯nəu̯tɑi̯p lAJnowtaJp

fɪš (háček sibilants)

For the palatal sibilants, IPA uses the special symbols ʃ, ʒ, ʧ, and ʤ. The convention of APA (American Phonetic Alphabet) is to put a háček (alias caron) on Latin letters (as in the orthography of some Slavic languages): š, ž, č, and ǰ, and to use y instead of j. This option lets you apply this convention.

option	result
☐ fɪš	short-change ʃoːtʧɛ́jnʤ SOHtCeJnG
☑ fɪš	short-change šoːtčɛ́ynǰ SOHtCeJnG

fʉt (uniform ʉ), gɵws (uniform ɵ)

The vowels of FOOT and jury tend to have a slightly opener quality than the beginning of the GOOSE diphthong. By default, CUBE shows this by using distinct IPA symbols: fɵt, ʤɵːɹɪj vs gʉws. However, it is not clear that the distinction between ɵ and ʉ is used contrastively by any language. By ticking one of these buttons you can choose the more economic option of using only ʉ (gʉws, fʉt, ʤʉːɹɪj) or only ɵ (gɵws, fɵt, ʤɵːɹɪj).

If you tick both ‘fʉt’ and ‘gɵws’, the latter option is ignored.

option	result
☐ fʉt ☐ gɵws	cuckoo kɵ́kʉw kUkuW Euronews jɵ́ːrənjʉwz jUHrxnjuWz
☑ fʉt ☐ gɵws	cuckoo kʉ́kʉw kUkuW Euronews jʉ́ːrənjʉwz jUHrxnjuWz
☐ fʉt ☑ gɵws	cuckoo kɵ́kɵw kUkuW Euronews jɵ́ːrənjɵwz jUHrxnjuWz
☑ fʉt ☑ gɵws	cuckoo kʉ́kʉw kUkuW Euronews jʉ́ːrənjʉwz jUHrxnjuWz

r (upright r)

By default ray is transcribed as ɹɛj. If you tick this button, the lower case Roman letter r (which in strict IPA represents a trill) will be used instead: rɛj.

option	result
☐ r	retrograde ɹɛ́tɹəgɹɛjd rEtrxgrejd
☑ r	retrograde rɛ́trəgrɛjd rEtrxgrejd

analyses

ʌ=ə (strut–comma)

It is customary in the British transcribing tradition to use different symbols for the strut vowel (ʌ) and the last vowel of comma (ə). However, many speakers pronounce them with similar qualities, and they are not strictly contrastrive, the former having by definition a higher level of stress or prominence. The CUBE database retains the distinction: in the ASCII representations, commA is x and the STRUT vowel is y. CUBE’s IPA transcriptions follow the traditional distiction of ʌ and ə by default; but by selecting the ‘ʌ=ə’ option you can display the STRUT vowel too as ə, with the difference represented (as in the Merriam-Webster dictionary of American English) in terms of stress.

option	result
☐ ʌ=ə	unburden ʌ́nbə́ːdən YnbYHdxn
☑ ʌ=ə	unburden ə́nbə́ːdən YnbYHdxn

mɪsdɛjk

In many contexts fortis (p t k) and lenis (b d g) plosives are not distinguished by voicing, but by other means (aspiration, preglottalization) in English. Thus the initial consonant of time tɑ́jm is aspirated, that of dime dɑ́jm is not. Ticking this button makes CUBE’s transcription follow this convention after voiceless fricatives too: mistime would be transcribed as mɪ́stɑ́jm (because the plosive in this word is aspirated), but mistake would be transcribed as mɪsdɛ́jk (since the first plosive in this word is not aspirated).

option	result
☐ mɪsdɛjk	space-time spɛ́jstɑjm sbEJstajm
☑ mɪsdɛjk	space-time sbɛ́jstɑjm sbEJstajm

tie diphthongs

By default CUBE shows the unitary nature of diphthongs (and affricates) by placing the pairs of characters closer to each other than other symbols, which are separated by narrow spaces. This option dispenses with the narrow spaces, showing the connection of diphthongal components with an underarch. Nonsyllabic subscripts are turned off when underarches are selected. (Please note that these underarches are misplaced in some browsers. This is a font rendering problem beyond our control.)

option	result
☑ vowel offglides ☐ tie diphthongs	linotype l ɑ́i̯ n əu̯ t ɑi̯ p lAJnowtaJp
☑ vowel offglides ☑ tie diphthongs	linotype lɑ́͜inə͜utɑ͜ip lAJnowtaJp

tʰɔ́p (show aspiration)

…