

  1. Lexicon:

    Predicates that must be supplied in a lexicon.

  2. Lexical Features:

    Obligatory and optional lexical features.

  3. Parse PF:

    The ParsePF system and predicates for taking apart and combining words.




Input Words
Expand Contractions
Lookup Words

Example: John can't sleep

[n John][v$[neg] can][v sleep]


ParsePF represents the first stage of parsing in the PAPPI system. It receives as input the sentence to be parsed, i.e. a sequence of words, and performs lexical analysis to produce as its output a corresponding sequence of base-level or zero-level constituents (ZLSS).

These base-level constituents will be in turn fed as input to the second stage of parsing, namely ParseSS. This level performs standard phrase structure analysis to construct the corresponding phrase structure tree representations at S-structure.

For example, ParsePF transforms a sentence like John can't sleep into the following sequence of zero-level constituents: [N John][V$[neg] can][V sleep]. Within ParsePF, the transformation is broken down into a series of smaller stages, as shown in the diagram above.

Input Words

PAPPI can accept input sentences typed at the PAPPI input window (or any other window). Once the input sentence has been selected, the Run button will send the sentence to ParsePF.

Basically, the input consists of just words separated by spaces. However, there are a few embellishments:

References: Implementation Notes

Expand Contractions

Expand Contractions uses simple contraction rules defined in the lexicon to `chop' or split up words into their components. For example, the following two rules can be used to derive can not and would not from can't and wouldn't, respectively.
Similarly, the following two rules simply stipulate that 'd and 've should be expanded into would and have, respectively.
Both contraction/3 and contraction/4 are examples of simple string manipulation rules. That is, Expand Contractions does not produce lexical categories, nor does it take them as input. Note that this means such rules cannot make use of general lexical features except under a very limited set of circumstances concerned with stem recognition.

In general, contraction rules will apply optionally. That is, PAPPI will try both the possibility of applying a contraction rule or letting the input word go through unchanged. A special blockContraction declaration can be used on a word-by-word basis to inhibit contraction processing. For more details on the format of the rules, see the documentation on the predicate contraction:


The contraction mechanism can be traced by turning on the Trace Contractions flag under the Tracing Options control panel:

(a) Turning on Trace Contractions

(Alternatively: it can be activated using the setup option tracingContractions.)

For example, here is the contraction mechanism at work on the Japanese sentence Taroo-ga hihansareta:

(a) Tracing Expand Contractions

Here, Expand Contractions produces two candidate strings. In the first one, prefixed by Exit Contraction: (1), the verb complex hihansareta is broken down into the suru-taking verb stem (hihan), followed by the passive form of suru (sare) to be modified by the past tense morpheme (-ta). This analysis subsequently leads to a successful parse. The second candidate string is the one where no contraction rule has applied.

References: contraction / blockContraction

Expand Contractions Hook

In later versions (PAPPI 3.x only), a hook predicate is provided after Expand Contractions but before Lookup Words. Initially, expandContractionsHook/2 is defined to be a null stage as follows:
See the Turkish implementation for an example of how to extend expandContractionsHook/2 to handle noun-noun compounding.


Note that the two previous stages operate on simple strings. They perform simple word transformation rules. In particular, lexical items have not yet acquired part-of-speech labels or feature bundles. That task is reserved for the next stage, Lookup Words.

Example constraints, as described above in the Input Word section, generally pass through Expand Contractions unscathed. Hence, the format of the words presented as input to Lookup Words will be largely unchanged. However, there is a special provision for compound words formed either in Expand Contractions or expandContractionsHook.

The following three compound forms are also valid:

The behaviour of these merge structures during lexical item formation is described in the next section.

Lookup Words

The task of producing labelled lexical items falls to the third stage, namely Lookup Words. Each word is processed as follows:
  1. If the word is a non-compound structure, it is matched against the lexicon using:
    C is the category label and Fs is the list of lexical features associated with Word.

    The behaviour of Lookup Words is determined by the outcome of lexical lookup:


    If the word is a marker, i.e. C = mrkr, no category [mrkr Word] is formed by Lookup Word.

    In PAPPI, markers are special elements appearing in the input that do not project structure like regular categories. Instead, markers are realizations of feature elements that are attached to regular categories.

    For example, in the English implementation, of as in:

    (a) his picture of Mary
    (b) proud of Mary
    is encoded as the realization of genitive Case. Following Knowledge of Language[Chomsky,86], we assume that the heads picture and proud assign genitive Case to the complement NP Mary. In other words, rather than positing a complement PP headed by of in both cases, we get the following two (simpler) parse fragments instead:

    (a) his pictures of Mary (b) proud of Mary

    The lexical entry for of is as follows:

    lex(of,mrkr,[right(np,case(gen),[])]).		  % object genitive Case
    Here, the marker feature right/3 states that of attaches as a feature case(gen) to the NP on the right.

    In general, the possible elements to which marker can attach must be declared using the predicate relevant(C), where C is a category label. For example, for English, we have:

    % relevant for marker constraints
    Non-relevant categories are ignored or skipped-over for the purposes of feature attachment.

    The following table specifies the possible marker features. Each lexical entry for a mrkr item must contain a marker feature. Generally, for each marker feature, C will be a category label, F a feature to be matched and A a feature to be added to the matching relevant item. (The possible forms for F and A are defined later in separate tables):

    Marker FeatureDescription
    left(C,F,A) Marks X, the relevant element of category C to its left. X must satisfy feature constraints F. If the match is successful, the features given in A are added to X. If the match is unsuccesful, Lookup Word fails locally.


    The English possessive marker 's is defined as follows:

    The possessive marker is the realization of genitive Case for those nouns not already morphologically marked as genitive (e.g. possessive personal pronouns like his).
    right(C,F,A) Marks X, the relevant element of category C to its right. X must satisfy feature constraints F. If the match is successful, the features given in A are added to X. If the match is unsuccesful, Lookup Word fails locally.


    The infinitival marker to matches with the (base form) verb to its right, and adds the feature inf([]) to the verb:

    lex(to,mrkr,[right(v,morph(_,[]),inf([]))]).	  % infinitival marker
    See also the of-insertion example described earlier.
    leftec(C,F,A,G,_) As for left(C,F,A) except in the case where the relevant element immediately to its left is not of category C. A new empty category with label C is created and inserted immediately to its left. The features of this new category is given by the goal G, which must of the form goal(Goal,Fs) where Goal is a call to a user-specified predicate that computes a list of features Fs.


    In the Turkish implementation, the plural marker normally marks a singular noun to its left. In the case of a plural-marked adjective like küçükler (small-plr), the following rule produces küçük+[N], where [N] is a plural-marked empty noun:

    emptyNFs/1 is defined by:
    emptyNFs(Fs) :- mkFs([ec(_),a(-),p(-),grid([],[]),agr(_),noECP(lf),
    Note: the agr(_) feature will be instantiated by override(agr([3,pl,[]])).
    rightec(C,F,A,G,_) As for right(C,F,A) except in the case where the element immediately to its right is not of category C. A new empty category with label C is created and inserted immediately to its right. The features of this new category is given by the goal G, which must of the form goal(Goal,Fs) where Goal is a call to a user-specified predicate that computes a list of features Fs.


    From the Turkish implementation, the noun relativizer -ki marks the noun to its right and allows it to take a locative complement:

    Note: the definition of emptyNFs/1 is given in the example for leftec.

    Feature Constraints

    In general, F specifies the pre-conditions for marker attachment in left/right/leftec/rightec. The elements in F are unified with the features of the candidate item according to the following rules:

    [] matches any item.
    F F a feature. Item must contain a feature unifiable with F.
    not(F) F a feature. Item must not contain a feature unifiable with F.
    [F1,..,Fn] Item must satisfy constraints F1 through Fn.
    if(F) where F is a constraint. If F matches, the add feature portion, A is carried out. If there is no match, A is skipped.
    eval(F,G) F a feature, and G a goal. Item must contain a feature unifiable with F and goal G holds.
    eval(if(F),G) F a feature, and G a goal. If item contains a feature unifiable with F, goal G must hold. Otherwise, G is skipped.


    Here, the Turkish genitive Case marker instantiates the morphC(gen) feature of the item immediately to its left. However, if the feature morphC is already present on the item, only nominative Case can be overridden by genitive Case.

    [A note on leftec/rightec: Consecutive leftec markers will not introduce multiple empty categories. For instance, in the Turkish implementation, siyahları (black-pl-acc), to be interpreted as "the black ones", is defined as being an adjective followed by the plural and accusative Case markers. Both of these are leftecs. However, only one empty noun marked for plural number and accusative Case should be generated. Hence, for empty noun generation, PAPPI automatically batches up consecutive leftec and rightec markers.]

    Add Features

    In general, A in left/right/leftec/rightec specifies the features to be added to the matching item. Note that feature instantiation can be carried out during the matching phase. The following table gives the possible ranges of values for the add features:

    Add FeatureDescription
    [] do nothing.
    A A a feature. A is added to the feature list for the matching item if there is no feature already in item that unifies with A.

    Note: compare with add features new(A) and override(A).

    new(A) A a feature. A is added to the feature list for the matching item provided feature is not already present.

    Note: in cases where feature A contains slots, e.g. as in f(V), f/1 is considered to be already present if a feature with the same functor and arity already exists in the matching item. In these cases, f(V) is not added and add feature succeeds quietly.

    override(A) A a feature. A is added to the feature list for the matching item.

    Note: in contrast with new(A) and A, override doesn't care if the feature already exists or not.

    [A1,..,An] Features A1 through An are added to the matching item.
    modify(F,A) F and A features. Matching item must have a feature unifiable with F. If so, A is added to the item.


    In the Japanese implementation, the past tense marker modifies the morph feature of the verb to its left, namely X:

    modify(F,A,G) G a goal, and F and A features. Matching item must have a feature unifiable with F. If so, goal G must hold and A is added to the item.
    suffix(S) S a simple atom or a concatenation expression of the form X+Y where X and Y are simple atoms. The word for the matching item is suffixed with S. If S is not simple, X and Y are first concatenated to form a simple atom.
    suffix(S,K) As for suffix(S) except that K is concatenated onto the end of the value of the k(_) feature for the matching item. As in the case of S, K may be simple or a concatenation expression.


    In the Japanese implementation, the nominative Case marker ga marks the noun to its left by adding -ga to the word and the EUC code a4ac to its k(_) feature:

    See the earlier example of the past tense marker for an example where S and K are non-simple.

    Summarizing, markers are general devices that may be used anywhere where there are words or morphemes that may be simply reduced to features attached to lexical items. See the Hungarian and Turkish lexicons for more examples of how morpheme markers are used in conjunction with the contraction mechanism.

  2. If the word is a compound structure consisting of words W1..Wn, n>1, each component Wi is matched against the lexicon using:
    lexicon(Wi,C,Fs)     for 1<=i<=n
    Each Wi must share the same category label C. A compound zero-level category is formed:
    [C Word]     where Word is 1,..,Wn concatenated.
    Each compound structure must also obey the following rules:

References: lexicon / k(_)

ParsePF Hook

Finally, ParsePF Hook is initially defined to be a null stage. It may be overridden as needed by individual lexicons to perform additional transformations on the input. It's initial definition is as follows:
For instance, see the Hungarian or Japanese lexicon for examples of how ParsePF Hook can be redefined to fill in default lexical feature values.


The output of ParsePF is a sequence of zero-level categories with markers resolved into features that attach to relevant lexical or specially-introduced empty categories.
