Interactive Search
Motivation
For ease of interactive exploration and comparison between our three target representations, there is an experimental search interface (hosted at the University of Oslo; see Kouylekov & Oepen, 2014). Using a query-by-example approach, it is possible to retrieve instances of specific semantic phenomena, across different annotations, and inspect matching semantic dependency graphs graphically.
Query Language
The SDP search interface interprets a simple set of search operators, collectively dubbed the WeSearch Query Language (WQL). By way of informal introduction, consider the following example query:
/v*[ARG* x] quarterly[ARG1 x] x:+result
The query above is comprised of three predications, conventionally shown as one per line. In this example, the following characters have operator status: ‘/’ (slash), ‘*’ (asterisk), ‘[’ and ‘]’ (left and right square bracket), ‘:’ (colon), and ‘+’ (plus sign). This is a near-complete list of operator characters in WQL. Each predication can be composed of (i) an identifier, followed by a colon if present; (ii) a form pattern; (iii) a lemma pattern, prefixed by a plus sign, if present; (iv) a part-of-speech (PoS) pattern, prefixed by a slash, if present; a sense pattern, prefixed by an equal sign, if present; and (vi) a list of arguments, enclosed in square brackets, if present. Patterns can make use of Lucene-style wildcards, with the asterisk matching any number of characters, and a question mark (‘?’) to match a single character.
Argument specifications in WQL take the form of role–value pairs, where roles draw from a fixed inventory of pre-defined argument labels (specific to each target representation), and values are predication identifiers defined in other parts of the query. The role label and value are separated by whitespace, and multiple arguments can be specified within the list by using a comma (‘,’) as the separator. In role labels, wildcards can be used just like in other query fields.
Thus, our example query above searches for a verbal predicate (any PoS tag starting in ‘v’), that takes any form of the lemma ‘result’ as its argument (this query is designed for the DM representation, where regular argument relations take the form ARG1 ... ARGn). The query processor will ensure a one-to-one correspondence between query elements and matching graph elements, i.e. multiple distinct query components cannot match against the same target (graph component), or vice versa. Lemma and PoS patterns, as well as role labels, are not case-sensitive.
In addition to the query proper, the search interface provides a set of radio buttons to select which of the three target representations to query; this selection can have implications for the matching of representations-specific properties (e.g. lemmas and predicate senses) and for the interpretation of underspecified role labels (see below). It is possible to search multiple representations in parallel (all three are active by default), and independent of the active set of representations for the search, annotations in all target representations will always be presented for inspection for the items that matched the query.
The result page uses a tabbed display organization, aiming to make it easy to switch between target representations and graph or tabular display of matching items. Color highlighting is used to indicate which parts of each result structure were matched by corresponding components of the query; as there can be more than one match in a single result, the interface allows ‘cycling through’ individual matches, one by one.
Boolean Connectives
In our example query above, the individual predications are implicitly conjoined, i.e. all three need to be matched against a candidate result graph for the query to be satisfied (formally, one might say that the whitespace separating predications serves as a conjunction operator). Albeit with somewhat mixed feelings, we further experiment with additional boolean connectives in WQL, viz. negation (‘!’, exclamation point) and disjunction (‘|’, vertical bar); to complement these logical operators, parentheses (‘(’ and ‘)’) can be used to group expressions, to make explicit or override the scoping of logical operators. By default, negation and conjunction bind stronger (i.e. scope narrowly) than disjunction (which scopes widely, i.e. at the top level or within an enclosing logical group).
More Examples
Following is a more complex example, searching for object equi verbs and taking advantage of an underspecified role label:
[ARG2 x, ARG* e] e:/v*[ARG1 x]
A similar effect, requiring the ‘downstairs’ predicate to be any type of argument (within certain assumptions about the applicable range of role labels) to the ‘upstairs’ one, could instead be achieved using a disjunctive statement (note the need for logical grouping of the two disjuncts, in relation to the conjunction):
( [ARG2 x, ARG3 e] | [ARG2 x, ARG4 e] ) e:/v*[ARG1 x]
The following query demonstrates the use of the top operator (‘^’), to retrieve graphs rooted in a coordinate structure, i.e. where the top node has an outgoing dependency matching the pattern ‘_*_c’ (again, assuming the DM representations); here, specification of the role value can be omitted, as there is no predication constraining the argument node:
^[_*_c]
As an example of the (experimental) use of negation to filter candidate results, the following query will match occurences of verbal nodes that have no outgoing or incoming argument links:
x:/v* !x:[* y] ![* x]
However, in early August 2014, the definition and implementation of boolean operators in WQL to some degree is still work in progress.
Full List of Operators
- ^ (caret), constrains the node to be a top node (must be predication-initial);
- : (colon), separates optional node identifier from node content;
- [ and ] (left and right square brackets), separate outgoing arcs;
- (whitespace), separates role labels and values in list of arcs;
- , (comma), separates role–value pairs within list of outgoing arcs;
- = (plus sign), indicates (optional) sense object property;
- + (equal sign), indicates (optional) lemma object property;
- / (slash), indicates (optional) pos property;
- ? (question mark), Lucene-style single-character wildcard;
- * (asterisk), Lucene-style arbitrary sub-string wildcard;
- ( and ) (left and right square parentheses), group sub-expressions (see below);
- | (vertical bar), logical disjunction of predications or groups;
- ! (exclamation mark), reserved for negation (must precede a predication or logical group);
- \ (backslash), escape character, suppress operator status for any of the above.