UWM: Applying an Existing Trainable Semantic Parser to Parse Robotic Spatial Commands

This paper describes Team UWM’s sys-tem for the Task 6 of SemEval 2014 for doing supervised semantic parsing of robotic spatial commands. An existing semantic parser, K RISP , was trained us-ing the provided training data of natural language robotic spatial commands paired with their meaning representations in the formal robot command language. The entire process required very little manual effort. Without using the additional annotations of word-aligned semantic trees, the trained parser was able to exactly parse new commands into their meaning representations with 51 . 18% best F-measure at 72 . 67% precision and 39 . 49% recall. Re-sults show that the parser was particularly accurate for short sentences.


Introduction
Semantic parsing is the task of converting natural language utterances into their complete formal meaning representations which are executable for some application. Example applications of semantic parsing include giving natural language commands to robots and querying databases in natural language. Some old semantic parsers were developed manually to work for specific applications (Woods, 1977;Warren and Pereira, 1982). However, such semantic parsers were generally brittle and building them required a lot of manual effort. In addition, these parsers could not be ported to any other application without again putting significant manual effort.
More recently, several semantic parsers have been developed using machine learning (Zelle and This work is licensed under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http://creativecommons.org/licenses/by/4.0/ Mooney, 1996;Ge and Mooney, 2005;Zettlemoyer and Collins, 2005;Wong and Mooney, 2006;Kate and Mooney, 2006;Lu et al., 2008;Kwiatkowski et al., 2011). In this approach, training data is first created for the domain of interest. Then using one of the many machine learning methods and semantic parsing frameworks, a semantic parser is automatically learned from the training data (Mooney, 2007). The trained semantic parser is then capable of parsing new natural language utterances into their meaning representations. Semantic parsers built using machine learning tend to be more robust and can be easily ported to other application domains with appropriate domain-specific training data.
The Task 6 of SemEval 2014 provided a new application domain for semantic parsing along with training and test data. The domain involved giving natural language commands to a robotic arm which would then move blocks on a board (Dukes, 2013). The domain was inspired from the classic AI system SHRDLU (Winograd, 1972). The training data contained 2500 examples of sentences paired with their meaning representations in the Robot Command Language (RCL) which was designed for this domain (Dukes, 2013). The test data contained 909 such example pairs.
We trained an existing and freely available 1 semantic parser KRISP (Kate and  using the training data for this domain. Besides changing the format of the data for running KRISP and writing a context-free grammar for the meaning representation language RCL, the entire process required minimal manual effort. The author spent less than a week's time for participating in the Task 6, and most of it was spent in running the experiments. This demonstrates that trainable semantic parsers like KRISP can be rapidly adopted to new domains. In the Results section we show different precisions and recalls it ob-tained at different confidence levels in the form of a precision-recall curve. The results also show that the parser was particularly accurate on shorter sentences. Two major reasons that prevented KRISP from performing better on this domain were -its high computational demand for memory which prevented it from being trained beyond 1500 training examples, and some variability in the meaning representation language RCL that negatively affected training as well as evaluation. 2 Background: KRISP Semantic Parser KRISP (Kernel-based Robust Interpretation for Semantic Parsing) is a trainable semantic parser (Kate and ) that uses Support Vector Machines (SVMs) (Cristianini and Shawe-Taylor, 2000) as the machine learning method with stringsubsequence kernel (Lodhi et al., 2002). It takes natural language utterances and their corresponding formal meaning representation as the training data along with the context-free grammar of the meaning representation language (MRL). The key idea in KRISP is that every production of the MRL is treated as a semantic concept. For every MRL production, an SVM classifier is trained so that it can give for any input natural language substring of words the probability that it expresses the corresponding semantic concept. Once these classifiers are trained, parsing a sentence reduces to finding the most probable semantic derivation of the sentence in which different productions cover different parts of the sentence and together form a complete meaning representation. Figure 1 shows an example semantic derivation of a robotic spatial command. Productions of RCL grammar (Table 1) are shown at tree nodes depicting different parts of the sentence they cover.
Since the training data is not in the form of such semantic derivations, an EM-like iterative algorithm is used to collect appropriate positive and negative examples in order to train the classifiers (Kate and . Positive examples are collected from correct semantic derivations derived by the parser learned in the previous iteration, and negative examples are collected from the incorrect semantic derivations. KRISP was shown to work well on the US geography database query domain (Tang and Mooney, 2001) as well as on the RoboCup Coach Language (CLang) domain (Kate et al., 2005). It was also shown to be particularly robust to noise in Figure 1: Semantic derivation of the robotic spatial command "pick up the turquoise pyramid" obtained by KRISP during testing which gives the correct RCL representation (event: (action: take) (entity: (color: cyan) (type: prism))).
the natural language utterances (Kate and . KRISP was later extended to do semisupervised semantic parsing (Kate and Mooney, 2007b), to learn from ambiguous supervision in which multiple sentences could be paired with a single meaning representation in the training data (Kate and Mooney, 2007a), and to transform the MRL grammar to improve semantic parsing (Kate, 2008).

Methods
In order to apply KRISP to the Task 6 of SemEval 2014, the format of the provided data was first changed to the XML-type format that KRISP accepts. The data contained several instances of co-references which was also part of RCL, but KRISP was not designed to handle co-references and expects them to be pre-resolved. We observed that almost all co-references in the meaning representations, indicated by "reference-id" token, resolved to the first occurrence of an "entity" element in the meaning representation. This was found to be true for more than 99% of the cases. We used this observation to resolve coreferences during semantic parsing in the following way. As a pre-processing step, we first remove from the meaning representations all the "id:" tokens (these resolve the references) but keep the "reference-id:" tokens (these encode presence of co-references). The natural language sentences are not modified in any way and the parser learns from the training data to relate words like "it" and "one" to the RCL token "reference-id". After KRISP generates a meaning representation during testing, as a post-processing step, "id: 1" is added to the first "entity" element in the meaning representation if it contains the "reference-id:" token.
The context-free grammar for RCL was not provided by the Task organizers. There are multi-ple ways to write a context-free grammar for a meaning representation language and those that conform better to natural language work better for semantic parsing (Kate, 2008). We manually wrote grammar for RCL which mostly followed the structure of the meaning representations as they already conformed highly to natural language commands and hence writing the grammar was straightforward. KRISP runs faster if there are fewer non-terminals on the right-handside (RHS) of the grammar because that makes the search for the most probable semantic derivation faster. Hence we kept non-terminals on RHS as few as possible while writing the grammar. Table 1 shows the entire grammar for RCL that we wrote which was given to KRISP. The nonterminals are indicated with a "*" in their front. We point out that KRISP needs grammar only for the meaning representation language (an application will need it anyway if the statements are to be executed) and not for the natural language.
KRISP's training algorithm could be aided by providing it with information about which natural language words are usually used to express the concept of a production. For example, word "red" usually expresses "*color: → ( color: red )". The data provided with the Task 6 came with the wordaligned semantic trees which indicated which natural language words corresponded to which meaning representation components. This information could have been used to aid KRISP, however, we found many inconsistencies and errors in the provided word-aligned semantic trees and chose not to use them. In addition, KRISP seemed to learn most of that information on its own anyway.
We further analyzed the results according to the lengths of the sentences and found that KRISP was  very accurate with shorter sentences and became progressively less accurate as the lengths of the sentences increase. Table 2 shows these results. This could be simply because the longer the sentence, the more the likelihood of making an error, and since no partial credit is given, the entire output meaning representation is deemed incorrect.
On further error analysis we observed that there was some variability in the meaning representations. The "move" and "drop" actions seemed to mean the same thing and were used alternatively. For example in the training data, the utterance "place the red block on single blue block" had "(action: drop)" in the corresponding meaning representation, while "place red cube on grey cube" had "(action: move)", but there is no apparent difference between the two cases. There were many such instances. This was confusing KRISP's training algorithm because it would collect the same phrase sometimes as a positive example and sometimes as a negative example. This also affected the evaluation, because KRISP would generate "move" which won't match "drop", or vice-versa, and the evaluator will call it an error.

Conclusions
We participated in the SemEval 2014 Task 6 of supervised semantic parsing of robotic spatial commands. We used an existing semantic parser learner, KRISP, and trained it on this domain which required minimum time and effort from our side. The trained parser was able to map natural language robotic spatial commands into their formal robotic command language representations with good accuracy, particularly for shorter sentences.