QCRI Home Arabic Language Technologies ALT Server Tools Document-level Discourse Parser

About

This package includes:

  • A discourse segmenter
  • A discourse parser

Related publications

  • Shafiq Joty, Giuseppe Carenini, and Raymond Ng. 2015. CODRA: A Novel Discriminative Framework for Rhetorical Analysis. Computational Linguistics, Volume 41:3, MIT Press. [PDF] [BibTeX]
    @article{jotycodra, title={CODRA: A Novel Discriminative Framework for Rhetorical Analysis}, author={Joty, Shafiq and Carenini, Giuseppe and Ng, Raymond T}, volume=41:3, publisher=MIT Press, year={2015} }
  • Shafiq Joty, Giuseppe Carenini, Raymond Ng and Yashar Mehdad. Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria. [PDF] [BibTeX]
    @inproceedings{joty2013combining, title={Combining Intra-and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis.}, author={Joty, Shafiq R and Carenini, Giuseppe and Ng, Raymond T and Mehdad, Yashar}, booktitle={Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics}, pages={486--496}, year={2013} }
  • Shafiq Joty, Giuseppe Carenini and Raymond Ng. A Novel Discriminative Framework for Sentence-Level Discourse Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the Conference on Natural Language Learning (EMNLP-CoNLL 2012), Jeju, Korea. [PDF] [BibTeX]
    @inproceedings{joty2012novel, title={A novel discriminative framework for sentence-level discourse analysis}, author={Joty, Shafiq and Carenini, Giuseppe and Ng, Raymond T}, booktitle={Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, pages={904--915}, year={2012}, organization={Association for Computational Linguistics} }

Download

Demo

Installation

Required for the discourse segmenter:

  1. Charniak's reranking parser. Put it in Tools/CharniakParserRerank and install it.
  2. Taggers from UIUC. Download POS tagger and shallow chunker [LBJPOS.jar, LBJChunk.jar, LBJ2.jar, LBJ2Library.jar] and put these in Tools/UIUC_TOOLs/
  3. Install scikit-learn and scipy [instructions]
  4. Install java if not installed [instructions for Ubuntu]
  5. Make sure the Tools/SPADE_UTILS/bin/edubreak is set to executable.

Required for the discourse parser:

  1. Install wordNet (for example, On ubuntu you can write: apt-get install science-linguistics) and set the WNHOME environment variable to the WordNet directory. WNHOME should contain the dictionary files.
  2. Install WordNet::QueryData (http://search.cpan.org/dist/WordNet-QueryData/QueryData.pm; also provided). To install it properly you may need to set the $wnHomeUnix and $wnPrefixUnix to the appropriate directories.

Usage

For parsing a raw text, you should run discourse segmenter followed by discourse parser.

Running the discourse segmenter:

$ python Discourse_Segmenter.py <infile>

If it shows errors in apply_model method in loading the model, then it is due to differnt versions of the logistic regression in sklearn. To overcome this, open the commented "train_model" in do_segment method and run the segmenter. This learns the model and saves it. If it runs once, you don't need to run train_model again. You should comment it to save time.

Running the discourse parser:

$ python Discourse_Parser.py <discourse segmented file>

License

The Discourse Parser is an Open Source Software, and is released under the Common Public License. You are welcome to use the code under the terms of the licence for research purposes ONLY, however please acknowledge its use with a citation.