SIGLEX has currently identified 59 lexical resources as having special interest to SIGLEX members. You can search this list by resource name, type, language, or keywords. You can also suggest a lexical resource to be added to this list. Also, check the ACL List of Resources by Language. (A prior set of links to SIGLEX Online Reources is currently being integrated into the SIGLEX Lexical Resources database, but some older links may be of interest.)

Name or Keywords:
Resource type:
Other types:

show all hide all

Google Web 1T 5-Gram Database

Primary resource type: Tools and software:Concordancers; Other resource tags: None; Resource language: Not language-specific; Availability: Public; Sponsor: Computational Linguistics group at the Institute of Cognitive Science, University of Osnabrück

The Google Web 1T 5-Gram Database is a collection of frequent 5-grams extracted from approximately 1 trillion words of Web text collected by Google Research. This Web interface allows you to run interactive queries on an indexed version of the database (1) to display the most frequent N-grams matching a specified search pattern, (2) for collocational patterns such as "carrying * to *" (where * marks collocate positions) and rank them by association strength, using one of four standard association measures, or (3) to determine pseudo-collocations of a given node word, ranked according to one of five standard association measures. (34)