Recent bugfixes

Version 2.1 (6 Aug 2004)

- includes new MontyNLGenerator component generates sentences and summaries

Version 2.0.1

- fixes API bug in version 2.0 which prevents java api from being callable

What is MontyLingua? [top]

MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial!

Version 2.0 is substantially FASTER, MORE ACCURATE, and MORE RELIABLE than version 1.3.1. It has now been tested across Windows, many flavors of UNIX, and Mac OS X, and several flavors of Java, and is in use by several university research projects and under several commercial settings.

MontyLingua differs from other natural language processing tools because:

it is complete end-to-end.. input raw_text; output semantic interpretation

not many dated tools and implementations sewn together; it is one well-integrated implementation

it does not require "training" and other fidgetting, and will work right out-of-the-box

it is enriched with "common sense" knowledge about the everyday world, allowing it to escape many stupid interpretive mistakes. e.g.: "(NX the/DT mosquito/NN bit/NN NX) (NX the/DT boy/NN NX)" ==corrected==> "(NX the/DT mosquito/NN NX) (VX bit/VBD VX) (NX the/DT boy/NN NX)"

it is lightweight and portable across platforms, written in portable Python and also available as a compiled Java library

it is easy to customize by allowing for a user lexicon

MontyLingua performs the following tasks over text:

MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. "you're" ==> "you are" MontyTagger - Part-of-speech tagging based on Brill94, enriched with common sense. MontyChunker - Lightning fast regular expression chunker MontyExtractor - Extracts phrases and subject/verb/object triplets from sentences MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form MontyNLGenerator - Uses MontyLingua's concise predicate-arg representation to generate naturalistic English sentences and text summaries

* free for non-commercial use. please see MontyLingua Version 2.0 License

Terms of Use [top]

Author: Hugo Liu <hugo@media.mit.edu>

Project Page: <http://web.media.mit.edu/~hugo/montylingua/>

Terms of Use Copyright (c) 2002-2004 by Hugo Liu, MIT Media Lab

All rights reserved.



Non-commercial use is free, as provided in the MontyLingua version 2.0 License. By downloading and using MontyLingua, you agree to abide by the additional copyright and licensing information in "license.txt", included in this distribution.



If you use this software in your research, please acknowledge MontyLingua and its author, and link to back to the project page http://web.media.mit.edu/~hugo/montylingua. Please cite montylingua in academic publications as: Liu, Hugo (2004). MontyLingua: An end-to-end natural

language processor with common sense. Available

at: web.media.mit.edu/~hugo/montylingua.

Documentation [top]

Documentation and License python documentation and api (html) [.html] java documentation and api [.html] MontyLingua license [.txt]

by downloading and using MontyLingua you must agree to these terms

Version 2.1 (6 Aug 2004)

- includes new MontyNLGenerator component generates sentences and summaries

Version 2.0.1

- fixes API bug in version 2.0 which prevents java api from being callable

New in version 2.0 (29 Jul 2004)

2.5X speed enhancement for whole system, 2X speed enhancement for tagger component

rule-based chunker replaced with much faster and more accurate regular expression chunker

common sense added to MontyTagger component improves word-level tagger accuracy to 97%

updated and expanded lexicon for English

added a user-customizable lexicon CUSTOMLEXICON.MDF

improvements to MontyLemmatiser incorporating exception cases

html documentation added

speed optimizations to all code

improvements made to semantic extraction

expanded Java API

Download MontyLingua [top]

Please read the following information to proceed to the download of Version 2.1 for Java and Python.

MontyLingua version 2.1 Terms of Use Copyright (c) 2002-2004 by Hugo Liu, MIT Media Lab All rights reserved. Non-commercial use is free, as provided in the MontyLingua version 2.0 License. By downloading and using MontyLingua, you agree to abide by the additional copyright and licensing information in "license.txt", included in this distribution. If you use this software in your research, please acknowledge MontyLingua and its author, and link to back to the project page http://web.media.mit.edu/~hugo/montylingua. Please cite montylingua in academic publications as: Liu, Hugo (2004). MontyLingua: An end-to-end natural language processor with common sense. Available at: web.media.mit.edu/~hugo/montylingua. If you have read and agree to the terms of use, click below to continue to the download

(your IP address will also be recorded): (Download is a 12 MB zip file)

READ THIS if you are running ML on Mac OS X, or Unix

The distribution ZIP includes datafiles designed for windows. If you are running MontyLingua on Unix or Mac OS X, and the phrase "I love you" is tagged incorrectly, then the datafiles need to be rebuilt. This is simple:

delete all files of the form, FASTLEXICON_n.MDF, where n is a number. re-run the MontyLingua program, either from Python, or Java, and the correct datafiles will be rebuilt. If running Java and you run out of memory during the rebuild process, use the -MX or -Xmx option in Java to increase the memory size. You will only need to rebuild these datafiles once.

Research and Industry Applications which use MontyLingua [top]

These are some of the research and industry projects which use MontyLingua and MontyTagger. To submit your project, email a web url and short description to the author .

William W. Cohen (2004) Minorthird: Methods for Identifying Names and Ontological Relations in Text using Heuristics for Inducing Regularities from Data, http://minorthird.sourceforge.net (website)

Jacob Eisenstein and Randall Davis. Visual and Linguistic Information in Gesture Classification. Accepted to International Conference on Multimodal Interfaces (ICMI'04) (paper)

L. Xie, L. Kennedy, S.-F. Chang, A. Divakaran, H. Sun, C.-Y. Lin (2004). "Discovering Meaningful Multimedia Patterns with Audio-visual Concepts and Associated Text." IEEE International Conference on Image Processing (ICIP 2004), Singapore, October 2004. (paper)

Ashwani Kumar, Sharad C. Sundararajan, Henry Lieberman (2004). Common Sense Investing: Bridging the Gap Between Expert and Novice. Conference on Human Factors in Computing Systems (CHI 04), Vienna, Austria. (paper) (website)