Natural Language Processing (NLP) is the automatic analysis of human languages such as English, Korean, etc. by computer algorithms. Unlike artificially created programming languages where the structure and meaning of programs is easy to encode, human languages provide an interesting challenge, both in terms of its analysis and the learning of language from observations.
Success in NLP implies great benefits to society. Imagine a world where you can pick up a phone and talk in English, while at the other end of the line your words are spoken in Chinese. Imagine a computer animated representation of yourself speaking fluently what you have written in an email. Imagine medical experts automatically uncovering protein/drug interactions in gigabytes of medical abstracts–a quantity of text no human could possibly read and summarize. Imagine feeding a computer an ancient script that no living person can remember, then listening as the computer reads aloud in this dead language.
NLP can be used for the transduction of one linguistic form to another or parsing of language into a structured form. Transduction of language involves summarizing, paraphrasing or translating languages. Parsing involves conversion of unstructured data into a structured form, such as speech into text or large text collections like the web into informative labels. Examples of parsing include identifying a group of words as a person’s name or identifying the recursive grammar of a language.
Like other artificial intelligence sub-fields, there is the issue of what knowledge the computer needs to process human language, and how is knowledge is obtained. For example, if the computer is to “learn” this knowledge, what can be learned automatically, and what can be achieved with human supervision.
Our lab has a particular focus on research into statistical machine translation and the visual and textual summarization of information contained in natural language.