What is Natural language Processing?
Natural Language Processing (NLP) aims to acquire, understand and generate the human languages such as English, French, Tamil, Hindi, etc. A language is a system, a set of symbols and a set of rules (or grammar).
NLP is a convenient description for all attempts to use computers to process natural language. NLP is also an area of artificial intelligence research that attempts to reproduce the human interpretation of language for computer system processing. The ultimate goal of NLP is to determine a system of language, words, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation.
NLP includes anything a computer needs to understand natural language (written or spoken) and also generate the natural language. To build computational natural language systems, we need Natural Language Understanding (NLU) and Natural Language Generation (NLG). NLG systems convert information from computer databases into normal-sounding human language, and NLU systems convert samples of human language into more representation that are easier for computer programs to manipulate.
Components of NLP
NLP encompasses anything a computer needs to understand natural language (typed or spoken) and also generate the natural language.
Natural language understanding (NLU)
The NLU task is understanding and reasoning while the input is a natural language. Here we ignore the issues of natural language generation.
Natural Language Generation (NLG)
NLG is a subfield of natural language processing NLP.NLG is also referred to text generation.
Major Application of Natural Language Processing
NLP is having a very important place in our day-to-day life due to its large natural language applications. By means of these NLP applications, the user can interact with computers in their own mother tongue by means of a keyword and a screen. The few NLP processes are:
- Part-of-speech tagging
- Information retrieval
- Machine translation
- Question answering
- Spoken dialogue system
- Speech recognition etc.
Steps of Natural Language Processing (NLP)
Natural Language Processing is done at different levels.
Phonological Analysis: Phonology is the study of a sound system in a language. The minimal unit of a sound system is the phoneme which is capable of distinguishing the meanings in the words.
The phonemes combine to form a higher level unit called syllable and syllables combine to form the words. Therefore, the organization of the sounds in a language exhibits the linguistic as well as computational challenges for its analysis.
Morphological Analysis: This level deals with the componential nature of words, which are composed of morphemes – the smallest units of semantic meaning. For example, the word preregistration can be morphologically analyzed into three separate morphemes: the prefix pre, the root ‘registra’, and the suffix ‘-tion’. Since the meaning of each morpheme remains the same across words, humans can break down an unknown word into its constituent morphemes in order to understand its meaning. Similarly, an NLP system can recognize the meaning conveyed by each morpheme in order to gain and represent meaning.
For example, adding the suffix ‘-ed’ to a verb conveys that the action of the verb took place in the past. This is a key piece of meaning, and in fact, is frequently only evidenced in a text by the use of the -ed morpheme.
Lexical Analysis: At this level, humans, as well as NLP systems, interpret the meaning of individual words. Several types of processing contribute to word-level understanding – the first of these being assignment of a single part-of-speech (POS) tag to each word.
In this processing, words that can function as more than one part-of-speech are assigned the most probable part-of-speech tag based on the context in which they occur. The lexical level may require a lexicon, and the particular approach taken by an NLP system will determine whether a lexicon will be utilized, as well as the nature and extent of information that is encoded in the lexicon.
Syntactic Analysis: Syntactic analysis uses the results of morphological analysis and lexical analysis to build a structural description of the sentence. The goal of this process, called parsing, is to convert the flat list of words that form the sentence into a structure that defines the units that are represented by that flat list.
The important thing here is that a flat list of words has been converted into a hierarchical structure and that the structures correspond to meaning units when the semantic analysis is performed.
Semantic Analysis: It derives an absolute (dictionary definition) meaning from context; it determines the possible meaning of a sentence in a context. The structures created by the syntactic analyzer are assigned meaning.
Thus, a mapping is made between individual words into appropriate objects in the knowledge base or database. It must create the correct structure s to correspond to the way the meaning of the individual words combines with each other. The structures for which no such mapping is possible are rejected.
Example: the sentence “colorless green ideas……..” would be rejected as it has no such semantic mapping because colorless and green make no sense.
Discourse Integration: The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it.
Example: the meaning of the word “it” in the sentence, “you wanted it” depends on the previous discourse context.
Pragmatic Analysis: It derives knowledge from external commonsense information; it means understanding the purposeful use of language in situations, particularly those aspects of language which require world knowledge.
Example: If someone says “the door is open” then it is necessary to know which door “the door” refers to; here it is necessary to know what the intention of the speaker: could be a pure statement of fact, could be an explanation of how the cat got in or could be a request to the person addressed to close the door.