There are several fundamentally different approaches to the construction of machine translation algorithms: rule-based, statistical, or statistical-based, neural machine translation (NMT). The first approach is traditional and is used by most developers of machine translation systems (PROMT in Russia, SYSTRAN in France, Linguatec in Germany, etc.).
The second type includes the popular Yandex.Translate service, Google Translate, as well as a new service from ABBYY. Now most systems are hybrid – combining rules, statistics and neural networks.
Statistical machine translation
Statistical machine translation is a type of machine translation of text based on the comparison of large volumes of language pairs.
Language pairs – texts containing sentences in one language and the corresponding sentences in the second, can be either variants of writing two sentences by a person who is a native speaker of two languages, or a set of sentences and their translations made by a person.
Thus, statistical machine translation has the property of “self-learning”. The more language pairs are available and the more closely they match each other, the better the result of statistical machine translation.
The concept of “statistical machine translation” refers to a general approach to solving the problem of translation, which is based on finding the most probable translation of a sentence using data obtained from a bilingual body of texts.
Rule-Based Machine Translation
A general term that refers to machine translation systems based on linguistic information about the source and target languages. They consist of bilingual dictionaries and grammars covering the main semantic, morphological, syntactic patterns of each language.
This approach to machine translation is also called classical. Based on these data, the source text is sequentially, by sentences, converted into the translated text. These systems are contrasted with machine translation systems that are based on examples.
The principle of operation of such systems is the connection between the structure of the input and output sentences. The main advantage of transfer-based systems is the high completeness of text coverage with an acceptable level of translation quality, as well as low costs for primary development and modernization.
Example-based machine translation
Is a method of machine translation that is often characterized by using a bilingual corpus with parallel texts as the main knowledge base during translation. Essentially, it is translation by analogy, which can be thought of as applying case-based reasoning to machine learning.
Machine translation by examples is based on the idea of translation by analogy. Applied to the process of human translation, the idea that translation is done by analogy is a rejection of the idea that people translate sentences by making deep linguistic analysis.
Instead, this idea is based on the belief that people translate by first parsing sentences into specific phrases, then translating those phrases, and finally putting those pieces together correctly into one long sentence.
Phrase-by-phrase translations are performed similarly to previous translations. The principle of translation by analogy is encoded in example-based machine translation through translation examples that are used to train such a system. Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to study the translation process.
Example-based machine translation was first proposed by Makoto Nagao in 1984. Nagao pointed out that this type of translation is specially adapted for translation when it comes to two completely different languages, such as English and Japanese.
In this case, one sentence can be translated into several well-structured sentences in another language, so it makes no sense to do deep linguistic analysis, which is typical for rule-based machine translation.