In 2014, an article was published with a brief description of the idea of applying deep learning neural networks to machine translation. In the upper Internet, no one noticed it at all, but in Google laboratories they began to actively dig. Two years later, in November 2016, an announcement appeared on the Google blog that turned the game around.
The idea was similar to transferring style between photos. Remember apps like Prisma that edited photos in the style of a famous artist?
There was no special magic there – the neural network was trained to recognize the paintings of the artist, and then the last layers where it makes a decision were “torn off”. The resulting guts, in fact, an intermediate representation of the network, was that very stylized picture. She sees it that way, and we are beautiful.
If with the help of a neural network we can transfer the style to the photo, then what if we try to impose a different language on our text in a similar way? Present the language of the text as the same “artist’s style”, trying to transfer it, preserving the essence of the image (that is, the essence of the text).
Now what if we represent the source text as a set of the same characteristic properties? In essence, encode it so that then another neural network, the decoder, decodes them back into text, but in a different language.
We will specifically train the decoder to know only its own language. He has no idea where the characteristics came from, but he knows how to express them, say, in Spanish.
Continuing the analogy: what difference does it make to you than to draw the dog I described – with pencils, watercolors or a finger through the mud. Draw the best you can.
Once again: the first neural network can only encode a sentence into a set of numbers-characteristics, and the second can only decode them back into text. Both have no idea about each other, each knows only its own language.
But how to find these characteristics?
Everything is clear with the dog, it has paws and other parts of the body, but what about the texts? Scientists 30 years ago already tried to craft a universal language code, it ended in complete failure.
But we now have diplerning, which does just that! The main difference between diplerning and classical neural networks was precisely that its networks are trained to find the characteristic properties of objects without understanding their nature.
If you have a sufficiently large neural network and a couple of thousand video cards in your stash, you can try to find such characteristics in the text!
Theoretically, the characteristics obtained by neural networks can then be given to linguists and they will discover a lot of new things for themselves.
The only question is what type of neural network to use in the encoder and decoder. For pictures, convolutional neural networks (CNNs) are great because they work with independent blocks of pixels.
But there are no independent blocks in the text, each next word depends on the previous and even subsequent ones. Text, speech and music are always consistent.
Recurrent neural networks (RNNs) are better suited for processing them, because they remember the previous result. In our case, these are the previous words in the sentence.
RNNs are now used in many places: speech recognition in Siri (we parse a sequence of sounds, where each one depends on the previous one), word suggestions on the keyboard (we remember the previous ones and guess the next one), music generation, even chatbots.
The architectures of neural translators vary greatly.
At first, the researchers used ordinary RNNs, then switched to bidirectional ones – the translator took into account not only the words before, but also after the desired word. It was much more efficient that way. Then they generally went hardcore, using multilayer RNNs with LSTM cells for long-term storage of the translation context.
In two years, neural networks have surpassed everything that has been invented in translation over the past 20 years. Neural translation made 50% fewer word order errors, 17% fewer lexical errors, and 19% fewer grammatical errors. Neural networks even learned to coordinate gender and cases in different languages themselves, no one taught them this.
The most notable improvements were where a direct translation never existed. Statistical translation methods have always worked through English. If you translated, for example, from Spanish into German, the machine would first convert the text into English, and only then translate it into German.
Double loss. Neural translation does not need this – connect any decoder and let’s go. For the first time, it was possible to directly translate between languages that did not have a single common vocabulary.