To reveal more details in Fig 03, we open up the Transformer model and see an encoding component, a decoding component, and connections between them. Google Translate works in the similar style. Here is an example from Google Translate: Fig 2. See Transformer Model as a black box: sentence in, translation out. In a machine translation application, it would take a sentence in one language, and output its translation in another. Let’s look at the Transformer model as a single black box. Moreover, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence aligned RNNs or convolution.
Understand the Transformer Modelįirst of all, Transformer is a neural network model or deep networks model with encoder-decoder structure, special layer design (i.e., multi-head self-attention layer) and connections. Notification: Before reading the code, please review the PyTorch nn.model frameworks WHAT IS TORCH.NN REALLY. GitHub Repository: Annotated-Transformer-English-to-Chinese-Translator. If you are looking for better performance, large data set are required and you will need GPU support to reduce the training time. The data set is a small set including around 10k sentence for training which is sufficient for learning and get the basic job done. In this post, we will build a Transformer model to translate English sentence to Chinese sentences with PyTorch instead of applying the popular Hugging-face packages. The original Harvard NLP blog applied the English- to-German translation task, but I adapted it to the English- to-Chinese translation task in this work. The English-to-Chinese translator project is based on my NLP training camp from GreedAI Course.
English to chinese translator code#
The code here is based heavily on harverdnlp : The Annotated Transformer.
The document images and styles are inspired by The Illustrated Transformer from Jay Alammer. To follow along, you are required t know neural networks and word embedding, and Python and familiar with PyTorch package and Nvidia CUDA drives. I have reordered and deleted some sections from the original paper and also added comments/figures and explanations throughout the post to keep model structure clear and the code block more meaningful. In this post I present an “annotated” version of the paper in the form of a line-by-line implementation to build an English-to-Chinese translator.