XLNET Transformer
an extension of xl-transformer

NamelessFather

3 min readMar 2, 2023

XLNet is a state-of-the-art language representation model developed by Google in 2019
It is a Transformer-based model that utilizes permutation-based training, unlike BERT which uses a masked language modeling objective
XLNet outperforms BERT on a variety of NLP tasks, including sentiment analysis, question answering, and document classification

What is Masked Language Modeling (MLM)?

In order to achieve bidirectional representation , 15% of tokens in the input sentence are masked at random. The transformer is trained to predict the masked words. For example, consider the sentence — “The cat sat on the wall”. The input to BERT would be “The cat [MASK] on the [MASK]”.

Permutation Language Modeling (PLM)

PLM is the idea of capturing bidirectional context by training an autoregressive model on all possible permutation of words in a sentence. Instead of fixed left-right or right-left modeling

Problem Statement And Solution

Despite the great success of Transformer-based models in NLP, they still have limitations when it comes to capturing dependencies in long sequences BERT, for example, uses a masked language modeling objective that can only capture dependencies within a limited context XLNet overcomes the limitations of BERT by utilizing a permutation-based training approach. It processes all the tokens in a sequence at once and uses the permuted objective to capture dependencies between all tokens, rather than just the ones in a limited context.

XLNET INPUT RANGE

The maximum length for input sequences in XLNet is configurable and can be set to a value up to the memory limit of your system.The maximum length depends on the specific XLNet architecture and the resources available during training and inference.By default input sequence length in XLNET is set to 512 tokens However, we can increase this length by passing the max_length argument when instantiating the model. (max_length=maxlen).Powerful advantage that XLNET has over BERT(and other Transform based Models ) is that unlike BERT, which has a 512 token input limit, XLNET is one of the few models that has no sequence length limit.

IMPLEMENTATION

I used Mental-Health dataset in Kaggle .It is a binary class dataset with labels 0 and 1
1: unhealthy
0:healthy
DatasetLink: https://www.kaggle.com/datasets/reihanenamdari/mental-health-corpus