XLNET Transformer
an extension of xl-transformer

NamelessFather
3 min readMar 2, 2023

--

  • XLNet is a state-of-the-art language representation model developed by Google in 2019​
  • It is a Transformer-based model that utilizes permutation-based training, unlike BERT which uses a masked language modeling objective​
  • XLNet outperforms BERT on a variety of NLP tasks, including sentiment analysis, question answering, and document classification

What is Masked Language Modeling (MLM)?

  • In order to achieve bidirectional representation , 15% of tokens in the input sentence are masked at random. The transformer is trained to predict the masked words. For example, consider the sentence — “The cat sat on the wall”. The input to BERT would be “The cat [MASK] on the [MASK]”.

Permutation Language Modeling (PLM)

  • PLM is the idea of capturing bidirectional context by training an autoregressive model on all possible permutation of words in a sentence. Instead of fixed left-right or right-left modeling​

Problem Statement And Solution

Despite the great success of Transformer-based models in NLP, they still have limitations when it comes to capturing dependencies in long sequences​ BERT, for example, uses a masked language modeling objective that can only capture dependencies within a limited context​ XLNet overcomes the limitations of BERT by utilizing a permutation-based training approach.​ It processes all the tokens in a sequence at once and uses the permuted objective to capture dependencies between all tokens, rather than just the ones in a limited context.​

XLNET INPUT RANGE

​The maximum length for input sequences in XLNet is configurable and can be set to a value up to the memory limit of your system.​The maximum length depends on the specific XLNet architecture and the resources available during training and inference.​By default input sequence length in XLNET is set to 512 tokens However, we can increase this length by passing the max_length argument when instantiating the model. (max_length=maxlen).​Powerful advantage that XLNET has over BERT(and other Transform based Models ) is that unlike BERT, which has a 512 token input limit, XLNET is one of the few models that has no sequence length limit.

IMPLEMENTATION

I used Mental-Health dataset in Kaggle ​ .It is a binary class dataset with labels 0 and 1

1: unhealthy

0:healthy

DatasetLink: https://www.kaggle.com/datasets/reihanenamdari/mental-health-corpus

--

--

NamelessFather
NamelessFather

Written by NamelessFather

"Things may come to those who wait, but only the things left by those who hustle"

No responses yet