Fine-tune Transformers in PyTorch using Hugging Face Transformers
Complete tutorial on how to fine-tune 73 transformer models for text classification — no code changes necessary!
Info
This notebook is designed to use a pretrained transformers model and fine-tune it on a classification task. The focus of this tutorial will be on the code itself and how to adjust it to your needs.
This notebook is using the AutoClasses from transformer by Hugging Face functionality. This functionality can guess a model's configuration, tokenizer and architecture just by passing in the model's name. This allows for code reusability on a large number of transformers models!
What should I know for this notebook?
I provided enough instructions and comments to be able to follow along with minimum Python coding knowledge.
Since I am using PyTorch to fine-tune our transformers models any knowledge on PyTorch is very useful. Knowing a little bit about the transformers library helps too.
How to use this notebook?
I built this notebook with reusability in mind. The way I load the dataset into the PyTorch Dataset class is pretty standard and can be easily reused for any other dataset.
The only modifications needed to use your own dataset will be in reading in the dataset inside the MovieReviewsDataset class which uses PyTorch Dataset. The DataLoader will return a dictionary of batch inputs format so that it can be fed straight to the model using the statement: outputs = model(**batch)
. As long as this statement holds, the rest of the code will work!
What transformers models work with this notebook?
There are rare cases where I use a different model than Bert when dealing with classification from text data. When there is a need to run a different transformer model architecture, which one would work with this code?
Since the name of the notebooks is finetune_transformers it should work with more than one type of transformers.
I ran this notebook across all the pretrained models found on Hugging Face Transformer. This way you know ahead of time if the model you plan to use works with this code without any modifications.
The list of pretrained transformers models that work with this notebook can be found here. There are 73 models that worked and 33 models that failed to work with this notebook.
Dataset
This notebook will cover fine-tune transformers for binary classification task. I will use the well known movies reviews positive - negative labeled Large Movie Review Dataset.
The description provided on the Stanford website:
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more details.
Why this dataset? I believe is an easy to understand and use dataset for classification. I think sentiment data is always fun to work with.
Coding
Now let's do some coding! We will go through each coding cell in the notebook and describe what it does, what's the code, and when is relevant - show the output
I made this format to be easy to follow if you decide to run each code cell in your own python notebook.
When I learn from a tutorial I always try to replicate the results. I believe it's easy to follow along if you have the code next to the explanations.
Downloads
Download the Large Movie Review Dataset and unzip it locally.
Code Cell:
# download the dataset
!wget -q -nc http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
# unzip it
!tar -zxf /content/aclImdb_v1.tar.gz
Installs
-
transformers library needs to be installed to use all the awesome code from Hugging Face. To get the latest version I will install it straight from GitHub.
-
ml_things library used for various machine learning related tasks. I created this library to reduce the amount of code I need to write for each machine learning project. Give it a try!
# Install transformers library.
!pip install -q git+https://github.com/huggingface/transformers.git
# Install helper functions.
!pip install -q git+https://github.com/gmihaila/ml_things.git
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
|████████████████████████████████| 2.9MB 6.7MB/s
|████████████████████████████████| 890kB 48.9MB/s
|████████████████████████████████| 1.1MB 49.0MB/s
Building wheel for transformers (PEP 517) ... done
Building wheel for sacremoses (setup.py) ... done
|████████████████████████████████| 71kB 5.2MB/s
Building wheel for ml-things (setup.py) ... done
Building wheel for ftfy (setup.py) ... done
Imports
Import all needed libraries for this notebook.
Declare parameters used for this notebook:
set_seed(123)
- Always good to set a fixed seed for reproducibility.epochs
- Number of training epochs (authors recommend between 2 and 4).batch_size
- Number of batches - depending on the max sequence length and GPU memory. For 512 sequence length a batch of 10 USUALY works without cuda memory issues. For small sequence length can try batch of 32 or higher.max_length
- Pad or truncate text sequences to a specific length. I will set it to 60 to speed up training.device
- Look for gpu to use. Will usecpu
by default if nogpu
found.model_name_or_path
- Name of transformers model - will use already pretrained model. Path of transformer model - will load your own model from local disk. I always like to start off withbert-base-cased
: 12-layer, 768-hidden, 12-heads, 109M parameters. Trained on cased English text.labels_ids
- Dictionary of labels and their id - this will be used to convert string labels to numbers.n_labels
- How many labels are we using in this dataset. This is used to decide size of classification head.
import io
import os
import torch
from tqdm.notebook import tqdm
from torch.utils.data import Dataset, DataLoader
from ml_things import plot_dict, plot_confusion_matrix, fix_text
from sklearn.metrics import classification_report, accuracy_score
from transformers import (AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer, AdamW,
get_linear_schedule_with_warmup,
set_seed,
)
# Set seed for reproducibility,
set_seed(123)
# Number of training epochs (authors recommend between 2 and 4)
epochs = 4
# Number of batches - depending on the max sequence length and GPU memory.
# For 512 sequence length batch of 10 works without cuda memory issues.
# For small sequence length can try batch of 32 or higher.
batch_size = 32
# Pad or truncate text sequences to a specific length
# if `None` it will use maximum sequence of word piece tokens allowed by model.
max_length = 60
# Look for gpu to use. Will use `cpu` by default if no gpu found.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Name of transformers model - will use already pretrained model.
# Path of transformer model - will load your own model from local disk.
model_name_or_path = 'bert-base-cased'
# Dicitonary of labels and their id - this will be used to convert.
# String labels to number ids.
labels_ids = {'neg': 0, 'pos': 1}
# How many labels are we using in training.
# This is used to decide size of classification head.
n_labels = len(labels_ids)
Helper Functions
I like to keep all Classes and functions that will be used in this notebook under this section to help maintain a clean look of the notebook:
MovieReviewsDataset(Dataset)
If you worked with PyTorch before this is pretty standard. We need this class to read in our dataset, parse it, use tokenizer that transforms text into numbers and get it into a nice format to be fed to the model.
Lucky for use, Hugging Face thought of everything and made the tokenizer do all the heavy lifting (split text into tokens, padding, truncating, encode text into numbers) and is very easy to use!
In this class I only need to read in the content of each file, use fix_text to fix any Unicode problems and keep track of positive and negative sentiments.
I will append all texts and labels in lists that later I will feed to the tokenizer and to the label ids to transform everything into numbers.
There are three main parts of this PyTorch Dataset class:
-
init() where we read in the dataset and transform text and labels into numbers.
-
len() where we need to return the number of examples we read in. This is used when calling len(MovieReviewsDataset()).
-
getitem() always takes as an input an int value that represents which example from our examples to return from our dataset. If a value of 3 is passed, we will return the example form our dataset at position 3. It needs to return an object with the format that can be fed to our model. Luckily our tokenizer does that for us and returns a dictionary of variables ready to be fed to the model in this way:
model(**inputs)
.
class MovieReviewsDataset(Dataset):
r"""PyTorch Dataset class for loading data.
This is where the data parsing happens and where the text gets encoded using
loaded tokenizer.
This class is built with reusability in mind: it can be used as is as long
as the `dataloader` outputs a batch in dictionary format that can be passed
straight into the model - `model(**batch)`.
Arguments:
path (:obj:`str`):
Path to the data partition.
use_tokenizer (:obj:`transformers.tokenization_?`):
Transformer type tokenizer used to process raw text into numbers.
labels_ids (:obj:`dict`):
Dictionary to encode any labels names into numbers. Keys map to
labels names and Values map to number associated to those labels.
max_sequence_len (:obj:`int`, `optional`)
Value to indicate the maximum desired sequence to truncate or pad text
sequences. If no value is passed it will used maximum sequence size
supported by the tokenizer and model.
"""
def __init__(self, path, use_tokenizer, labels_ids, max_sequence_len=None):
# Check if path exists.
if not os.path.isdir(path):
# Raise error if path is invalid.
raise ValueError('Invalid `path` variable! Needs to be a directory')
# Check max sequence length.
max_sequence_len = use_tokenizer.max_len if max_sequence_len is None else max_sequence_len
texts = []
labels = []
print('Reading partitions...')
# Since the labels are defined by folders with data we loop
# through each label.
for label, label_id, in tqdm(labels_ids.items()):
sentiment_path = os.path.join(path, label)
# Get all files from path.
files_names = os.listdir(sentiment_path)#[:10] # Sample for debugging.
print('Reading %s files...' % label)
# Go through each file and read its content.
for file_name in tqdm(files_names):
file_path = os.path.join(sentiment_path, file_name)
# Read content.
content = io.open(file_path, mode='r', encoding='utf-8').read()
# Fix any unicode issues.
content = fix_text(content)
# Save content.
texts.append(content)
# Save encode labels.
labels.append(label_id)
# Number of exmaples.
self.n_examples = len(labels)
# Use tokenizer on texts. This can take a while.
print('Using tokenizer on all texts. This can take a while...')
self.inputs = use_tokenizer(texts, add_special_tokens=True, truncation=True, padding=True, return_tensors='pt', max_length=max_sequence_len)
# Get maximum sequence length.
self.sequence_len = self.inputs['input_ids'].shape[-1]
print('Texts padded or truncated to %d length!' % self.sequence_len)
# Add labels.
self.inputs.update({'labels':torch.tensor(labels)})
print('Finished!\n')
return
def __len__(self):
r"""When used `len` return the number of examples.
"""
return self.n_examples
def __getitem__(self, item):
r"""Given an index return an example from the position.
Arguments:
item (:obj:`int`):
Index position to pick an example to return.
Returns:
:obj:`Dict[str, object]`: Dictionary of inputs that feed into the model.
It holddes the statement `model(**Returned Dictionary)`.
"""
return {key: self.inputs[key][item] for key in self.inputs.keys()}
train(dataloader, optimizer_, scheduler_, device_)
I created this function to perform a full pass through the DataLoader object (the DataLoader object is created from our Dataset* type object using the **MovieReviewsDataset class). This is basically one epoch train through the entire dataset.
The dataloader is created from PyTorch DataLoader which takes the object created from MovieReviewsDataset class and puts each example in batches. This way we can feed our model batches of data!
The optimizer_ and scheduler_ are very common in PyTorch. They are required to update the parameters of our model and update our learning rate during training. There is a lot more than that but I won't go into details. This can actually be a huge rabbit hole since A LOT happens behind these functions that we don't need to worry. Thank you PyTorch!
In the process we keep track of the actual labels and the predicted labels along with the loss.
def train(dataloader, optimizer_, scheduler_, device_):
r"""
Train pytorch model on a single pass through the data loader.
It will use the global variable `model` which is the transformer model
loaded on `_device` that we want to train on.
This function is built with reusability in mind: it can be used as is as long
as the `dataloader` outputs a batch in dictionary format that can be passed
straight into the model - `model(**batch)`.
Arguments:
dataloader (:obj:`torch.utils.data.dataloader.DataLoader`):
Parsed data into batches of tensors.
optimizer_ (:obj:`transformers.optimization.AdamW`):
Optimizer used for training.
scheduler_ (:obj:`torch.optim.lr_scheduler.LambdaLR`):
PyTorch scheduler.
device_ (:obj:`torch.device`):
Device used to load tensors before feeding to model.
Returns:
:obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted
Labels, Train Average Loss].
"""
# Use global variable for model.
global model
# Tracking variables.
predictions_labels = []
true_labels = []
# Total loss for this epoch.
total_loss = 0
# Put the model into training mode.
model.train()
# For each batch of training data...
for batch in tqdm(dataloader, total=len(dataloader)):
# Add original labels - use later for evaluation.
true_labels += batch['labels'].numpy().flatten().tolist()
# move batch to device
batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()}
# Always clear any previously calculated gradients before performing a
# backward pass.
model.zero_grad()
# Perform a forward pass (evaluate the model on this training batch).
# This will return the loss (rather than the model output) because we
# have provided the `labels`.
# The documentation for this a bert model function is here:
# https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
outputs = model(**batch)
# The call to `model` always returns a tuple, so we need to pull the
# loss value out of the tuple along with the logits. We will use logits
# later to calculate training accuracy.
loss, logits = outputs[:2]
# Accumulate the training loss over all of the batches so that we can
# calculate the average loss at the end. `loss` is a Tensor containing a
# single value; the `.item()` function just returns the Python value
# from the tensor.
total_loss += loss.item()
# Perform a backward pass to calculate the gradients.
loss.backward()
# Clip the norm of the gradients to 1.0.
# This is to help prevent the "exploding gradients" problem.
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Update parameters and take a step using the computed gradient.
# The optimizer dictates the "update rule"--how the parameters are
# modified based on their gradients, the learning rate, etc.
optimizer.step()
# Update the learning rate.
scheduler.step()
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
# Convert these logits to list of predicted labels values.
predictions_labels += logits.argmax(axis=-1).flatten().tolist()
# Calculate the average loss over the training data.
avg_epoch_loss = total_loss / len(dataloader)
# Return all true labels and prediction for future evaluations.
return true_labels, predictions_labels, avg_epoch_loss
validation(dataloader, device_)
I implemented this function in a very similar way as train but without the parameters update, backward pass and gradient decent part. We don't need to do all of those VERY computationally intensive tasks because we only care about our model's predictions.
I use the DataLoader in a similar way as in train to get out batches to feed to our model.
In the process I keep track of the actual labels and the predicted labels along with the loss.
def validation(dataloader, device_):
r"""Validation function to evaluate model performance on a
separate set of data.
This function will return the true and predicted labels so we can use later
to evaluate the model's performance.
This function is built with reusability in mind: it can be used as is as long
as the `dataloader` outputs a batch in dictionary format that can be passed
straight into the model - `model(**batch)`.
Arguments:
dataloader (:obj:`torch.utils.data.dataloader.DataLoader`):
Parsed data into batches of tensors.
device_ (:obj:`torch.device`):
Device used to load tensors before feeding to model.
Returns:
:obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted
Labels, Train Average Loss]
"""
# Use global variable for model.
global model
# Tracking variables
predictions_labels = []
true_labels = []
#total loss for this epoch.
total_loss = 0
# Put the model in evaluation mode--the dropout layers behave differently
# during evaluation.
model.eval()
# Evaluate data for one epoch
for batch in tqdm(dataloader, total=len(dataloader)):
# add original labels
true_labels += batch['labels'].numpy().flatten().tolist()
# move batch to device
batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()}
# Telling the model not to compute or store gradients, saving memory and
# speeding up validation
with torch.no_grad():
# Forward pass, calculate logit predictions.
# This will return the logits rather than the loss because we have
# not provided labels.
# token_type_ids is the same as the "segment ids", which
# differentiates sentence 1 and 2 in 2-sentence tasks.
# The documentation for this `model` function is here:
# https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification
outputs = model(**batch)
# The call to `model` always returns a tuple, so we need to pull the
# loss value out of the tuple along with the logits. We will use logits
# later to to calculate training accuracy.
loss, logits = outputs[:2]
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
# Accumulate the training loss over all of the batches so that we can
# calculate the average loss at the end. `loss` is a Tensor containing a
# single value; the `.item()` function just returns the Python value
# from the tensor.
total_loss += loss.item()
# get predicitons to list
predict_content = logits.argmax(axis=-1).flatten().tolist()
# update list
predictions_labels += predict_content
# Calculate the average loss over the training data.
avg_epoch_loss = total_loss / len(dataloader)
# Return all true labels and prediciton for future evaluations.
return true_labels, predictions_labels, avg_epoch_loss
Load Model and Tokenizer
Loading the three essential parts of the pretrained transformers: configuration, tokenizer and model. I also need to load the model on the device I'm planning to use (GPU / CPU).
Since I use the AutoClass functionality from Hugging Face I only need to worry about the model's name as input and the rest is handled by the transformers library.
# Get model configuration.
print('Loading configuraiton...')
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name_or_path,
num_labels=n_labels)
# Get model's tokenizer.
print('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path)
# Get the actual model.
print('Loading model...')
model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=model_name_or_path,
config=model_config)
# Load model to defined device.
model.to(device)
print('Model loaded to `%s`'%device)
Loading configuraiton...
Loading tokenizer...
Loading model...
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model loaded to `cuda`
Dataset and DataLoader
This is wehere I create the PyTorch Dataset and DataLoader objects that will be used to feed data into our model.
This is where I use the MovieReviewsDataset class and create the dataset variables. Since data is partitioned for both train and test I will create a PyTorch Dataset and PyTorch DataLoader object for train and test. ONLY for simplicity I will use the test as validation. In practice NEVER USE THE TEST DATA FOR VALIDATION!
print('Dealing with Train...')
# Create pytorch dataset.
train_dataset = MovieReviewsDataset(path='/content/aclImdb/train',
use_tokenizer=tokenizer,
labels_ids=labels_ids,
max_sequence_len=max_length)
print('Created `train_dataset` with %d examples!'%len(train_dataset))
# Move pytorch dataset into dataloader.
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
print('Created `train_dataloader` with %d batches!'%len(train_dataloader))
print()
print('Dealing with ...')
# Create pytorch dataset.
valid_dataset = MovieReviewsDataset(path='/content/aclImdb/test',
use_tokenizer=tokenizer,
labels_ids=labels_ids,
max_sequence_len=max_length)
print('Created `valid_dataset` with %d examples!'%len(valid_dataset))
# Move pytorch dataset into dataloader.
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False)
print('Created `eval_dataloader` with %d batches!'%len(valid_dataloader))
Dealing with Train...
Reading partitions...
100%|████████████████████████████████|2/2 [00:34<00:00, 17.28s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [00:34<00:00, 362.01it/s]
Reading pos files...
100%|████████████████████████████████|12500/12500 [00:23<00:00, 534.34it/s]
Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished!
Created `train_dataset` with 25000 examples!
Created `train_dataloader` with 25000 batches!
Dealing with ...
Reading partitions...
100%|████████████████████████████████|2/2 [01:28<00:00, 44.13s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [01:28<00:00, 141.71it/s]
Reading pos files...
100%|████████████████████████████████|12500/12500 [01:17<00:00, 161.60it/s]
Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished!
Created `valid_dataset` with 25000 examples!
Created `eval_dataloader` with 25000 batches!
Train
I create an optimizer and scheduler that will be used by PyTorch in training.
I loop through the number of defined epochs and call the train and validation functions.
I will output similar info after each epoch as in Keras: train_loss: - val_loss: - train_acc: - valid_acc.
After training, I plot the train and validation loss and accuracy curves to check how the training went.
# Note: AdamW is a class from the huggingface library (as opposed to pytorch)
# I believe the 'W' stands for 'Weight Decay fix"
optimizer = AdamW(model.parameters(),
lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5
eps = 1e-8 # args.adam_epsilon - default is 1e-8.
)
# Total number of training steps is number of batches * number of epochs.
# `train_dataloader` contains batched data so `len(train_dataloader)` gives
# us the number of batches.
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps = 0, # Default value in run_glue.py
num_training_steps = total_steps)
# Store the average loss after each epoch so we can plot them.
all_loss = {'train_loss':[], 'val_loss':[]}
all_acc = {'train_acc':[], 'val_acc':[]}
# Loop through each epoch.
print('Epoch')
for epoch in tqdm(range(epochs)):
print()
print('Training on batches...')
# Perform one full pass over the training set.
train_labels, train_predict, train_loss = train(train_dataloader, optimizer, scheduler, device)
train_acc = accuracy_score(train_labels, train_predict)
# Get prediction form model on validation data.
print('Validation on batches...')
valid_labels, valid_predict, val_loss = validation(valid_dataloader, device)
val_acc = accuracy_score(valid_labels, valid_predict)
# Print loss and accuracy values to see how training evolves.
print(" train_loss: %.5f - val_loss: %.5f - train_acc: %.5f - valid_acc: %.5f"%(train_loss, val_loss, train_acc, val_acc))
print()
# Store the loss value for plotting the learning curve.
all_loss['train_loss'].append(train_loss)
all_loss['val_loss'].append(val_loss)
all_acc['train_acc'].append(train_acc)
all_acc['val_acc'].append(val_acc)
# Plot loss curves.
plot_dict(all_loss, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--'])
# Plot accuracy curves.
plot_dict(all_acc, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--'])
Epoch
100%|████████████████████████████████|4/4 [13:49<00:00, 207.37s/it]
Training on batches...
100%|████████████████████████████████|782/782 [02:40<00:00, 4.86it/s]
Validation on batches...
100%|████████████████████████████████|782/782 [00:46<00:00, 16.80it/s]
train_loss: 0.44816 - val_loss: 0.38655 - train_acc: 0.78372 - valid_acc: 0.81892
Training on batches...
100%|████████████████████████████████|782/782 [02:40<00:00, 4.86it/s]
Validation on batches...
100%|████████████████████████████████|782/782 [02:13<00:00, 5.88it/s]
train_loss: 0.29504 - val_loss: 0.43493 - train_acc: 0.87352 - valid_acc: 0.82360
Training on batches...
100%|████████████████████████████████|782/782 [02:40<00:00, 4.87it/s]
Validation on batches...
100%|████████████████████████████████|782/782 [01:43<00:00, 7.58it/s]
train_loss: 0.16901 - val_loss: 0.48433 - train_acc: 0.93544 - valid_acc: 0.82624
Training on batches...
100%|████████████████████████████████|782/782 [02:40<00:00, 4.87it/s]
Validation on batches...
100%|████████████████████████████████|782/782 [00:46<00:00, 16.79it/s]
train_loss: 0.09816 - val_loss: 0.73001 - train_acc: 0.96936 - valid_acc: 0.82144
It looks like a little over one epoch is enough training for this model and dataset.
Evaluate
When dealing with classification it's useful to look at precision recall and f1 score. Another good thing to look at when evaluating the model is the confusion matrix.
# Get prediction form model on validation data. This is where you should use
# your test data.
true_labels, predictions_labels, avg_epoch_loss = validation(valid_dataloader, device)
# Create the evaluation report.
evaluation_report = classification_report(true_labels, predictions_labels, labels=list(labels_ids.values()), target_names=list(labels_ids.keys()))
# Show the evaluation report.
print(evaluation_report)
# Plot confusion matrix.
plot_confusion_matrix(y_true=true_labels, y_pred=predictions_labels,
classes=list(labels_ids.keys()), normalize=True,
magnify=3,
);
100%|████████████████████████████████|782/782 [00:46<00:00, 16.77it/s]
precision recall f1-score support
neg 0.83 0.81 0.82 12500
pos 0.81 0.83 0.82 12500
accuracy 0.82 25000
macro avg 0.82 0.82 0.82 25000
weighted avg 0.82 0.82 0.82 25000
Normalized confusion matrix
Results are not great, but for this tutorial we are not interested in performance.
Final Note
If you made it this far Congrats and Thank you for your interest in my tutorial!
I've been using this code for a while now and I feel it got to a point where is nicely documented and easy to follow.
Of course is easy for me to follow because I built it. That is why any feedback is welcome and it helps me improve my future tutorials!
If you have 1 minute please give me a feedback in the comments.
If you see something wrong please let me know by opening an issue on my ml_things GitHub repository!
A lot of tutorials out there are mostly a one-time thing and are not being maintained. I plan on keeping my tutorials up to date as much as I can.
Contact 🎣
🦊 GitHub: gmihaila
🌐 Website: gmihaila.github.io
👔 LinkedIn: mihailageorge
📬 Email: georgemihaila@my.unt.edu.com