Discuss NLP & πŸ€— Library



George Mihaila

PhD Candidate Computer Science
University of North Texas


2021 George Mihaila

Disclaimer

  • This is not a thorough tutorial on NLP.
  • This is not a thorough tutorial on the Hugging Face Library.
  • This presentation is to encourage curiosity and further exploration of NLP and the Hugging Face Library.
2021 George Mihaila

Agenda

  • About me πŸ’¬
  • Natural Language Processing πŸ“–
  • Chatbots with Attitude πŸ—£οΈ
  • Topic Shift Detection πŸ•΅οΈβ€
  • Identify Innovation πŸ’‘
  • πŸ€” πŸ€— βš™οΈ
  • My Tutorials πŸ“š
  • Conclusions πŸ€”
  • Questions πŸ–
  • Contact 🎣

2021 George Mihaila

About me πŸ’¬

  • PhD candidate in computer science at University of North Texas (UNT).
  • Research area in Natural Language Processing (NLP) with focus on dialogue generation with persona.
  • Four years of combined experience in research and industry on various Artificial Intelligence (AI) and Machine Learning (ML) projects. Check my resume here.
  • Interest areas: Neural Networks, Deep Learning, Natural Language Processing, Reinforcement Learning, Computer Vision,
    Scaling Machine Learning.
2021 George Mihaila

About me πŸ’¬

How I got here?

  • Neural networks are the reason I started my PhD in computer science.
  • The professor I talked with asked me if I wanted to work on natural language processing and neural networks.
  • The notion of neural networks sounded very exciting to me.
  • That’s when I knew that is what I want to do for my career.
2021 George Mihaila

About me πŸ’¬

In my free time

  • I like to share my knowledge on NLP: I wrote tutorials from scratch on state-of-the-art language models like Bert and GPT2 with over 10k views. Check them out here.
  • Contribute to open-source – Hugging Face Transformers and Datasets.
  • Technical reviewer for one of the first books published on transformers models for NLP. The book is called Transformers for NLP in Python, by Denis Rothman.
  • Technical reviewer for the next edition of the Transformers for NLP in Python book.
  • I delivered a webinar on my tutorial GPT2 for Text Classification with 300+ participants.
  • My personal project ML Things for things I find useful and speed up my work with ML.
2021 George Mihaila

Natural Language Processing πŸ“–

Wikipedia

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language ...

... The result is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.

2021 George Mihaila

Chatbots with Attitude πŸ—£οΈ

Research

  • My main research area is in NLP with focus on dialogue generation with persona.
  • I use the Friends TV corpus to train language models that can capture each of the main six characters personas.
  • Imagine a chatbot that sounds like Joey or Chandler.
  • This will make chatbots systems more engaging and allow shifting personas depending on a customer's mood.
  • It can significantly increase customer experience.
2021 George Mihaila

Chatbots with Attitude πŸ—£οΈ

Data

  • Create dataset around each of the six main character in the form of context - response pairs from all ten seasons.
  • Context: dialogue history of target character or other characters.
  • Response: sentence that target character responds.
  • Train data: Season 1 - Season 8
  • Validation data: Season 9
  • Test data: Season: 10
2021 George Mihaila

Chatbots with Attitude πŸ—£οΈ

Model

  • Use GPT-2 to generate responses as baseline model.
  • Fine-tune GPT-2 on each of the six main characters dataset (six different GPT-2 models)
  • Use special token separator between context and response <SPEAKER>.
2021 George Mihaila

Chatbots with Attitude πŸ—£οΈ

Evaluation

  • Use Bilingual Evaluation Understudy (BLEU) score to evaluate model performance (higher is better).
  • Evaluate each of the six models on each of the six characters validation data.
  • Each of the six models should have the highest BLEU score on their target character's validation data and lower on other character's validation data.
2021 George Mihaila

Topic Shift Detection πŸ•΅οΈβ€

  • Detect sudden topic shift in dialogue conversation.
  • Help determine when a conversation changes topic and find out which topic is a speaker more interested in.
  • Use similar context-response format data to classify if the response is in a different topic or not.
2021 George Mihaila

Identify Innovation πŸ’‘

  • Use patent data to automatically detect fintech innovations.
  • Identifying FinTech Innovations Using BERT full paper published in IEEE Big Data 2020.
  • Classify a patent abstract into six types of fintech categories. Check more here.
2021 George Mihaila

πŸ€” πŸ€— βš™οΈ

About

  • Hugging Face is a company that offer state-of-the art models and solutions in NLP.
  • It's based on open-source.
  • Was founded in 2016 by Clement Delangue, Julien Chaumond, Thomas Wolf.
  • Contains thousands of language models on 22 tasks.
  • Most model implementations come in TensorFlow 2 and PyTorch.
  • Contains 449+ datasets on various NLP tasks.
  • Let's check it out huggingface.co.
2021 George Mihaila

πŸ€” πŸ€— βš™οΈ

Transformers

2021 George Mihaila

πŸ€” πŸ€— βš™οΈ

Datasets

  • Library dedicated to popular NLP datasets for various tasks: summarization, sentiment, dialogue, etc.
  • Contains efficient loading and formatting of large datasets.
  • Let's check their documentation out huggingface.co/docs/datasets/
2021 George Mihaila

My Tutorials πŸ“š

Name Description Links
πŸ‡ Better Batches with PyTorchText BucketIterator How to use PyTorchText BucketIterator to sort text data for better batching. Open In Colab Generic badge Generic badge Generic badge Generic badge
🐢 Pretrain Transformers Models in PyTorch using Hugging Face Transformers Pretrain 67 transformers models on your custom dataset. Open In Colab Generic badge Generic badge Generic badge Generic badge
🎻 Fine-tune Transformers in PyTorch using Hugging Face Transformers Complete tutorial on how to fine-tune 73 transformer models for text classification β€” no code changes necessary! Open In Colab Generic badge Generic badge Generic badge Generic badge
βš™οΈ Bert Inner Workings in PyTorch using Hugging Face Transformers Complete tutorial on how an input flows through Bert. Open In Colab Generic badge Generic badge Generic badge Generic badge
🎱 GPT2 For Text Classification using Hugging Face πŸ€— Transformers Complete tutorial on how to use GPT2 for text classification. Open In Colab Generic badge Generic badge Generic badge Generic badge
2021 George Mihaila

Conclusions πŸ€”

  • You learned a little bit about myself.
  • Learned more about Hugging Face.
  • Learned how to navigate the Transformers library.
  • Know about the Datasets library.
2021 George Mihaila

Questions πŸ–

  • What did you learn today?
  • What motivated you in this presentation?
  • Do you have any questions?
2021 George Mihaila

Contact 🎣

Let's stay in touch!

🦊 GitHub: gmihaila

🌐 Website: gmihaila.github.io

πŸ‘” LinkedIn: mihailageorge

πŸ““ Medium: @gmihaila

πŸ“¬ Email: georgemihaila@my.unt.edu.com

πŸ‘€ Schedule meeting: calendly.com/georgemihaila

2021 George Mihaila