[go: up one dir, main page]

Skip to content

NLP basic and advance text preprocessing concepts and techniques

Notifications You must be signed in to change notification settings

gulabpatel/NLP_Basics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP_Basics

In the "Deep_learning_for_NLP.ipynb" file, I have tried to cover basics of NLP and followed the book titled "Deep Learning for Natural Language Processing". I will keep updating the current repo.....

Basic NLP models like Count Vectorizer, TF-IDF, Word2Vec, Embedding, Sentiment Analysis, Text Classification, LSTM/BiLSTM, new nlp library basics, Topic Modeling etc... Seq2seq Modeling

Multi-Class Text Classification Model Comparison and Selection

[https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568]

About

Natural Language Processing Performance Metrics [ppt]

NLP Metrics Timeline

Evaluation Metrics: Quick Notes

Average precision

  • Macro: average of sentence scores
  • Micro: corpus (sums numerators and denominators for each hypothesis-reference(s) pairs before division)

Machine Translation

  1. BLEU (BiLingual Evaluation Understudy)
    • Papineni 2002
    • 'Measures how many words overlap in a given translation when compared to a reference translation, giving higher scores to sequential words.' (recall)
    • Limitation:
      • Doesn't consider different types of errors (insertions, substitutions, synonyms, paraphrase, stems)
      • Designed to be a corpus measure, so it has undesirable properties when used for single sentences.
  2. GLEU (Google-BLEU)
    • Wu et al. 2016
    • Minimum of BLEU recall and precision applied to 1, 2, 3 and 4grams
      • Recall: (number of matching n-grams) / (number of total n-grams in the target)
      • Precision: (number of matching n-grams) / (number of total n-grams in generated sequence)
    • Correlates well with BLEU metric on a corpus metric but does not have its drawbacks for per sentence reward objective.
    • Not to be confused with Generalized Language Evaluation Understanding or Generalized BLEU, also known as GLEU
      • Napoles et al. 2015's ACL paper: Ground Truth for Grammatical Error Correction Metrics
      • Napoles et al. 2016: GLEU Without Tuning
        • Minor adjustment required as the number of references increases.
      • Simple variant of BLEU, it hews much more closely to human judgements.
      • "In MT, an untranslated word or phrase is almost always an error, but in GEC, this is not the case."
        • GLEU: "computes n-gram precisions over the reference but assigns more weight to n-grams that have been correctly changed from the source."
      • Python code
  3. WER (Word Error Rate)
    • Levenshtein distance (edit distance) for words: minimum number of edits (insertion, deletions or substitutions) required to change the hypotheses sentence into the reference.
    • Range: greater than 0 (ref = hyp), no max range as Automatic Speech Recognizer (ASR) can insert an arbitrary number of words
    • $ WER = \frac{S+D+I}{N} = \frac{S+D+I}{S+D+C} $
      • S: number of substitutions, D: number of deletions, I: number of insertions, C: number of the corrects, N: number of words in the reference ($N=S+D+C$)
    • WAcc (Word Accuracy) or Word Recognition Rate (WRR): $1 - WER$
    • Limitation: provides no details on the nature of translation errors
      • Different errors are treated equally, even thought they might influence the outcome differently (being more disruptive or more difficult/easier to be corrected).
      • If you look at the formula, there's no distinction between a substitution error and a deletion followed by an insertion error.
    • Possible solution proposed by Hunt (1990):
      • Use of a weighted measure
      • $ WER = \frac{S+0.5D+0.5I}{N} $
      • Problem:
        • Metric is used to compare systems, so it's unclear whether Hunt's formula could be used to assess the performance of a single system
        • How effective this measure is in helping a user with error correction
    • See more info
  4. METEOR (Metric for Evaluation of Translation with Explicit ORdering):
  5. TER (Translation Edit Rate)
    • Snover et al. 2006's paper: A study of translation edit rate with targeted human annotation
    • Number of edits (words deletion, addition and substitution) required to make a machine translation match exactly to the closest reference translation in fluency and semantics
    • TER = $\frac{E}{R}$ = (minimum number of edits) / (average length of reference text)
    • It is generally preferred to BLEU for estimation of sentence post-editing effort. Source.
    • PyTER
    • char-TER: character level TER

Summarization

  1. ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

Image Caption Quality

  1. CIDEr (Consensus-based Image Description Evaluation)