[go: up one dir, main page]

Skip to content

Naver sentiment movie corpus classification

Notifications You must be signed in to change notification settings

kim10481/nsmc_study

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naver Movie Sentiment Classification

Data

All datas are from https://github.com/e9t/nsmc

Goal

Sentiment Classification: Classify Good/Bad from movie reviews

  • Task: Many to One
  • Use: bidirection LSTM + Self Attention + Fully Connected MLP

A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING: https://arxiv.org/pdf/1703.03130.pdf

Blog Explanation in Korean

Link : simonjisu.github.io

Notebooks

Date Model Accuracy Link
180402 self_attn_1H_r5 0.7208 link
180402 self_attn_1H_r20 0.8460 link
180402 self_attn_3H_r5 0.8498 link
  • r: n_hops of self attention
  • H: number of hidden layers
  • model3: self_attn_3H_r5 is not quite good at explanation. It may be overfitted, because of under reasons.
    1. It classifies labels by only using first parts and last parts of a sentences
    2. As layers go deeper, it learns from previous hidden layers just for guessing the right label and ignore whaterver the word is.

Get a Review Visualization

  1. Red Blocks

Red Blocks means which words the embedding takes into account a lot, and which ones are skipped by the embedding.

  1. Help

TYPE "-h" or "-help" behind "visualize

INSERT ARGUMENTS behind "visualize.py"

  • First
    • [-1] model1: 1 hidden layer, r=5
    • [-2] model2: 1 hidden layer, r=20
    • [-3] model3: 3 hidden layer, r=5
  • Second
    • [-sample_idx] number from 0 to 781
  • Third
    • [-(file_path.html)] file path, it is an optional, default is "./figures/(file_name)[sample_idx].html"
  1. Example
python3 visualize.py -2 715

Example: Sample number 715

About

Naver sentiment movie corpus classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 89.0%
  • Jupyter Notebook 9.7%
  • Python 1.3%