A curated list of resources dedicated to Natural Language Processing and etc(paper, blogs and notes). note: There is some materials which is not directly related to nlp such as python skills.
- https://github.com/keon/awesome-nlp
- https://github.com/papower1/Awesome-Korean-NLP-Papers
- https://github.com/Kyubyong/nlp_tasks
- Blogs
- Github
- Research Summaries and Trends
- Environment
- NLP in Korean
- Tutorials
- Libraries
- Annotation Tools
- 핑퐁 BERT
- 핑퐁 띄어쓰기
- dsindex's blog
- Kangwon University's NLP course in Korean
- 파이썬 키워드 인자 *
- 딥러닝 용어사전
- arXIV 작성법
- 박규병님의 Deep Learning Career FAQ
- Algorith Youtube Channel
- Structing your first NLP project
- Pypapago 개발기
- 아나콘다 환경복사
- 스타트업 개발자가 리눅스 서버에 들어가면 언제나 하는 작업들
- Chatbot convai2 (with retrieval via elastic)
- DL dev to production
- NL to SQL by BERT
- 제주어 번역 및 음성 합성(박규병님)
- beam search + nlp_mad_easy(박규병님)
- pypapago nmt lib
- makcedward/nlpaug(NLP & Signal augmentation)
- lovit의 패스트캠퍼스, 자연어처리를 위한 머신러닝 github
- 한국어 문서 -> 문장 분류기 (중요)
- 핑퐁에서 만든 띄어쓰기 모델_Chatspace
- Chatbot with Crawler
- NLP RedditSota
- yandex 강의
- 한글 자모 분리 툴킷
- 파이썬 오픈소스 챗봇 RasaHQ
- Customized KoNLPy
- 용래님 pytorch Transformer
- Korean NER Dataset Github
- 송영숙님 Korean Chitchat Dataset with Sentiment
- Chatbot API open source example
- Awesome Python
- Yunjey의 PyTorch Tutorial
- 개발자 기술 면접 정리
- NER_TensorFlow_2017_HCLT
- 이기창님 깃헙 블로그 소스
- 현재 쓰고 있는 깃헙 블로그 소스
- PyTorch Wrapper, pytorch-lightning
- Pycon 2019 Tutorial GluonNLP tutorial
- matplotlib + 한글
- API basd Chatbot example
- NLP tutorial by lyeoni
- tmux 셋팅
- CRF!!! harvardnlp/pytorch-struct
- RL Chatbot1
- RL Chatbot2
- Evaluation Sentence Embedding (SentEval)
- python-mecab-ko
- NLP-Overview is an up-to-date overview of deep learning techniques applied to NLP, including theory, implementations, applications, and state-of-the-art results. This is a great Deep NLP Introduction for researchers.
- NLP-Progress tracks the progress in Natural Language Processing, including the datasets and the current state-of-the-art for the most common NLP tasks
- NLP's ImageNet moment has arrived
- ACL 2018 Highlights: Understanding Representation and Evaluation in More Challenging Settings
- Four deep learning trends from ACL 2017. Part One: Linguistic Structure and Word Embeddings
- Four deep learning trends from ACL 2017. Part Two: Interpretability and Attention
- Highlights of EMNLP 2017: Exciting Datasets, Return of the Clusters, and More!
- Deep Learning for Natural Language Processing (NLP): Advancements & Trends
- Survey of the State of the Art in Natural Language Generation
- KoNLPy - Python package for Korean natural language processing.
- Mecab (Korean) - C++ library for Korean NLP
- KoalaNLP - Scala library for Korean Natural Language Processing.
- Korean WordNet
- KAIST Corpus - A corpus from the Korea Advanced Institute of Science and Technology in Korean.
- Naver Sentiment Movie Corpus in Korean
- Chosun Ilbo archive - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
- NER dataset from 한국해양대학교 자연언어처리연구실
- PAWS and PAWS-X: Two New Datasets to Improve Natural Language Understanding Models_( Paraphrase Adversaries from Word Scrambling)
- conversational-AI-atasets(영어 대화 데이터셋)
- Intro to Artificial Intelligence - Udacity course which touches upon NLP as well
- Deep Natural Language Processing - Lectures series from Oxford
- Deep Learning for Natural Language Processing (cs224-n) - Richard Socher and Christopher Manning's Stanford Course
- Neural Networks for NLP - Carnegie Mellon Language Technology Institute there
- Deep NLP Course by Yandex Data School, covering important ideas from text embedding to machine translation including sequence modeling, language models and so on.
-
Python - Python NLP Libraries | Back to Top
- TextBlob - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of Natural Language Toolkit (NLTK) and Pattern, and plays nicely with both 👍
- spaCy - Industrial strength NLP with Python and Cython 👍
- textacy - Higher level NLP built on spaCy
- gensim - Python library to conduct unsupervised semantic modelling from plain text 👍
- scattertext - Python library to produce d3 visualizations of how language differs between corpora
- GluonNLP - A deep learning toolkit for NLP, built on MXNet/Gluon, for research prototyping and industrial deployment of state-of-the-art models on a wide range of NLP tasks.
- AllenNLP - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
- PyTorch-NLP - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU
- Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)
- PyNLPl - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for FoLiA, but also ARPA language models, Moses phrasetables, GIZA++ alignments.
- jPTDP - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
- BigARTM - a fast library for topic modelling
- Snips NLU - A production ready library for intent parsing
- Chazutsu - A library for downloading&parsing standard NLP research datasets
- Word Forms - Word forms can accurately generate all possible forms of an English word
- Multilingual Latent Dirichlet Allocation (LDA) - A multilingual and extensible document clustering pipeline
- NLP Architect - A library for exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU
- Flair - A very simple framework for state-of-the-art multilingual NLP built on PyTorch. Includes BERT, ELMo and Flair embeddings.
- Kashgari - Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding.
- Label Studio is an open-source, configurable data annotation tool. Its purpose is to enable you to label different types of data using the most convenient interface with a standardized output format.
- brat - brat rapid annotation tool is an online environment for collaborative text annotation
- LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019) - LIDA is an open source dialogue annotation system which supports the full pipeline of dialogue annotation from dialogue / turn segmentation from raw text
- GATE - General Architecture and Text Engineering is 15+ years old, free and open source
- Anafora is free and open source, web-based raw text annotation tool
- doccano - doccano is free, open-source, and provides annotation features for text classification, sequence labeling and sequence to sequence
- tagtog, costs $
- prodigy is an annotation tool powered by active learning, costs $
- LightTag - Hosted and managed text annotation tool for teams, costs $
- rstWeb - open source local or online tool for discourse tree annotations
- GitDox - open source server annotation tool with GitHub version control and validation for XML data and collaborative spreadsheet grids