Curated libraries for a faster workflow
- Image: makesense.ai
- Text: doccano, prodigy
- Text: nlp-datasets, curse-words, badwords, LDNOOBW, english-words (A text file containing over 466k English words), 10K most common words
- Image: 1 million fake faces
- Dataset search engine: datasetlist, UCI Machine Learning Datasets
- Audio: pydub
- Video: pytube (download youTube vidoes)
- Image: py-image-dataset-generator (auto fetch images from web for certain search)
- News: news-please
- PDF: camelot, tabula-py
- Remote file: smart_open
- Crawling: pyppeteer (chrome automation), MechanicalSoup, libextract
- Google sheets: gspread
- Google drive: gdown
- Python API for datasets: pydataset
- Google maps location data: geo-heatmap
- Text: nlpaug
- Image: imgaug, albumentations, augmentor
- Missing values: missingno
- Split images into train/validation/test: split-folders
- Class Imbalance: imblearn
- View Jupyter notebooks through CLI: nbdime
- Parametrize notebooks: papermill
- Access notebooks programatically: nbformat
- Convert notebooks to other formats: nbconvert
- Extra utilities not present in frameworks: mlxtend
- Maps in notebooks: ipyleaflet
- Automatic feature engineering: featuretools, autopandas
- Custom distance metric learning: metric-learn
- List of holidays: python-holidays
- Bruteforce through all scikit-learn model and parameters: auto-sklearn, tpot
- Autogenerate ML code: automl-gs, mindsdb
- Pretrained models: modeldepot, pytorch-hub, papers-with-code
- Find SOTA models: sotawhat
- Pytorch: Keras like summary for pytorch, skorch (wrap pytorch in scikit-learn compatible API)
- Einstein notation: einops
- Scikit-learn: scikit-lego
- Gradient Boosting: catboost
- Hidden Markov Models: hmmlearn
- Genetic Programming: gplearn
- Active Learning: modal
- Rule based classifier: sklearn-expertsys
- Preprocessing: textacy
- Text Extraction from Image, Audio, PDF: textract
- Text generation: gp2client, textgenrnn, gpt-2-simple
- Text summarization: textrank, pytldr
- Spelling correction: JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages)
- Keyword extraction: rake, pke
- Multiply Choice Question Answering: mcQA
- Sequence to sequence models: headliner
- Transfer learning: finetune
- Translation: googletrans
- Embeddings: pymagnitude (manage vector embeddings easily), chakin (download pre-trained word vectors), sentence-transformers, InferSent, bert-as-service, sent2vec
- Multilingual support: polyglot, inltk (indic languages), indic_nlp
- NLU: snips-nlu
- Semantic parsing: quepy
- Inflections: inflect
- Contractions: pycontractions
- Coreference Resolution: neuralcoref
- Readability: homer
- Language Detection: language-check
- Topic Modeling: guidedlda, enstop
- Clustering: spherecluster (kmeans with cosine distance), kneed (automatically find number of clusters from elbow curve), kmodes
- Metrics: seqeval (NER, POS tagging)
- String match: jellyfish (perform string and phonetic comparison),flashtext (superfast extract and replace keywords), pythonverbalexpressions: (verbally describe regex), commonregex (readymade regex for email/phone etc)
- Sentiment: vaderSentiment (rule based)
- Text distances: textdistance, editdistance
- PID removal: scrubadub
- Profanity detection: profanity-check
- wordclouds: stylecloud
- Library: speech_recognition
- Factorization machines (FM), and field-aware factorization machines (FFM): xlearn
- Scikit-learn like API: surprise
- Image processing: scikit-image, imutils
- Segmentation Models in Keras: segmentation_models
- Face recognition: face_recognition, face-alignment (find facial landmarks)
- Face swapping: faceit
- Video summarization: videodigest
- Semantic search over videos: scoper
- OCR: keras-ocr, pytesseract
- Learning curve: lrcurve (plot realtime learning curve in Keras), livelossplot
- Notifications: knockknock (get notified by slack/email), jupyter-notify (notify when task is completed in jupyter)
- Keras: keras-tuner
- Scikit-learn: sklearn-deap (evolutionary algorithm for hyperparameter search)
- General: hyperopt
- Visualize keras models: keras-vis
- Interpret models: eli5, lime, shap, alibi, tf-explain, treeinterpreter
- Interpret BERT: exbert
- Interpret word2vec: word2viz
- Draw CNN figures: nn-svg
- Visualization for scikit-learn: yellowbrick, scikit-plot
- XKCD like charts: chart.xkcd
- Convert matplotlib charts to D3 charts: mpld3
- Generate graphs using markdown: mermaid
- Visualize topics models: pyldavis
- High dimensional visualization: umap
- Parallelize .apply in Pandas: pandarallel, swifter
- Profile pytorch layers: torchprof
- Read config files: config, python-decouple
- Data Validation: schema, jsonschema, cerebrus, pydantic, marshmallow
- Enable CORS in Flask: flask-cors
- Cache results of functions: cachetools, cachew (cache to local sqlite)
- Authentication: pyjwt (JWT)
- Task Queue: rq
- Generate frontend with python: streamlit
- Datetime compatible API for Bikram Sambat: nepali-date
- bloom filter: python-bloomfilter
- Run python libraries in sandbox: pipx
- Pretty print tables in CLI: tabulate
- Leaflet maps from python: folium
- Debugging: PySnooper
- Pickling extended: cloudpickle