toolbox

Curated libraries for a faster workflow

Phase: Data

Annotation

Image: makesense.ai
Text: doccano, prodigy

Dataset

Text: nlp-datasets, curse-words, badwords, LDNOOBW, english-words (A text file containing over 466k English words), 10K most common words
Image: 1 million fake faces
Dataset search engine: datasetlist, UCI Machine Learning Datasets

Fetch data

Audio: pydub
Video: pytube (download youTube vidoes)
Image: py-image-dataset-generator (auto fetch images from web for certain search)
News: news-please
PDF: camelot, tabula-py
Remote file: smart_open
Crawling: pyppeteer (chrome automation), MechanicalSoup, libextract
Google sheets: gspread
Google drive: gdown
Python API for datasets: pydataset
Google maps location data: geo-heatmap

Data Augmentation

Text: nlpaug
Image: imgaug, albumentations, augmentor

Phase: Exploration

Data Preparation

Missing values: missingno
Split images into train/validation/test: split-folders
Class Imbalance: imblearn

Experiment in notebooks

View Jupyter notebooks through CLI: nbdime
Parametrize notebooks: papermill
Access notebooks programatically: nbformat
Convert notebooks to other formats: nbconvert
Extra utilities not present in frameworks: mlxtend
Maps in notebooks: ipyleaflet

Phase: Feature Engineering

Generate Features

Automatic feature engineering: featuretools, autopandas
Custom distance metric learning: metric-learn
List of holidays: python-holidays

Phase: Modeling

Model Selection

Bruteforce through all scikit-learn model and parameters: auto-sklearn, tpot
Autogenerate ML code: automl-gs, mindsdb
Pretrained models: modeldepot, pytorch-hub, papers-with-code
Find SOTA models: sotawhat

Framework extensions

Pytorch: Keras like summary for pytorch, skorch (wrap pytorch in scikit-learn compatible API)
Einstein notation: einops
Scikit-learn: scikit-lego

Algorithms

Gradient Boosting: catboost
Hidden Markov Models: hmmlearn
Genetic Programming: gplearn
Active Learning: modal
Rule based classifier: sklearn-expertsys

NLP

Preprocessing: textacy
Text Extraction from Image, Audio, PDF: textract
Text generation: gp2client, textgenrnn, gpt-2-simple
Text summarization: textrank, pytldr
Spelling correction: JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages)
Keyword extraction: rake, pke
Multiply Choice Question Answering: mcQA
Sequence to sequence models: headliner
Transfer learning: finetune
Translation: googletrans
Embeddings: pymagnitude (manage vector embeddings easily), chakin (download pre-trained word vectors), sentence-transformers, InferSent, bert-as-service, sent2vec
Multilingual support: polyglot, inltk (indic languages), indic_nlp
NLU: snips-nlu
Semantic parsing: quepy
Inflections: inflect
Contractions: pycontractions
Coreference Resolution: neuralcoref
Readability: homer
Language Detection: language-check
Topic Modeling: guidedlda, enstop
Clustering: spherecluster (kmeans with cosine distance), kneed (automatically find number of clusters from elbow curve), kmodes
Metrics: seqeval (NER, POS tagging)
String match: jellyfish (perform string and phonetic comparison),flashtext (superfast extract and replace keywords), pythonverbalexpressions: (verbally describe regex), commonregex (readymade regex for email/phone etc)
Sentiment: vaderSentiment (rule based)
Text distances: textdistance, editdistance
PID removal: scrubadub
Profanity detection: profanity-check
wordclouds: stylecloud

Speech Recognition

Library: speech_recognition

RecSys

Factorization machines (FM), and field-aware factorization machines (FFM): xlearn
Scikit-learn like API: surprise

Computer Vision

Image processing: scikit-image, imutils
Segmentation Models in Keras: segmentation_models
Face recognition: face_recognition, face-alignment (find facial landmarks)
Face swapping: faceit
Video summarization: videodigest
Semantic search over videos: scoper
OCR: keras-ocr, pytesseract

Timeseries

Predict Time Series: prophet
Scikit-learn like API: sktime

Phase: Monitoring

Monitor training process

Learning curve: lrcurve (plot realtime learning curve in Keras), livelossplot
Notifications: knockknock (get notified by slack/email), jupyter-notify (notify when task is completed in jupyter)

Phase: Optimization

Hyperparameter Optimization

Keras: keras-tuner
Scikit-learn: sklearn-deap (evolutionary algorithm for hyperparameter search)
General: hyperopt

Interpretability

Visualize keras models: keras-vis
Interpret models: eli5, lime, shap, alibi, tf-explain, treeinterpreter
Interpret BERT: exbert
Interpret word2vec: word2viz

Visualization

Draw CNN figures: nn-svg
Visualization for scikit-learn: yellowbrick, scikit-plot
XKCD like charts: chart.xkcd
Convert matplotlib charts to D3 charts: mpld3
Generate graphs using markdown: mermaid
Visualize topics models: pyldavis
High dimensional visualization: umap

Phase: Production

Scalability

Parallelize .apply in Pandas: pandarallel, swifter

Bechmarking

Profile pytorch layers: torchprof

API

Read config files: config, python-decouple
Data Validation: schema, jsonschema, cerebrus, pydantic, marshmallow
Enable CORS in Flask: flask-cors
Cache results of functions: cachetools, cachew (cache to local sqlite)
Authentication: pyjwt (JWT)
Task Queue: rq

Serialization

Transpiling: sklearn-porter (transpile sklearn model to C, Java, JavaScript and others), m2cgen

Dashboards

Generate frontend with python: streamlit

Adversarial testing

Generate images to fool model: foolbox
Generate phrases to fool NLP models: triggers

Python libraries

Datetime compatible API for Bikram Sambat: nepali-date
bloom filter: python-bloomfilter
Run python libraries in sandbox: pipx
Pretty print tables in CLI: tabulate
Leaflet maps from python: folium
Debugging: PySnooper
Pickling extended: cloudpickle

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
LICENSE		LICENSE
README.md		README.md

License

PoeBlu/toolbox-1

Folders and files

Latest commit

History

Repository files navigation

toolbox

Phase: Data

Annotation

Dataset

Fetch data

Data Augmentation

Phase: Exploration

Data Preparation

Experiment in notebooks

Phase: Feature Engineering

Generate Features

Phase: Modeling

Model Selection

Framework extensions

Algorithms

NLP

Speech Recognition

RecSys

Computer Vision

Timeseries

Phase: Monitoring

Monitor training process

Phase: Optimization

Hyperparameter Optimization

Interpretability

Visualization

Phase: Production

Scalability

Bechmarking

API

Serialization

Dashboards

Adversarial testing

Python libraries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages