Implementing the text recognizer project from the course "Full Stack Deep Learning Course" (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in "Hypermodern Python".
-
pyenv (or similar) and python 3.9.* installed.
-
nox for linting, formatting, and testing.
-
Poetry is a project manager for python.
Install poetry and pyenv.
pyenv local 3.9.*
make install
Download and generate datasets by running:
make download
make generate
Use, modify, or create a new experiment found at training/conf/experiment/
.
To run an experiment we first need to enter the virtual env by running:
poetry shell
Then we can train a new model by running:
python main.py +experiment=conv_transformer_paragraphs
Create a picture of the network and place it here
Ideas of mine that did not work unfortunately:
-
Efficientnet was apparently a terrible choice of an encoder
- A ConvNext module heavily copied from lucidrains x-unet was incredibly much better at encoding the images to a better representation.
-
Use VQVAE to create pre-train a good latent representation
- Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
- This is very unfortunate as I really hoped that this idea would work :(
- I still really like this idea, and I might not have given up just yet...
- I have now given up... :( ConvNext ftw
-
Axial Transformer Encoder
- Added a lot of extra parameters with no gain in performance
- Cool idea, but on a single GPU
-
Word Pieces
- Might have worked better, but liked the idea of single character recognition more.
- remove einops (try)
- Tests
- Evaluation
- Wandb artifact fetcher
- fix linting
- Modularize the decoder
- Add kv cache
- Train with Laprop
- Fix stems
- residual attn
- single kv head
- fix rotary embedding
- simplify attention with norm
- tie embeddings
- cnn -> tf encoder -> tf decoder