OLLA (Optimizing the Lifetime and Location of Arrays) enables training larger deep neural networks on existing hardware. It accomplishes this with a few techniques:
- Operator order optimization — reodering tensor operators to reduce peak memory usage
- Fragmentation reduction — dynamic memory profiling and scheduling to better-utilize memory.
Our approach is described in detail on the OLLA arXiv paper. See citing below to attribute the work.
Installing OLLA in your Python environment is simple:
git clone https://github.com/facebookresearch/olla
pip install . [--extra-index-url <url>]
Note:
- The above install will attempt to install
torch
,torchaudio
,torchvision
, andtorchtext
based on default distributions. To install for your CUDA version/OS, see the PyTorch Getting Started documentation, appending the--extra-index-url
flag and value to the above install command as needed. - OLLA is tested with Gurobi 9.1.1; use your own license or version as needed.
To run benchmarks:
python benchmarks.py
Run all unit tests with:
python -m unittest discover -s tests --pattern "*_test.py"
Run unit tests that are skipped with by setting RUN_SKIPPED=1
in the environment before the command.
If you use OLLA, please use the below BibTex for citing:
@article{steiner2022olla,
title={OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks},
author={Steiner, Benoit and Elhoushi, Mostafa and Kahn, Jacob, and Hegarty, James},
doi = {10.48550/arXiv.2210.12924},
year={2022},
}