[go: up one dir, main page]

Skip to content

Latest commit

 

History

History
32 lines (29 loc) · 3.84 KB

deepCompression.md

File metadata and controls

32 lines (29 loc) · 3.84 KB

TL;DR

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on mobile phones and embedded systems with limited hardware resources. To address this limitation, “Deep Compression” is developed to compress the deep neural networks by 10x-49x without loss of prediction accuracy. The authors claim that the "Dense-Sparse-Dense" training method regularizes CNN/RNN/LSTMs to improve the prediction accuracy of a wide range of neural networks given the same model size. The "Efficient Inference Engine" works directly on the deep-compressed DNN model and accelerates the inference, taking advantage of weight sparsity, activation sparsity and weight sharing, which is 13x faster and 3000x more energy efficient than a TitanX GPU.

Slides I understood(or hope to understand):

bitsperweight deepcompression-i deepcompression-pipeline densesparsedensetraining finetunecentroids modelcompression networkpruning neuralmachinetranslation pruning_neuraltalk_lstm pruning trainedquantization pruning-i pruningneuralmachinetranslation pruning-result pruning-rnn-lstm retraintorecover smallerdnn speedefficiency sram-dnn weightdistribution weightsharing weightsharing-ii weightsharing-iii

####References: