- AutorIn
- Cecilia Eugenia De la Parra Aparicio
- Titel
- Post-Training Optimization of Cross-layer Approximate Computing for Edge Inference of Deep Learning Applications
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-889784
- Erstveröffentlichung
- 2024
- Datum der Einreichung
- 19.07.2023
- Datum der Verteidigung
- 13.12.2023
- Abstract (EN)
- Over the past decade, the rapid development of deep learning (DL) algorithms has enabled extraordinary advances in perception tasks throughout different fields, from computer vision to audio signal processing. Additionally, increasing computational resources available in supercomputers and graphic processor clusters have provided a suitable environment to train larger and deeper deep neural network (DNN) models for improved performances. However, the resulting memory bandwidth and computational requirements of such DNN models restricts their deployment in embedded systems with constrained hardware resources. To overcome this challenge, it is important to establish new paradigms to reduce the computational workload of such DL algorithms while maintaining their original accuracy. A key observation of previous research is that DL models are resilient to input noise and computational errors; therefore, a reasonable approach to decreasing such hardware requirements is to embrace DNN resiliency and utilize approximate computing techniques at different system design layers. This approach requires, however, constant monitoring as well as a careful combination of approximation techniques to avoid performance degradation while maximizing computational savings. Within this context, the focus of this thesis is the simulation of cross-layer approximate computing (AC) methods for DNN computation and the development of optimization methods to compensate AC errors in approximated DNNs. The first part of this thesis proposes the simulation framework ProxSim. This framework enables accelerated approximate computational unit (ACU) simulation for evaluation and training of approximated DNNs. ProxSim supports quantization and approximation of common neural layers such as fully connected (FC), convolutional, and recurrent layers. A performance evaluation using a variety of DNN architectures, as well as a comparison with the state of the art is also presented. The author used ProxSim to implement and evaluate the following methods presented in this work. The second part of this thesis introduces an approach to model the approximation error in DNN computation. First, the author thoroughly anaylzes the error caused by approximate multipliers to compute the multiply and accumulate (MAC) operations in DNN models. From this analysis, a statistical model of the approximation error is obtained. Through various experiments with DNNs for image classification, the proposed model is verified and compared with other methods from the literature. The results demonstrate the validity of the approximation error model and reinforce a general understanding of approximate computing in DNNs. In the third part of this thesis, the author presents a methodology for uniform systematic approximation of DNNs. This methodology focuses on the optimization of full DNN approximation with a single type of ACU to minimize power consumption without accuracy loss. The backbone of this methodology is the custom fine-tuning methods the author proposes to compensate for the approximation error. These methods enable the use of ACUs with large approximation errors, which results in significant power savings and negligible accuracy losses. This process is corroborated by extensive experiments, where the estimated savings and the accuracy achieved after approximation are thoroughly examined using ProxSim. In the last part of this thesis, the author proposes two different methodologies to further boost energy savings after applying uniform approximation. This increment in energy savings is achieved by computing more resilient DNN elements (neurons or layers) with increased approximation levels. The first methodology focuses on iterative kernel-wise approximation and quantization enabled by a custom approximate MAC unit. The second method is based on flexible layer-wise approximation, and applied to bit-decomposed in-memory computing (IMC) architectures as a case study to demonstrate the effectiveness of the proposed approach.
- Freie Schlagwörter (EN)
- approximate computing, deep learning, neural networks
- Klassifikation (DDC)
- 006
- Klassifikation (RVK)
- ST 301
- GutachterIn
- Prof. Dr. Akash Kumar
- Prof. Dr. Lukáš Sekanina
- Den akademischen Grad verleihende / prüfende Institution
- Technische Universität Dresden, Dresden
- Förder- / Projektangaben
- European Union Horizon2020 (TEMPO)
ID: 826655 - Version / Begutachtungsstatus
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:14-qucosa2-889784
- Veröffentlichungsdatum Qucosa
- 07.02.2024
- Dokumenttyp
- Dissertation
- Sprache des Dokumentes
- Englisch
- Lizenz / Rechtehinweis
- CC BY 4.0