default search action
ACM Transactions on Architecture and Code Optimization, Volume 20
Volume 20, Number 1, March 2023
- Thomas Luinaud, J. M. Pierre Langlois, Yvon Savaria:
Symbolic Analysis for Data Plane Programs Specialization. 1:1-1:21 - Nilesh Rajendra Shah, Ashitabh Misra, Antoine Miné, Rakesh Venkat, Ramakrishna Upadrasta:
BullsEye : Scalable and Accurate Approximation Framework for Cache Miss Calculation. 2:1-2:28 - Mitali Soni, Asmita Pal, Joshua San Miguel:
As-Is Approximate Computing. 3:1-3:26 - Parth Shah, Ranjal Gautham Shenoy, Vaidyanathan Srinivasan, Pradip Bose, Alper Buyuktosunoglu:
TokenSmart: Distributed, Scalable Power Management in the Many-core Era. 4:1-4:26 - Zhangyu Chen, Yu Hua, Luochangqi Ding, Bo Ding, Pengfei Zuo, Xue Liu:
Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic Optimization. 5:1-5:26 - Aristeidis Mastoras, Sotiris Anagnostidis, Albert-Jan Nicholas Yzelman:
Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance. 6:1-6:23 - Yemao Xu, Dezun Dong, Dongsheng Wang, Shi Xu, Enda Yu, Weixia Xu, Xiangke Liao:
SSD-SGD: Communication Sparsification for Distributed Deep Learning Training. 7:1-7:25 - Ataberk Olgun, Juan Gómez-Luna, Konstantinos Kanellopoulos, Behzad Salami, Hasan Hassan, Oguz Ergin, Onur Mutlu:
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM. 8:1-8:31 - Christos Sakalis, Stefanos Kaxiras, Magnus Själander:
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks. 9:1-9:24 - Yi Liang, Shaokang Zeng, Lei Wang:
Quantifying Resource Contention of Co-located Workloads with the System-level Entropy. 10:1-10:25 - Suyeon Hur, Seongmin Na, Dongup Kwon, Joonsung Kim, Andrew Boutros, Eriko Nurvitadhi, Jangwoo Kim:
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks. 11:1-11:24 - Ashish Gondimalla, Jianqiao Liu, Mithuna Thottethodi, T. N. Vijaykumar:
Occam: Optimal Data Reuse for Convolutional Neural Networks. 12:1-12:25 - Bo Peng, Yaozu Dong, Jianguo Yao, Fengguang Wu, Haibing Guan:
FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance Optimizations. 13:1-13:26 - Qiang Zhang, Lei Xu, Baowen Xu:
RegCPython: A Register-based Python Interpreter for Better Performance. 14:1-14:25 - Hai Jin, Zhuo He, Weizhong Qiang:
SpecTerminator: Blocking Speculative Side Channels Based on Instruction Classes on RISC-V. 15:1-15:26 - Tuowen Zhao, Tobi Popoola, Mary W. Hall, Catherine Olschanowsky, Michelle Strout:
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration. 16:1-16:26 - Manuela Schuler, Richard Membarth, Philipp Slusallek:
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments. 17:1-17:25 - Ivan Korostelev, Joao P. L. de Carvalho, José E. Moreira, José Nelson Amaral:
YaConv: Convolution with Low Cache Footprint. 18:1-18:18 - Furkan Eris, Marcia S. Louis, Kubra Eris, José Luis Abellán Miguel, Ajay Joshi:
Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy. 19:1-19:25
Volume 20, Number 2, June 2023
- Nicolas Tollenaere, Guillaume Iooss, Stéphane Pouget, Hugo Brunie, Christophe Guillon, Albert Cohen, P. Sadayappan, Fabrice Rastello:
Autotuning Convolutions Is Easier Than You Think. 20:1-20:24 - Victor Perez, Lukas Sommer, Victor Lomüller, Kumudha Narasimhan, Mehdi Goli:
User-driven Online Kernel Fusion for SYCL. 21:1-21:25 - Vinicius Espindola, Luciano G. Zago, Hervé Yviquel, Guido Araujo:
Source Matching and Rewriting for MLIR Using String-Based Automata. 22:1-22:26 - Wenjing Ma, Fangfang Liu, Daokun Chen, Qinglin Lu, Yi Hu, Hongsen Wang, Xinhui Yuan:
An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform. 23:1-23:24 - Sarabjeet Singh, Neelam Surana, Kailash Prasad, Pranjali Jain, Joycee Mekie, Manu Awasthi:
HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy. 24:1-24:20 - Chandra Sekhar Mummidi, Sandip Kundu:
ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures. 25:1-25:19 - Qiaoyi Liu, Jeff Setter, Dillon Huff, Maxwell Strange, Kathleen Feng, Mark Horowitz, Priyanka Raina, Fredrik Kjolstad:
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators. 26:1-26:26 - Ahmet Caner Yüzügüler, Canberk Sönmez, Mario Drumond, Yunho Oh, Babak Falsafi, Pascal Frossard:
Scale-out Systolic Arrays. 27:1-27:25 - Francesco Minervini, Oscar Palomar, Osman S. Unsal, Enrico Reggiani, Josue V. Quiroga, Joan Marimon, Carlos Rojas, Roger Figueras, Abraham Ruiz, Alberto González, Jonnatan Mendoza, Iván Vargas, César Hernández, Joan Cabre, Lina Khoirunisya, Mustapha Bouhali, Julian Pavon, Francesc Moll, Mauro Olivieri, Mario Kovac, Mate Kovac, Leon Dragic, Mateo Valero, Adrián Cristal:
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications. 28:1-28:25 - Hadjer Benmeziane, Hamza Ouarnoughi, Kaoutar El Maghraoui, Smaïl Niar:
Multi-objective Hardware-aware Neural Architecture Search with Pareto Rank-preserving Surrogate Models. 29:1-29:21 - Dongwei Chen, Dong Tong, Chun Yang, Jiangfang Yi, Xu Cheng:
FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers. 30:1-30:24 - Jingwen Du, Fang Wang, Dan Feng, Changchen Gan, Yuchao Cao, Xiaomin Zou, Fan Li:
Fast One-Sided RDMA-Based State Machine Replication for Disaggregated Memory. 31:1-31:25
Volume 20, Number 3, September 2023
- Abdul Rasheed Sahni, Hamza Omar, Usman Ali, Omer Khan:
ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes. 32:1-32:24 - Sooraj Puthoor, Mikko H. Lipasti:
Turn-based Spatiotemporal Coherence for GPUs. 33:1-33:27 - Ruobing Chen, Haosen Shi, Jinping Wu, Yusen Li, Xiaoguang Liu, Gang Wang:
Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters. 34:1-34:24 - Gokul Subramanian Ravi, Tushar Krishna, Mikko H. Lipasti:
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency. 35:1-35:25 - Weizhi Xu, Yintai Sun, Shengyu Fan, Hui Yu, Xin Fu:
Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs. 36:1-36:26 - Jin Zhao, Yu Zhang, Ligang He, Qikun Li, Xiang Zhang, Xinyu Jiang, Hui Yu, Xiaofei Liao, Hai Jin, Lin Gu, Haikun Liu, Bingsheng He, Ji Zhang, Xianzheng Song, Lin Wang, Jun Zhou:
GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing. 37:1-37:24 - Yufeng Zhou, Alan L. Cox, Sandhya Dwarkadas, Xiaowan Dong:
The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead. 38:1-38:25 - Benjamin Reber, Matthew Gould, Alexander H. Kneipp, Fangzhou Liu, Ian Prechtl, Chen Ding, Linlin Chen, Dorin Patru:
Cache Programming for Scientific Loops Using Leases. 39:1-39:25 - Xinfeng Xie, Peng Gu, Yufei Ding, Dimin Niu, Hongzhong Zheng, Yuan Xie:
MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing. 40:1-40:26 - Alexander Krolik, Clark Verbrugge, Laurie J. Hendren:
rNdN: Fast Query Compilation for NVIDIA GPUs. 41:1-41:25 - Jiazhi Jiang, Zijiang Huang, Dan Huang, Jiangsu Du, Lin Chen, Ziguang Chen, Yutong Lu:
Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure. 42:1-42:21 - Yuwen Zhao, Fangfang Liu, Wenjing Ma, Huiyuan Li, Yuanchi Peng, Cui Wang:
MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT Framework. 43:1-43:23 - Muhammad Waqar Azhar, Madhavan Manivannan, Per Stenström:
Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints. 44:1-44:25 - Dong Huang, Dan Feng, Qiankun Liu, Bo Ding, Wei Zhao, Xueliang Wei, Wei Tong:
SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs. 45:1-45:26
Volume 20, Number 4, December 2023
- Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu:
Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs. 46:1-46:22 - Hai Jin, Bo Lei, Haikun Liu, Xiaofei Liao, Zhuohui Duan, Chencheng Ye, Yu Zhang:
A Compilation Tool for Computation Offloading in ReRAM-based CIM Architectures. 47:1-47:25 - Christian Menard, Marten Lohstroh, Soroush Bateni, Matthew Chorlian, Arthur Deng, Peter Donovan, Clément Fournier, Shaokai Lin, Felix Suchert, Tassilo Tanneberger, Hokeun Kim, Jerónimo Castrillón, Edward A. Lee:
High-performance Deterministic Concurrency Using Lingua Franca. 48:1-48:29 - Donglei Wu, Weihao Yang, Xiangyu Zou, Wen Xia, Shiyi Li, Zhenbo Hu, Weizhe Zhang, Binxing Fang:
Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference. 49:1-49:24 - Syed Salauddin Mohammad Tariq, Lance Menard, Pengfei Su, Probir Roy:
MicroProf: Code-level Attribution of Unnecessary Data Transfer in Microservice Applications. 50:1-50:26 - Shiyi Li, Qiang Cao, Shenggang Wan, Wen Xia, Changsheng Xie:
gPPM: A Generalized Matrix Operation and Parallel Algorithm to Accelerate the Encoding/Decoding Process of Erasure Codes. 51:1-51:25 - Petros Anastasiadis, Nikela Papadopoulou, Georgios I. Goumas, Nectarios Koziris, Dennis Hoppe, Li Zhong:
PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems. 52:1-52:25 - Hui Yu, Yu Zhang, Jin Zhao, Yujian Liao, Zhiying Huang, Donghao He, Lin Gu, Hai Jin, Xiaofei Liao, Haikun Liu, Bingsheng He, Jianhui Yue:
RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural Network. 53:1-53:26 - Victor Ferrari, Rafael Cardoso Fernandes Sousa, Márcio Machado Pereira, Joao P. L. de Carvalho, José Nelson Amaral, José E. Moreira, Guido Araujo:
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions. 54:1-54:26 - Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang:
DxPU: Large-scale Disaggregated GPU Pools in the Datacenter. 55:1-55:23 - Shiqing Zhang, Mahmood Naderan-Tahan, Magnus Jahre, Lieven Eeckhout:
Characterizing Multi-Chip GPU Data Sharing. 56:1-56:24 - Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka:
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads. 57:1-57:26 - Satya Jaswanth Badri, Mukesh Saini, Neeraj Goel:
Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent Computing. 58:1-58:25 - Miao Yu, Tingting Xiang, Venkata Pavan Kumar Miriyala, Trevor E. Carlson:
Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator. 59:1-59:26 - Ziaul Choudhury, Anish Gulati, Suresh Purini:
FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler. 60:1-60:25 - Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Maurício Breternitz, Lizy K. John:
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks. 61:1-61:24 - Jia Wei, Xingjun Zhang, Longxiang Wang, Zheng Wei:
Fastensor: Optimise the Tensor I/O Path from SSD to GPU for Deep Learning Training. 62:1-62:25
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.