default search action
ACM Transactions on Architecture and Code Optimization, Volume 16
Volume 16, Number 1, March 2019
- Ghassan Shobaki, Austin Kerbow, Christopher Pulido, William Dobson:
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling. 1:1-1:30 - Yu-Ping Liu, Ding-Yong Hong, Jan-Jan Wu, Sheng-Yu Fu, Wei-Chung Hsu:
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation. 2:1-2:24 - Mohammad Sadrosadati, Seyed Borna Ehsani, Hajar Falahati, Rachata Ausavarungnirun, Arash Tavakkol, Mojtaba Abaee, Lois Orosa, Yaohua Wang, Hamid Sarbazi-Azad, Onur Mutlu:
ITAP: Idle-Time-Aware Power Management for GPU Execution Units. 3:1-3:26 - Halit Dogan, Masab Ahmad, Brian Kahne, Omer Khan:
Accelerating Synchronization Using Moving Compute to Data Model at 1, 000-core Multicore Scale. 4:1-4:27 - Leonid Azriel, Lukas Humbel, Reto Achermann, Alex Richardson, Moritz Hoffmann, Avi Mendelson, Timothy Roscoe, Robert N. M. Watson, Paolo Faraboschi, Dejan S. Milojicic:
Memory-Side Protection With a Capability Enforcement Co-Processor. 5:1-5:26 - Aamer Jaleel, Eiman Ebrahimi, Sam Duncan:
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems. 6:1-6:24
Volume 16, Number 2, May 2019
- Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao:
SketchDLC: A Sketch on Distributed Deep Learning Communication via Trace Capturing. 7:1-7:26 - Aristeidis Mastoras, Thomas R. Gross:
Efficient and Scalable Execution of Fine-Grained Dynamic Linear Pipelines. 8:1-8:26 - Tae Jun Ham, Juan L. Aragón, Margaret Martonosi:
Efficient Data Supply for Parallel Heterogeneous Architectures. 9:1-9:23 - Savvas Sioutas, Sander Stuijk, Luc Waeijen, Twan Basten, Henk Corporaal, Lou J. Somers:
Schedule Synthesis for Halide Pipelines through Reuse Analysis. 10:1-10:22 - Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Ji Chen, Hai Jin, Yu Zhang, Long Zheng, Bingsheng He, Song Jiang:
Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. 11:1-11:26 - Sahar Sargaran, Naser Mohammadzadeh:
SAQIP: A Scalable Architecture for Quantum Information Processors. 12:1-12:21 - Prerna Budhkar, Ildar Absalyamov, Vasileios Zois, Skyler Windh, Walid A. Najjar, Vassilis J. Tsotras:
Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads. 13:1-13:28 - Heinrich Riebler, Gavin Vaz, Tobias Kenter, Christian Plessl:
Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL. 14:1-14:26 - Xun Gong, Xiang Gong, Leiming Yu, David R. Kaeli:
HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution. 15:1-15:22 - Yang Song, Olivier Alavoine, Bill Lin:
A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS Targets. 16:1-16:23 - Pedro Yébenes, Jose Rocher-Gonzalez, Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Alfaro, Francisco J. Quiles, Crispín Gómez Requena, José Duato:
Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies. 17:1-17:26 - Mohammad A. Alshboul, Hussein Elnawawy, Reem Elkhouly, Keiji Kimura, James Tuck, Yan Solihin:
Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory. 18:1-18:27 - Zacharias Hadjilambrou, Marios Kleanthous, Georgia Antoniou, Antoni Portero, Yiannakis Sazeides:
Comprehensive Characterization of an Open Source Document Search Engine. 19:1-19:21
Volume 16, Number 3, August 2019
- Bingchao Li, Jizeng Wei, Jizhou Sun, Murali Annavaram, Nam Sung Kim:
An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns. 20:1-20:24 - Stephen I. Roberts, Steven A. Wright, Suhaib A. Fahmy, Stephen A. Jarvis:
The Power-optimised Software Envelope. 21:1-21:27 - Ram Srivatsa Kannan, Michael Laurenzano, Jeongseob Ahn, Jason Mars, Lingjia Tang:
Caliper: Interference Estimator for Multi-tenant Environments Sharing Architectural Resources. 22:1-22:25 - Zhen Lin, Hongwen Dai, Michael Mantor, Huiyang Zhou:
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution. 23:1-23:27 - Keryan Didier, Dumitru Potop-Butucaru, Guillaume Iooss, Albert Cohen, Jean Souyris, Philippe Baufreton, Amaury Graillat:
Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware. 24:1-24:27 - Pantea Zardoshti, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott, Michael F. Spear:
Simplifying Transactional Memory Support in C++. 25:1-25:24 - Jungwoo Park, Myoungjun Lee, Soontae Kim, Minho Ju, Jeongkyu Hong:
MH Cache: A Mult Stephen Jarvisi-retention STT-RAM-based Low-power Last-level Cache for Mobile Hardware Rendering Systems. 26:1-26:26 - Jakob Leben, George Tzanetakis:
Polyhedral Compilation for Multi-dimensional Stream Processing. 27:1-27:26 - Mohammad Sadegh Sadeghi, Siavash Bayat Sarmadi, Shaahin Hessabi:
Toward On-chip Network Security Using Runtime Isolation Mapping. 28:1-28:25 - Stéphane Louise:
A First Step Toward Using Quantum Computing for Low-level WCETs Estimations. 29:1-29:22 - Artem Chikin, Taylor Lloyd, José Nelson Amaral, Ettore Tiotto, Muhammad Usman:
Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops. 30:1-30:26 - Sanghoon Cha, Bokyeong Kim, Chang Hyun Park, Jaehyuk Huh:
Morphable DRAM Cache Design for Hybrid Memory Systems. 31:1-31:24 - Chao Luo, Yunsi Fei, David R. Kaeli:
Side-channel Timing Attack of RSA on a GPU. 32:1-32:18 - Liang Yuan, Chen Ding, Wesley Smith, Peter J. Denning, Yunquan Zhang:
A Relational Theory of Locality. 33:1-33:26
Volume 16, Number 4, January 2020
- Arun Thangamani, V. Krishna Nandivada:
Optimizing Remote Communication in X10. 34:1-34:26 - Sriseshan Srikanth, Anirudh Jain, Joseph M. Lennon, Thomas M. Conte, Erik DeBenedictis, Jeanine E. Cook:
MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. 35:1-35:26 - Mostafa Koraei, Omid Fatemi, Magnus Jahre:
DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs. 36:1-36:24 - Leeor Peled, Uri C. Weiser, Yoav Etsion:
A Neural Network Prefetcher for Arbitrary Memory Access Patterns. 37:1-37:27 - Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen:
The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically. 38:1-38:26 - Wenbin Jiang, Yang Ma, Bo Liu, Haikun Liu, Bing Bing Zhou, Jian Zhu, Song Wu, Hai Jin:
Layup: Layer-adaptive and Multi-type Intermediate-oriented Memory Optimization for GPU-based CNNs. 39:1-39:23 - Sergi Siso, Wes Armour, Jeyarajan Thiyagalingam:
Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information. 40:1-40:23 - Salonik Resch, S. Karen Khatamifard, Zamshed Iqbal Chowdhury, Masoud Zabihi, Zhengyang Zhao, Jianping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu:
PIMBALL: Binary Neural Networks in Spintronic Memory. 41:1-41:26 - Zhen Hang Jiang, Yunsi Fei, David R. Kaeli:
Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs. 42:1-42:24 - Kyle Daruwalla, Heng Zhuo, Rohit Shukla, Mikko H. Lipasti:
BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing. 43:1-43:25 - Aristeidis Mastoras, Thomas R. Gross:
Chunking for Dynamic Linear Pipelines. 44:1-44:25 - Manuel Selva, Fabian Gruber, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, Fabrice Rastello:
Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable. 45:1-45:26 - Ahmad Yasin, Jawad Haj-Yahya, Yosi Ben-Asher, Avi Mendelson:
A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors. 46:1-46:25 - Jie Zhao, Albert Cohen:
Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation. 47:1-47:25 - Daniel Gerzhoy, Xiaowu Sun, Michael Zuzak, Donald Yeung:
Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors. 48:1-48:27 - Chunwei Xia, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue:
DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing. 49:1-49:26 - Ian Briggs, Arnab Das, Mark Baranowski, Vishal Chandra Sharma, Sriram Krishnamoorthy, Zvonimir Rakamaric, Ganesh Gopalakrishnan:
FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation. 50:1-50:21 - Khalid Ahmad, Hari Sundar, Mary W. Hall:
Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs. 51:1-51:24 - Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer, Sergei Gorlatch, Christophe Dubach:
Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift. 52:1-52:25 - Michiel A. van der Vlag, Georgios Smaragdos, Zaid Al-Ars, Christos Strydis:
Exploring Complex Brain-Simulation Workloads on Multi-GPU Deployments. 53:1-53:25 - Reem Elkhouly, Mohammad A. Alshboul, Akihiro Hayashi, Yan Solihin, Keiji Kimura:
Compiler-support for Critical Data Persistence in NVM. 54:1-54:25 - Lorenzo Chelini, Oleksandr Zinenko, Tobias Grosser, Henk Corporaal:
Declarative Loop Tactics for Domain-specific Optimization. 55:1-55:25 - Asif Ali Khan, Fazal Hameed, Robin Bläsing, Stuart S. P. Parkin, Jerónimo Castrillón:
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0. 56:1-56:23
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.