default search action
29th HPCA 2023: Montreal, QC, Canada
- IEEE International Symposium on High-Performance Computer Architecture, HPCA 2023, Montreal, QC, Canada, February 25 - March 1, 2023. IEEE 2023, ISBN 978-1-6654-7652-2
Session 1A: Neural Networks and Accelerators 1
- Mingi Yoo, Jaeyong Song, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee:
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators. 1-14 - Shurui Li, Hangbo Yang, Chee Wei Wong, Volker J. Sorger, Puneet Gupta:
PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator. 15-28 - Bokyung Kim, Shiyu Li, Hai Li:
INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators. 29-41 - Ranggi Hwang, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu:
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks. 42-55 - Jo Sanghoon, Hyojun Son, John Kim:
Logical/Physical Topology-Aware Collective Communication in Deep Learning Training. 56-68 - Dongseok Im, Gwangtae Park, Zhiyong Li, Junha Ryu, Hoi-Jun Yoo:
Sibia: Signed Bit-slice Architecture for Dense DNN Acceleration with Slice-level Sparsity Exploitation. 69-80
Session 1B: NVRAM and Hybrid Memory
- Siddharth Gupta, Yunho Oh, Lei Yan, Mark Sutherland, Abhishek Bhattacharjee, Babak Falsafi, Peter Hsu:
AstriFlash A Flash-Based System for Online Services. 81-93 - Xijing Han, James Tuck, Amro Awad:
Thoth: Bridging the Gap Between Persistently Secure Memories and Memory Interfaces of Emerging NVMs. 94-107 - Hongchao Du, Qiao Li, Riwei Pan, Tei-Wei Kuo, Chun Jason Xue:
Multi-Granularity Shadow Paging with NVM Write Optimization for Crash-Consistent Memory-Mapped I/O. 108-121 - Yina Lv, Liang Shi, Qiao Li, Congming Gao, Yunpeng Song, Longfei Luo, Youtao Zhang:
MGC: Multiple-Gray-Code for 3D NAND Flash based High-Density SSDs. 122-136 - Yiwei Li, Mingyu Gao:
Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking. 137-151 - Jianming Huang, Yu Hua:
Root Crash Consistency of SGX-style Integrity Trees in Secure Non-Volatile Memory Systems. 152-164
Session 1C: Caching and Memory Management
- Yunjin Wang, Chia-Hao Chang, Anand Sivasubramaniam, Niranjan Soundararajan:
ACIC: Admission-Controlled Instruction Cache. 165-178 - Carlos Escuin, Asif Ali Khan, Pablo Ibáñez, Teresa Monreal, Jerónimo Castrillón, Víctor Viñals:
Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs. 179-192 - Youngin Kim, Hyeonjin Kim, William J. Song:
NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling. 193-205 - Anirudh Jain, Divya Kiran Kadiyala, Alexandros Daglis:
Safety Hints for HTM Capacity Abort Mitigation. 206-219 - Weijian Chen, Shuibing He, Yaowen Xu, Xuechen Zhang, Siling Yang, Shuang Hu, Xian-He Sun, Gang Chen:
iCache: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training. 220-232 - Anirban Chakraborty, Sarani Bhattacharya, Sayandeep Saha, Debdeep Mukhopadhyay:
Are Randomized Caches Truly Random? Formal Analysis of Randomized-Partitioned Caches. 233-246
Session 2A: Accelerators
- Hesam Shabani, Abhishek Singh, Bishoy Youhana, Xiaochen Guo:
HIRAC: A Hierarchical Accelerator with Sorting-based Packing for SpGEMMs in DNN Applications. 247-258 - Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna:
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs. 259-272 - Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Lin:
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design. 273-286 - Chirag Sakhuja, Zhan Shi, Calvin Lin:
Leveraging Domain Information for the Efficient Automated Design of Deep Learning Accelerators. 287-301 - Zhe Zhou, Cong Li, Fan Yang, Guangyu Sun:
DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing. 302-316
Session 2B: Security
- Mulong Luo, Wenjie Xiong, Geunbae Lee, Yueying Li, Xiaomeng Yang, Amy Zhang, Yuandong Tian, Hsien-Hsin S. Lee, G. Edward Suh:
AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks. 317-332 - Minbok Wi, Jaehyun Park, Seoyoung Ko, Michael Jaemin Kim, Nam Sung Kim, Eojin Lee, Jung Ho Ahn:
SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling. 333-346 - Erhu Feng, Dong Du, Yubin Xia, Haibo Chen:
Efficient Distributed Secure Memory with Migratable Merkle Tree. 347-360 - Mehrnoosh Raoufi, Jun Yang, Xulong Tang, Youtao Zhang:
AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM. 361-373 - Jeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair:
Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems. 374-389
Session 2C: Applications 1
- Yu Wen, Chenhao Xie, Shuaiwen Leon Song, Xin Fu:
Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing. 390-402 - Faquan Chen, Rendong Ying, Jianwei Xue, Fei Wen, Peilin Liu:
ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds. 403-414 - Jyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin:
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention. 415-428 - Haoran Wang, Haobo Xu, Ying Wang, Yinhe Han:
CTA: Hardware-Software Co-design for Compressed Token Attention Mechanism. 429-441 - Peiyan Dong, Mengshu Sun, Alec Lu, Yanyue Xie, Kenneth Liu, Zhenglun Kong, Xin Meng, Zhengang Li, Xue Lin, Zhenman Fang, Yanzhi Wang:
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers. 442-455
Session 3B: Datacenters and HPC
- Bingyao Li, Jieming Yin, Anup Holey, Youtao Zhang, Jun Yang, Xulong Tang:
Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding. 456-470 - Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, Yungang Bao:
Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective. 471-484 - Md Rajib Hossen, Kishwar Ahmed, Mohammad A. Islam:
Market Mechanism-Based User-in-the-Loop Scalable Power Oversubscription for HPC Systems. 485-498 - Yifan Yuan, Jinghan Huang, Yan Sun, Tianchen Wang, Jacob Nelson, Dan R. K. Ports, Yipeng Wang, Ren Wang, Charlie Tai, Nam Sung Kim:
Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications. 499-515
Session 3C: GPUs
- Harini Muthukrishnan, Daniel Lustig, Oreste Villa, Thomas F. Wenisch, David W. Nellans:
FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems. 516-529 - Aaron Barnes, Fangjia Shen, Timothy G. Rogers:
Mitigating GPU Core Partitioning Performance Effects. 530-542 - Rahaf Abdullah, Huiyang Zhou, Amro Awad:
Plutus: Bandwidth-Efficient Memory Security for GPUs. 543-555 - Quan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li, Youhui Bai, Feng Yan, Yinlong Xu:
MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism. 556-569
Session 4A: Neural Networks and Accelerators 2
- Linyan Mei, Koen Goetschalckx, Arne Symons, Marian Verhelst:
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling. 570-583 - Yue Dai, Youtao Zhang, Xulong Tang:
CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks. 584-597 - Yifan Yang, Joel S. Emer, Daniel Sánchez:
ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining. 598-610 - Junkyum Kim, Myeonggu Kang, Yunki Han, Yanggon Kim, Lee-Sup Kim:
OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing. 611-623 - Marcus Chow, Ali Jahanshahi, Daniel Wong:
KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference Servers. 624-637 - Vahid Janfaza, Kevin Weston, Moein Razavi, Shantanu Mandal, Farabi Mahmud, Alex Hilty, Abdullah Muzahid:
MERCURY: Accelerating DNN Training By Exploiting Input Similarity. 638-650
Session 4B: PIMs and Persistent Memory
- Ming Zhang, Yu Hua:
Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory. 651-663 - Chencheng Ye, Yuanchao Xu, Xipeng Shen, Yan Sha, Xiaofei Liao, Hai Jin, Yan Solihin:
Reconciling Selective Logging and Hardware Persistent Memory Transaction. 664-676 - Alexander Freij, Huiyang Zhou, Yan Solihin:
SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers. 677-690 - Khalid Al-Hawaj, Tuan Ta, Nick Cebry, Shady Agwa, Olalekan Afuye, Eric Hall, Courtney Golden, Alyssa B. Apsel, Christopher Batten:
EVE: Ephemeral Vector Engines. 691-704 - Ben Perach, Ronny Ronen, Shahar Kvatinsky:
On Consistency for Bulk-Bitwise Processing-in-Memory. 705-717 - Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi:
Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications. 718-730
Session 4C: Quantum and FPGAs
- Siwei Tan, Mingqian Yu, Andre Python, Yongheng Shang, Tingting Li, Liqiang Lu, Jianwei Yin:
HyQSAT: A Hybrid Approach for 3-SAT Problems by Integrating Quantum Annealer with CDCL. 731-744 - Ang Li, August Ning, David Wentzlaff:
Duet: Creating Harmony between Processors and Embedded FPGAs. 745-758 - Evan McKinney, Mingkang Xia, Chao Zhou, Pinlei Lu, Michael Hatridge, Alex K. Jones:
Co-Designed Architectures for Modular Superconducting Quantum Computers. 759-772 - Yan-Hao Chen, Yuwei Jin, Fei Hua, Ari B. Hayes, Ang Li, Yunong Shi, Eddy Z. Zhang:
A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis. 773-786 - Poulami Das, Eric Kessler, Yunong Shi:
The Imitation Game: Leveraging CopyCats for Robust Native Gate Selection in NISQ Programs. 787-801
Session 5A: Cloud and Edge Computing
- Junkang Zhu, Yaoyu Tao, Zhengya Zhang:
eNODE: Energy-Efficient and Low-Latency Edge Inference and Training of Neural ODEs. 802-813 - Jovan Stojkovic, Tianyin Xu, Hubertus Franke, Josep Torrellas:
SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution. 814-827 - Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao:
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. 828-841 - Junyeol Yu, Jongseok Kim, Euiseong Seo:
Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving. 842-854 - Dimosthenis Masouros, Christian Pinto, Michele Gazzetti, Sotirios Xydis, Dimitrios Soudris:
Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures. 855-869
Session 5B: Encryption and SGX
- Yinghao Yang, Huaizhi Zhang, Shengyu Fan, Hang Lu, Mingzhe Zhang, Xiaowei Li:
Poseidon: Practical Homomorphic Encryption Accelerator. 870-881 - Rashmi Agrawal, Leo de Castro, Guowei Yang, Chiraag Juvekar, Rabia Tugce Yazicigil, Anantha P. Chandrakasan, Vinod Vaikuntanathan, Ajay Joshi:
FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption. 882-895 - Yilan Zhu, Xinyao Wang, Lei Ju, Shanqing Guo:
FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference. 896-907 - Md Hafizul Islam Chowdhuryy, Myoungsoo Jung, Fan Yao, Amro Awad:
D-Shield: Enabling Processor-side Encryption and Integrity Verification for Secure NVMe Drives. 908-921 - Shengyu Fan, Zhiwei Wang, Weizhi Xu, Rui Hou, Dan Meng, Mingzhe Zhang:
TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU. 922-934
Session 5C: Reliability
- George Papadimitriou, Dimitris Gizopoulos:
AVGI: Microarchitecture-Driven, Fast and Accurate Vulnerability Assessment. 935-948 - Jiangwei Zhang, Chong Wang, Zhenhua Zhu, Donald Kline, Alex K. Jones, Huazhong Yang, Yu Wang:
Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance. 964-976 - Chunfeng Du, Suzhen Wu, Jiapeng Wu, Bo Mao, Shengzhe Wang:
ESD: An ECC-assisted and Selective Deduplication for Encrypted Non-Volatile Main Memory. 977-990
Session 6A: Industry Track Session
- Majed Valad Beigi, Yi Cao, Sudhanva Gurumurthi, Charles Recchia, Andrew C. Walton, Vilas Sridharan:
A Systematic Study of DDR4 DRAM Faults in the Field. 991-1002 - Jianguo Yao, Hao Zhou, Yalin Zhang, Ying Li, Chuang Feng, Shi Chen, Jiaoyan Chen, Yongdong Wang, Qiaojuan Hu:
High Performance and Power Efficient Accelerator for Cloud Inference. 1003-1016 - Sungyeob Yoo, Hyunsung Kim, Jinseok Kim, Sunghyun Park, Joo-Young Kim, Jinwook Oh:
LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler. 1017-1030 - Yiquan Chen, Jiexiong Xu, Chengkun Wei, Yijing Wang, Xin Yuan, Yangming Zhang, Xulin Yu, Yi Chen, Zeke Wang, Shuibing He, Wenzhi Chen:
BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment. 1031-1044
Session 6B: NICs and Networks
- Hamed Seyedroudbari, Srikar Vanavasam, Alexandros Daglis:
Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs. 1045-1058 - Yinxiao Feng, Dong Xiang, Kaisheng Ma:
A Scalable Methodology for Designing Efficient Interconnection Network of Chiplets. 1059-1071 - Hans Kasan, John Kim:
VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance. 1072-1084
Session 7A: Neural Network and Accelerators 3
- Enrico Reggiani, Alessandro Pappalardo, Max Doblas, Miquel Moretó, Mauro Olivieri, Osman Sabri Unsal, Adrián Cristal:
Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices. 1085-1098 - Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, Cong Hao:
FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. 1099-1112 - Size Zheng, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang:
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion. 1113-1126 - Nivedita Shrivastava, Smruti Ranjan Sarangi:
Securator: A Fast and Secure Neural Processing Unit. 1127-1139 - Shao-Fu Lin, Yi-Jung Chen, Hsiang-Yun Cheng, Chia-Lin Yang:
Tensor Movement Orchestration in Multi-GPU Training Systems. 1140-1152
Session 7B: Microarchitecture and Memory Systems
- Truls Asheim, Boris Grot, Rakesh Kumar:
A Storage-Effective BTB Organization for Servers. 1153-1167 - Haifeng Li, Ke Liu, Ting Liang, Zuojun Li, Tianyue Lu, Hui Yuan, Yinben Xia, Yungang Bao, Mingyu Chen, Yizhou Shan:
HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory. 1168-1181 - Sanyam Mehta:
Speculative Register Reclamation. 1182-1194 - Jiwon Lee, Ju Min Lee, Yunho Oh, William J. Song, Won Woo Ro:
SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs. 1195-1207 - Xiaoyang Lu, Rujia Wang, Xian-He Sun:
CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework. 1208-1220 - Jovan Stojkovic, Namrata Mantri, Dimitrios Skarlatos, Tianyin Xu, Josep Torrellas:
Memory-Efficient Hashed Page Tables. 1221-1235
Session 7C: Applications 2 & Potpourri
- Yewen Li, Xueqi Li, Ruihao Gao, Wanqi Liu, Guangming Tan:
NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling. 1236-1248 - Ying Xu, Long Cheng, Xuyi Cai, Xiaohan Ma, Weiwei Chen, Lei Zhang, Ying Wang:
Efficient Supernet Training Using Path Parallelism. 1249-1261 - Quan M. Nguyen, Daniel Sánchez:
Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism. 1262-1274 - Xiangjun Peng, Yaohua Wang, Ming-Chang Yang:
CHOPPER: A Compiler Infrastructure for Programmable Bit-serial SIMD Processing Using Memory in DRAM. 1275-1288 - Julián Pavón, Iván Vargas Valdivieso, Joan Marimon, Roger Figueras, Francesc Moll, Osman S. Unsal, Mateo Valero, Adrián Cristal:
VAQUERO: A Scratchpad-based Vector Accelerator for Query Processing. 1289-1302
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.