default search action
31st PACT 2022: Chicago, IL, USA
- Andreas Klöckner, José Moreira:
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2022, Chicago, Illinois, October 8-12, 2022. ACM 2022, ISBN 978-1-4503-9868-8
Compilers for ever
- Tong Zhou, Ruiqin Tian, Rizwan A. Ashraf, Roberto Gioiosa, Gokcen Kestor, Vivek Sarkar:
ReACT: Redundancy-Aware Code Generation for Tensor Expressions. 1-13 - Bodhisatwa Chatterjee, Sharjeel Khan, Santosh Pande:
Com-CAS: Effective Cache Apportioning under Compiler Guidance. 14-27 - Perry Gibson, José Cano:
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation. 28-39
Optimizing the execution of GNNs
- Mingi Yoo, Jaeyong Song, Hyeyoon Lee, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee:
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators. 40-53 - Zhe Zhou, Cong Li, Xuechao Wei, Xiaoyang Wang, Guangyu Sun:
GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory Processing. 54-68 - Chengying Huan, Shuaiwen Leon Song, Yongchao Liu, Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu:
T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture. 69-82 - Zhuoran Ji, Cho-Li Wang:
Optimizing Aggregate Computation of Graph Neural Networks with on-GPU Interpreter-Style Programming. 83-95
Getting more out of your memory
- Albin Eldstål-Ahrens, Angelos Arelakis, Ioannis Sourdis:
FlatPack: Flexible Compaction of Compressed Memory. 96-108 - Han Jie Qiu, Sihang Liu, Xinyang Song, Samira Manabi Khan, Gennady Pekhimenko:
Pavise: Integrating Fault Tolerance Support for Persistent Memory Applications. 109-123 - Taiyu Zhou, Yajuan Du, Fan Yang, Xiaojian Liao, Youyou Lu:
Efficient Atomic Durability on eADR-Enabled Persistent Memory. 124-134
Sparse matrix computations
- Roberto L. Castro, Diego Andrade, Basilio B. Fraguela:
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUs. 135-147 - Xin He, Kuan-Yu Chen, Siying Feng, Hun-Seok Kim, David T. Blaauw, Ronald G. Dreslinski, Trevor N. Mudge:
Squaring the circle: Executing Sparse Matrix Computations on FlexTPU - A TPU-Like Processor. 148-159 - Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, Juan Touriño:
Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. 160-171
Graph processing
- Han-Yi Chou, Sayan Ghosh:
Batched Graph Community Detection on GPUs. 172-184 - Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, Bo Wu:
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation. 185-197 - Shinnung Jeong, Yongwoo Lee, Jaeho Lee, Heelim Choi, Seungbin Song, Jinho Lee, Youngsok Kim, Hanjun Kim:
Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing. 198-210
Miscellaneous
- Jian Zhou, Jianfeng Wu, Weizhou Huang, You Zhou, Fei Wu, Liu Shi, Xiaoyi Zhang, Kun Wang, Feng Zhu, Shu Li:
Tiered Hashing: Revamping Hash Indexing under a Unified Memory-Storage Hierarchy. 211-222 - Qi Zhao, Zhengyi Qiu, Shudi Shao, Xinning Hui, Hassan Ali Khan, Guoliang Jin:
Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization Determinism. 223-238 - Sankeerth Durvasula, Raymond Kiguru, Samarth Mathur, Jenny Xu, Jimmy Lin, Nandita Vijaykumar:
VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks. 239-251
Better neural networks
- Yufan Xu, Qiwei Yuan, Erik Curtis Barton, Rui Li, P. Sadayappan, Aravind Sukumaran-Rajam:
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs. 252-264 - Lizhi Xiang, P. Sadayappan, Aravind Sukumaran-Rajam:
High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs. 265-278 - Zachary Susskind, Aman Arora, Igor D. S. Miranda, Luis Armando Quintanilla Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Priscila M. V. Lima, Felipe M. G. França, Maurício Breternitz, Lizy K. John:
Weightless Neural Networks for Efficient Edge Inference. 279-290 - Cheng Fu, Hanxian Huang, Bram Wasti, Chris Cummins, Riyadh Baghdadi, Kim M. Hazelwood, Yuandong Tian, Jishen Zhao, Hugh Leather:
Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition. 291-303
Getting more out of your GPU
- Leul Belayneh, Haojie Ye, Kuan-Yu Chen, David T. Blaauw, Trevor N. Mudge, Ronald G. Dreslinski, Nishil Talati:
Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems. 304-316 - Xiaodan Serina Tan, Pavel Golikov, Nandita Vijaykumar, Gennady Pekhimenko:
GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud. 317-332 - Yuhui Bao, Yifan Sun, Zlatan Feric, Michael Tian Shen, Micah Weston, José L. Abellán, Trinayan Baruah, John Kim, Ajay Joshi, David R. Kaeli:
NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs. 333-345
Better hardware
- Parmida Vahdatniya, Amirali Sharifian, Reza Hojabr, Arrvindh Shriraman:
mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL. 346-358 - Seyed Armin Vakil-Ghahani, Soheil Khadirsharbiyani, Jagadish B. Kotra, Mahmut T. Kandemir:
Athena: An Early-Fetch Architecture to Reduce on-Chip Page Walk Latencies. 359-371 - Mingjian He, Hua Wang, Ke Zhou, Kaichao Cui, Huabing Yan, Chang Guo, Rongfeng He:
DSDP: Dual Stream Data Prefetcher. 372-383
Task parallelism
- Oh-Kyoung Kwon, Ji Hoon Kang, Seungchul Lee, Wonjung Kim, Junehwa Song:
Efficient Task-Mapping of Parallel Applications Using a Space-Filling Curve. 384-397 - Mahyar Emami, Endri Bezati, Jörn W. Janneck, James R. Larus:
Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks. 398-411
Optimization
- Xinyu Chen, Marco Minutoli, Jiannan Tian, Mahantesh Halappanavar, Ananth Kalyanaraman, Dingwen Tao:
HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures. 412-425 - Jedidiah McClurg, Miles Claver, Jackson Garner, Jake Vossen, Jordan Schmerge, Mehmet E. Belviranli:
Optimizing Regular Expressions via Rewrite-Guided Synthesis. 426-438 - Bangtian Liu, Avery Laird, Wai Hung Tsang, Bardia Mahjour, Maryam Mehri Dehnavi:
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization. 439-450
GPU algorithms
- Jie Zhao, Cédric Bastoul, Yanzhi Yi, Jiahui Hu, Wang Nie, Renwei Zhang, Zhen Geng, Chong Li, Thibaut Tachon, Zhiliang Gan:
Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions Atomically. 451-466 - Haoyuan Xing, Gagan Agrawal, Rajiv Ramnath:
GPU Adaptive In-situ Parallel Analytics (GAP). 467-480 - Muhammad A. Awad, Serban D. Porumbescu, John D. Owens:
A GPU Multiversion B-Tree. 481-493
Portable performance
- Johannes Doerfert, Marc Jasper, Joseph Huber, Khaled Abdelaal, Giorgis Georgakoudis, Thomas Scogland, Konstantinos Parasyris:
Breaking the Vendor Lock: Performance Portable Programming through OpenMP as Target Independent Runtime Layer. 494-504 - Foivos Tsimpourlas, Pavlos Petoumenos, Min Xu, Chris Cummins, Kim M. Hazelwood, Ajitha Rajan, Hugh Leather:
BenchPress: A Deep Active Benchmark Generator. 505-516 - Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia:
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement. 517-529
Posters
- Anjia Wang, Xinyao Yi, Yonghong Yan:
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models. 530-531 - Dongwei Chen, Dong Tong, Chun Yang, Jiangfang Yi, Xu Cheng:
FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers. 532-533 - Michail Boulasikis, Flavius Gruian, Gareth Callanan, Jörn W. Janneck:
Analysing Dataflow Programs with Causation Traces. 534-535 - Jaeyoung Kang, Weihong Xu, Wout Bittremieux, Tajana Rosing:
Massively Parallel Open Modification Spectral Library Searching with Hyperdimensional Computing. 536-537 - Victor Ferrari, Rafael C. F. Sousa, Márcio Machado Pereira, João P. L. de Carvalho, José Nelson Amaral, Guido Araujo:
Improving Convolution via Cache Hierarchy Tiling and Reduced Packing. 538-539 - Jie Li, Yuhui Deng, Zhaorui Wu, Shujie Pang:
A Thermal-Aware Data Replica Placement Strategy for Data-Intensive Data Centers. 540-541 - Luanzheng Guo, Rizwan A. Ashraf, Ryan D. Friese, Gokcen Kestor:
Towards Supporting Semiring in MLIR-Based COMET Compiler. 542-543 - Serena Curzel, Sofija Jovic, Michele Fiorito, Antonino Tumeo, Fabrizio Ferrandi:
MLIR Loop Optimizations for High-Level Synthesis: A Case Study. 544-545 - Jeongeun Kim, Young Woo Jeong, Su-Yeon Jang, Seung Eun Lee:
An Architecture for Resilient Federated Learning through Parallel Recognition. 546-547 - Truls Asheim, Boris Grot, Rakesh Kumar:
A Specialized BTB Organization for Servers. 548-549
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.