default search action
ICPP 2022: Bordeaux, France
- Proceedings of the 51st International Conference on Parallel Processing, ICPP 2022, Bordeaux, France, 29 August 2022 - 1 September 2022. ACM 2022, ISBN 978-1-4503-9733-9
Distributing Learning Algorithms
- Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu:
Aperiodic Local SGD: Beyond Local SGD. 1:1-1:10 - Yijun Li, Jiawei Huang, Zhaoyi Li, Shengwen Zhou, Wanchun Jiang, Jianxin Wang:
HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. 2:1-2:11 - Refael Cohen, Ido Hakimi, Assaf Schuster:
SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer. 3:1-3:10 - Milan Shah, Reece Neff, Hancheng Wu, Marco Minutoli, Antonino Tumeo, Michela Becchi:
Accelerating Random Forest Classification on GPU and FPGA. 4:1-4:11
System Optimizations Through Deep Learning
- Liu Liu, Jian Yu, Zhijun Ding:
Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud. 5:1-5:11 - Boqian Fu, Fahao Chen, Peng Li, Deze Zeng:
TCB: Accelerating Transformer Inference Services with Request Concatenation. 6:1-6:11 - Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye, Yabo Duan:
EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks. 7:1-7:11 - Guanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng, Dejing Dou:
FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity. 8:1-8:11
Parallel Algorithms
- Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan, Weifeng Liu:
TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs. 9:1-9:11 - Srdan Milakovic, Oguz Selvitopi, Israt Nisa, Zoran Budimlic, Aydin Buluç:
Parallel Algorithms for Masked Sparse Matrix-Matrix Products. 10:1-10:11 - Francisco López, Lars Karlsson, Paolo Bientinesi:
FLOPs as a Discriminant for Dense Linear Algebra Algorithms. 11:1-11:10 - Boxiang Wang, Qifan Xu, Zhengda Bian, Yang You:
Tesseract: Parallelize the Tensor Parallelism Efficiently. 12:1-12:11 - Jan Hückelheim, Laurent Hascoët:
Automatic Differentiation of Parallel Loops with Formal Methods. 13:1-13:11 - Andrey Prokopenko, Piyush Sao, Damien Lebrun-Grandié:
A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs. 14:1-14:10 - Haidong Lan, Wenxi Zhu, Du Wu, Qian Qiu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Liu Wei, Jintao Meng, Minwen Deng:
Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach. 15:1-15:9 - Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul, Yaofeng Tu:
SHE: A Generic Framework for Data Stream Mining over Sliding Windows. 16:1-16:12
Architectural Support for Learning
- Zhengbo Chen, Qi Yu, Fang Zheng, Feng Guo, Zuoning Chen:
DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training. 17:1-17:10 - Minjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo, Sheng Liu:
Mentha: Enabling Sparse-Packing Computation on Systolic Arrays. 18:1-18:11 - Moiz Arif, Kevin Assogba, M. Mustafa Rafique, Sudharshan Vazhkudai:
Exploiting CXL-based Memory for Distributed Deep Learning. 19:1-19:11 - Jiazhi Jiang, Jiangsu Du, Dan Huang, Dongsheng Li, Jiang Zheng, Yutong Lu:
Characterizing and Optimizing Transformer Inference on ARM Many-core Processor. 20:1-20:11
Storage Recovery and Repair
- Lin Wang, Yuchong Hu, Qian Du, Dan Feng, Ray Wu, Ingo He, Kevin Zhang:
Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server. 21:1-21:11 - Hai Zhou, Dan Feng:
Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters. 22:1-22:11 - Shuang Ma, Si Wu, Cheng Li, Yinlong Xu:
Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance. 23:1-23:11 - Shucheng Wang, Qiang Cao, Ziyi Lu, Jie Yao:
Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID. 24:1-24:11
Data Systems, Storage, I/O
- Kai Lu, Guokuan Li, Jiguang Wan, Ruixiang Ma, Wei Zhao:
ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning. 25:1-25:13 - Jie Liu, Bogdan Nicolae, Dong Li:
Lobster: Load Balance-Aware I/O for Distributed DNN Training. 26:1-26:11 - Yuanzhang Wang, Fengkui Yang, Ji Zhang, Chunhua Li, Ke Zhou, Chong Liu, Zhuo Cheng, Wei Fang, Jinhu Liu:
LDPP: A Learned Directory Placement Policy in Distributed File Systems. 27:1-27:11 - Li Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang:
A Data-aware Learned Index Scheme for Efficient Writes. 28:1-28:11
Memory Systems and I/O
- Haodong Lin, Zhibing Sha, Jun Li, Zhigang Cai, Balazs Gerofi, Yuanquan Shi, Jianwei Liao:
DRAM Cache Management with Request Granularity for NAND-based SSDs. 29:1-29:10 - Xiaomin Zou, Fang Wang, Dan Feng, Tianjin Guan, Nan Su:
ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent Memory. 30:1-30:11 - Christopher Stewart, Nathaniel Morris, Lydia Y. Chen, Robert Birke:
Performance Modeling for Short-Term Cache Allocation. 31:1-31:11 - Kai Zhang, Zhiqi Wang, Zili Shao:
BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems. 32:1-32:10 - Lucia Pons, Julio Sahuquillo, Salvador Petit, Julio Pons:
Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning. 33:1-33:11 - Mengya Lei, Fang Wang, Dan Feng, Xiaoyu Shuai, Yuchao Cao:
A Dynamic and Recoverable BMT Scheme for Secure Non-Volatile Memory. 34:1-34:11
Graph Algorithms
- Christoph Klein, Robert Strzodka:
Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUs. 35:1-35:11 - Jason Niu, Jaroslaw Zola, Ahmet Erdem Sariyüce:
Counting Induced 6-Cycles in Bipartite Graphs. 36:1-36:10 - Shuai Lin, Rui Wang, Yongkun Li, Yinlong Xu, John C. S. Lui, Fei Chen, Pengcheng Wang, Lei Han:
Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced Partitioning. 37:1-37:11 - Anwesh Panda, Sathish Vadhiyar:
Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings. 38:1-38:10 - Xianghao Xu, Hong Jiang, Fang Wang, Yongli Cheng, Peng Fang:
GraphSD: A State and Dependency aware Out-of-Core Graph Processing System. 39:1-39:11
Resource Management and Scheduling
- Taylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard, Felix Wolf:
ElastiSim: A Batch-System Simulator for Malleable Workloads. 40:1-40:11 - Huanle Xu, Yang Liu, Wing Cheong Lau:
Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters. 41:1-41:11 - Anam Tahir, Kai Cui, Heinz Koeppl:
Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems. 42:1-42:11 - Tapan Srivastava, Huazhe Zhang, Henry Hoffmann:
Penelope: Peer-to-peer Power Management. 43:1-43:11 - Avinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu, Sairam Veeraswamy:
Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chains. 44:1-44:11 - Md. Maruf Hossain, Erik Saule:
Postmortem Computation of Pagerank on Temporal Graphs. 45:1-45:11 - Yang Liu, Huanle Xu, Wing Cheong Lau:
Online Resource Optimization for Elastic Stream Processing with Regret Guarantee. 46:1-46:11 - Kangjin Wang, Ying Li, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang:
Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis. 47:1-47:11 - Huijun Wang, Oliver Sinnen:
Scheduling Fork-Join Task Graphs with Communication Delays and Equal Processing Times. 48:1-48:9 - Wenda Tang, Senbo Fu, Yutao Ke, Qian Peng, Feng Gao:
Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public Clouds. 49:1-49:12 - Yuxin Chen, Benjamin Brock, Serban D. Porumbescu, Aydin Buluç, Katherine A. Yelick, John D. Owens:
Atos: A Task-Parallel GPU Scheduler for Graph Analytics. 50:1-50:11 - Anne Benoit, Lucas Perotin, Yves Robert, Hongyang Sun:
Online Scheduling of Moldable Task Graphs under Common Speedup Models. 51:1-51:11
Programming Systems, Runtime Systems and Compilers
- Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang, Jie Zhao:
Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway Processor. 52:1-52:12 - Xin You, Changxi Liu, Hailong Yang, Pengbo Wang, Zhongzhi Luan, Depei Qian:
Vectorizing SpMV by Exploiting Dynamic Regular Patterns. 53:1-53:12 - Lijuan Jiang, Ping Xu, Qianchao Zhu, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wenjing Ma, Zhouyang Li, Jun Liu, Jinming Ma, Minxi Jin, Chao Yang:
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. 54:1-54:11 - Jimmy Aguilar Mena, Omar Shaaban, Victor Lopez, Marta Garcia, Paul M. Carpenter, Eduard Ayguadé, Jesús Labarta:
Transparent load balancing of MPI programs using [email protected] and DLB. 55:1-55:11
Networks and Communication
- Rongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang, Qi Qi, Jianxin Liao:
Parallel Network Slicing for Multi-SP Services. 56:1-56:11 - Jin Ye, Lin Li, Wenlu Zhang, Guihao Chen, Yuanchao Shan, Yijun Li, Weihe Li, Jiawei Huang:
UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival. 57:1-57:11 - Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard, Lizy Kurian John:
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems. 58:1-58:12 - Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory V. Chockler, Alexey Gotsman:
Acuerdo: Fast Atomic Broadcast over RDMA. 59:1-59:11 - Yuan Liu, Wenxin Li, Wenyu Qu, Heng Qi:
BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks. 60:1-60:11 - Shan Huang, Dezun Dong, Lingbin Zeng, Zejia Zhou, Yukun Zhou, Xiangke Liao:
DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers. 61:1-61:11 - Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen:
NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks. 62:1-62:10 - Mikhail Isaev, Nic McDonald, Jeffrey Young, Richard W. Vuduc:
ParaGraph: An application-simulator interface and toolkit for hardware-software co-design. 63:1-63:13
Performance Benchmarking and Auto-tuning
- Yiltan Hassan Temuçin, Ryan E. Grant, Ahmad Afsahi:
Micro-Benchmarking MPI Partitioned Point-to-Point Communication. 64:1-64:12 - Kohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki, Hiroki Honda:
Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUs. 65:1-65:12 - Cunyang Wei, Haipeng Jia, Yunquan Zhang, Liusha Xu, Ji Qi:
IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs. 66:1-66:11 - Hui Dou, Yilun Wang, Yiwen Zhang, Pengfei Chen:
DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks. 67:1-67:11
Edge and Cloud Computing
- Xiaoyu Xia, Feifei Chen, Qiang He, Guangming Cui, John C. Grundy, Mohamed Almorsy Abdelrazek, Fang Dong:
Formulating Interference-aware Data Delivery Strategies in Edge Storage Systems. 68:1-68:11 - Guangming Cui, Qiang He, Xiaoyu Xia, Feifei Chen, Yun Yang:
Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach. 69:1-69:11 - Liming Ge, Zizhao Wang, Wei Bao, Dong Yuan, Nguyen Hoang Tran, Bing Bing Zhou, Albert Y. Zomaya:
Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing Systems. 70:1-70:13 - Zhaowu Huang, Fang Dong, Dian Shen, Huitian Wang, Xiaolin Guo, Shucun Fu:
Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous Edge. 71:1-71:11
Optimization of Federated Learning
- Lina Su, Ruiting Zhou, Ne Wang, Guang Fang, Zongpeng Li:
An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint. 72:1-72:11 - Nang Hung Nguyen, Phi Le Nguyen, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc Long Nguyen, Thanh Hung Nguyen, Huy Hieu Pham, Truong Thao Nguyen:
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning. 73:1-73:11 - Shengyuan Ye, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang, Xu Chen:
Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. 74:1-74:11 - Chuang Hu, Huanghuang Liang, Xiao Ming Han, Boan Liu, Dazhao Cheng, Dan Wang:
Spread: Decentralized Model Aggregation for Scalable Federated Learning. 75:1-75:12 - Jaehee Jang, Heonseok Ha, Dahuin Jung, Sungroh Yoon:
FedClassAvg: Local Representation Learning for Personalized Federated Learning on Heterogeneous Neural Networks. 76:1-76:10
Performance of Machine Learning
- Zining Zhang, Bingsheng He, Zhenjie Zhang:
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks. 77:1-77:13 - Liang Liu, Mingzhu Shen, Ruihao Gong, Fengwei Yu, Hailong Yang:
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database. 78:1-78:14 - Muhammed Fatih Balin, Kaan Sancak, Ümit V. Çatalyürek:
MG-GCN: A Scalable multi-GPU GCN Training Framework. 79:1-79:11 - Rongxin Xu, Shiva Raj Pokhrel, Qiujun Lan, Gang Li:
FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning. 80:1-80:11
Optimization of Applications
- Yuhao Liu, Xin Du, Zhihui Lu, Qiang Duan, Jianfeng Feng, Minglong Wang, Jie Wu:
Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers. 81:1-81:11 - Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine A. Yelick, Aydin Buluç:
Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly. 82:1-82:11 - Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han, Jiwon Seo:
BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers. 83:1-83:12 - Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa:
Tensor-Accelerated Fourth-Order Epistasis Detection on GPUs. 84:1-84:11 - Qingcai Jiang, Jielan Li, Junshi Chen, Xinming Qin, Lingyun Wan, Jinlong Yang, Jie Liu, Wei Hu, Hong An:
Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means Clustering. 85:1-85:11 - Sifan Long, Xiaowei Guo, Xiaokang Fan, Chao Li, Kelvin Wong, Ran Zhao, Yi Liu, Sen Zhang, Canqun Yang:
ParallelDualSPHysics: supporting efficient parallel fluid simulations through MPI-enabled SPH method. 86:1-86:11 - Frank Wanye, Vitaliy Gleyzer, Edward K. Kao, Wu-chun Feng:
On the Parallelization of MCMC for Community Detection. 87:1-87:13 - Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany, Tsung-Wei Huang:
From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus. 88:1-88:12
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.