Paper-Conference

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization

jianbo-dong

• Jan 1, 2025 • 1 min read

Salus: A Practical Trusted Execution Environment for CPU-FPGA Heterogeneous Cloud Platforms

yu-zou

• Jan 1, 2024 • 1 min read

Evt: Accelerating deep learning training with epilogue visitor tree

zhaodong-chen

• Jan 1, 2024 • 1 min read

Tt-gnn: Efficient on-chip graph neural network training via embedding reformation and hardware optimization

zheng-qu

• Jan 1, 2023 • 1 min read

Spada: Accelerating sparse matrix multiplication with adaptive dataflow

zhiyao-li

• Jan 1, 2023 • 1 min read

Rm-stc: Row-merge dataflow inspired gpu sparse tensor core for energy-efficient sparse acceleration

guyue-huang

• Jan 1, 2023 • 1 min read

Predicting the output structure of sparse matrix multiplication with sampled compression ratio

zhaoyang-du

• Jan 1, 2023 • 1 min read

Klotski: DNN model orchestration framework for dataflow architecture accelerators

Chen Bai

• Jan 1, 2023 • 1 min read

Hbp: Hierarchically balanced pruning and accelerator co-design for efficient dnn inference

ao-ren

• Jan 1, 2023 • 1 min read

Gamora: Graph learning based symbolic reasoning for large-scale boolean networks

nan-wu

• Jan 1, 2023 • 1 min read