Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization Jan 1, 2025· Jianbo Dong , Bin Luo , Jun Zhang , Pengcheng Zhang , Fei Feng , Yikai Zhu , Ang Liu , Zian Chen , Yi Shi , Hairong Jiao , Others · 0 min read Cite Type Conference paper Publication 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA) Last updated on Jan 1, 2025 ← TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model Jun 21, 2025 NVMePass: A Lightweight, High-performance and Scalable NVMe Virtualization Architecture with I/O Queues Passthrough Jan 1, 2025 →