Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback
Document Type
Article
Source of Publication
IEEE Journal on Selected Areas in Communications
Publication Date
12-4-2025
Abstract
Large-scale deep learning models rely on wireless networks for distributed training approaches, which are essential to meet the immense computational and data demands. However, the stochastic nature of wireless environments introduces significant challenges such as variable delays, noise interference, and packet loss, which lead to degraded gradient synchronization and hinder model convergence. In this work, we propose a novel communication-aware distributed training (CADT) framework that integrates real-time channel state information (CSI) feedback into the gradient aggregation process. Unlike conventional methods that assume static or ideal communication conditions, CADT dynamically reweights gradients from each node based on instantaneous channel quality, enabling robust aggregation under adverse wireless conditions. By dynamically adjusting the contribution of each node based on instantaneous channel conditions, CADT effectively compensates for wireless impairments, thereby ensuring more reliable gradient aggregation and significantly improving both convergence speed and final model accuracy. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and SVHN using Vision Transformer and ResNet-50 demonstrate that CADT outperforms baseline methods in terms of convergence, accuracy, and communication efficiency. In addition, we provide a rigorous theoretical analysis that establishes convergence guarantees under realistic wireless conditions, thereby advancing the theoretical foundation of distributed optimization in non-ideal communication environments.Our framework offers a practical solution for real-world scenarios such as edge computing, where communication constraints and environmental variability are dominant factors.
DOI Link
ISSN
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Volume
44
First Page
2231
Last Page
2243
Disciplines
Computer Sciences
Keywords
communication-aware methods, distributed optimization, gradient aggregation, Wireless distributed training
Scopus ID
Recommended Citation
Pei, Jiaming; Frascolla, Valerio; Al-Dulaimi, Anwer; Liu, Wei; Aldhyani, Theyazn H.H.; Bashir, Ali Kashif; and Mumtaz, Shahid, "Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback" (2025). All Works. 7877.
https://zuscholars.zu.ac.ae/works/7877
Indexed in Scopus
yes
Open Access
no