Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback

Document Type

Article

Source of Publication

IEEE Journal on Selected Areas in Communications

Publication Date

12-4-2025

Abstract

Large-scale deep learning models rely on wireless networks for distributed training approaches, which are essential to meet the immense computational and data demands. However, the stochastic nature of wireless environments introduces significant challenges such as variable delays, noise interference, and packet loss, which lead to degraded gradient synchronization and hinder model convergence. In this work, we propose a novel communication-aware distributed training (CADT) framework that integrates real-time channel state information (CSI) feedback into the gradient aggregation process. Unlike conventional methods that assume static or ideal communication conditions, CADT dynamically reweights gradients from each node based on instantaneous channel quality, enabling robust aggregation under adverse wireless conditions. By dynamically adjusting the contribution of each node based on instantaneous channel conditions, CADT effectively compensates for wireless impairments, thereby ensuring more reliable gradient aggregation and significantly improving both convergence speed and final model accuracy. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and SVHN using Vision Transformer and ResNet-50 demonstrate that CADT outperforms baseline methods in terms of convergence, accuracy, and communication efficiency. In addition, we provide a rigorous theoretical analysis that establishes convergence guarantees under realistic wireless conditions, thereby advancing the theoretical foundation of distributed optimization in non-ideal communication environments.Our framework offers a practical solution for real-world scenarios such as edge computing, where communication constraints and environmental variability are dominant factors.

DOI Link

10.1109/jsac.2025.3640136

ISSN

0733-8716

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Volume

44

First Page

2231

Last Page

2243

Disciplines

Computer Sciences

Keywords

communication-aware methods, distributed optimization, gradient aggregation, Wireless distributed training

Scopus ID

105024084638

Recommended Citation

Pei, Jiaming; Frascolla, Valerio; Al-Dulaimi, Anwer; Liu, Wei; Aldhyani, Theyazn H.H.; Bashir, Ali Kashif; and Mumtaz, Shahid, "Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback" (2025). All Works. 7877.
https://zuscholars.zu.ac.ae/works/7877

Indexed in Scopus

yes

Open Access

no

All Works

Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Search

Browse

Contribute

Content Type

All Works

Distributed Large Models Training Optimization With Real-Time Wireless Channel Feedback

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

First Page

Last Page

Disciplines

Keywords

Scopus ID

Recommended Citation

Indexed in Scopus

Open Access

Share

Search

Browse

Contribute

Content Type