High Efficiency Inference Accelerating Algorithm for NOMA-Based Edge Intelligence

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Xin Yuan; Ning Li; Tuo Zhang; Muqing Li; Yuwen Chen; Jose Fernan Martinez Ortega; Song Guo
Source:
IEEE Transactions on Wireless Communications, ISSN 1536-1276, 2024-11, Vol. 23, No. 11
Subject Terms:
Telecomunicaciones
Document Type:
Article
Online Access:
https://oa.upm.es/86112/

Additional Information
- Publication Information:
  Institute of Electrical and Electronics Engineers (IEEE), 2024.
- Publication Date:
  2024
- Abstract:
  Even the artificial intelligence (AI) has been widely used and significantly changed our life, deploying the large AI models on resource limited edge devices directly is not appropriate. Thus, the model split inference is proposed to improve the performance of edge intelligence (EI), in which the AI model is divided into different sub-models and the resource-intensive sub-model is offloaded to edge server wirelessly for reducing resource requirements and inference latency. Unfortunately, with the sharp increasing of edge devices, the shortage of spectrum resource in edge network becomes seriously in recent years, which limits the performance improvement of EI. Refer to the NOMA-based edge computing (EC), integrating non-orthogonal multiple access (NOMA) technology with split inference in EI is attractive. However, the NOMA-based communication aspect and the influence of intermediate data transmission fail to be considered properly in model split inference of EI in previous works, and the sophistication in resource allocation caused by NOMA scheme makes it further complicated. Thus, the Effective Communication and Computing resource allocation algorithm is proposed in this paper for accelerating the split inference in NOMA-based EI, shorted as ECC. Specifically, the ECC takes the energy consumption and the inference latency into account to find the optimal model split strategy and resource allocation strategy (subchannel, transmission power, computing resource). Since the minimum inference delay and energy consumption cannot be satisfied simultaneously, the gradient descent (GD) based algorithm is adopted to find the optimal tradeoff between them. Moreover, the loop iteration GD approach (Li-GD) is developed to reduce the complexity of the GD algorithm caused by parameter discretization. The key idea of Li-GD is that: the initial value of the ith layer's GD procedure is selected from the optimal results of the former (i-1) layers' GD procedure whose intermediate data size is the closest to ith layer. Additionally, the properties of the proposed algorithms are investigated, including convergence, complexity, and approximation error. The experimental results demonstrate that the performance of ECC is much better than that of the previous studies.
- File Description:
  application/pdf
- ISSN:
  1558-2248
  1536-1276
- Accession Number:
  10.1109/twc.2024.3454086
- Rights:
  CC BY
- Accession Number:
  edsair.doi.dedup.....5419daa90162cfba8b9af535b008808f

Comments

No Comments.

High Efficiency Inference Accelerating Algorithm for NOMA-Based Edge Intelligence

Contact

Follow us