POWER-AWARE DISTRIBUTED INFERENCE: A VIEW ON NEXT-WORD PREDICTION
Authors:
1Mrs. A Josh Mary,2Bhairathi Naga Satish, 3Chippada Dinesh, 4Bollipo Tejas,
Page No: 389-403
Abstract:
There has been a development of natural language processing (NLP) services, particularly those offered at the edge, driven by the need for high-quality AI generated content (AIGC) with quick response times. One common edge NLP function for mobile keyboards on user devices is next-word prediction, and we investigate distributed inference for this purpose to provide some context. In line with this, we optimize linked metrics, which include maximizing predicted click through rate (CTR) for better QoS, minimizing user impatient for increased QoE, and keeping energy usage within budget for sustainability. In addition, we think about the real-world scenario where nobody knows how accurate diverse NLP models are in making predictions. To quantify the prediction accuracy of models and balance the trade-offs among linked metrics, we provide a new distributed inference technique for online next-word prediction with user impatience (DONUT) that combines online learning and online control. Based on our theoretical analysis, DONUT accomplishes sub-linear regret (loss of CTR), guarantees limited user impatience, and keeps energy usage under budget. We prove DONUT's flexibility to different environments and show it outperforms other baseline techniques using numerical simulations. ,
Description:
.
Volume & Issue
Volume-14,Issue-4
Keywords
.