TY - JOUR
T1 - Q-LAtte
T2 - An Efficient and Versatile LSTM Model for Quantized Attention-Based Time Series Forecasting in Building Energy Applications
AU - Kang, Jieui
AU - Park, Jihye
AU - Choi, Soeun
AU - Sim, Jaehyeong
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Long Short-Term Memory (LSTM) networks, coupled with attention mechanisms, have demonstrated their proficiency in handling time-series data, particularly in the architectural energy prediction industry. However, their high computational complexity and resource-intensive nature pose significant challenges for real-time applications and on edge devices. Traditional methods of mitigating these issues, such as quantization, often lead to a compromise on model performance due to approximation errors introduced during the process. In this paper, we propose Q-LAtte, a novel, quantization-friendly attention-based LSTM model, as a solution to these challenges. Q-LAtte incorporates an innovative approach to quantization that preserves the efficiency benefits while significantly reducing the performance degradation typically associated with standard quantization techniques. The key to its superior performance lies in its distribution-aware quantization process. By effectively conserving the output distribution of the model parameters before and after quantization, Q-LAtte ensures the retention of subtle but significant variations integral to decision-making processes like prediction or classification. Compared to traditional quantized models, Q-LAtte exhibits a notable improvement in performance. Specifically, our method reduces the Mean Average Percentage Error (MAPE) from 17.56 to 8.48 and the Mean Absolute Scaled Error (MASE) by 48%, while minimizing the time cost. These results highlight the efficacy of Q-LAtte in striking a balance between efficiency and accuracy, significantly enhancing the feasibility of deploying attention-LSTM networks on resource-constrained devices for real-time, on-site data analysis and decision-making.
AB - Long Short-Term Memory (LSTM) networks, coupled with attention mechanisms, have demonstrated their proficiency in handling time-series data, particularly in the architectural energy prediction industry. However, their high computational complexity and resource-intensive nature pose significant challenges for real-time applications and on edge devices. Traditional methods of mitigating these issues, such as quantization, often lead to a compromise on model performance due to approximation errors introduced during the process. In this paper, we propose Q-LAtte, a novel, quantization-friendly attention-based LSTM model, as a solution to these challenges. Q-LAtte incorporates an innovative approach to quantization that preserves the efficiency benefits while significantly reducing the performance degradation typically associated with standard quantization techniques. The key to its superior performance lies in its distribution-aware quantization process. By effectively conserving the output distribution of the model parameters before and after quantization, Q-LAtte ensures the retention of subtle but significant variations integral to decision-making processes like prediction or classification. Compared to traditional quantized models, Q-LAtte exhibits a notable improvement in performance. Specifically, our method reduces the Mean Average Percentage Error (MAPE) from 17.56 to 8.48 and the Mean Absolute Scaled Error (MASE) by 48%, while minimizing the time cost. These results highlight the efficacy of Q-LAtte in striking a balance between efficiency and accuracy, significantly enhancing the feasibility of deploying attention-LSTM networks on resource-constrained devices for real-time, on-site data analysis and decision-making.
KW - Artificial Intelligence
KW - building energy
KW - deep learning acceleration
KW - optimization
KW - quantization
UR - http://www.scopus.com/inward/record.url?scp=85193293459&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3400588
DO - 10.1109/ACCESS.2024.3400588
M3 - Article
AN - SCOPUS:85193293459
SN - 2169-3536
VL - 12
SP - 69325
EP - 69341
JO - IEEE Access
JF - IEEE Access
ER -