TY - JOUR
T1 - Dynamic motion estimation and evolution video prediction network
AU - Kim, Nayoung
AU - Kang, Je Won
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Future video prediction provides valuable information that helps a computer machine understand the surrounding environment and make critical decisions in real-time. However, long-term video prediction remains a challenging problem due to the complicated spatiotemporal dynamics in a video. In this paper, we propose a dynamic motion estimation and evolution (DMEE) network model to generate unseen future videos from the observed videos in the past. Our primary contribution is to use trained kernels in convolutional neural network (CNN) and long short-term memory (LSTM) architectures, adapted to each time step and sample position, to efficiently manage spatiotemporal dynamics. DMEE uses the motion estimation (ME) and motion update (MU) kernels to predict the future video using an end-to-end prediction-update process. In the prediction, the ME kernel estimates the temporal changes. In the update step, the MU kernel combines the estimates with the previously generated frames as reference frames using a weighted average. The kernels are not only used for a current frame, but also are evolved to generate successive frames to enable temporally specific filtering. We perform qualitative performance analysis and quantitative performance analysis based on the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and video classification score developed for examining the visual quality of the generated video. It is demonstrated with experiments that our algorithm provides better qualitative and quantitative performance superior to the current state-of-the-art algorithms. Our source codes are available in https://github.com/Nayoung-Kim-ICP/Video-Generation.
AB - Future video prediction provides valuable information that helps a computer machine understand the surrounding environment and make critical decisions in real-time. However, long-term video prediction remains a challenging problem due to the complicated spatiotemporal dynamics in a video. In this paper, we propose a dynamic motion estimation and evolution (DMEE) network model to generate unseen future videos from the observed videos in the past. Our primary contribution is to use trained kernels in convolutional neural network (CNN) and long short-term memory (LSTM) architectures, adapted to each time step and sample position, to efficiently manage spatiotemporal dynamics. DMEE uses the motion estimation (ME) and motion update (MU) kernels to predict the future video using an end-to-end prediction-update process. In the prediction, the ME kernel estimates the temporal changes. In the update step, the MU kernel combines the estimates with the previously generated frames as reference frames using a weighted average. The kernels are not only used for a current frame, but also are evolved to generate successive frames to enable temporally specific filtering. We perform qualitative performance analysis and quantitative performance analysis based on the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and video classification score developed for examining the visual quality of the generated video. It is demonstrated with experiments that our algorithm provides better qualitative and quantitative performance superior to the current state-of-the-art algorithms. Our source codes are available in https://github.com/Nayoung-Kim-ICP/Video-Generation.
KW - Convolutional Neural Network
KW - Deep learning
KW - Long Short-term Memory
KW - Long-term video generation and prediction
KW - Video understanding and analysis
UR - http://www.scopus.com/inward/record.url?scp=85120340538&partnerID=8YFLogxK
U2 - 10.1109/TMM.2020.3035281
DO - 10.1109/TMM.2020.3035281
M3 - Article
AN - SCOPUS:85120340538
SN - 1520-9210
VL - 23
SP - 3986
EP - 3998
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -