Abstract
Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics. In this paper, we propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks. To capture the local motion pattern of objects, we devise local filter memory networks that generate adaptive filter kernels by storing the prototypical motion of moving objects in the memory. We further present global context propagation networks that iteratively aggregate the non-local neighboring representations to preserve the contextual information over the past frames. The proposed framework, utilizing the outputs from both networks, can address blurry predictions and color distortion. We conduct experiments on Caltech pedestrian and UCF101 datasets, and demonstrate state-of-the-art results. Especially for multi-step prediction, we obtain an outstanding performance in quantitative and qualitative evaluation.
| Original language | English |
|---|---|
| State | Published - 2021 |
| Event | 32nd British Machine Vision Conference, BMVC 2021 - Virtual, Online Duration: 22 Nov 2021 → 25 Nov 2021 |
Conference
| Conference | 32nd British Machine Vision Conference, BMVC 2021 |
|---|---|
| City | Virtual, Online |
| Period | 22/11/21 → 25/11/21 |
Bibliographical note
Publisher Copyright:© 2021. The copyright of this document resides with its authors.