Transmitting massive amounts of image and audio data acquired by Internet-of-Everything (IoE) devices to data center servers for intelligent recognition processes is impractical for energy reasons, requiring in-situ processing of such data. However, algorithms accelerated by previous recognition processors [1, 2] are limited to specific applications, therefore, each IoE device may require an application-specific accelerator. On the other hand, deep convolutional neural networks (CNNs)  are a promising machine-learning approach, showing state-of-the-art recognition accuracy in a wide variety of applications, including both image and audio recognition. This makes CNNs a suitable candidate for a universal recognition platform for IoE devices, as described in Fig. 14.6.1. Due to the computational complexity and significant memory requirements of CNNs, a microcontroller unit (MCU) typically used for IoE devices is incapable of producing a meaningful recognition result in an energy-efficient way. Hence, the implementation of an energy-efficient CNN processor is desired to realize intelligent IoE systems.