This article presents an ultra-low-power parallel computing platform and its system-on-chip (SoC) embodiment, targeting a wide range of emerging near-sensor processing tasks for Internet of Things (IoT) applications. The proposed SoC achieves 193 million operations per second (MOPS) per mW at 162 MOPS (32 bits), improving the first-generation Parallel Ultra-Low-Power (PULP) architecture by 6.4 and 3.2 times in performance and energy efficiency, respectively.