In the last years, remote health monitoring is becoming an essential branch of health care with the rapid development of wearable sensors technology. To meet the demand of new more complex applications and ensuring adequate battery lifetime, wearable sensors have evolved into multi-core systems with advanced power-saving capabilities and additional heterogeneous components. In this paper, we present an approach that applies optimization and parallelization techniques uncovered by modern ultra-low power platforms in the SW layers with the goal of improving the mapping and reducing the energy consumption of biomedical applications. Additionally, we investigate the benefit of integrating domain-specific accelerators to further reduce the energy consumption of the most computationally expensive kernels. Using 30-second excerpts of signals from two public databases, we apply the proposed optimization techniques on well-known modules of biomedical benchmarks from the state-of-the-art and two complete applications. We observe speed-ups of 5.17x and energy savings of 41.6% for the multi-core implementation using a cluster of 8 cores with respect to single-core wearable sensor designs when processing a standard 12-lead ECG signal analysis. Additionally, we conclude that the minimum workload required to take advantage of parallelization for a hearbeat classifier corresponds to the processing of 3-lead ECG signals, with a speed-up of 2.96x and energy savings of 19.3%. Moreover, we observe additional energy savings of up to 7.75% and 16.8% by applying power management and memory scaling to the multi-core implementation of the 3-lead beat classifier and 12-lead ECG analysis, respectively. Finally, by integrating hardware (HW) acceleration we observe overall energy savings of up to 51.3% for the 12-lead ECG analysis.