According to the World Health Organization, lifestyle-related diseases, e.g., cardiovascular diseases are the major cause of mortality worldwide. An accurate and continuous medical supervision is highly required for diagnosis and treatment of such diseases. Our traditional healthcare delivery systems, however can’t cope with consequential increasing healthcare costs and medical management needs. Personal health monitoring systems are poised to offer large-scale and cost-effective solutions to this problem. The use of wearable, miniaturized and autonomous wireless sensor nodes, featuring continuous on-node analysis of biosignals, can indeed provide ambulatory long-term and real-time monitoring required by the patients, and enables faster coordination with medical personnel. In such autonomous nodes, due to very limited available energy resources and costly wireless transmission, an ultra-low-power (ULP) on-node processing platform for advanced biosignal analysis is crucial. In this thesis, I explore ULP processing architectures for on-node biosignal analysis applications; where commonly, moderately complex arithmetic manipulations on single- or multiple- input signals are carried out. To achieve energy efficiency while providing sufficient processing capability to apply advanced biosignal analysis, in this thesis near-threshold (near-Vt h ) computing is exploited. Hence, severe performance degradation and reliability issues, occurring at deeply scaled voltages, can be avoided. In Chapter 3, I introduce a near-Vth computing single-core architecture, consisting of a ULP core, an instruction memory (IM) and a data memory (DM). The ULP core features an instruction set architecture (ISA) customized for biosignal applications. I explore that an ISA with minimal instruction set achieves considerable energy savings compared to the state-of-the-art cores, when executing biosignal applications (i.e., up to 54% compared to an established ISA). The proposed single-core architecture accomplishes high energy efficiency for most of single-input biosignal analysis applications, since it fully exploits near-Vth computing. However, the single-core architecture achieves limited voltage scaling, hence reduced energy awareness, for most of multiple-input biosignal analysis applications, where computational workload requirements are such high that the single-core architecture can’t attain these throughputs in near-Vth regime. To alleviate the performance degradation issue that prevents the single-core architecture from exploiting near-Vt h computing typically for multiple-input biosignal analysis, I propose parallel processing of biosignals on multi-core architectures. To this end, In Chapter 4, a multiple instruction, multiple data (MIMD) multi-core architecture is introduced. The MIMD architecture comprises several ULP cores, individual IMs, and a multi-bank DM shared through a lightweight interconnect between the cores and the DM. I prove that parallel processing of multiple-input biosignals leads to better energy efficiency than the sequential processing (i.e., on a single-core) for moderate and high biosignal workloads. In particular, the MIMD architecture achieves up to 62% power savings with respect to the single-core architecture for high biosignal workloads (i.e., 167 MOps/s). On the other hand, parallel processing of multiple-input biosignals can be penalized at low workloads due to high leakage power dissipation in multi-core architectures. In particular, the MIMD architecture fails against the single-core architecture in terms of energy efficiency for workloads lighter than 1.7 MOps/s. One of the major burden of power dissipation in MIMD architectures is costly multiple instruction fetch. To mitigate this issue, I propose data-level parallelism through single instruction, multiple data (SIMD) paradigm. To this end, in Chapter 4 a novel hybrid multi-core architecture, that supports SIMD and MIMD operations, is introduced. The SIMD operations, coupled with data and instruction broadcasting, enable coordinated multiple accesses to memories, hence reduced instruction fetch power. Additionally, the hybrid multi-core architecture features partial power gating of memories to achieve leakage power savings, vital at low workloads (a few 100 kOps/s). I show that SIMD processing of multiple-input biosignals leads to better energy efficiency compared to the MIMD processing. In particular, when SIMD operations are exploited, the hybrid multi-core architecture achieves up to 45.7% power saving compared to the MIMD architecture for moderate biosignal workloads. I also ascertain that partial power gating of memories is an effective technique to alleviate leakage issue in multi-core architectures. More specifically, partial power gating of the IM in the hybrid multi-core architecture leads to 38.8% power saving at low workloads. Finally, to alleviate issues with applications involving such program parts that limit SIMD execution of applications (i.e., conditional program parts), I propose to resynchronize the cores for stable lockstep code execution in case of synchronization loss. Hence, SIMD operations are exploited even for applications with conditional program parts. To this end, in Chapter 4 a lightweight software-directed hardware synchronizer is introduced. I reveal that for applications with conditional program parts, lockstep SIMD execution accomplishes up to 64% power saving with respect to the elementary SIMD execution at moderate workloads (i.e.,89 MOps/s).