The paper presents the results of design explorations for the implementation of the Smith-Waterman (S-W) algorithm executing DNA and protein sequences alignment. Both design explorations studies and the corresponding FPGA implementations are obtained by writing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in the resource utilization and an increase in the computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and to system performance requirements. The FPGA design presented in the paper can efficiently scale up to 250 MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 Mega-Alignments/min using 36 S-W cores on the same FPGA for sequences of 160×100 nucleotides.