Ultrasound imaging is a reference medical diagnostic technique, thanks to its blend of versatility, effectiveness, and moderate cost. The core computation of all ultrasound imaging methods is based on simple formulae, except for those required to calculate acoustic propagation delays with high precision and throughput. Unfortunately, advanced three-dimensional (3-D) systems require the calculation or storage of billions of such delay values per frame, which is a challenge. In 2-D systems, this requirement can be four orders of magnitude lower, but efficient computation is still crucial in view of low-power implementations that can be battery-operated, enabling usage in numerous additional scenarios. In this paper, we explore two smart designs of the delay generation function. To quantify their hardware cost, we implement them on FPGA and study their footprint and performance. We evaluate how these architectures scale to different ultrasound applications, from a low-power 2-D system to a next-generation 3-D machine. When using numerical approximations, we demonstrate the ability to generate delay values with sufficient throughput to support 10 000-channel 3-D imaging at up to 30 fps while using 63% of a Virtex 7 FPGA, requiring 24 MB of external memory accessed at about 32 GB/s bandwidth. Alternatively, with similar FPGA occupation, we show an exact calculation method that reaches 24 fps on 1225-channel 3-D imaging and does not require external memory at all. Both designs can be scaled to use a negligible amount of resources for 2-D imaging in low-power applications and for ultrafast 2-D imaging at hundreds of frames per second.