3D ultrasound imaging is quickly becoming a refer- ence technique for high-quality, accurate, expressive diagnostic medical imaging. Unfortunately, its computation requirements are huge and, today, demand expensive, power-hungry, bulky processing resources. A key bottleneck is the receive beamforming operation, which requires the application of many permutations of fine-grained delays among the digitized received echoes. To apply these delays in the digital domain, in principle large tables (billions of coefficients) are needed, and the access bandwidth to these tables can reach multiple TB/s, meaning that their storage both on-chip and off-chip is impractical. However, smarter implementations of the delay generation function, including forgoing the tables altogether, are possible. In this paper we explore efficient strategies to compute the delay function that controls the reconstruction of the image, and present a feasibility analysis for an FPGA platform.