In medical diagnosis, ultrasound (US) imaging is one of the most common, safe, and powerful techniques. Volumetric (3D) US imaging, an emerging technique, is even more attractive than standard 2D imaging, as it allows for imaging without the local presence of a trained sonographer finely positioning the probe. This would be particularly useful in rescue operations, remote areas and developing countries. Unfortunately, present-day 3D imagers are expensive, bulky and power-hungry, confining them to hospitals. There is therefore a strong motivation to develop efficient electronics to enable a portable US platform that is small, cheap, and battery- operated. Beamforming (BF) is the most computationally expensive of 3D imaging. Both commercial [1] and research [2] imagers have dealt with the challenge by reducing the number of receive channels, hence simplifying the computation through the usage of far fewer elements. This comes at the cost of image quality, and the resulting machines are nonetheless still non-portable and expensive. In turn, the bottleneck of the BF process is the calculation of acoustic delays, which requires up to trillions of square roots per second. We propose a drastically more efficient architecture [3]. With geometric considerations, each delay is calculated from a small set of square roots (mapped onto CORDICs), plus two additions. In this demo, we will show the reconstruction of a 2.5M-voxel volume, supporting a transducer with 32×32 receive channels. We have fitted the architecture into a single Kintex UltraScale KU040 [4], which is unprecedented. We also extrapolated the utilization of a 80×80 instance on a Virtex UltraScale XCVU190 [4]. Table I shows the implementation results. Fig. 1 shows our beamformer custom block connected to the other FPGA subsystems. The delay calculation architecture is shown in Fig. 2. The demo setup is presented in Fig. 3, where the 3D beamformer is implemented on the FPGA, while the pre- and post-processing stages are currently performed on Matlab.