The problem of approximately computing the $k$ dominant Fourier coefficients of a vector $X$ quickly, and using few samples in time domain, is known as the Sparse Fourier Transform (sparse FFT) problem. A long line of work on the sparse FFT has resulted in algorithms with $O(k\log n\log (n/k))$ runtime [Hassanieh \emph{et al.}, STOC'12] and $O(k\log n)$ sample complexity [Indyk \emph{et al.}, FOCS'14]. These results are proved using non-adaptive algorithms, and the latter $O(k\log n)$ sample complexity result is essentially the best possible under the sparsity assumption alone: It is known that even adaptive algorithms must use $\Omega((k\log(n/k))/\log\log n)$ samples [Hassanieh \emph{et al.}, STOC'12]. By {\em adaptive}, we mean being able to exploit previous samples in guiding the selection of further samples. This paper revisits the sparse FFT problem with the added twist that the sparse coefficients approximately obey a $(k_0,k_1)$-block sparse model. In this model, signal frequencies are clustered in $k_0$ intervals with width $k_1$ in Fourier space, and $k= k_0k_1$ is the total sparsity. Signals arising in applications are often well approximated by this model with $k_0\ll k$. Our main result is the first sparse FFT algorithm for $(k_0, k_1)$-block sparse signals with a sample complexity of $O^*(k_0k_1 + k_0\log(1+ k_0)\log n)$ at constant signal-to-noise ratios, and sublinear runtime. A similar sample complexity was previously achieved in the works on {\em model-based compressive sensing} using random Gaussian measurements, but used $\Omega(n)$ runtime. To the best of our knowledge, our result is the first sublinear-time algorithm for model based compressed sensing, and the first sparse FFT result that goes below the $O(k\log n)$ sample complexity bound. Interestingly, the aforementioned model-based compressive sensing result that relies on Gaussian measurements is non-adaptive, whereas our algorithm crucially uses {\em adaptivity} to achieve the improved sample complexity bound. We prove that adaptivity is in fact necessary in the Fourier setting: Any {\em non-adaptive} algorithm must use $\Omega(k_0k_1\log \frac{n}{k_0k_1})$ samples for the $(k_0,k_1$)-block sparse model, ruling out improvements over the vanilla sparsity assumption. Our main technical innovation for adaptivity is a new randomized energy-based importance sampling technique that may be of independent interest.