Interactive free viewpoint video offers the possibility for each user to independently choose the views of a 3D scene to be displayed at de- coder. The visual content is commonly represented by N texture and depth map pairs that capture different viewpoints. A server selects an appropriate subset of M ≤ N views for transmission, so that the user can freely navigate in the corresponding window of viewpoints without being affected by network delay. During navigation, a user can synthesize any intermediate virtual view image in the navigation window via depth-image-based rendering (DIBR) using two nearby camera views as references. When the available bandwidth is too small for the transmission of all camera views needed to synthesize views in the navigation window, we propose to synthesize intermedi- ate virtual views as new references for transmission—a re-sampling of viewpoints for the 3D scene—so that the synthesized view dis- tortion within the navigation window is minimised. We formulate a combinatorial optimization to find the best set of M virtual views to synthesize as new references, and show that the problem is NP- hard. We approximate the original problem with a new reference view equivalence model and derive in this case an optimal dynamic programming algorithm to determine to best set of M views to be transmitted to each user. Experimental results show that synthesiz- ing virtual views as new references for client-side view synthesis can outperform simple selection from camera views by up to 0.73dB in synthesized view quality.