In this paper, we present a particle filter that exploits multi modal information for robust target tracking. We demonstrate a Bayesian framework for combining acoustic and video information using a state space approach. A proposal strategy for joint acoustic and video state-space tracking using particle filters is given by carefully placing the random support of the joint filter where the final posterior is likely to lie. By using the Kullback-Leibler divergence measure, it is shown that the joint filter posterior estimate decreases the worst case divergence of the individual modalities. Hence, the joint tracking filter is robust against video and acoustic occlusions. We also introduce a time-delay variable to the joint state space to handle the acoustic-video data synchronization issue, caused by acoustic propagation delay. Computer simulations are presented with field and synthetic data to demonstrate the filter’s performance.