Traditional media, such as text, image, audio and video, have long been the main media resources and granted full support of standard desktop tools and applications. Interactive rich multimedia documents, adding resources such as video or synthetic animations and relying on complex synchronization among objects, are now making their entrance into the world as new multimedia formats emerge. In this context, the Synchronized Multimedia Integration Language (SMIL) is receiving more and more attention from content authors due to its fine property of multimedia synchronization and authoring interactivity for the content production. At the same time, MPEG-4 is designed to address the requirement of new generation of highly interactive multimedia applications, while simultaneously maintaining the support of traditional applications. MPEG-4 provides facilities (XMT and BIFS) to integrate and synchronize, spatially and temporally, many different media objects together. However, these facilities lack appropriate authoring tools to widen its audience and subsequently limit the application. In this paper, we present a comparative analysis between SMIL and XMT, the textual description of MPEG-4, to illustrate the pros and cons of these two major interactive media. And then we propose a conversion scheme from SMIL to the Binary Format for Scenes (BIFS) of MPEG-4 to take advantage of both formats. According to this scheme, we design the real implementation method using the current available tools and discuss the purpose and significance of such conversion.