Abstract

The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than $600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 "Biotechnology" has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.

Details