A Gaussian Source Coding Perspective on Caching and Total Correlation

Communication technology has advanced up to a point where children are getting unfamiliar with themost iconic symbol in IT: the loading icon. We no longer wait for something to come on TV, nor for a download to complete. All the content we desire is available in instantaneous and personalized streams. Whereas users benefit tremendously from the increased freedom, the network suffers. Not only do personalized data streams increase the load overall, the instantaneous aspect concentrates traffic around peak hours. The heaviest (mostly video) applications are used predominantly during the evening hours. Caching is a tool to balance traffic without compromising the ‘on-demand’ aspect of content delivery; by sending data in advance a server can avoid peak traffic. The challenge is, of course, that in advance the server has no clue what data the user might be interested in. We study this problem in a lossy source coding setting with Gaussian sources specifically, using amodel based on the Gray–Wyner network. Ultimately caching is a trade-off between anticipating the precise demand through user habits versus ‘more bang for buck’ by exploiting correlation among the files in the database. For two Gaussian sources and using Gaussian codebooks we derive this trade-off completely. Particularly interesting is the case when the user has no preference for some content a-priori, caching then becomes an application of the concepts ofWyner’s common information and Watanabe’s total correlation. We study these concepts in databases of more than two sources where we derive that caching all of the information shared by multiple Gaussians is easy, whereas caching some is hard. We characterize the former, provide an inner bound for the latter and conjecture for which class of Gaussians it is tight. Later we also study how to most efficiently capture the total correlation that exists between two sets of Gaussians. As a final chapter, we study the applicability of caching of discrete information sources by actually building such algorithms, using convolutional codes to ‘cache and compress’. We provide a proof of concept of the practicality for doubly symmetric and circularly symmetric binary sources. Lastly we provide a discussion on challenges to be overcome for generalizing such algorithms.

Related material