E-Scan: Consuming Contextual Data with Model Plugins

Ailamaki, Anastasia

conference paper

Sanca, Viktor

•

Ailamaki, Anastasia

August 1, 2023

Joint Workshops at 49th International Conference on Very Large Data Bases (VLDBW’23)

Second International Workshop on Composable Data Management Systems (CDMS’23)

Extracting value and insights from increasingly heterogeneous data sources involves multiple systems combining and consuming the data. With multi-modal and context-rich data such as strings, text, videos, or images, the problem of standardizing the data model and format for interchangeable use is further exacerbated by a non-uniform way of processing, extracting, and preserving content and context from the data. This makes the data movement, reuse, and exchange between different systems a non-composable, manual process. On the other hand, increasingly powerful and popular machine learning-driven data representation models map the input data into uniform high-dimensional vector embeddings for further processing, informed by particular models. However, using models is expensive, and the manual integration effort might exacerbate unnecessary costs. Thus, we propose E-Scan, a contextual data exchange plugin for using, exchanging, and caching context-rich data. We outline the need for a common interface that separates the concerns and allows smooth and cost-effective data exchange. First, while vector embeddings are context-less, the model information is saved to preserve the context and preprocessing steps. Next, a lightweight vector engine caches and stores the uniform intermediate data representation in a lazy way to lower the transformation and data access, exchange, and retrieval cost. Finally, a pull-based interface allows uniform data consumption between components under a common plugin interface. This way, various context-rich data types are stored, processed, and exchanged in a standardized way while allowing plugin-based customization for subsequent context interpretation.

Name

E_Scan_Consuming_Contextual_Data_with_Model_Plugins_CR.pdf

Type

Postprint

Version

http://purl.org/coar/version/c_ab4af688f83e57aa

Access type

openaccess

License Condition

CC BY

Size

457.8 KB

Format

Adobe PDF

Checksum (MD5)

416acf0c82e531d70c50c10d8ce31dd9