Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Efficient Lineage Tracking for Scientific Workflows
 
conference paper

Efficient Lineage Tracking for Scientific Workflows

Heinis, Thomas  
•
Alonso, Gustavo
2008
Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD'08)
SIGMOD '08

Data lineage and data provenance are key to the management of scientific data. Not knowing the exact provenance and processing pipeline used to produce a derived data set often renders the data set useless from a scientific point of view. On the positive side, capturing provenance information is facilitated by the widespread use of workflow tools for processing scientific data. The workflow process describes all the steps involved in producing a given data set and, hence, captures its lineage. On the negative side, efficiently storing and querying workflow based data lineage is not trivial. All existing solutions use recursive queries and even recursive tables to represent the workflows. Such solutions do not scale and are rather inefficient.

In this paper we propose an alternative approach to storing lineage information captured as a workflow process. We use a space and query efficient interval representation for dependency graphs and show how to transform arbitrary workflow processes into graphs that can be stored using such representation. We also characterize the problem in terms of its overall complexity and provide a comprehensive performance evaluation of the approach.

  • Details
  • Metrics
Type
conference paper
DOI
10.1145/1376616.1376716
Author(s)
Heinis, Thomas  
Alonso, Gustavo
Date Issued

2008

Publisher

ACM

Published in
Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD'08)
ISBN of the book

978-1-60558-102-6

Start page

1007

End page

1018

Editorial or Peer reviewed

NON-REVIEWED

Written at

OTHER

EPFL units
DIAS  
Event nameEvent placeEvent date
SIGMOD '08

Vancouver, Canada

June

Available on Infoscience
September 7, 2009
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/42476
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés