*banner
 

A Provenance Framework to Capture, Store, Query, and Browse Data Lineage in Kepler
Manish Anand

Citation
Manish Anand. "A Provenance Framework to Capture, Store, Query, and Browse Data Lineage in Kepler". Talk or presentation, 16, April, 2009; Presented at the 8th Biennial Ptolemy Miniconference.

Abstract
Many scientific workflow systems can record provenance (i.e., the processing history) of workflow runs. An important use of provenance information is to support reproducibility and verification of scientific results. These goals require that data lineage information be captured as part of workflow provenance. However, for complex workflows and large data sets, provenance information including lineage can grow quickly and its storage can be prohibitively expensive. Thus, it is important to provide efficient techniques for storing as well as querying provenance information. We describe a provenance framework implemented within Kepler that can be used to record and store provenance of workflow runs. In addition, our framework provides techniques that significantly reduce the storage size of provenance store, a declarative language to query provenance and access data lineage information, and a visualization tool to display and browse relevant provenance views including data and process dependency graphs. We demonstrate our approach over real-world bioinformatics workflows.

Electronic downloads

Citation formats  
  • HTML
    Manish Anand. <a
    href="http://chess.eecs.berkeley.edu/pubs/557.html"
    ><i>A Provenance Framework to Capture, Store,
    Query, and Browse Data Lineage in
    Kepler</i></a>, Talk or presentation,  16,
    April, 2009; Presented at the 8th Biennial Ptolemy
    Miniconference.
  • Plain text
    Manish Anand. "A Provenance Framework to Capture,
    Store, Query, and Browse Data Lineage in Kepler". Talk
    or presentation,  16, April, 2009; Presented at the 8th
    Biennial Ptolemy Miniconference.
  • BibTeX
    @presentation{Anand09_ProvenanceFrameworkToCaptureStoreQueryBrowseDataLineage,
        author = {Manish Anand},
        title = {A Provenance Framework to Capture, Store, Query,
                  and Browse Data Lineage in Kepler},
        day = {16},
        month = {April},
        year = {2009},
        note = {Presented at the 8th Biennial Ptolemy
                  Miniconference},
        abstract = {Many scientific workflow systems can record
                  provenance (i.e., the processing history) of
                  workflow runs. An important use of provenance
                  information is to support reproducibility and
                  verification of scientific results. These goals
                  require that data lineage information be captured
                  as part of workflow provenance. However, for
                  complex workflows and large data sets, provenance
                  information including lineage can grow quickly and
                  its storage can be prohibitively expensive. Thus,
                  it is important to provide efficient techniques
                  for storing as well as querying provenance
                  information. We describe a provenance framework
                  implemented within Kepler that can be used to
                  record and store provenance of workflow runs. In
                  addition, our framework provides techniques that
                  significantly reduce the storage size of
                  provenance store, a declarative language to query
                  provenance and access data lineage information,
                  and a visualization tool to display and browse
                  relevant provenance views including data and
                  process dependency graphs. We demonstrate our
                  approach over real-world bioinformatics workflows. },
        URL = {http://chess.eecs.berkeley.edu/pubs/557.html}
    }
    

Posted by Christopher Brooks on 17 Apr 2009.
Groups: ptolemy
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess