*banner
 

Optimizing Comad for growing input data sets
Sean Riddle, Amber Hartman, Timothy McPhillips, Shawn Bowers, David Welker, Bertram Ludaescher

Citation
Sean Riddle, Amber Hartman, Timothy McPhillips, Shawn Bowers, David Welker, Bertram Ludaescher. "Optimizing Comad for growing input data sets". Talk or presentation, 16, April, 2009; Poster presented at the 8th Biennial Ptolemy Miniconference.

Abstract
I demonstrate performance gains that can be realized in the real-world usage of the collection-oriented modeling and design (Comad) paradigm of workflow design in the microbial ecology domain. Comad is a PN-based workflow modeling domain in which actors work on a stream of nested data collections (akin to XML), reading and updating the data stream only at declaratively specified locations (we use XPath-like expression to define actor scopes). The resulting modeling approach often leads to conceptually simpler process pipelines. The user runs the microbial ecology workflow multiple times over the course of several months over an increasingly large dataset that includes data input to previous runs, and intermediate results that have already been computed will be intelligently reused. If the final dataset were run with no previously computed intermediate results, it would take slightly over one month. For efficient scalability, token data are stored in a database and only references are passed. Previous Comad workflows were characterizable as directed acyclic graphs; if they branched, they could not come together again. In certain instances, parallel processing can be achieved only by splitting the single pipeline into multiple branches and merging after the explicitly parallelized section. In addition to support for branch merging, alterations were made to the Comad variant of the PN director to ensure that these branches execute independently. These optimizations result in conservative use of memory and computational resources, allowing the use of the workflow at real-world scales.

Electronic downloads

Citation formats  
  • HTML
    Sean Riddle, Amber Hartman, Timothy McPhillips, Shawn
    Bowers, David Welker, Bertram Ludaescher. <a
    href="http://chess.eecs.berkeley.edu/pubs/566.html"
    ><i>Optimizing Comad for growing input data
    sets</i></a>, Talk or presentation,  16, April,
    2009; Poster presented at the 8th Biennial Ptolemy
    Miniconference.
  • Plain text
    Sean Riddle, Amber Hartman, Timothy McPhillips, Shawn
    Bowers, David Welker, Bertram Ludaescher. "Optimizing
    Comad for growing input data sets". Talk or
    presentation,  16, April, 2009; Poster presented at the 8th
    Biennial Ptolemy Miniconference.
  • BibTeX
    @presentation{RiddleHartmanMcPhillipsBowersWelkerLudaescher09_OptimizingComadForGrowingInputDataSets,
        author = {Sean Riddle and Amber Hartman and Timothy
                  McPhillips and Shawn Bowers and David Welker and
                  Bertram Ludaescher},
        title = {Optimizing Comad for growing input data sets},
        day = {16},
        month = {April},
        year = {2009},
        note = {Poster presented at the 8th Biennial Ptolemy
                  Miniconference.},
        abstract = {I demonstrate performance gains that can be
                  realized in the real-world usage of the
                  collection-oriented modeling and design (Comad)
                  paradigm of workflow design in the microbial
                  ecology domain. Comad is a PN-based workflow
                  modeling domain in which actors work on a stream
                  of nested data collections (akin to XML), reading
                  and updating the data stream only at declaratively
                  specified locations (we use XPath-like expression
                  to define actor scopes). The resulting modeling
                  approach often leads to conceptually simpler
                  process pipelines. The user runs the microbial
                  ecology workflow multiple times over the course of
                  several months over an increasingly large dataset
                  that includes data input to previous runs, and
                  intermediate results that have already been
                  computed will be intelligently reused. If the
                  final dataset were run with no previously computed
                  intermediate results, it would take slightly over
                  one month. For efficient scalability, token data
                  are stored in a database and only references are
                  passed. Previous Comad workflows were
                  characterizable as directed acyclic graphs; if
                  they branched, they could not come together again.
                  In certain instances, parallel processing can be
                  achieved only by splitting the single pipeline
                  into multiple branches and merging after the
                  explicitly parallelized section. In addition to
                  support for branch merging, alterations were made
                  to the Comad variant of the PN director to ensure
                  that these branches execute independently. These
                  optimizations result in conservative use of memory
                  and computational resources, allowing the use of
                  the workflow at real-world scales. },
        URL = {http://chess.eecs.berkeley.edu/pubs/566.html}
    }
    

Posted by Christopher Brooks on 17 Apr 2009.
Groups: ptolemy
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess