*banner
 

Kepler/G-Pack: A Kepler Package Using the Google Cloud for Interactive Scientific Workflows
Gongjing Cao, Lei Dou, Quinn Hart, Bertram Ludaescher

Citation
Gongjing Cao, Lei Dou, Quinn Hart, Bertram Ludaescher. "Kepler/G-Pack: A Kepler Package Using the Google Cloud for Interactive Scientific Workflows". Talk or presentation, 16, February, 2011; Presented at the Ninth Biennial Ptolemy Miniconference, Berkeley, CA.

Abstract
Scientific workflows often aim at fully automating complex data analysis pipelines. However, depending on the nature of the scientific workflow, interactive steps, i.e., which involve human analysis and decision making within the workflow are also common. So far, Kepler has had limited capabilities for letting users become "actors" in a scientific workflow. The Kepler/G-Pack (Google package) is a set of actors that leverage a simple form of cloud computing via Google-Apps and Google-Docs, effectively "outsourcing" Kepler tasks to Google resources. With this package, many tasks and steps that users would use the Google document and computational cloud for can now be orchestrated by Kepler programmatically. For example, via certain actors in this package, Google spreadsheets can be used as data sources, sinks, or computational steps ("transformers") in a Kepler workflow. Additional functionalities include: creating a new copy of a spreadsheet from a template, sharing a spreadsheet with another user, emailing the user to notify the sharing, provide feedback to the workflow that a human interactive session has finished ("committed") so the workflow instance may proceed, visualizing analysis outputs via Google charts etc. We will demonstrate the capabilities of the Kepler/G-Pack using different workflows, e.g., (1) a "Data Curation Workflow" in which biological specimen records are semi-automatically curated: The upstream part of the workflow groups records, identifying "curation work packages" that curators can validate or manually revise (this workflow shows the use of spreadsheets for data fusion and manual data cleaning within the workflow); (2) an "Evapotranspiration Workflow" which shows how Google spreadsheets can be used for remote computation and visualization.

Electronic downloads

Citation formats  
  • HTML
    Gongjing Cao, Lei Dou, Quinn Hart, Bertram Ludaescher. <a
    href="http://chess.eecs.berkeley.edu/pubs/814.html"><i>Kepler/G-Pack:
    A Kepler Package Using the Google Cloud for Interactive
    Scientific Workflows</i></a>, Talk or
    presentation,  16, February, 2011; Presented at the <a
    href="http://ptolemy.eecs.berkeley.edu/conferences/11"
    >Ninth Biennial Ptolemy Miniconference</a>,
    Berkeley, CA.
  • Plain text
    Gongjing Cao, Lei Dou, Quinn Hart, Bertram Ludaescher.
    "Kepler/G-Pack: A Kepler Package Using the Google Cloud
    for Interactive Scientific Workflows". Talk or
    presentation,  16, February, 2011; Presented at the <a
    href="http://ptolemy.eecs.berkeley.edu/conferences/11"
    >Ninth Biennial Ptolemy Miniconference</a>,
    Berkeley, CA.
  • BibTeX
    @presentation{CaoDouHartLudaescher11_KeplerGPackKeplerPackageUsingGoogleCloudForInteractive,
        author = {Gongjing Cao and Lei Dou and Quinn Hart and
                  Bertram Ludaescher},
        title = {Kepler/G-Pack: A Kepler Package Using the Google
                  Cloud for Interactive Scientific Workflows},
        day = {16},
        month = {February},
        year = {2011},
        note = {Presented at the <a
                  href="http://ptolemy.eecs.berkeley.edu/conferences/11"
                  >Ninth Biennial Ptolemy Miniconference</a>,
                  Berkeley, CA.},
        abstract = {Scientific workflows often aim at fully automating
                  complex data analysis pipelines. However,
                  depending on the nature of the scientific
                  workflow, interactive steps, i.e., which involve
                  human analysis and decision making within the
                  workflow are also common. So far, Kepler has had
                  limited capabilities for letting users become
                  "actors" in a scientific workflow. The
                  Kepler/G-Pack (Google package) is a set of actors
                  that leverage a simple form of cloud computing via
                  Google-Apps and Google-Docs, effectively
                  "outsourcing" Kepler tasks to Google resources.
                  With this package, many tasks and steps that users
                  would use the Google document and computational
                  cloud for can now be orchestrated by Kepler
                  programmatically. For example, via certain actors
                  in this package, Google spreadsheets can be used
                  as data sources, sinks, or computational steps
                  ("transformers") in a Kepler workflow. Additional
                  functionalities include: creating a new copy of a
                  spreadsheet from a template, sharing a spreadsheet
                  with another user, emailing the user to notify the
                  sharing, provide feedback to the workflow that a
                  human interactive session has finished
                  ("committed") so the workflow instance may
                  proceed, visualizing analysis outputs via Google
                  charts etc. We will demonstrate the capabilities
                  of the Kepler/G-Pack using different workflows,
                  e.g., (1) a "Data Curation Workflow" in which
                  biological specimen records are semi-automatically
                  curated: The upstream part of the workflow groups
                  records, identifying "curation work packages" that
                  curators can validate or manually revise (this
                  workflow shows the use of spreadsheets for data
                  fusion and manual data cleaning within the
                  workflow); (2) an "Evapotranspiration Workflow"
                  which shows how Google spreadsheets can be used
                  for remote computation and visualization.},
        URL = {http://chess.eecs.berkeley.edu/pubs/814.html}
    }
    

Posted by Christopher Brooks on 18 Feb 2011.
Groups: ptolemy
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess