Data Flow in the Data Center

Data Flow in the Data Center
Adam Cataldo

Citation
Adam Cataldo. "Data Flow in the Data Center". Talk or presentation, 7, November, 2013; Presented at the 10th Biennial Ptolemy Miniconference, Berkeley. .

Abstract
In this talk, I will give an overview of Cascading, a data-flow abstraction used to generate MapReduce jobs on Hadoop. I’ll explain how we’re using Cascading to drive analytics at Wealthfront, the world’s largest software-based financial advisor. Then I'll dive into some of the technical problems we had to solve to map Cascading into our workflows. In particular, I'll describe how we move data back and forth between our Cascading cluster and other production systems, how we test our Cascading jobs for correctness, and how we tune for performance to meet real-time constraints.

Electronic downloads

Cataldo_DataFlowInTheDataCenter_PtolemyMConf2013.pdf · application/pdf · 3018 kbytes

Citation formats

HTML

Adam Cataldo. <a
href="http://chess.eecs.berkeley.edu/pubs/1029.html"><i>Data
Flow in the Data Center</i></a>, Talk or
presentation,  7, November, 2013; Presented at the <a
href="http://ptolemy.org/conferences/13" >10th
Biennial Ptolemy Miniconference</a>, Berkeley.
.

Plain text

Adam Cataldo. "Data Flow in the Data Center". Talk
or presentation,  7, November, 2013; Presented at the <a
href="http://ptolemy.org/conferences/13" >10th
Biennial Ptolemy Miniconference</a>, Berkeley.
.

BibTeX

@presentation{Cataldo13_DataFlowInDataCenter,
    author = {Adam Cataldo},
    title = {Data Flow in the Data Center},
    day = {7},
    month = {November},
    year = {2013},
    note = {Presented at the <a
              href="http://ptolemy.org/conferences/13" >10th
              Biennial Ptolemy Miniconference</a>, Berkeley.
},
    abstract = {In this talk, I will give an overview of
              Cascading, a data-flow abstraction used to
              generate MapReduce jobs on Hadoop. Iâ��ll explain
              how weâ��re using Cascading to drive analytics at
              Wealthfront, the worldâ��s largest software-based
              financial advisor. Then I'll dive into some of the
              technical problems we had to solve to map
              Cascading into our workflows. In particular, I'll
              describe how we move data back and forth between
              our Cascading cluster and other production
              systems, how we test our Cascading jobs for
              correctness, and how we tune for performance to
              meet real-time constraints.},
    URL = {http://chess.eecs.berkeley.edu/pubs/1029.html}
}

Posted by Barb Hoversten on 16 Nov 2013.
Groups: chess
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.