*banner
 

Data Flow in the Data Center
Adam Cataldo

Citation
Adam Cataldo. "Data Flow in the Data Center". Talk or presentation, 7, November, 2013; Presented at the 10th Biennial Ptolemy Miniconference, Berkeley. .

Abstract
In this talk, I will give an overview of Cascading, a data-flow abstraction used to generate MapReduce jobs on Hadoop. I’ll explain how we’re using Cascading to drive analytics at Wealthfront, the world’s largest software-based financial advisor. Then I'll dive into some of the technical problems we had to solve to map Cascading into our workflows. In particular, I'll describe how we move data back and forth between our Cascading cluster and other production systems, how we test our Cascading jobs for correctness, and how we tune for performance to meet real-time constraints.

Electronic downloads

Citation formats  
  • HTML
    Adam Cataldo. <a
    href="http://chess.eecs.berkeley.edu/pubs/1029.html"><i>Data
    Flow in the Data Center</i></a>, Talk or
    presentation,  7, November, 2013; Presented at the <a
    href="http://ptolemy.org/conferences/13" >10th
    Biennial Ptolemy Miniconference</a>, Berkeley.
    .
  • Plain text
    Adam Cataldo. "Data Flow in the Data Center". Talk
    or presentation,  7, November, 2013; Presented at the <a
    href="http://ptolemy.org/conferences/13" >10th
    Biennial Ptolemy Miniconference</a>, Berkeley.
    .
  • BibTeX
    @presentation{Cataldo13_DataFlowInDataCenter,
        author = {Adam Cataldo},
        title = {Data Flow in the Data Center},
        day = {7},
        month = {November},
        year = {2013},
        note = {Presented at the <a
                  href="http://ptolemy.org/conferences/13" >10th
                  Biennial Ptolemy Miniconference</a>, Berkeley.
    },
        abstract = {In this talk, I will give an overview of
                  Cascading, a data-flow abstraction used to
                  generate MapReduce jobs on Hadoop. I’ll explain
                  how we’re using Cascading to drive analytics at
                  Wealthfront, the world’s largest software-based
                  financial advisor. Then I'll dive into some of the
                  technical problems we had to solve to map
                  Cascading into our workflows. In particular, I'll
                  describe how we move data back and forth between
                  our Cascading cluster and other production
                  systems, how we test our Cascading jobs for
                  correctness, and how we tune for performance to
                  meet real-time constraints.},
        URL = {http://chess.eecs.berkeley.edu/pubs/1029.html}
    }
    

Posted by Barb Hoversten on 16 Nov 2013.
Groups: chess
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess