Palestras de Colaboradores da Cloudera no SouJava

No dia 12 de Abril de 2014 tivemos a presença dos desenvolvedores Cloudera e Committers da Apache Aaron Myers e Todd Lipcon da Cloudera.

As palestras ministradas foram as seguintes :

  • Palestra: Introduction to Apache Hadoop – HDFS and Map/Reduce Fundamentals
  • Descrição: This talk will introduce the concept of Map/Reduce, a programming paradigm that enables the parallel processing of extremely large data sets. We’ll also introduce Hadoop’s implementation of Map/Reduce, and HDFS, the distributed file system that’s built into Hadoop to enable Map/Reduce. Nearly all of Hadoop is implemented in Java, and this talk will cover some of the details of writing a Map/Reduce job in Java.
  • Palestrante: Aaron Myers
  • Mini-Bio: Aaron T. Myers (aka ATM) is a Platform Software Engineer at Cloudera and an Apache Hadoop Committer/PMC Member at Apache. Aaron’s work is primarily focused on HDFS, High Availability, and Hadoop Security. Prior to joining Cloudera, Aaron was a Software Engineer and VP of Engineering at Amie Street, where he worked on all components of the software stack, including operations, infrastructure, and customer-facing feature development. Aaron holds both an Sc.B. and Sc.M. in Computer Science from Brown University.
  • Palestra: Beyond Map/Reduce: Introduction to Apache Crunch and Apache Spark
  • Descrição: Following Aaron’s talk, Todd will introduce Apache Crunch and Apache Spark. These two projects are higher-level frameworks which allow the programmer to express complex distributed data processing tasks on Hadoop in a more concise and simple manner than writing raw MapReduce jobs. Additionally, Todd will introduce Spark Streaming, a processing system which can run data flows on real-time data as it arrives. He will cover some example use cases that show how Hadoop can be used in such applications as real-time streaming data processing, machine learning, and model building.
  • Palestrante: Todd Lipcon
  • Mini-Bio: Todd Lipcon is an engineer at Cloudera who works on Core Hadoop as well as the Cloudera Distribution for Hadoop. Todd is also active in other Apache projects and is always excited to hear about the interesting ways in which people are using Hadoop for large scale data analysis. Previously, Todd came to Cloudera from Amie Street, where he worked on infrastructure, operations, data mining, and product development. Prior to that, he interned at Google developing machine learning methods to detect credit-card fraud on AdWords and Google Checkout. Todd holds a BSc in Computer Science from Brown University, where he completed an honors thesis developing a new collaborative filtering algorithm for the Netflix Prize Competition.

Video e Apresentações nos Links Abaixo:

Sobre Marcio Junior Vieira

Atualmente atua como Cientista de Dados da Ambiente Livre. Evangelista de tecnologias Open Source e Free Software desde 1999. Data Scientist, Data Engineer e Big Data Expert. Certified Pentaho Solutions Consultant. Alfresco ECM & Activiti BPM e Camunda BPM Expert. Scala, Java, PHP, Python and JavaScript Programmer.
Esta entrada foi publicada em Big Data, Hadoop, Open Source. Adicione o link permanente aos seus favoritos.

Deixe uma resposta

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *