This is a candidate session. ScalaMatsuri selects sessions using as a reference participants voting later.

日本語

General purpose resource management on Hadoop with Akka Cluster

Is there a life beyond Apache Spark in the Hadoop ecosystem? What to do when your scalability requirements are driven by the computational load and not by the size of your data? In this talk we’ll uncover fundamental techniques of our Scala-based enterprise predictive modeling system. We’ll demonstrate how using YARN and Akka one can build and execute custom distributed workflows with strict requirements for scalability and high availability. We’ll discuss tools that can be used for this purpose as well as demonstrate some specific use cases of when this approach might be especially useful.

Session length

40 minutes

Language of the presentation

English

Target audience

Intermediate: Requires a basic knowledge of the area

Who is your session intended to

Especially useful for people who're constrained within pure Hadoop infrastructure and found Spark not to be sufficient/convenient for some distributed workflows.
People who already use or are interested in Hadoop/Spark/Akka.

Speaker

Iaroslav Zeigerman (dotData Inc. - Senior Software Engineer)

First time speaker

Contributes

Candidate sessions