This is a candidate session. ScalaMatsuri selects sessions using as a reference participants voting later.

日本語

Scaling TB's of data with Apache Spark and Scala DSL at Production

Apache HBase is columnar NoSQL, widely adopted in industry and research as a Data lake in scaling data processing platforms. As HBase components are written in Scala and Apache Spark also written in Scala as a base, It’s real pleasure to understand the beauty of functional Scala DSL with Spark and HBase. We will also talk about case study: Scaling 30 TB’s and 4.6 billion events per day with Apache HBase as a base data lake solution in integration with Apache Kafka, Apache Spark / Spark Streaming.

Session length

40 minutes

Language of the presentation

English

Target audience

Intermediate: Requires a basic knowledge of the area

Who is your session intended to

1. Who understands basic functional programming with scala or has an understanding of Java.
2. Who understands concurrent programming or multithreading in Java / Scala.
3. Who has interest in distributed data processing and has a keen interest in data scaling optimization.
4. Who has earlier worked in Big Data, Fast Data or has a keen interest.

Speaker

CHETAN KHATRI (Accionlabs Inc.)

TransmogrifAI - Automate Machine Learning Workflow with the power of Scala and Spark at massive scale. - Scala.IO 2018 Lyon, France.
Scaling 30 TB's of Data lake with Apache HBase and Scala DSL at Production. - HBaseConAsia 2018, Beijing - China.
Scaling TB's of data with Apache Spark and Scala DSL at Production - HKOSCon 2018

Contributes

Elixir
scalaz
apache-spark

Candidate sessions