日本語

Scaling TB's of data with Apache Spark and Scala DSL at Production

Apache HBase is columnar NoSQL, widely adopted in industry and research as a Data lake in scaling data processing platforms. As HBase components are written in Scala and Apache Spark also written in Scala as a base, It’s real pleasure to understand the beauty of functional Scala DSL with Spark and HBase. We will also talk about case study: Scaling 30 TB’s and 4.6 billion events per day with Apache HBase as a base data lake solution in integration with Apache Kafka, Apache Spark / Spark Streaming.

Session length
40 minutes
Language of the presentation
English
Target audience
Intermediate: Requires a basic knowledge of the area
Who is your session intended to
1. Who understands basic functional programming with scala or has an understanding of Java.
2. Who understands concurrent programming or multithreading in Java / Scala.
3. Who has interest in distributed data processing and has a keen interest in data scaling optimization.
4. Who has earlier worked in Big Data, Fast Data or has a keen interest.
Speaker
CHETAN KHATRI (Accionlabs Inc.)
  • TransmogrifAI - Automate Machine Learning Workflow with the power of Scala and Spark at massive scale. - Scala.IO 2018 Lyon, France.
  • Scaling 30 TB's of Data lake with Apache HBase and Scala DSL at Production. - HBaseConAsia 2018, Beijing - China.
  • Scaling TB's of data with Apache Spark and Scala DSL at Production - HKOSCon 2018
Contributes
  • Elixir
  • scalaz
  • apache-spark

voted / votable

Candidate sessions