Scala Tutorial - Part I

July 7, 2018


This is 1st part of the Scala and Data Science Series. The article in this series are meant for aspiring data scientist (like Me), who wish to learn full stack Big Data & Machine Learning pipeline. Generally, I work with python for data science, but to build a product pipeline with big data tools like kafka, zookeeper and spark, I felt learning scala will be a better option. Hence, I started my course with cognitive classes. This series will cover the code and resources from that course only, few other(which I will refer to learn particular topic).

Basics of Scala

Scala is

  • Statically typed language.Although, it can infer the data type of variable from the value of the variable, but You should declare the type of variable.
  • Modularity, you don’t have one global namespace for all the classes that are involved.
  • When compiled scala, it convert into the bytecode and be executed by JVM on various platforms.
  • proven correctness prior to deployment. optimal for large job.
  • Parallel processing ability
  • Light weight
  • Low boilerplate
  • Stable, scalable and innovative.
  • Mutable variable - var
  • Immutable variable - val

Why Scala for data science

  • Centrality and dispersion measures
  • ROC
  • feature engineering
  • Support Vector machine
  • Big data support and tools
  • Most big data tools are written in scala and have awesome support for scala APIs
  • Parallel processing

Website -

Next blog will be about Creating a scala project.

To learn more about Scala, Stay Tuned.

Hope this helps! Keep tuned for more blogs from ML series.

Happy Learning!

Rajiv Jha :)

Rajiv Jha

Rajiv Jha

My name is Rajiv Jha. I am Senior Engineering student at Guru Gobind Singh Indraprastha University.