Spark Study Note II: Data Exploration
In my last post I briefly introduce how I set up Apache Spark on my own laptop. In this post, I will start using Spark and Python to do some data exploration. Resilient Distributed Datasets Let’s first get familiar with how Spark store data. The Resilient Distributed Datasets...
[Read More]