Zhou (Joe) Li


Data Scientist at Apple

Spark Study Note II: Data Exploration

In my last post I briefly introduce how I set up Apache Spark on my own laptop. In this post, I will start using Spark and Python to do some data exploration. Resilient Distributed Datasets Let’s first get familiar with how Spark store data. The Resilient Distributed Datasets... [Read More]

Spark Study Note I: Setting Up

It is always fun to learn new things. In this and the following posts, I will write down the study notes on how I play around with Apache Spark on my own laptop. Many of the resources come from the book Machine Learning with Spark. Apache Spark is... [Read More]

Rcpp example for parallelly calculating a kernel matrix

“Sometimes R code just isn’t fast enough.” With the help of the profiling tools such as the profvis package, it is possible to figure out the bottlenecks of your code. However some of them (unavoidable loops, recursive functions, etc.) cannot be speed up in R no matter what you... [Read More]

Hello World!

千里之行,始於足下 “A journey of a thousand miles begins with a single step.” This is the first post of my personal website. Cheers! [Read More]