Why Spark?
I decided to use spark very recently, when I realized that the tools I was using were not efficient in the analysis of a very big volume of data.
In this section I would like to gather small pills about spark, installation, pyspark or scala code, and solution I found during my tests which I hope can be useful.
Posts Collection

Alternating Least Squares (ALS) Spark ML
Alternating Least Squares (ALS) for Santander Kaggle competition The Kaggle Santander competition just concluded. I decided for this competition to ...
Read More
Read More

Install JVM xgboost package
Install JVM xgboost package to interface to Apache Spark For a complete guide and documentation, please refer to the official ...
Read More
Read More

Kaggle Bosch competition using Apache Spark
Apache Spark for Kaggle competitions I competed in Kaggle Bosch competition to predict the failures during the production lines. As ...
Read More
Read More

Build and install Spark on Linux platform.
Build and install Spark on a Linux platform Here a short guide to build, install and configure Apache Spark on ...
Read More
Read More

Scala project under IntelliJ IDEA
Setup of a scala project using IntelliJ IDEA I suppose you have already downloaded and installed the community edition of ...
Read More
Read More

Spark code snippets
A growing post which gathers short pieces of code Let's suppose spark to be an opened spark session. # Open ...
Read More
Read More

Spark and XGBoost using Scala
Spark and XGBoost using Scala language Recently XGBoost project released a package on github where it is included interface to ...
Read More
Read More