Data never lie

Spark pills

Why Spark?

I decided to use spark very recently, when I realized that the tools I was using  were not efficient in  the analysis of a very big volume of data.

In this section I would like to gather small pills  about spark, installation, pyspark or scala code, and solution I found during my tests which I hope can be useful.

Posts Collection

Alternating Least Squares (ALS) Spark ML

Alternating Least Squares (ALS) for Santander Kaggle competition The Kaggle Santander competition just concluded. I decided for this competition to ...
Install JVM xgboost package

Install JVM xgboost package to interface to Apache Spark For a complete guide and documentation, please refer to the official ...
Kaggle Bosch competition using Apache Spark

Apache Spark for Kaggle competitions I competed in Kaggle Bosch competition to predict the failures during the production lines.  As ...
Build and install Spark on Linux platform.

Build and install Spark on a Linux platform Here a short guide to build, install and configure Apache Spark on ...
Scala project under IntelliJ IDEA

Setup of a scala project using IntelliJ IDEA I suppose you have already downloaded and installed the community edition of ...
Spark code snippets

A growing post which gathers short pieces of code Let's suppose spark to be an opened spark session. # Open ...
Spark and XGBoost using Scala

Spark and XGBoost using Scala language Recently XGBoost project released a package on github where it is included interface to ...
