I decided to use spark very recently, when I realized that the tools I was using were not efficient in the analysis of a very big volume of data.
In this section I would like to gather small pills about spark, installation, pyspark or scala code, and solution I found during my tests which I hope can be useful.
Alternating Least Squares (ALS) for Santander Kaggle competition The Kaggle Santander competition just concluded. I decided for this competition to ...Read More
Install JVM xgboost package to interface to Apache Spark For a complete guide and documentation, please refer to the official ...Read More
Apache Spark for Kaggle competitions I competed in Kaggle Bosch competition to predict the failures during the production lines. As ...Read More
Build and install Spark on a Linux platform Here a short guide to build, install and configure Apache Spark on ...Read More
Setup of a scala project using IntelliJ IDEA I suppose you have already downloaded and installed the community edition of ...Read More
A growing post which gathers short pieces of code Let’s suppose spark to be an opened spark session. # Open ...Read More
Spark and XGBoost using Scala language Recently XGBoost project released a package on github where it is included interface to ...Read More