CodeData Science

Build and install Spark on Linux platform.

Build and install Spark on a Linux platform

Here a short guide to build, install and configure Apache Spark on a Linux platform.

You can decide which Spark release install in your environment. I started using Spark starting from version 2.0 and I did not use the pre compiled releases, but I compiled and configured it.

Besides that,  I preferred to use the version from github master development branch, but you can use any given branch from github.

So, fix the main path where you want to install your SPARK_HOME and clone there your preferred  release

All the information to build spark from scratch are available at this link

Once you cloned your Spark release, go in your Spark directory and use the following line to compile and install it

First of all you can configure your environment variables in your .bashrc  which can point to your java, scala and python api of Spark.

Specific configuration for Spark environment

All the Spark configuration can be set up under the spark/conf directory of your installation.

You can adapt these 2 files:

starting from the .template you will find in that directory.

For example you can put in spark-default.conf some of the configuration you will use in you Spark Session.

In spark-env.sh, instead you can setup all the variables you need for specific installation and parameters.

If needed, do not forget to do the source of this file.

If you want to try my xgboost and spark code, refer to xgboost installation .

Happy sparking!

Leave a Reply

Be the First to Comment!

avatar
  Subscribe  
Notify of