Antonio Gulli's coding playground: Hands on big data - Crash Course on Spark Cache & Master

Wednesday, September 17, 2014

Hands on big data - Crash Course on Spark Cache & Master - lesson 6

Two very important aspects for Spark are the use of caching in memory for avoiding re-computations and the possibility to connect a master from the spark-shell

Caching
Caching is pretty simple. Just use .persist() at the end of the desired computation. There are also option to persist an computation on disk or to replicate among multiple nodes. There is also the option to persist objects in in-Memory shared file-systems such as Tachyon.

Connect a master
There are two ways for connecting a master

val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)

This will connect the local master with 8 cores

$ ./bin/spark-shell --master local[8]

Antonio Gulli's coding playground

Wednesday, September 17, 2014

Hands on big data - Crash Course on Spark Cache & Master - lesson 6

No comments:

Post a Comment