Thursday, September 11, 2014

Hands-on big data - fire a spark cluster on Amazon AWS in 7 minutes - lesson 1

After login in, launch the management console

And select EC2

Then, launch an instance

For beginning, a good option is to get a free machine

Select the first one

Select all the default options and launch the machine. Remember to create the key par

Download the keys

Get putty and puttygen . Launch puttygen load the spark-test.pem and save the private key (it’s ok to have it with no password now)

Check the launched instance

Get the public ip

Fire putty, the login is typically ec2-user@IP

And add the ppk file like this

Login into the new machine


 Enter the directory

Get private and public keys

And download them

Modifiy your ~/.bashrc

You need to copy the spark-test.pem into the ~/.ssh directory  - winscp is your friend


./spark-ec2 -k spark-test -i ~/.ssh/spark-test.pem --hadoop-major-version=2 --spark-version=1.1.0 launch -s 1 ag-spark

If it does not work, Make sure that the region ( -r ) is the same of the key you have on your running machine

GO AND TAKE A COFFEE, the cluster is booting – it needs time.

Fire a browser and check the status

Login into the cluster

