Antonio Gulli's coding playground: Hands on big data - Crash Course on Spark - Word count

Tuesday, September 16, 2014

Hands on big data - Crash Course on Spark - Word count - lesson 5

Let's do simple exercise of word count. Spark will automatically parallelize the code

1. Load the bible
2. create a flatMap where every line is put in correspondence with words
3. Map the words to tuples (words, counter where counter is simple starting with 1
4. Reduce by keys, where the reduce operation is just +

Pretty simple, isn't it?

Antonio Gulli's coding playground

Tuesday, September 16, 2014

Hands on big data - Crash Course on Spark - Word count - lesson 5

No comments:

Post a Comment