Let's do simple exercise of word count. Spark will automatically parallelize the code
1. Load the bible
2. create a flatMap where every line is put in correspondence with words
3. Map the words to tuples (words, counter where counter is simple starting with 1
4. Reduce by keys, where the reduce operation is just +
Pretty simple, isn't it?
No comments:
Post a Comment