TOC

1. Why is Cross Validation important? 12

Solution 12

Code 12

2. Why is Grid Search important? 13

Solution 13

Code 13

3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 14

Solution 14

Code 15

4. How to deal with categorical features? And what is one-hot-encoding? 17

Solution 17

Code 18

5. What are generalized linear models and what is an R Formula? 18

Solution 18

Code 19

6. What is the Word2Vec distributed representation? 19

Solution 19

Code 20

7. What are the Decision Trees? 20

Solution 20

Code 22

8. What are the Ensembles? 23

Solution 23

9. What is a Gradient Boosted Tree? 23

Solution 23

10. What is a Gradient Boosted Trees Regressor? 24

Solution 24

Code 24

11. Gradient Boosted Trees Classification 25

Solution 25

Code 25

12. What is a Random Forest? 27

Solution 27

Code 27

13. What is an AdaBoost classification algorithm? 28

Solution 28

14. What is a recommender system? 29

Solution 29

15. What is a collaborative filtering ALS algorithm? 29

Solution 29

Code 30

16. What is the DBSCAN clustering algorithm? 31

Solution 31

Code 31

17. What is a Streaming K-Means? 32

Solution 32

Code 33

18. What is the PCA Dimensional reduction technique? 33

Solution 33

Code 35

19. What is the SVD Dimensional reduction technique? 35

Solution 35

Code 36

20. What is Parquet? 36

Solution 36

Code 36

21. What is the Isotonic Regression? 37

Solution 37

Code 37

22. What is SVM with soft margins? 38

Solution 38

23. What is the Expectation Maximization Clustering algorithm? 39

Solution 39

24. What is a Gaussian Mixture? 40

Solution 40

Code 41

25. What is the Latent Dirichlet Allocation topic model? 41

Solution 41

Code 42

26. What is the Associative Rule Learning? 43

Solution 43

27. What is FP-growth? 44

Solution 44

Code 44

28. How to use the GraphX Library? 45

Solution 45

29. What is PageRank? And how to compute it with GraphX 46

Solution 46

Code 47

Code 47

30. What is Power Iteration Clustering? 48

Solution 48

Code 49

31. What is a Perceptron? 49

Solution 49

32. What is an ANN (Artificial Neural Network)? 50

Solution 50

33. What are the activation functions? 51

Solution 51

34. How many types of Neural Networks are known? 52

35. How can you train a Neural Network 53

Solution 53

36. What application have the ANNs? 54

Solution 54

37. Can you code a simple ANNs in python? 55

Solution 55

Code 55

38. What support has Spark for Neural Networks? 57

Solution 57

Code 57

39. What is Deep Learning? 58

Solution 58

40. What are autoencoders and stacked autoencoders? 62

Solution 62

41. What are convolutional neural networks? 63

Solution 63

42. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 64

Solution 64

43. Neural Network – Deep Learning - Theano 66

Solution 66

Code 66

Complexity 66

44. Neural Network – Deep Learning - Theano 66

Solution 66

Code 67

Complexity 67

45. Neural Network – Deep Learning - Lasagne 67

Solution 67

Code 67

Complexity 67

46. Splines 67

Solution 67

Code 67

Complexity 67

47. Search – Hill Climbing, Simulated Annealing, Greedy 67

Solution 67

Code 67

Complexity 67

48. MonteCarlo 67

Solution 67

Code 68

Complexity 68

49. Sampling (Gibbs) 68

Solution 68

Code 68

Complexity 68

50. Hypothesis Testing 68

Solution 68

Code 68

Complexity 68

51. Text Mining 68

Solution 68

Code 68

Complexity 68

52. NLP tagging 68

Solution 68

Code 69

Complexity 69

53. Bloom Filters 69

Solution 69

Code 69

Complexity 69

54. minHash 69

Solution 69

Code 69

Complexity 69

55. LSH 69

Solution 69

Code 69

Complexity 69

56. Count Min Sketches 69

Solution 69

Code 69

## No comments:

## Post a Comment