Saturday, November 7, 2015

TOC for my new book: A collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark

Table of Contents
1. Why is Cross Validation important? 11
Solution 11
Code 11
2. Why is Grid Search important? 12
Solution 12
Code 12
3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 13
Solution 13
Code 14
4. How to deal with categorical features? And what is one-hot-encoding? 16
Solution 16
Code 17
5. What are generalized linear models and what is an R Formula? 18
Solution 18
Code 18
6. What are the Decision Trees? 19
Solution 19
Code 21
7. What are the Ensembles? 22
Solution 22
8. What is a Gradient Boosted Tree? 22
Solution 22
9. What is a Gradient Boosted Trees Regressor? 23
Solution 23
Code 23
10. Gradient Boosted Trees Classification 24
Solution 24
Code 25
11. What is a Random Forest? 26
Solution 26
Code 26
12. What is an AdaBoost classification algorithm? 27
Solution 27
13. What is a recommender system? 28
Solution 28
14. What is a collaborative filtering ALS algorithm? 29
Solution 29
Code 30
15. What is the DBSCAN clustering algorithm? 32
Solution 32
Code 32
16. What is a Streaming K-Means? 33
Solution 33
Code 34
17. What is Canopi Clusterting? 34
Solution 34
18. What is Bisecting K-Means? 35
Solution 35
19. What is the PCA Dimensional reduction technique? 36
Solution 36
Code 37
20. What is the SVD Dimensional reduction technique? 38
Solution 38
Code 38
21. What is Latent Semantic Analysis (LSA)? 39
Solution 39
22. What is Parquet? 39
Solution 39
Code 39
23. What is the Isotonic Regression? 40
Solution 40
Code 40
24. What is LARS? 41
Solution 41
25. What is GMLNET? 42
Solution 42
26. What is SVM with soft margins? 43
Solution 43
27. What is the Expectation Maximization Clustering algorithm? 44
Solution 44
28. What is a Gaussian Mixture? 45
Solution 45
Code 45
29. What is the Latent Dirichlet Allocation topic model? 46
Solution 46
Code 47
30. What is the Associative Rule Learning? 48
Solution 48
31. What is FP-growth? 50
Solution 50
Code 50
32. How to use the GraphX Library? 50
Solution 50
33. What is PageRank? And how to compute it with GraphX 51
Solution 51
Code 52
Code 52
34. What is Power Iteration Clustering? 54
Solution 54
Code 54
35. What is a Perceptron? 55
Solution 55
36. What is an ANN (Artificial Neural Network)? 56
Solution 56
37. What are the activation functions? 57
Solution 57
38. How many types of Neural Networks are known? 58
39. How can you train a Neural Network 59
Solution 59
40. What application have the ANNs? 59
Solution 59
41. Can you code a simple ANNs in python? 60
Solution 60
Code 60
42. What support has Spark for Neural Networks? 61
Solution 61
Code 62
43. What is Deep Learning? 63
Solution 63
44. What are autoencoders and stacked autoencoders? 68
Solution 68
45. What are convolutional neural networks? 69
Solution 69
46. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 70
Solution 70
47. What is pre-training? 71
Solution 71
48. An example of Deep Learning with nolearn and Lasagne package 72
Solution 72
Code 73
Outcome 73
Code 74
49. Can you compute an embedding with Word2Vec? 75
Solution 75
Code 76
Code 77
50. What are Radial Basis Networks? 77
Solution 77
Code 78
51. What are Splines? 78
Solution 78
Code 78
52. What are Self-Organized-Maps (SOMs)? 78
Solution 78
Code 79
53. What is Conjugate Gradient? 79
Solution 79
54. What is exploitation-exploration? And what is the armed bandit method? 80
Solution 80
55. What is Simulated Annealing? 81
Solution 81
Code 81
56. What is a Monte Carlo experiment? 81
Solution 81
Code 82
57. What is a Markov Chain? 83
Solution 83
58. What is Gibbs sampling? 83
Solution 83
Code 84
59. What is Locality Sensitive Hashing (LSH)? 84
Solution 84
Code 85
60. What is minHash? 85
Solution 85
Code 86
61. What are Bloom Filters? 86
Solution 86
Code 87
62. What is Count Min Sketches? 87
Solution 87
Code 87
63. How to build a news clustering system 88
Solution 88
64. What is A/B testing? 89
Solution 89
65. What is Natural Language Processing? 90
Solution 90
Code 90
Outcome 92
66. Where to go from here 92
Appendix A 95
67. Ultra-Quick introduction to Python 95
68. Ultra-Quick introduction to Probabilities 96
69. Ultra-Quick introduction to Matrices and Vectors 97
70. Ultra-Quick summary of metrics 98
Classification Metrics 98
Clustering Metrics 99
Scoring Metrics 99
Rank Correlation Metrics 99
Probability Metrics 100
Ranking Models 100
71. Comparison of different machine learning techniques 101
Linear regression 101
Logistic regression 101
Support Vector Machines 101
Clustering 102
Decision Trees, Random Forests, and GBTs 102
Associative Rules 102
Neural Networks and Deep Learning 103


No comments:

Post a Comment