Here you find some hints to run Mahout on the top of Amazon EC2. Here a collection of algorithms implemented . They include:
Classification
Logistic Regression (SGD implementation in progress: MAHOUT-228)
Support Vector Machines (SVM) (open: MAHOUT-14, MAHOUT-232 and MAHOUT-334)
Perceptron and Winnow (open: MAHOUT-85)
Neural Network (open, but MAHOUT-228 might help)
Random Forests (integrated - MAHOUT-122, MAHOUT-140, MAHOUT-145)
Restricted Boltzmann Machines (open, MAHOUT-375, GSOC2010)
Clustering
Canopy Clustering (MAHOUT-3 - integrated)K-Means Clustering (MAHOUT-5 - integrated)
Fuzzy K-Means (MAHOUT-74 - integrated)
Expectation Maximization (EM) (MAHOUT-28)
Mean Shift Clustering (MAHOUT-15 - integrated)
Hierarchical Clustering (MAHOUT-19)
Dirichlet Process Clustering (MAHOUT-30 - integrated)
Latent Dirichlet Allocation (MAHOUT-123 - integrated)
Spectral Clustering (open, MAHOUT-363, GSoC 2010)
Pattern Mining
Parallel FP Growth Algorithm (Also known as Frequent Itemset mining)
Regression
Locally Weighted Linear Regression (open)
Dimension reduction
Singular Value Decomposition and other Dimension Reduction Techniques (available since 0.3)
Principal Components Analysis (PCA) (open)
Independent Component Analysis (open)
Gaussian Discriminative Analysis (GDA) (open)
Evolutionary Algorithms
see also: MAHOUT-56 (integrated)
That is pretty cool. Too bad Amazon doesn't support Mahout as part of Elastic MapReduce:
ReplyDeletehttp://aws.amazon.com/elasticmapreduce/
On a related note, have you looked at all at Google's Prediction API or Bigquery? Looks like they are invite only at the moment, but could be quite interesting as well.