Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Sunday, January 24, 2010
Kernel PCA
PCA is a dimension reduction technique for linearly separeted data. In order to deal with data that cannot be linearly separated, one can adopt the kernel trick. This generates the so called Kernel-PCA.
Thanks a lot for the code! Indeed it's a bit buggy, but it's still a very nice starting point. Below is a little fixup that should make it work with any dimensionality.
A quick question: You say "in this toy example we materialize the kernel, which is not necessary". What do you mean exactly? how could we not do that?
ciao Basilio
void PCA::kernel_pca(MatrixXd & dataPoints, unsigned int dimSpace) { int m = dataPoints.rows(); int n = dataPoints.cols();
// compute the eigenvalue on the K_Centralized Matrix EigenSolver m_solve(K_Centralized); std::cout << "got the eigenvalues, eigenvectors" << std::endl; VectorXd eigenvalues = m_solve.eigenvalues().real(); MatrixXd eigenVectors = m_solve.eigenvectors().real();
// sort and get the permutation indices PermutationIndices pi; for (int i = 0 ; i < n; i++) pi.push_back(std::make_pair(-eigenvalues(i), i));
std::sort(pi.begin(), pi.end());
// get top eigenvectors _result = MatrixXd::Zero(n, dimSpace); for (unsigned int i = 0; i < dimSpace; i++) { _result.col(i) = eigenVectors.col(pi[i].second); // permutation indices }
MatrixXd sqrtE = MatrixXd::Zero(dimSpace, dimSpace); for (unsigned int i = 0; i < dimSpace; i++) { sqrtE(i, i) = sqrt(-pi[i].first); }
// get the final data projection _result = (sqrtE * _result.transpose()).transpose(); delete k; }
Works only for a Square matrix as the dataset :(
ReplyDeleteThanks a lot for the code! Indeed it's a bit buggy, but it's still a very nice starting point. Below is a little fixup that should make it work with any dimensionality.
ReplyDeleteA quick question: You say "in this toy example we materialize the kernel, which is not necessary". What do you mean exactly? how could we not do that?
ciao
Basilio
void PCA::kernel_pca(MatrixXd & dataPoints, unsigned int dimSpace)
{
int m = dataPoints.rows();
int n = dataPoints.cols();
Kernel k = new Kernel(dataPoints);
// ''centralize''
MatrixXd K_Centralized = k->get()
- MatrixXd::Ones(n, n)*k->get()
- k->get()*MatrixXd::Ones(n, n)
+ MatrixXd::Ones(n, n)*k->get()*MatrixXd::Ones(n, n);
std::cout << "Centralized" << std::endl;
// compute the eigenvalue on the K_Centralized Matrix
EigenSolver m_solve(K_Centralized);
std::cout << "got the eigenvalues, eigenvectors" << std::endl;
VectorXd eigenvalues = m_solve.eigenvalues().real();
MatrixXd eigenVectors = m_solve.eigenvectors().real();
// sort and get the permutation indices
PermutationIndices pi;
for (int i = 0 ; i < n; i++)
pi.push_back(std::make_pair(-eigenvalues(i), i));
std::sort(pi.begin(), pi.end());
// get top eigenvectors
_result = MatrixXd::Zero(n, dimSpace);
for (unsigned int i = 0; i < dimSpace; i++)
{
_result.col(i) = eigenVectors.col(pi[i].second); // permutation indices
}
MatrixXd sqrtE = MatrixXd::Zero(dimSpace, dimSpace);
for (unsigned int i = 0; i < dimSpace; i++)
{
sqrtE(i, i) = sqrt(-pi[i].first);
}
// get the final data projection
_result = (sqrtE * _result.transpose()).transpose();
delete k;
}