Sunday, January 24, 2010

Kernel PCA

PCA is a dimension reduction technique for linearly separeted data. In order to deal with data that cannot be linearly separated, one can adopt the kernel trick. This generates the so called Kernel-PCA.

Here you have the code in C++ and Eigen.


  1. Works only for a Square matrix as the dataset :(

  2. Thanks a lot for the code! Indeed it's a bit buggy, but it's still a very nice starting point. Below is a little fixup that should make it work with any dimensionality.

    A quick question: You say "in this toy example we materialize the kernel, which is not necessary". What do you mean exactly? how could we not do that?


    void PCA::kernel_pca(MatrixXd & dataPoints, unsigned int dimSpace)
    int m = dataPoints.rows();
    int n = dataPoints.cols();

    Kernel k = new Kernel(dataPoints);

    // ''centralize''
    MatrixXd K_Centralized = k->get()
    - MatrixXd::Ones(n, n)*k->get()
    - k->get()*MatrixXd::Ones(n, n)
    + MatrixXd::Ones(n, n)*k->get()*MatrixXd::Ones(n, n);
    std::cout << "Centralized" << std::endl;

    // compute the eigenvalue on the K_Centralized Matrix
    EigenSolver m_solve(K_Centralized);
    std::cout << "got the eigenvalues, eigenvectors" << std::endl;
    VectorXd eigenvalues = m_solve.eigenvalues().real();
    MatrixXd eigenVectors = m_solve.eigenvectors().real();

    // sort and get the permutation indices
    PermutationIndices pi;
    for (int i = 0 ; i < n; i++)
    pi.push_back(std::make_pair(-eigenvalues(i), i));

    std::sort(pi.begin(), pi.end());

    // get top eigenvectors
    _result = MatrixXd::Zero(n, dimSpace);
    for (unsigned int i = 0; i < dimSpace; i++)
    _result.col(i) = eigenVectors.col(pi[i].second); // permutation indices

    MatrixXd sqrtE = MatrixXd::Zero(dimSpace, dimSpace);
    for (unsigned int i = 0; i < dimSpace; i++)
    sqrtE(i, i) = sqrt(-pi[i].first);

    // get the final data projection
    _result = (sqrtE * _result.transpose()).transpose();
    delete k;