Wednesday, March 4, 2009

K-means in C++

K-means is a classical clustering algorithm..
Here you have a C++ code for K-means clustering.

(Edit: 12/05/013)
See also my more recent posting A new lighter implementation of K-means


  1. The k_means() implementation here does what the literature describes as "batch update", ie centroids are updated only after a full scan of all the points is done, as against an "online" approach (possibly less efficient) where both, the 'from' cluster and 'to' cluster's centroids are updated as soon as the 'movement' is established. The batch mode can be thought of as providing a fast but potentially only approximate solution as a starting point for the second phase.
    % phase

  2. Thanks for this Antonio - any details on how to run it though? I just want to cluster a 1D vector of integers.Thanks,


  3. Il 23/07/2012 11:53, Neil D'Souza ha scritto:
    > Dear Antonio,
    > I downloaded your code for k-means (in your blogspot dated:Wednesday, March 4, 2009)
    > and plan to use in an Open source (GPL) product I am developing.
    > I just wanted to confirm the license you are releasing the code under, hopefully, GPL or freeBSD or public domain (since your file does not have a copyright notice).
    > Many thanks for your help in advance.
    > Kind Regards,
    > Neil
    Please feel free to use it but quote the origin.

  4. Hello.

    At first, sorry for my english. I'm using your code (and I will refere you in my document) but I have some problems using the function. I hace one vector with 3 componentes (r, g, b) and the size of the vector is variable. I wrote this in my code:
    ClusterId num_clusters = 3;
    PointId num_points = puntos;
    Dimensions num_dimensions = 3;

    PointsSpace ps(num_points, num_dimensions);

    Clusters clusters(num_clusters, ps);


    I suposse that num_clusters is k, num_points the total size (is a variable call puntos for me) and num_dimensions is the dimensions of the gaussian (3, because I have r, g and b). But I don't know how to introduce my vector for do the cluster, and how to save the 3 parameters of a gaussian.

    Thanks :)

  5. Well! Nice post and i appreciate the the question which clear the process. Thanks
    hard drive recovery

  6. It is just a warning for users. The implemented initialization of centroids (see in function Clusters::initial_partition_points()) may cause problems.
    E.g., for 2D data
    {{0, 0},{0.1, -0.1},{-0.1, 0},{-0.1, -0.1},{1, 1},{0.9, 1.1},{0.9, 1},{0, 1},{0, 0.9},{1, 0},{1, 0.1},{1.1, 0},{1.1, 0.1}}
    there are 4 natural clusters. The initial centroids are found to be
    {0.525,0.5}, {0.666667,0.333333}, {0.6,0.366667}, and {0.333333,0.3}.
    After one iteration of the algorithm they become
    {0.665,1.1}, {1.21667,0.133333}, {inf,inf}, and {0.0583333,0.025}.
    Thus, one cluster is lost ({inf,inf}). Please, look for a clever centroid initialization, for instance, in wikipedia.

  7. can u tell me how can i cluster an image using this code? i want to provide the values of r, g, b of a point for clustering

  8. They, too believed that we were largely biased; in this case it was a perceived bias against advocacy and confrontation as well as against legitimate representative practices.SEO Course KArachi