The k_means() implementation here does what the literature describes as "batch update", ie centroids are updated only after a full scan of all the points is done, as against an "online" approach (possibly less efficient) where both, the 'from' cluster and 'to' cluster's centroids are updated as soon as the 'movement' is established. The batch mode can be thought of as providing a fast but potentially only approximate solution as a starting point for the second phase. % phase

Il 23/07/2012 11:53, Neil D'Souza ha scritto: > Dear Antonio, > I downloaded your code for k-means (in your blogspot dated:Wednesday, March 4, 2009) > and plan to use in an Open source (GPL) product I am developing. > I just wanted to confirm the license you are releasing the code under, hopefully, GPL or freeBSD or public domain (since your file does not have a copyright notice). > Many thanks for your help in advance. > Kind Regards, > Neil Please feel free to use it but quote the origin. Cheers Antonio

At first, sorry for my english. I'm using your code (and I will refere you in my document) but I have some problems using the function. I hace one vector with 3 componentes (r, g, b) and the size of the vector is variable. I wrote this in my code: ClusterId num_clusters = 3; PointId num_points = puntos; Dimensions num_dimensions = 3;

PointsSpace ps(num_points, num_dimensions);

Clusters clusters(num_clusters, ps);

clusters.k_means();

I suposse that num_clusters is k, num_points the total size (is a variable call puntos for me) and num_dimensions is the dimensions of the gaussian (3, because I have r, g and b). But I don't know how to introduce my vector for do the cluster, and how to save the 3 parameters of a gaussian.

It is just a warning for users. The implemented initialization of centroids (see in function Clusters::initial_partition_points()) may cause problems. E.g., for 2D data {{0, 0},{0.1, -0.1},{-0.1, 0},{-0.1, -0.1},{1, 1},{0.9, 1.1},{0.9, 1},{0, 1},{0, 0.9},{1, 0},{1, 0.1},{1.1, 0},{1.1, 0.1}} there are 4 natural clusters. The initial centroids are found to be {0.525,0.5}, {0.666667,0.333333}, {0.6,0.366667}, and {0.333333,0.3}. After one iteration of the algorithm they become {0.665,1.1}, {1.21667,0.133333}, {inf,inf}, and {0.0583333,0.025}. Thus, one cluster is lost ({inf,inf}). Please, look for a clever centroid initialization, for instance, in wikipedia.

They, too believed that we were largely biased; in this case it was a perceived bias against advocacy and confrontation as well as against legitimate representative practices.SEO Course KArachi

Thanks

ReplyDeleteThe k_means() implementation here does what the literature describes as "batch update", ie centroids are updated only after a full scan of all the points is done, as against an "online" approach (possibly less efficient) where both, the 'from' cluster and 'to' cluster's centroids are updated as soon as the 'movement' is established. The batch mode can be thought of as providing a fast but potentially only approximate solution as a starting point for the second phase.

ReplyDelete% phase

Thanks for this Antonio - any details on how to run it though? I just want to cluster a 1D vector of integers.Thanks,

ReplyDeleteTim

Il 23/07/2012 11:53, Neil D'Souza ha scritto:

ReplyDelete> Dear Antonio,

> I downloaded your code for k-means (in your blogspot dated:Wednesday, March 4, 2009)

> and plan to use in an Open source (GPL) product I am developing.

> I just wanted to confirm the license you are releasing the code under, hopefully, GPL or freeBSD or public domain (since your file does not have a copyright notice).

> Many thanks for your help in advance.

> Kind Regards,

> Neil

Please feel free to use it but quote the origin.

Cheers

Antonio

Hello.

ReplyDeleteAt first, sorry for my english. I'm using your code (and I will refere you in my document) but I have some problems using the function. I hace one vector with 3 componentes (r, g, b) and the size of the vector is variable. I wrote this in my code:

ClusterId num_clusters = 3;

PointId num_points = puntos;

Dimensions num_dimensions = 3;

PointsSpace ps(num_points, num_dimensions);

Clusters clusters(num_clusters, ps);

clusters.k_means();

I suposse that num_clusters is k, num_points the total size (is a variable call puntos for me) and num_dimensions is the dimensions of the gaussian (3, because I have r, g and b). But I don't know how to introduce my vector for do the cluster, and how to save the 3 parameters of a gaussian.

Thanks :)

Well! Nice post and i appreciate the the question which clear the process. Thanks

ReplyDeletehard drive recovery

It is just a warning for users. The implemented initialization of centroids (see in function Clusters::initial_partition_points()) may cause problems.

ReplyDeleteE.g., for 2D data

{{0, 0},{0.1, -0.1},{-0.1, 0},{-0.1, -0.1},{1, 1},{0.9, 1.1},{0.9, 1},{0, 1},{0, 0.9},{1, 0},{1, 0.1},{1.1, 0},{1.1, 0.1}}

there are 4 natural clusters. The initial centroids are found to be

{0.525,0.5}, {0.666667,0.333333}, {0.6,0.366667}, and {0.333333,0.3}.

After one iteration of the algorithm they become

{0.665,1.1}, {1.21667,0.133333}, {inf,inf}, and {0.0583333,0.025}.

Thus, one cluster is lost ({inf,inf}). Please, look for a clever centroid initialization, for instance, in wikipedia.

can u tell me how can i cluster an image using this code? i want to provide the values of r, g, b of a point for clustering

ReplyDeleteThey, too believed that we were largely biased; in this case it was a perceived bias against advocacy and confrontation as well as against legitimate representative practices.SEO Course KArachi

ReplyDelete