Wednesday, January 14, 2009

a rant to the academic community

I like to serve conferences as PC member or to review journal papers. Anyway, I keep seeing papers with elegant theoretical solutions and very poor experimental settings. For instance, you can see a new elegant text clustering algorithm tested in an unrealistic environment such as "Let's suppose we take this collection of 4000 articles and cluster them in just 5 clusters, well then my algorithm is better than the state-of-art of about 3% in precision".

Please, do not accept this type of papers. Web reality is different you have hundred thousands or millions or more documents and, certainly, you don't have just 5 clusters. Plus, data is evolving and you may want to consider temporal constrains, as well.

