Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search

Thursday, July 28, 2011

Learn how to select a display ads

Quite interesting talk by the way of Andrei. Particularly impressed by how good the linear regression works for this problem. (well, linear regression is always working well and it's so simple)

Big data here though? Simple algorithms often work well when given enough data, as I am sure you know, Antonio.

If you haven't seen the classic Banko and Brill paper showing this, by the way, it's a great one:

http://dl.acm.org/citation.cfm?id=1073017

Figure 1 is the key. The best algorithm becomes the worst with more data, all other algorithms, even more complicated ones, essentially work equally well given a lot of data.

Oversimplifying a bit, the lesson is that algorithms that can use all the data when given a lot of data will all tend to work almost equally well, so you should focus on competing the simplest and most efficient ones against others to see if more complexity is worth the computational cost.

Big data here though? Simple algorithms often work well when given enough data, as I am sure you know, Antonio.

ReplyDeleteIf you haven't seen the classic Banko and Brill paper showing this, by the way, it's a great one:

http://dl.acm.org/citation.cfm?id=1073017

Figure 1 is the key. The best algorithm becomes the worst with more data, all other algorithms, even more complicated ones, essentially work equally well given a lot of data.

Oversimplifying a bit, the lesson is that algorithms that can use all the data when given a lot of data will all tend to work almost equally well, so you should focus on competing the simplest and most efficient ones against others to see if more complexity is worth the computational cost.

It's always like this ,-)

ReplyDelete