Antonio Gulli's coding playground: April 2009

Thursday, April 30, 2009

A ten digit number

Write a ten digits number, such that position i indicates the number of i units in the number (e.g. position 3 indicates the number of 3 in the number, and so on).

Wednesday, April 29, 2009

Two missiles are directed one against the other. The first one has a speed of 9,000 Km/h. The secon one has a speed of 21,000 Km/h. They start 1234 Km apart. What is their distance one minute before they collide?

Tuesday, April 28, 2009

Fastest runner is not the winner

In a 100 meters race, the fastest runner is not the winner. His run is a valid one. How is this possible?

Monday, April 27, 2009

Fast Dynamic Reranking in Large Graphs

Fast Dynamic Reranking in Large Graphs is a google paper for learning a ranking function of a graph based on positive and negative votes given to a subset of nodes. A node is ranked high if in
a truncate random walk starting from it the probability of hitting a relevant node before an irrelevant one is large. Here below an example.
.

Sunday, April 26, 2009

Efficient Overlap and Content Reuse Detection in Blogs and Online News Articles

Efficient Overlap and Content Reuse Detection in Blogs and Online News Articles describes a word based signature duplicate detection for news and blogs. good paper, but lacks of comparison with previous state-of-the-art.

Saturday, April 25, 2009

FPGA Acceleration of RankBoost in Web Search Engines

FPGA Acceleration of RankBoost in Web Search Engines is a Microsoft paper about accellerating RankBoost learning algorithm. RankBoost is a variant of AdaBoost, where the weak learners are based on features extracted from the web documents. Microsoft claims of using more that 646 features. Each feature acts as a binary selector and gives a positive vote, if its value is above a threshold. Two techiques are used for speeding up the learning computation:

a binning and histogram based approximation, for the sequential version of the algorithm.
a FGPA-based implementation, for the parallel version of the algorithm.

I enjoyed reading the paper a lot. Just a question, why the authors are not leveraging off-the-shelf GPU processors, such as CUDA nvidia or PSP3?

Friday, April 24, 2009

Find 4 primes such that their sum is 50

Thursday, April 23, 2009

Find a perfect square, cube and fifth power

Find a number that is a perfect square, cube and fifth power and is less than 2*10^14

Wednesday, April 22, 2009

100 random digits

100 digits are chosen at random. What is the probability to find two numbers, a and b, such that a^2=b and each digit is used just one time in forming the two numbers?

Tuesday, April 21, 2009

Two search engines recently launched ( I like them )

SearchMe just expanded its index to anything related to entertainment/media, song, artists, news, video, images, web, whatever. I like the quality offered by the search ranking, the freshness, and the variety or results. The interface is very innovative. My suggestion is to experiment a little more by adopting the different layouts seen in FoxTab.

DuckDuckGo is a simple combination of Yahoo's Boss and Wikipedia. Anyway, I like the clean interface and the concepts extracted from Wikipedia.

Friday, April 17, 2009

A great review on our freshness job

Off-topic, but just want to say thank you to anyone who worked hard on it. These are nice words.

http://www.seobythesea.com/?p=1333

For example, a search for Bruce Springsteen is presently showing a rich mix of images, scheduled events, web pages, and news results for the performer, and if you hover your mouse over the image next to the news results for Bruce Springsteen at Ask.com, the news video starts playing.

They sound interesting to me, too. I do like their focus on providing timely information.

I looked at search results for “Bruce Springsteen” in Google, Yahoo, and Ask, to see how current the information was from each. All three had fairly up-to-date news information. Ask actually had a richer results than the other two in terms of the images that they displayed, and their listing of future “events” or concerts for Springsteen.

Wednesday, April 15, 2009

Bayesan Classifiers

Bayesan Classifiers are one of the most powerful data mining tool. I use them everyday with very good results. I found out that having an accurate and rich training dataset is much more important than having a ultra-sophisticate classifier (such as SVM). Bayesan classifiers have a very simple math, are easy to implement, and provide very good results in almost all the situations.

Here you have a Naive implementation of Naive Bayesan Classifier.

Friday, April 10, 2009

VC does not look good

One Image is worth...

Wednesday, April 8, 2009

Two dice

Can you design two dice so that their sum behave just like a pair of ordinary dice?

E.g. 2 ways of getting 3 (2+1, 1+2). 1 way of getting 12 (6+6), etc

PS: this puzzle is quite interesting. You need a way to encode dice. It turns out this can leverage the representation of polynomial

Tuesday, April 7, 2009

Dropping K Swarovsky balls

Someone asked for a variant of the Dropping Swarovski Crystals problem where you have m balls available and n floors. In this case, what is the number of drops needed?

Let's see: we can formulate the problem with 2 balls as a recursion problem. Let f(k) the maximum number of floors we can test with k drops and using 2 balls. If a ball breaks at a floor k, we need to start from the very ground up to the k floor and test all the k floors. If the ball is not breaking we need to test f(k-1). Therefore, the recursion is f(k) = f(k-1) + k. In other words f(k) = sum_i=1,..k i = k (k+1) / 2. We need to find the minimum k such that k(k+1)/2 > 100. This can be solved by direct test and we find that 14 (15/2) = 105. The number of test we obtain is 14. If we want to know what are the floor to test we can "unroll" the recursion and find the actual floors.

Now suppose you have 3 balls, can you see the problem as a recursion one? And what about m balls?

Monday, April 6, 2009

Sorted Matrix for rows and columns

Given a random matrix, prove that if you sort by rows and then by columns the matrix is still sorted by rows

Sunday, April 5, 2009

15 bags

You have 15 bags. How many crystal balls do you need to have a different number of balls in each bag?

PS: try to solve with less than 30 balls

Saturday, April 4, 2009

Linear Solvers (Successive Over Relaxation)

Successive over-relaxation (SOR) is a linear solver which speed up convergence of the Gauss–Seidel method. Implementation of the method is quite easy and convergence is fast.

SOR has been frequently used for computing PageRank (see Adaptive Methods for the Computation of PageRank , A Survey on PageRank Computing , Comparison of Krylov subspace methods on the PageRank problem ).

Here you have the code for SOR solver in C++

PS: if you are interested in Linear Solvers with C++, I suggest using Dolfin, which provides high-performance linear algebra through uBLAS, PETSc, Trilinos and MTL4 (experimental) with simple C++ and Python wrappers.

Friday, April 3, 2009

Decision Tree (part II)

Added the classification code to the Decision Tree implementation. Classification is a visit of the tree. If the new observation matches the value contained in the internal node, then the true branch if followed. Otherwise, the false branch is followed. Nothing can simpler than this.

Please note that a complete Decision Tree implementation should also incorporate some pruning strategy, but I will leave this to the interested reader ;-)

Thursday, April 2, 2009

Splay Trees in C++

I was looking for a generic implementation of Splay Tree in C++. Splay Tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again. This is useful for competitive analysis and online algorithms. Splay Trees operations such as insertion, look-up and removal are performed in O(log(n)) amortized time.

I found no generic implementation, but the original C implementation of Daniel Sleator and Robert Tarjan. Therefore, I decided to transform this implementation into generic C++ with templates.

Here you have the C++ code for Splay Trees (adapted by the original C version)

CGAL : Computational Geometry Algorithms Library in C++

I was looking for a library implementing Range Trees and Interval Trees, two less known but very useful data structures for retrieving overlapping segments and points. Those data structures are used in databases and in many other contexts.

Well, what I found is a larger golden mine. The Computational Geometry Algorithms Library (CGAL) is a software library that aims to provide easy access to efficient and reliable algorithms in computational geometry. While primarily written in C++, Python bindings are also available.

The table of content shows an impressive amount of already cooked algorithms and data structures, ranging from basic 2D and 3D Geometry Kernel, to Convex Hulls and Delaunay Triangulations, to Voronoi Diagrams.

I found the Search Structures part quite interesting with implementations of 2D Range and Neighbor Search , Interval Skip List, Spatial Searching ( Neighbor Searching , Range Searching ,

d tree ), Range and Segment Trees, Intersecting Sequences of dD Boxes.

The library has additional tools for Linear and Quadratic Programming Solver and Spatial Sorting and support for third party software such as the GUI libraries Qt, Geomview, and the Boost Graph Library. (Gosh all these are together?)

A huge amount of software all using Template C++ with an approach similar to Boost.

I take what I need for my coding task, but this is bookmarked and I will discuss about it more intensively in the future.

Wednesday, April 1, 2009

Proof of the Riemann's hypothesis

Finally they found the proof for Riemann's hypothesis

Fuzzy Clustering

Fuzzy clustering is an interesting data mining process where every item belongs to multiple clusters with different membership values in [0, 1]. As a consequence clusters are not crips but fuzzy (guess what? this is a typical situation in fuzzy logic). Fuzzy c-means clustering is the most famous fuzzy clustering algorithm and can be seen as a variant of the well-known k-means.

Here you have the C++ for Fuzzy Clustering (it uses Boost::uBlas for matrix operations)