Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Wednesday, December 25, 2013
Egg drops
Given k eggs and n floors in a building, find the minimum number of trials required to determine the lowest floor from which when we drop an egg it does not break. Provide a Dynamic Programming solution.
Tuesday, December 24, 2013
Bridge Matching
n cities are sitting on the northern bank of a river and n cities are sitting on the southern bank. You can build a bridge only between cities with the same number. Provide a dynamic programming solution.
Monday, December 23, 2013
LBS
Given an array of integers find the longest bitonic sequence. Provide a dynamic programming solution
Sunday, December 22, 2013
Box Stacking
Given a set of 3d boxes compute the largest stack of boxes. A box can be stacked only on the top of another box with lager base. Provide a dynamic programming solution for stacking the boxes
Saturday, December 21, 2013
Boolean Parentization
given a Boolean expression of and, or, xor, true, false, find the number of ways to parenthesize and evaluate to true. Provide a dynamic programming solution
Friday, December 20, 2013
Box Stacking
Given a set of 3d boxes compute the largest stack of boxes. A box can be stacked only on the top of another box with lager base.
Thursday, December 19, 2013
Maximum value contiguous subsequence
given a real numbers’ array find a contiguous subsequence with max sum. Provide a dynamic programming solution
Wednesday, December 18, 2013
Balanced partition
given an array of integers between 0 and M divide the integers into two set such that difference of sums is minimized. Provide a dynamic programming solution
Tuesday, December 17, 2013
Scheduling
given n jobs each of which with a processing time, a profit and a deadline maximize the profit. Provide a dynamic programming solution
Monday, December 16, 2013
Print the binary representation of an unsigned
Sunday, December 15, 2013
compute whether or not an unsigned number is a power of two
Saturday, December 14, 2013
Find the position with only one bit set in an unsigned
Friday, December 13, 2013
given an unsigned int swap odds and even bits
Thursday, December 12, 2013
Wednesday, December 11, 2013
Add two numbers without using arithmetic operators
Tuesday, December 10, 2013
Saturday, November 9, 2013
Chapter 3 is done
This chapter is about Graph computation and there are classical algorithms (BFS, DFS, MST). MST is implementing the classical Prim's algo but using Fibonacci heap available in Boost. Here you find the first version of the graph code
Wednesday, October 30, 2013
Writing a programming / algorithmic book
I wanted to take on a new challenge and start writing a book about algorithms and design. The intent is to have Simplicity by Design (SD): every Chapter is a collection of 15-30 questions, followed by the solution, followed by a C++ code snippet which can be easily re-used.
So far, I have drafts for 3 chapters: bit programming, dynamic programming, and graphs. Here an example of graph code with very classical questions: implement a graph, implement bfs, implement dfs, implement topological sort, I already have code for other graph problems. Dynamic programming has more mathematical content and I will publish an excerpt next weeks.
A number of questions:
So far, I have drafts for 3 chapters: bit programming, dynamic programming, and graphs. Here an example of graph code with very classical questions: implement a graph, implement bfs, implement dfs, implement topological sort, I already have code for other graph problems. Dynamic programming has more mathematical content and I will publish an excerpt next weeks.
A number of questions:
- Should I publish code for public review? If so, can i still use it for the book?
- Should I publish questions for public review?
I think that this will be one of my side projects off work for the next year or so.
Sunday, September 29, 2013
Create your Team
you have a pool of players. Each player ha a set of skills. For creating a team you need to hire the minimum number of players who have together a predetermined set of skills
Wednesday, September 25, 2013
P0 - what people are saying
Here some previous postings
Find i = n^3 + m^3
how can a number be expressed as sum of two cubes?
52 cards
You have 52 cards half of them are red and half of them are black. You have a given amount of money M and every time you can bet k < M money. If you predict the right colour you will win 2k money, if not you will loose k. What is the best strategy?
Sort lexicographically an array of strings in C++ and in C
Tuesday, September 24, 2013
Monday, September 23, 2013
Conflits in k keywords
A keyword is made up of n bits. Given k keywords generated with uniform probability what is the probability to have collision? Provide an evaluation and write the code.
Sunday, September 22, 2013
Radix Sort -
Write a radix sort implementation in C++
Saturday, September 21, 2013
PageZero and Ted
I enjoyed attending Ted and give a talk about PageZero. While I was listening a number of very inspiring talks, I asked to myself would this PageZero really help users?
So I started to write {Ted} and immediately PageZero suggested {Ted Talks} as first suggestions and {Speakers} Speakers as su-bintent. That's exactly what I was looking for!!! I am impressed.
Friday, September 20, 2013
Buying with constrains
You have a bag with capacity C and can insert items with value wi which will occupy space si. You cannot exceed the capacity. Write an algorithm in C++
Thursday, September 19, 2013
Project PageZero
SearchBox never changed since 1993, when the first search engine was created. Bing is disrupting the box by bridging the gap with the World of Entities. Names, Web Sites, and Generic Entities are directly accessible and understood as soon as you type.
http://www.slideshare.net/antoniogulli/project-page-zero-bi
http://www.slideshare.net/antoniogulli/project-page-zero-bi
Wednesday, September 18, 2013
Tuesday, September 17, 2013
k balls, n floors
There is a building with n floors and you have k balls. Balls are identical and when you drop them they can break or not. You need to figure out the highest floor of the building an egg can be dropped without breaking.
Monday, September 16, 2013
In a network of n nodes elect a leader
what is the algorithm you would suggest?
Sunday, September 15, 2013
In a stream of integers find the ones that appears at least k times
Optimal in space and time
Saturday, September 14, 2013
Friday, September 13, 2013
Design Patterns : C++ full collection of Gamma's patterns
Design Patterns : C++ full collection of Gamma's patterns
Full collection of Gamma's patterns in Full collection of Gamma's patterns in c++:
- Creational: Abstract Factory, Builder, Factory, Prototype, Object Pool, Singleton,
- Structural: Adapter, Bridge, Composite, Decorator, Facade, Flyweight, Proxy patterns,
- Behavioral: Chain of Responsibility, Command, Iterator, Mediator, Memento, Observer, State, Strategy, Template, Visitor, and Interpreter
Thursday, September 12, 2013
Capped payrolls
In a company there are different payrolls pi, and we want to find a cap C such that the total amount of money paid is not above a maxim M. How can you determine C?
Wednesday, September 11, 2013
Int 16 bit - compute the sqrt
Given an integer in 16 bits compute the sqrt
Tuesday, September 10, 2013
Given an array of sorted lists merge them
in optimal time and space
Monday, September 9, 2013
Sunday, September 8, 2013
Count all the safe configurations of queens on a chessboard
Saturday, September 7, 2013
Given a BST and two nodes find the common ancestor
Friday, September 6, 2013
How can you detect the endianess of your computer?
Given an array of integers longest non decreasing
Compute and print the longest non decreasing sequence of integers
Thursday, September 5, 2013
Given a list of words, detect the longest word
made by concatenating other words in the list
Wednesday, September 4, 2013
Given a stream of integers select k random entries
Tuesday, September 3, 2013
Given two integers select the maximum with no comparison
Monday, September 2, 2013
Sunday, September 1, 2013
Given two numbers a and b
Swap them with no temp variable. How many way do you know for achieving that?
Saturday, August 31, 2013
Friday, August 30, 2013
What is the probability of dropping m balls in the same bin?
Suppose there are n bins
Thursday, August 29, 2013
Load balancing
We have a set of task i={1, ..., n} and each one has a cost b_i, how can we assigned to m servers so that the load is balanced?
Wednesday, August 28, 2013
Tuesday, August 27, 2013
Given an array of 1 million entries
Pick the smallest int.
Monday, August 26, 2013
Sunday, August 25, 2013
Triangle and Ants
On a triangle, there are 3 ants what is the probability that they collide when they move
Saturday, August 24, 2013
In a number of 64 bits
Swap odd and even bits.
Sunday, August 18, 2013
What is wrong with this code?
string FindAddr( listemps, string name ) { for( list ::iterator i = emps.begin(); i != emps.end(); i++ ) { if( *i == name ) { return i->addr; } } return ""; }
Friday, August 16, 2013
generate strings
Write a program that generates from the string "abcdefghijklmnopqrstuvwxyz{" the following:
a
bcb
cdedc
defgfed
efghihgfe
fghijkjihgf
ghijklmlkjihg
hijklmnonmlkjih
ijklmnopqponmlkji
jklmnopqrsrqponmlkj
klmnopqrstutsrqponmlk
lmnopqrstuvwvutsrqponml
mnopqrstuvwxyxwvutsrqponm
nopqrstuvwxyz{zyxwvutsrqpon
Thursday, August 15, 2013
Add two integers without using '+'
How can you add two integers without using + operator
Wednesday, August 7, 2013
Swap two integers
with no temp. How many ways do you know?
how many ways do you know to swap variables?
without using temp vars?
Sunday, August 4, 2013
Use a dictionary
Given a string like "thisisalongstringwithmisakenoten" and a dictionary of words, split the string so that the number of non matched words are minimized. Try to optimize the code
Saturday, August 3, 2013
Implement a smartpointer class
using templates in C++
Smartpointer s1(T * ptr)
Smartpointer s1(Smartpointer & s2)
and the assigment
s1=s2
Smartpointer s1(T * ptr)
Smartpointer s1(Smartpointer & s2)
and the assigment
s1=s2
Friday, August 2, 2013
Allocate a matrix of ints reducing the fragmentation
Wednesday, July 31, 2013
Shuffle an array of n elements
Suppose to have a uniform random generator in [0, 1)
Tuesday, July 30, 2013
Intersect two list of integers
what is the complexity
Monday, July 29, 2013
Monday, July 22, 2013
Bing vs Google suggestions
Spot the difference on {royal baby} - which on has better suggestions the unbranded search engine or the other one?
Friday, July 19, 2013
What they've said about new Knowledge Suggestions @ Bing
"This is one area where Bing has a clearly better user experience than Google, if you’re looking to get actionable results from the actual search box. While Google will put a Pitbull Google+ profile in its autosuggest feature, it does little with suggestions for any of the examples Bing provides"
That’s not to say Google’s Knowledge Graph results don’t provide ways for users to distinguish what they’re looking for, but you have to get through the search query before you can tell Google what you’re actually looking for (like the Harry Potter books vs. movies, for example). Of course, you could always specify in your actual query."
Bing Autosuggest Expands to Take on Instant Search, Knowledge Graph In One Fell Swoop - http://t.co/nPS4Z6qufz
Bing has taken search suggestions, autocomplete, and semantic search to a whole new level. -- http://t.co/nPS4Z6qufz
I find the experience pretty darn pleasant. Controversial article but our share is increasing. http://t.co/8SryXw61Eb via @MarketingPilgrm
PCMag talks about the new searchbox with Knowledge understanding -http://t.co/pTf0J2xpk6
Techcrunch talks about Bing Autosuggest - http://t.co/0DgvxFq8OU
According to webpronews we clearly beats Google - http://t.co/v5HIA9jE1Q
According to webpronews we clearly beats Google - http://t.co/v5HIA9jE1Q
Searchengineland talks about the searchbox disruption - http://t.co/vLbOjnoKSs
Microsoft is pushing a bit further by surfacing these differences directly below the search box. -- http://t.co/LcQRmNU7BR
Help vague Bing users get off of Microsoft's landing page and on with their lives.http://t.co/FeC6kax9EJ via @engadget
Other articles:
Bing Autosuggest Categories Revamped
UberGizmo (blog)
Microsoft expands Bing’s autosuggest for people to include animals, brands, movies, albums, places, and more
thenextweb.com
Marketing
Pilgrim
MediaPost
IN
Wednesday, July 17, 2013
Disrupting the searchbox
Today,
I published a blog on Microsoft search and it has been quoted by many
I published a blog on Microsoft search and it has been quoted by many
“This
is one area where Bing has a clearly better user experience than Google, if
you’re looking to get actionable results from the actual search box. While
Google will put a Pitbull Google+ profile in its autosuggest feature, it does
little with suggestions for any of the examples Bing provides.
That’s
not to say Google’s Knowledge Graph results don’t provide ways for users to
distinguish what they’re looking for, but you have to get through the search
query before you can tell Google what you’re actually looking for (like the
Harry Potter books vs. movies, for example). Of course, you could always
specify in your actual query.”
Sunday, July 14, 2013
SQL-like engines on the top of hadoop map/reduce
Apache Kafka
A nice system for distributed logging from multiple platform - apache kafka
Thursday, July 11, 2013
Building large scale systems at Twitter
Very interesting article for building large systems at Twitter. Worth reading it, use of sharding on the top of standard mysql makes the trick (gizzard - for sharding , flockdb - for graph db)
https://twitter.com/zipkinproject
http://www.youtube.com/watch?v=5cKTP36HVgI
http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html
http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html
https://twitter.com/zipkinproject
http://www.youtube.com/watch?v=5cKTP36HVgI
http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html
http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html
Tuesday, July 9, 2013
Monday, July 8, 2013
Othello
Othello is a game played on a square board, with black and white pieces. If a white/black piece is surrounded by a black/white one (either left/right or up/down) then the piece will change color. Players play in turn until they have valid moves left. Implement Othello following Object Oriented criteria (e.g. what is a piece, what is a board, what is a game, who keeps the score, what is a valid move).
represent a double in [0,1) as binary
First need to understand what is the binary representation for a double in [0,1). Let's see an analogy. If the number is an integer such as 123 I would probably make something like this
2 ^ 7 = 128 > 123 too large
123 - 2 ^ 6 = 59 -> set bit 1
59 - 2 ^ 5 = 27 -> set bit 1
27 - 2 ^ 4 = 11 -> set bit 1
11 - 2 ^ 3 = 3 -> set bit 1
3 - 2 ^ 2 < 0 -> set bit 0
3 - 2 ^ 1 = 1 -> set bit 1
1 1 1 1 0 1 1
SearchCharm - Unified Search Experience in Windows 8.1
I am using windows 8.1 preview since many weeks without any problem. Search experience brings together the online and the offline worlds and it allows users to look for information virtually everywhere.
Traditional Web Search
Do I want to access cnn? Easy. I start to insert the query and I get both the web site and some relevant suggestions including cnn news and cnn money. Suggestions are generated by my group in London.
Navigating windows
Do I want to access windows control panel? No problem, I just use the same interface and search paradigm.
Cloud search
Local search
Traditional Web Search
Do I want to access cnn? Easy. I start to insert the query and I get both the web site and some relevant suggestions including cnn news and cnn money. Suggestions are generated by my group in London.
Do I want to access windows control panel? No problem, I just use the same interface and search paradigm.
Cloud search
What if I want to access my files stored in the cloud? Here I have my cv saved on Microsoft Skydrive and accessing it is as easy as searching via the charm
What if I want to access my files stored in my local computer? Here I have some videos saved on my local computer and accessing them is as easy as searching via the charm
Sunday, July 7, 2013
Given a very large binary tree T1 and a small binary T2
check whether T2 is contained in T1. Discuss your solution
Saturday, July 6, 2013
Given a binary tree, create a list of lists containing all the nodes in one level
write some c++ code / STL. Optimal in space and time
Friday, July 5, 2013
Wednesday, July 3, 2013
Search as you type with snapshots and disambiguation
Exciting day today! Bing is shipping an innovative search feature aiming at helping users to complete their tasks, as soon as they start to type their queries.
Another example is the query {pluto}. Is the user interested in the planet or is she interested in the cartoon? Those are very different kinds of entities which happens to have the same name but Bing understands the difference and gives to the user the opportunity to select the desired choice.
Now let's suppose that the intent is not ambiguous. For instance consider the keyword {michele obama}, we believe that it is very valuable to have a rich suggestion like the following one.
And what if there are two persons with the same name? In this case, we offer the opportunity to select the desired person as in this example for the query prefix {rachel nich}
One more example. Let's consider the query {u2}. Here, we offer the opportunity to explore all the entities / artists that are playing with the pop band
Hope you enjoy it - this is the beginning.
Consider the keyword "pitbull". This is a very interesting query because the user can be interested in getting either results about the famous singer (a person) or about the dog (an animal). For this example, we believe no instantaneous decision can be taken on the user's behalf and that only the user has a complete understanding of the intent expressed with a query.
However, we believe that the search engine can help the user to express very quickly this intent. How? In Bing we use the power of Satori (our Microsoft Knowledge Graph) to enrich our suggestions with additional information useful for disambiguating different meanings associated with the query. In this way, the users can select their own most appropriate choice.
However, we believe that the search engine can help the user to express very quickly this intent. How? In Bing we use the power of Satori (our Microsoft Knowledge Graph) to enrich our suggestions with additional information useful for disambiguating different meanings associated with the query. In this way, the users can select their own most appropriate choice.
Another example is the query {pluto}. Is the user interested in the planet or is she interested in the cartoon? Those are very different kinds of entities which happens to have the same name but Bing understands the difference and gives to the user the opportunity to select the desired choice.
Now let's suppose that the intent is not ambiguous. For instance consider the keyword {michele obama}, we believe that it is very valuable to have a rich suggestion like the following one.
And what if there are two persons with the same name? In this case, we offer the opportunity to select the desired person as in this example for the query prefix {rachel nich}
One more example. Let's consider the query {u2}. Here, we offer the opportunity to explore all the entities / artists that are playing with the pop band
And what if you search for {harry potter}? In this case Bing provides a snapshot of the filmography, a reference to the book and a reference to the character.
Hope you enjoy it - this is the beginning.
Sunday, May 12, 2013
Friday, May 10, 2013
Stocastic Gradient Descent
Added a stocastic gradient descent to the linear regression code
Thursday, May 9, 2013
Learning linear regression with gradient descend
Last week I restarted an old and good behavior (see A collection of algos and data structures published here). Every day, I take an well known algorithm and code it in boost and C++. Nothing else, just pure training and geeky fun. The only constrain is the time limit of 45mins, after a running session in St. James park.
Here you have the code for linear regression with gradient descent in C++, boost, and ublas. Linear regression is an approach to modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X. Ublas is a powerful set of c routines for efficient matrix and vector computations, Boost:: numeric provides an elegant C++ way of using ublas based on templates.
Here you have the code
Here you have the code for linear regression with gradient descent in C++, boost, and ublas. Linear regression is an approach to modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X. Ublas is a powerful set of c routines for efficient matrix and vector computations, Boost:: numeric provides an elegant C++ way of using ublas based on templates.
Here you have the code
Wednesday, April 3, 2013
Scaling requests with MemCached
A nice paper from Facebook about the use of memcached for scaling requests. Classical UDP connection-less techniques, are mixed with TCP coalescing via mcrouter. Also, they implement flow control, new mechanism for cache miss, and excess of requests on a particular key. The concept of pool is quite interesting.
Monday, April 1, 2013
Implement a binary search
with minimum number of comparisons
Saturday, March 30, 2013
Friday, March 29, 2013
Find largest sum in a subarray
Given an array that may contain both positive and negative integers, find the sum of contiguous subarray of numbers which has the largest sum.
Hint: there is a divide & conquer solution in O(nlogn) and one cool solution in O(n)
Thursday, March 28, 2013
Given a sorted array which is rotated N times(N is unknown), search a number in it
Hint: binary search with modification
Wednesday, March 27, 2013
Find the maxium integer in an array of size N, in O(N) and additional space O(1)
It's easy to make this in O(N) and O(N), or in O(N) and O(K) where K is the maximum int value contained in the array. However, this is requiring O(N) and O(1).
Hint: one way of doing this is to modify the original array.
Hint: one way of doing this is to modify the original array.
Tuesday, March 26, 2013
My talk at ECIR2013
Gave an invited talk @ ECIR2013 and covered:
- Bing, as a platform, new trends for Microsoft search
- Bing suggestion technologies, an overview
- Bing & Cosmos, massive data mining models compared to Hadoop, Map & Reduce, Hive, Pig
Monday, March 25, 2013
Posted a blog about autosuggest for Bing
Last week, I posted a blog about autosuggest for Bing
Sunday, March 24, 2013
Given n sets, intersect them. preserve the duplication
Saturday, March 23, 2013
Find the row with max num of 1s
Given a matrix nxm where all the rows are 0/1 and sorted. Find the row with max num of 1s.
Friday, March 22, 2013
Thursday, March 21, 2013
Wednesday, March 20, 2013
Monday, March 11, 2013
Sorted array and rotated
Given a sorted array of distinct elements, and the array is rotated at an unknown position. Find minimum element in the array.
Sunday, March 10, 2013
Given an array of positive integers, select two sub-sequences having minimum difference
Hint: one element can be either in one sub-sequence or in the other, so this is a typical recursive problem
Saturday, March 9, 2013
A sorted array with duplicates. Count #key
In O(logN)
Friday, March 8, 2013
Thursday, March 7, 2013
Design a data structure where min, max are O(1) , delete and insert are O(logN)
Hint: min is O(1) for an min-heap, delete and insert are O(log N). Same for max. Question is how to make it for both min and max
Wednesday, March 6, 2013
Circular pumps
Given a circular trace with pumps and a car running on a trace. Each pump has a well known distance from the origin, and the amount of kilometers you can run if you stop and refuel your car. Find a suitable starting point so that the car will not run out of fuel and the number of stops are minimal.
Tuesday, March 5, 2013
Special sort
Given an array of numbers, arrange them in a way that generate the largest value. For instance, if the given numbers are {34, 345, 348, 50}, the arrangement 5034834534 gives the largest value.
Monday, March 4, 2013
Visit a BST in level order
Visit a BST in level order.
Visit a BST in reverse level order
Visit a BST in reverse level order
Sunday, March 3, 2013
Saturday, March 2, 2013
Friday, March 1, 2013
Find a pair in a BST with sum 0
Given a BST of integers find a pair with sum 0.
Tuesday, February 12, 2013
Class of problems with Facebook Graph search
Several classes of problems with Facebook graph search have been reported. Here there is a list:
Plus, they frequently give no search results for suggested queries
Plus, they frequently give no search results for suggested queries
Thursday, February 7, 2013
Given a set of time intervals in any order, merge all overlapping intervals into one and output the result which should have only mutually exclusive intervals
Tuesday, February 5, 2013
Facebook naive graph search and the need of proper query understanding algos
Search engines use multiple data sources to help users in their query formulation. Among those sources three are very important:
- The web graph, which includes all the web pages published on the net. Mining the content of those pages and their connections can be very useful for detecting topics/entities and for understanding the correlation among those topics/entities;
- The query graph, which includes all the queries submitted to a search engine in the past. Mining this graph can be useful for detecting what are the needs expressed by past users and how those needs change over the time;
- The blog and news graph, which are useful for detecting fresh content;
Facebook is very interesting for search because they have access to other sources of data including:
Freshness
Today, the most important financial news is Dell's acquisition and every single newspaper or financial web site report the news. Bing (and Google) both have this information in their suggestion, while Facebook is completely missing it. It seems like they are not analyzing the news graph (3rd source in the above list)
In the below example, the last two suggestions are provided by Bing. Facebook just tries to complete the prefix {microsoft and} with stuff such as {London} and {Pisa}, which are popular in my social graph but have NO correlation with Microsoft and with today's news.
Query understanding
So, If you spend sometime analyzing Facebook suggestion it's clear that they create synthetic suggestions trying to complete every query prefix with stuffs popular in my social graph. This approach is naive because they DON'T seem to understand the meaning of what the user is looking for. Here there are some more examples:
Again, "Happy Diwali", "Spotify", "Microsoft" and "London" are popular in my social graph but they have NO correlation with Tom Cruise. Still, Facebook is naively generating those suggestions.
A proper search engine understands the right correlation by analyzing the Web graph, the query graph and the news graph something that Facebook is missing at the moment.
- The friend graph, which keeps track of all the friendship relations for each user;
- The content graph, which includes all the postings and the content produced by your friends.
Freshness
Today, the most important financial news is Dell's acquisition and every single newspaper or financial web site report the news. Bing (and Google) both have this information in their suggestion, while Facebook is completely missing it. It seems like they are not analyzing the news graph (3rd source in the above list)
In the below example, the last two suggestions are provided by Bing. Facebook just tries to complete the prefix {microsoft and} with stuff such as {London} and {Pisa}, which are popular in my social graph but have NO correlation with Microsoft and with today's news.
Query understanding
So, If you spend sometime analyzing Facebook suggestion it's clear that they create synthetic suggestions trying to complete every query prefix with stuffs popular in my social graph. This approach is naive because they DON'T seem to understand the meaning of what the user is looking for. Here there are some more examples:
Again, "Happy Diwali", "Spotify", "Microsoft" and "London" are popular in my social graph but they have NO correlation with Tom Cruise. Still, Facebook is naively generating those suggestions.
A proper search engine understands the right correlation by analyzing the Web graph, the query graph and the news graph something that Facebook is missing at the moment.
There are many web sites out there which clearly say that tom cruise is correlated with {scientology}, {katie holmes} (his wife), {suri} (his daughter), and {cameron diaz} (since they performed in many films together), and {nicole kidman} (his past wife). However, Facebook does not capture the intent of user query and does not understand the correlation among those entities. Let's see another example:
Last suggestion is from Bing, while Facebook keeps suggesting the same {Happy Diwali} stuff which is popular in my social graph. Again, they are missing a quality access to the 2nd and 3rd sources of information in my above list.
Bing (and Google) analyse the Web graph, the query graph, the news graph and many other sources and therefore they detect the right correlations among entities (strangely enough Google misses the correlation between {Obama} and {Michelle}).
Please note that if you try the above examples you may get different suggestions from Facebook because those are personalized with stuff popular in your social graph. However, it's easy to reproduce similar naive suggestions when you understand the pattern.
Friday, February 1, 2013
Can you search a list with length n in less than O(n)?
Thursday, January 24, 2013
Facebook Graph Search and problems with synthetic suggestions
I spent sometime playing with the suggestions mined by Facebook for their recent Graph Search. It seems to me that many of them are synthetic e.g. they are generated by mining the friends's graph and not the queries past queries submitted by users.
This approach seems to create many nonsensical suggestions. In this example, I was searching {hotels in new york city close} and the top suggestions are "hotels in new york city close to London", which is clearly a no sense. It seems to me that they are using some kind of frequent itemset algorithm and they notice that I have many friends in London, hence the synthetically generated nonsensical suggestion. As you see, Bing suggestions are not suffering this problem and correct locations are shown.
Another example: I am an Italian guy and for me Milan is in Italy (btw, this Milan is way more popular on the Web that the one in Tennessee). Plus, none of those Milan cities are near to London. Web suggestions are nailing the correct location "il duomo" is central area in Milan, Italy.
Here there is another example of nonsensical suggestion. I assume that there are no Google employees who work for Microsoft at the same time. However, both the companies are quite popular in my social network hence the synthetic suggestion.
Now, let's consider another class of defects. There are queries which are not directly answered and the proposed suggestions are quite unrelated. In this example, I was searching for {actors who won the oscar} and Facebook graph search suggestions are clearly unrelated since I am not interested in "actors who like Oscar wilde"
Here there is another example of unrelated suggestion. I am searching {visit the taja} and this is not related to "People named 'taja' who work at Visit Holland". In this case the semantic of the query is quite not understood.
If you try to replicate my examples, please note that you could get a different set of suggestions because those are generated by using your personal friend graph. However, it would be easy to generate similar situations once you understand the pattern.
See also my past postings:
This approach seems to create many nonsensical suggestions. In this example, I was searching {hotels in new york city close} and the top suggestions are "hotels in new york city close to London", which is clearly a no sense. It seems to me that they are using some kind of frequent itemset algorithm and they notice that I have many friends in London, hence the synthetically generated nonsensical suggestion. As you see, Bing suggestions are not suffering this problem and correct locations are shown.
Another example: I am an Italian guy and for me Milan is in Italy (btw, this Milan is way more popular on the Web that the one in Tennessee). Plus, none of those Milan cities are near to London. Web suggestions are nailing the correct location "il duomo" is central area in Milan, Italy.
Here there is another example of nonsensical suggestion. I assume that there are no Google employees who work for Microsoft at the same time. However, both the companies are quite popular in my social network hence the synthetic suggestion.
Now, let's consider another class of defects. There are queries which are not directly answered and the proposed suggestions are quite unrelated. In this example, I was searching for {actors who won the oscar} and Facebook graph search suggestions are clearly unrelated since I am not interested in "actors who like Oscar wilde"
Here there is another example of unrelated suggestion. I am searching {visit the taja} and this is not related to "People named 'taja' who work at Visit Holland". In this case the semantic of the query is quite not understood.
If you try to replicate my examples, please note that you could get a different set of suggestions because those are generated by using your personal friend graph. However, it would be easy to generate similar situations once you understand the pattern.
See also my past postings:
Friday, January 18, 2013
Spelling, Spelling, Spelling. Facebook Graph Search and Bing Web Search
Continuing with my personal analysis on the Facebook Graph Search and Bing Web Search's integration, here is another example of why you can get the best of two worlds. In this case, I will discuss misspelled words, a common problem for English (see for instance this funny article: "Why misspelled names are so common and what journalists are doing to prevent them").
Let's suppose that users start to type the word {swarzin}: their intent is to get information about the actor, but they are unsure of the correct spelling. As you see in the below image, Bing’s Speller processes tens of millions of data points mined from searches, web pages, clicks and user actions to help you find the right “Schwarzenegger".
Another example is {terramis} a delicious Italina cake of which the correct spelling is {tiramisu} - technically, {terramis} is a mispelled query prefix for the full query{tiramisu}. Other mispelled prefixes are {bufet warr} and one of the myriad variants of {britnay s}. In all of these cases, Bing nails the correct suggestions and are integrated into Facebook typeahead search.
Bing has a significant investment in building a state-of-art speller and those technologies are also leveraged by our team in London in securing the right suggestions. In this context, the additional problem is dealing with the mispelled fragments represented by the query prefix, correct the errors and, at the same time, predict which the more relevant suggestions are, and we need to do this every time new characters are typed in real time!
Can you imagine how fast the predictions and corrections need to be?
Let's suppose that users start to type the word {swarzin}: their intent is to get information about the actor, but they are unsure of the correct spelling. As you see in the below image, Bing’s Speller processes tens of millions of data points mined from searches, web pages, clicks and user actions to help you find the right “Schwarzenegger".
Another example is {terramis} a delicious Italina cake of which the correct spelling is {tiramisu} - technically, {terramis} is a mispelled query prefix for the full query{tiramisu}. Other mispelled prefixes are {bufet warr} and one of the myriad variants of {britnay s}. In all of these cases, Bing nails the correct suggestions and are integrated into Facebook typeahead search.
Bing has a significant investment in building a state-of-art speller and those technologies are also leveraged by our team in London in securing the right suggestions. In this context, the additional problem is dealing with the mispelled fragments represented by the query prefix, correct the errors and, at the same time, predict which the more relevant suggestions are, and we need to do this every time new characters are typed in real time!
Can you imagine how fast the predictions and corrections need to be?
Thursday, January 17, 2013
Facebook Graph Search, with Bing Web Search
Many people asked what are the benefits of merging Facebook internal search (aka Facebook Graph Search), with Bing Web Search results. This is my poster query {Conrad Bain} just died, and Facebook knows many people with that name from their internal data, while Bing nails the name of the actor performing 'Different Strokes' series.
Try your own queries and give us feedback
Try your own queries and give us feedback
Subscribe to:
Posts (Atom)