"I have a suggestion for our President on how to boost economic growth without spending a penny: Free the H-1B's."
I agree.
Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Monday, August 31, 2009
Sunday, August 30, 2009
Saturday, August 29, 2009
I joined the Bing Microsoft new Search Technology Centre (STC) in Europe
It’s official. I decided to join the Microsoft new Search Technology Centre (STC) in Europe. I work on Bing Search technology and will be leading all the engineering development for UX and verticals in Europe.
My office is in the London site of STC Europe which is located close to Carnaby Street, right in the centre of Soho a location full of artists, music, and creativeness. The environment and all the people around are having a galvanizing effect on me. I can feel the energy and new ideas flooding.
STC Europe has three sites: London, Munich and Paris. In addition, it has a strong connection with STC Asia in Beijing, the India Development Center, the new in-development STC center at Silicon Valley, and, obviously, the headquarters in Redmond. All in all, this gives me more opportunity to travel, work with smart people, and improve my skills.
After a week here, I like the environment open to experiment with new stuff. You simply say hey I have this new idea for search and you get the resources to experiment with it. If it works, it goes online. Search is all about continuous improvements and evolutions, isnt’it?
My office is in the London site of STC Europe which is located close to Carnaby Street, right in the centre of Soho a location full of artists, music, and creativeness. The environment and all the people around are having a galvanizing effect on me. I can feel the energy and new ideas flooding.
STC Europe has three sites: London, Munich and Paris. In addition, it has a strong connection with STC Asia in Beijing, the India Development Center, the new in-development STC center at Silicon Valley, and, obviously, the headquarters in Redmond. All in all, this gives me more opportunity to travel, work with smart people, and improve my skills.
After a week here, I like the environment open to experiment with new stuff. You simply say hey I have this new idea for search and you get the resources to experiment with it. If it works, it goes online. Search is all about continuous improvements and evolutions, isnt’it?
Friday, August 28, 2009
The Daily Beast -- Five who are changing the face of the Internet.
The Daily Beast received a nomination for "Five who are changing the face of the Internet." by the newsweek.
I am proud of the ex-group that I led when I was in Ask.com. They contribuited to deliver the News Search algorithmic experience for Dailybeast, together with the other R&D center in NJ (if you search on DailyBeast this is a service powered by Ask.com).
I am proud of the ex-group that I led when I was in Ask.com. They contribuited to deliver the News Search algorithmic experience for Dailybeast, together with the other R&D center in NJ (if you search on DailyBeast this is a service powered by Ask.com).
Wednesday, August 26, 2009
Pointers and Smart Pointers
I love when people imposes the use of smart pointers. Well, if you come from C you know that a pointer is nothing but a memory address. If you come from Java, it's another religion. I like you, but I am on the wild and spice side.
If you are into C++, then you must use smart pointers. Smart pointers are all about resource management. You want to be on the wild but safe side of life. So when you allocate something, you may want to be sure that it will be deallocated at the right time. Talking about sustainability.
It's easy: a smart pointer destructor takes the responsability of freing memory. Now, since the destructor is automatically called by the language when the object goes out of the scope.. you are on the wild but safe side of the life. It's all about RAII.
There are a bunch of smart pointers and you should know them all.
If you are into C++, then you must use smart pointers. Smart pointers are all about resource management. You want to be on the wild but safe side of life. So when you allocate something, you may want to be sure that it will be deallocated at the right time. Talking about sustainability.
It's easy: a smart pointer destructor takes the responsability of freing memory. Now, since the destructor is automatically called by the language when the object goes out of the scope.. you are on the wild but safe side of the life. It's all about RAII.
There are a bunch of smart pointers and you should know them all.
- std::auto_ptr and boost::scoped_ptr. Here the destructor will actually free the memory for you.
- boost::shared_ptr. Here the destrucor will decrement a reference count and when it gets zero counts then it will free the memory. Very useful if you have a share resource, and you are on the wild and open side.
Tuesday, August 25, 2009
Monday, August 24, 2009
Book review: Large Scale C++ Software design
Large-Scale C++ Software Design is a must read book, if you are in software industry. Sometime you may what to move down from Design patterns to low-level physical organization of C++ projects. I believe this is the very first book dealing with this important aspect of software, which is too frequently ingnored in favour to more "abstract" aspects.
On the negative side, the book is too redundant and could have been reduced. The most interesting Chapter is number 5. Go directly there and start reading from that point.
On the negative side, the book is too redundant and could have been reduced. The most interesting Chapter is number 5. Go directly there and start reading from that point.
Sunday, August 23, 2009
Saturday, August 22, 2009
Remember Cuil? Now It’s a Real-Time Search Engine
Cuil does not look good. A lot of hype when they launched and now moved to the Real time news search; but it does not seem better than Oneriot.
"Okay, so is this a competitor to Twitter Search? Maybe a little, but really it’s more like OneRiot in terms of real-time search. And to be honest, OneRiot blows Cuil out of the water in this vertical."
"Okay, so is this a competitor to Twitter Search? Maybe a little, but really it’s more like OneRiot in terms of real-time search. And to be honest, OneRiot blows Cuil out of the water in this vertical."
Thursday, August 20, 2009
I no longer work for Ask.com
It's official. After a long period in Ask.com, it's now time to work on something different.
More thant 4 years ago, I started with a small team of people soon expanded into the first Ask.com European R&D Center. Pisa has been selected as location for the excellent quality of life and for the good concentration of software engineers and academic researchers. Our office was magnificent, as you can see from this collection of pictures (1, 2, 3).
Different Pisa teams worked on various projects:
I posted different blogs pointing out some differentiating aspects of our technology:
I want to thank Pisa team for the impressive work we carried out together. I also want to thank all the people from other offices world wide (Edison, NJ ; Oakland & Campbell, CA; London, UK; Dublin, Ireland; Hangzhou, China). In no particular order: Kelly, Jim, Apostolos, Tomasz, Yufan, Yihan, Doug, Rona, Navid, Chuck, Tuoc, Nitin, Alex, Eric, Andy, Erik, Miguel, Dominic, Steve, Padriac, Peter, James, Michael, Juanita, John, Mary, Kurt, Brendan, Michelle, Danica, Amy, Cassie and so many other people that is difficult to mention in a small blog posting.
I am fortunate to have worked with many bright, talented teams. I learned a lot from them.
More thant 4 years ago, I started with a small team of people soon expanded into the first Ask.com European R&D Center. Pisa has been selected as location for the excellent quality of life and for the good concentration of software engineers and academic researchers. Our office was magnificent, as you can see from this collection of pictures (1, 2, 3).
Different Pisa teams worked on various projects:
- Image Search, co-lead -- "Ask new image search is a step ahead in a notoriously tricky area. With the quality of its image search results, combined with the new Zoom query refinement feature, I'll be using it as my default image search service going forward", SearchEngineWatch
- News and Blog search, co-lead -- "Ask.com has a pretty original approach to the old-time, old-school, traditional maybe, view of news.", Techcrunch
- Video News Search, lead -- "it's interesting that they've managed to integrate the video playing right into the main page since I doubt all the source videos are the same format (Flash, aspx, etc.)", Niraj user comment
- DailyBeast, Tina Brown's news site, co-lead -- "How did IAC/Tina Brown's new Daily Beast do in its first month? Pretty well: The company says it attracted 2.3 million unique monthly visitors and served up 11.4 million page views. A great start for any publishing startup, Alleyinsider.
- Core Web Search Infrastructure; Pisa was involved in the design and implementation of the middleware software providing the base of all the Ask.com search products. A number of people in Pisa worked on this project.
- RealTime Fresh Web Ranking, lead; injecting and ranking fresh news, video and blogs into Web search results in realtime.
- Frontend Platform for UK, Pisa was involved in the Jeeves rebranding in UK. A number of people in Pisa worked on this project.
- A bunch of Search Patents, "Ask.com has been working hard since then at making itself a more useful resource for timely news information, and has started incorporating multimedia into that mix.", Seobythesea
I posted different blogs pointing out some differentiating aspects of our technology:
- Real Time Query Expansion -- Query Logs vs News
- Fresh Correlations, a valid data source
- What is the status of swine flu?
- What is happening in Washington ? real time text a...
- Fresh Correlations: Valentino Rossi and Jorge Lore...
- Fiat , Chrysler deal done: news, video, blog and i...
- Real Time Gossip: Susan Boyle
- Fresh answers and old ones
- Real time Semantic Search
- American Idol: Freshness, Variety and UI
I want to thank Pisa team for the impressive work we carried out together. I also want to thank all the people from other offices world wide (Edison, NJ ; Oakland & Campbell, CA; London, UK; Dublin, Ireland; Hangzhou, China). In no particular order: Kelly, Jim
I am fortunate to have worked with many bright, talented teams. I learned a lot from them.
Wednesday, August 19, 2009
Useful Twitter hack commands
A nice collection of text commands, if you like twitter.
Tuesday, August 18, 2009
Off-Topic: Ryanair business model
Ryanair: How a Small Irish Airline Conquered EuropeI definitively recommend reading this book. If you are a manager, here you will find a lot of insights for running a company when the market is already under a monopoly. Ryanair took the Southwest business model of low-cost and no frills flights and adapted it to the European market. This market was much more protective of larger companies and with different cultures and ways of making business in different countries. There is no free dinner here: all the aspects of Ryanair growth are discussed. They were about to run out of money so many times, they had a very negative behavior with trade unions. Anyway, they were able to make low cost flights a commodity as taking a train or a car. And even better that this. If you plan to run a company, or if you want to have a look the cost-saving kingdom, or if you plan to be an ass-hole once in your life, then this is your book
Monday, August 17, 2009
List of TR35
Here the list of young innovators
Sunday, August 16, 2009
What Apple is doing with this big-ass data center?
Going search, Going social, Going Apps, Going something completely new?
Saturday, August 15, 2009
Google loosing shares
Quoting mashable
"Numbers released by Nielsen tell a similar story: while Google grew from June to July, it still lost market share to its competitors – from 66.1% in June to 64.8% in July, a 1.3 percentage point drop. However, a closer look at the numbers reveals that Bing wasn’t the primary culprit – it was Yahoo which stole Google’s market share."
"Numbers released by Nielsen tell a similar story: while Google grew from June to July, it still lost market share to its competitors – from 66.1% in June to 64.8% in July, a 1.3 percentage point drop. However, a closer look at the numbers reveals that Bing wasn’t the primary culprit – it was Yahoo which stole Google’s market share."
Friday, August 14, 2009
Latent Space Domain Transfer between High Dimensional Overlapping Distributions
This paper combines different techniques for learning across different knowledge domains. SVM Regression is used to fill up missing values, and SVD is used to reduce dimensions. A theoretical bound for the two combined techniques is provided.
Tuesday, August 11, 2009
Knol: is not looking goog(d)
Hmm I had a similar idea and Google released it before. I guess I was wrong.
quoting marketpilgrim: "It’s been a little over a year since Google launched Google Knol. Now it appears the service may not make it to its 2nd birthday."
quoting marketpilgrim: "It’s been a little over a year since Google launched Google Knol. Now it appears the service may not make it to its 2nd birthday."
Monday, August 10, 2009
How do you define a query session?
How can you identify a query session? Smart Miner: A New Framework for Mining Large Scale Web Usage Data suggests using three major components: 1. temporal visit constrains; 2. the links among pages, and 3. maximal visit paths, computed using an a-priori like algorithm. I suggest reading the paper if you want to see reasonable ideas for identifying query sessions.
What I don't like is the experimental part. A site with 1,5K unique users and 5K pages cannot be considered a Large Web site...
What I don't like is the experimental part. A site with 1,5K unique users and 5K pages cannot be considered a Large Web site...
Sunday, August 9, 2009
Book Review: Building Search Application
Building Search Applications: Lucene, LingPipe, and Gate is a pretty good introduction to Information Retrieval with a lot of pragmatic examples. Based on Lucene, Gate and LingPipe. I recomend to add it to your library if you like Lucene and Nutch or if you need to maintain or create a medium scale search application.
Saturday, August 8, 2009
Summer Time: have fun
A few random parodies from the past.
Friday, August 7, 2009
Real Time Query Expansion -- Query Logs vs News (update)
After about 12 hours I checked two real-time search engines (Namely Twitter and Oneriot). None of them offer a real time query expansion service (Yet).
Both of them have a "trending topic", not related to the particular query submitted by the user
Both of them have a "trending topic", not related to the particular query submitted by the user
Thursday, August 6, 2009
Real Time Query Expansion -- Query Logs vs News
Many Search engines offer a related query suggestion service. For instance, when you search for "Obama" the search engine can suggest the query "Is Obama Muslin?". This happens because both the queries have been submitted very frequently by different users in the same search session. In information retrieval this process is called Query Expansion. A common approach is to extract correlations between query terms by analyzing user logs.
The query log based approach shows its limit when you deal with real time events. In this case, there might be no time to accumulate past queries since events are happening right now. For dealing with real time search query expansion, a new idea is to extract fresh correlation from news events.
For instance, Sonia Sotomayor has been just confirmed to the high court.
Judd Gregg, is one of the supporters
And the algorithm nailed the correlation
Now compare the query suggestion provided by Google, where no correlations are provided since the event is too recent.
And compare with the query suggestion provided by Bing, where related search query log based are shown
I believe that leveraging both past query logs and real time news events can provided a more complete and updated query expansion service, since you leverage the best of both the worlds.
(PS: In addition, please note that both Bing and Ask are showing a related fresh video, while Google is not)
The query log based approach shows its limit when you deal with real time events. In this case, there might be no time to accumulate past queries since events are happening right now. For dealing with real time search query expansion, a new idea is to extract fresh correlation from news events.
For instance, Sonia Sotomayor has been just confirmed to the high court.
Judd Gregg, is one of the supporters
And the algorithm nailed the correlation
Now compare the query suggestion provided by Google, where no correlations are provided since the event is too recent.
And compare with the query suggestion provided by Bing, where related search query log based are shown
I believe that leveraging both past query logs and real time news events can provided a more complete and updated query expansion service, since you leverage the best of both the worlds.
(PS: In addition, please note that both Bing and Ask are showing a related fresh video, while Google is not)
Wednesday, August 5, 2009
Tagommenders: Connecting Users to Items through Tags
Can tags be used for improving the performance of recommandation systems? This paper investigates the idea, comparing different signals derived by tags, and implicit or explicit rating.
A bunch of interesting metrics, but the MovieLens dataset is too small and it's not easy to understand how they will scale on large scale.
A bunch of interesting metrics, but the MovieLens dataset is too small and it's not easy to understand how they will scale on large scale.
Tuesday, August 4, 2009
Bing's market share
"Bing's share of the search market grew another percentage point in July, indicating that some of those initial users may be sticking around for the long haul. Google, on the other hand, fell by nearly the same amount, and now faces the combined forces of Microsoft and Yahoo in the race for search market share." arstechnica
Monday, August 3, 2009
Learnin to Rank (a good tutorial)
A very good and updated tutorial for "Learning to rank", from Microsoft Reasearch @ WWW09.
This extends my previous favourite tutorial on the subject [pt1][pt2]
This extends my previous favourite tutorial on the subject [pt1][pt2]
Sunday, August 2, 2009
WebIR a classical tutorial
A complete WebIR tutorial from WebBar 2004, a bit outdated but still valid. The major topic missing there is Learning to Rank -- which has been introduced in 2005.
Saturday, August 1, 2009
How fast Flicker photos propagate in the social network?
The paper "A Measurement-driven Analysis of Information Propagation in the Flickr Social Network" provides an answer: the propagation is quite slow, and each published photo "jumps" no more than 1-2 hops in Flicker social network.
I wonder how what would be the result for a similar study applied to social network more newsworthy such as Facebook or Twitter.
I wonder how what would be the result for a similar study applied to social network more newsworthy such as Facebook or Twitter.
Subscribe to:
Posts (Atom)