Monday, January 30, 2012

Search is so boring, what if we stop sending queries?

First search engine I ever saw was Lycos pursuit back in 1994 (wow long time ago!). The key idea in pursuit was to have a crawler collecting documents, and a search form where you had the opportunity to submit queries/keywords for retrieving relevant results. Now, after 17 years search paradigm is ,,, still the same! just more relevant and working on a larger scale. Anyway, the paradigm is still the same. Really?

So that's boring. I don't like being bored.

After leaving and before joining Microsoft, I had this idea of a new startup. One of those ideas I have every 4-5 weeks. I never realized it, so now I put it here for free and if you want you can pick it and make money out of it. After all, ideas are for free but the execution is not.

Imagine you visit a place and prepare a video guide about this place you are visiting. You go on your favorite restaurant and you record a video with your impressions and comments. You go in an hotel and you record another video. You go to the visit a shopping area and you record another video. You record your own videos pretty much  in every location and your mobile will upload them in my server.

Then every time someone will come closer to that location my service will push your content to those people.

No queries, just the location. Searches will happen on your behalf.

  • Relevance can leverage a ranking on the video contributors, their social network, how much their contribution was appreciated in the past, and many other factors that I will not discuss here
  • Monetization can leverage the location and it will have several opportunities.

So this is an idea for you to pick and execute.

Search, with no queries.

Tuesday, January 24, 2012

Books: "Cracking the coding interview"

Cracking the coding interview is one book you should definitively have in your bookshelf, and you should keep reading it now and then.

I have a passion for reading and solving coding interview questions and never found such a detailed source of information.

The book starts with several suggestions on how preparing yourself for an interview. This is an aspect that many people underestimate, whilst having a well-written CV, a personal blog, and possibly a number of open source projects is definitively important. This book gives you a number of good suggestions.

Then there is long part discussing interview questions with a broad coverage of basic data structures, algorithms, programming languages, databases and threads and some advanced coding questions.

The style is concise and you can read each Chapter in isolation. Gayle made an amazing job in illustrating not just the solutions, but several techniques that you can use for solving new problems. Plus, those interview questions, the solutions and the techniques, are not just hypothetical but are very useful in your day by day life as Dev or Researcher.

I would suggest the author splitting the Chapter 7 "Mathematics and Probability" into two separate parts and expand both of them because they are very important during interviews and the current description is probably too synthetic. Also, a Chapter on String algorithms and another one on Parallel programming would be probably useful to have because people will look for them elsewhere.

Having said that, this is definitively a must have book and the money you spend will definitively generate a great return on the investment.

Thanks Gayle for writing it.

Saturday, January 7, 2012

Friday, January 6, 2012

Thursday, January 5, 2012

Wednesday, January 4, 2012

Sphinx: open source search

Sphinxsearch is a scalable open source search server that seamlessly integrates with Mysql. Documents are stored in mysql and the server creates its own external indices. The index and search performance are quite good (60+ MB/sec per server, 500+ queries/sec),  and the server allows to easily extend your own ranking functions. A sql-like language is supported for searching. Apparently, the largest cluster of sphinx servers is currently indexing 3billions of documents and it serves 50M queries per day. An interesting feature I am going to use is the ability to search all the geo-points within a fixed radius.

Tuesday, January 3, 2012

News dataset

Republishing my link to the open collection of news articles.