Saturday, January 30, 2010

Add two numbers representened in lists (with carry)

You are given two numbers with a strange list representation such as the below. Add them with carry. Find the optimal solution in space and time.

Num 1 = 123456
Num 2= 1234

Friday, January 29, 2010

Thursday, January 28, 2010

Ubuntu Firefox shuns Google for Yahoo! search

"The next release of Ubuntu will scrap Google as the default search engine on its Firefox browser in favor of Yahoo!, thanks to a new revenue-sharing deal between Yahoo! and commercial Ubuntu backer Canonical."

source: the register

Wednesday, January 27, 2010

Google is going social

When you search, you get what your friends are posting. In this case, the social graph is defined by all the Google services (Youtube, Blogger, etc) + Twitter. It’s not clear if Facebook is included, it seems to me that is not. What is your opinion?

Tuesday, January 26, 2010

Eleborate large dataset

I believe that if you were around in the 90-ties you may have heard about NoW or Skeleton programming. Isn't map & reduce just a sub-set of skeleton programming?

Monday, January 25, 2010

Where I'd like to invest: 3D UI

We live in a 3D world (no actualy more, but that is a different story), but so far our interactive experience were 2D. Now things are chaning and we start to see real 3D movies (such as Avatar).

What would be next?

I want to have a real 3D UI for my operative system and I don't just mean simulate it in a 2D screen, I mean project it and than interact with the users, like it is already happening to some extent for video games in Project Natal Then I want a 3D Search experience, which goes behind the 2D flat search. And I want a 3D social experience.

Any company out of there where I can put my money?

Sunday, January 24, 2010

Kernel PCA

PCA is a dimension reduction technique for linearly separeted data. In order to deal with data that cannot be linearly separated, one can adopt the kernel trick. This generates the so called Kernel-PCA.

Here you have the code in C++ and Eigen.

Saturday, January 23, 2010

Google founders to loose control

Google founders will sell around 5-6% of the company stocks on the market, for a current value of about 5.5 billion dollars. Woah! This way they will go under 50% of the company. Will they start something new ? I hope so.

Friday, January 22, 2010

Searches Soar in 2009; Google Tops List

The U.S. represented the largest individual search market in the world with 22.7 billion searches, comScore found, or or approximately 17 percent of searches conducted globally. China ranked second with 13.3 billion searches, followed by Japan with 9.2 billion and the U.K. with 6.2 billion.

About 87.8 million searches originated on the Google Sites network, a 58 percent increase. Just 9.44 million were made via Yahoo, which saw its own total jump by 14 percent. Chinese search engine finished third. But it was Microsoft, spurred by Bing, which showed the second-strongest growth out of the sites comScore measured, jumping 70 percent to 4.09 million searches. Russian search engine Yandex achieved the most considerable gains, growing 91 percent to 1.9 billion searches.

ource PC Mag

Thursday, January 21, 2010

Longest non decreasing sequence.

Given a sequence of N numbers. Find the length of the longest non-decreasing sequence.

Tuesday, January 19, 2010

theft of search intellectual property: operation Aurora

I was in China while this was happening.

Google reported an
highly sophisticated and targeted attack on our corporate infrastructure originating from China that resulted in the theft of intellectual property from Google
Wired is giving some technical details. Apparentl,y is a problem with Adobe Reader and Microsoft IE (even if other programs can be affected)

Although the initial attack occurred when company employees visited a malicious website, Alperovitch said researchers are still trying to determine if this occurred through a URL sent to employees by e-mail or instant messaging or through some other method, such as Facebook or other social networking sites.Once the user visited the malicious site, their Internet Explorer browser was exploited to download an array of malware to their computer automatically and transparently. The programs unloaded seamlessly and silently onto the system, like Russian nesting dolls, flowing one after the other.

“The initial piece of code was shell code encrypted three times and that activated the exploit,” Alperovitch said. “Then it executed downloads from an external machine that dropped the first piece of binary on the host. That download was also encrypted. The encrypted binary packed itself into a couple of executables that were also encrypted.”

One of the malicious programs opened a remote backdoor to the computer, establishing an encrypted covert channel that masqueraded as an SSL connection to avoid detection. This allowed the attackers ongoing access to the computer and to use it as a “beachhead” into other parts of the network, Alperovitch said, to search for login credentials, intellectual property and whatever else they were seeking.

McAfee obtained copies of malware used in the attack, and quietly added protection to its products a number of days ago, Alperovitch said, after its researchers were first brought in by hacked companies to help investigate the breaches.

Although security firm iDefense told Threat Level on Tuesday that the Trojan used in some of the attacks was the Trojan.Hydraq, Alperovitch says the malware he examined was not previously known by any anti-virus vendors.

iDefense also said that a vulnerability in Adobe’s Reader and Acrobat applications was used to gain access to some of the 34 breached companies. The hackers sent e-mail to targets that carried malicious PDF attachments.

Alperovitch said that none of the companies he examined were breached with a malicious PDF, but he said there were likely many methods used to attack the various companies, not just the IE vulnerability.

Once the hackers were in systems, they siphoned off data to command-and-control servers in Illinois, Texas and Taiwan. Alperovitch wouldn’t identify the systems in the United States that were involved in the attack, though reports indicate that Rackspace, a hosting firm in Texas, was used by the hackers. Rackspace disclosed on its blog this week that it inadvertently played “a very small part” in the hack.

The company wrote that “a server at Rackspace was compromised, disabled, and we actively assisted in the investigation of the cyber attack, fully cooperating with all affected parties.”

Alperovitch wouldn’t say what the attackers might have found once they were on company networks, other than to indicate that the high-value targets that were hit “were places of important intellectual property.”

iDefense, however, told Threat Level that the attackers were targeting source-code repositories of many of the companies and succeeded in reaching their target in many cases.

Alperovitch says the attacks appeared to have begun Dec. 15, but may have started earlier. They appear to have ceased on Jan. 4, when command-and-control servers that were being used to communicate with the malware and siphon data shut down.

Microsoft issued a security warning

Microsoft is investigating reports of limited, targeted attacks against customers of Internet Explorer 6, using a vulnerability in Internet Explorer. This advisory contains information about which versions of Internet Explorer are vulnerable as well as workarounds and mitigations for this issue.

Our investigation so far has shown that Internet Explorer 5.01 Service Pack 4 on Microsoft Windows 2000 Service Pack 4 is not affected, and that Internet Explorer 6 Service Pack 1 on Microsoft Windows 2000 Service Pack 4, and Internet Explorer 6, Internet Explorer 7 and Internet Explorer 8 on supported editions of Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 are vulnerable.

The vulnerability exists as an invalid pointer reference within Internet Explorer. It is possible under certain conditions for the invalid pointer to be accessed after an object is deleted. In a specially-crafted attack, in attempting to access a freed object, Internet Explorer can be caused to allow remote code execution.

Microsoft also issued a risk assesment that I encorauge you to visit
As you can see, the client configuration currently at risk is Windows XP running IE6. We recommend users of IE6 on Windows XP upgrade to a new version of Internet Explorer and/or enable DEP. Users of other platforms are at reduced risk. We also recommend users of Windows XP upgrade to newer versions of Windows.

The vulnerability is present in Internet Explorer 6, Internet Explorer 7, and Internet Explorer 8. All versions may crash after opening the attack code. However, there are a number of ways to limit the attack to an IE crash and prevent attacker code execution.

  • Disable code executing from random locations of freed memory. Data Execution Prevention (DEP) prevents the execution of code from pages of memory that are not explicitly marked as executable. DEP is a supported feature on Windows XP Service Pack 2 and higher, Windows Server 2003 Service Pack 2 and higher, and all versions of Windows Vista, Windows Server 2008, and Windows 7. Some platforms enable DEP by default (see below). You can read more about DEP in this blog here and here. You can enable DEP on Windows XP and Windows Vista by clicking the Microsoft Fix It button below. (DEP is enabled by default for Internet Explorer 8 running on XP Service Pack 3, Windows Vista Service Pack 1 and higher, and Windows 7, so you do not need to use the "Microsoft Fix It" for those configurations.)
Mcafee released some advices for security here

Make sure that you have the latest version of McAfee security software on your computer with the latest signature files to protect yourself against the malware exploits within Operation Aurora. If you don't have security software, you can download a McAfee free trial.

Also, Microsoft recommends users change their browser security setting to HIGH and McAfee recommends that you restrict browsing to known sites until Microsoft provides a patch for the Internet Explorer exploit. To change your security settings to HIGH, open the browser, click on Tools>Options>Security and slide your setting up.

Adobe released a critical patch here
Adobe Reader 9.3 was released today, right on schedule, to address this issue. In the meantime, the company is realizing the changing nature of the platform business, and how Reader/Acrobat and Flash are now just as susceptible to potential attacks as any other platform, including Windows. Interestingly, the cross-platform nature of the Acrobat platform means that Mac users were just as susceptible to this exploit as Windows users.

Monday, January 18, 2010

Baidu CTO leaving

I was in Beijing these days. In my hotel I saw the 10 birthday anniversay for Baidu, the search market leader in China. Today, I saw the news about their CTO change. Something is happening in China.

Sunday, January 17, 2010

Bing got +2.7% market share, since May 2009

"What is even more interesting is if you look at year-over-year query growth rates for each search engine. Bing’s growth is actually accelerating. Its growth rate in query volume was 49.4 percent in December, compared to 20.6 percent growth for Google (which was also above the average), and a 1.9 percent decline for Yahoo. Here are the year-over-year query growth rates for Bing for the past few months:"

source: techcrunch

Saturday, January 16, 2010

PCA: Dimensional Reduction in Eigen

PCA (Principal Component Analisys) is a classical machine learning method to reduce the dimensionality of a problem. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. Playing with Eigen library, I started to implement PCA in C++.

Thursday, January 14, 2010

Wednesday, January 13, 2010

Google enters the real estate business + maps

I was wondering when this was happening. Alessio told me about this a couple of years ago. He was right and I knew it.

Tuesday, January 12, 2010

Som clustering algorithm

Som (Self Organizing Map) is an interesting algorithm for mapping vectors into a lattice of nodes. Typically this is a 2D array of nodes or a toroid. This is an implementation with 2D array and boost::matrix

Store phone numbers

I used it my interviews. Now not any longer: Store 100 million phone numbers, with minimal memory space. Search must be efficient.

Monday, January 11, 2010

Getting eigenvalues solver in Windows

After testing Boost::numeric::bindings (with Atlas) on Windows, I definetively abandoning this solution. Anyone has any positive experience with them?

I am moving to Eingen library, which is a native template library ready to use. Just download it. Then in Visual Studio edit the proprierties of your project and in C++ add the appropriate include path. Then compile it, without any problem.

Eingensolver is defined in this class. Here a toy code example of use.


// import most common Eigen types USING_PART_OF_NAMESPACE_EIGEN
int main(int, char *[])
using namespace Eigen;

Matrix3f m3;
m3 << 1, 2, 3, 4, 5, 6, 7, 8, 9;
Matrix4f m4 = Matrix4f::Identity();
Vector4i v4(1, 2, 3, 4);

EigenSolver m_solve(m3);
Vector3f m_solved_val = m_solve.eigenvalues().real();

std::cout << "m3\n" << m3 << "\nm4:\n"
<< m4 << "\nv4:\n" << v4
<< "\neigen:\n" << m_solved_val << "\n" << std::endl;

std::string a;
std::cin >> a;

Sunday, January 10, 2010

Yahoo CEO Gives Herself a B-Minus

Carol Bartz said her first year was harder than she expected it would be "It was a little tougher internally than I think I had anticipated," Bloomberg quoted Bartz, 61, as saying. "I did move fast, but this is a big job."

Saturday, January 9, 2010

Friday, January 8, 2010

Getting Blas and Lapack and Blas libraries in Windows without compiling them

Getting Lapack and Blas libraries in windows is quite annoying since there is not any prebuilt package that I am aware of. You need to compile a distribution of ATLAS, but this is quite demanding since there are a lot of dependencies for windows and current version appears like

One alternative solution, that I am exploring is based on extraction of symbols from R. You install it and then from Visual Studio command prompt perform a:

dumpbin /EXPORTS Rlapack.dll > Rlapack.def
dumpbin /EXPORTS Rblas.dll > Rblas.def

Next step is to use the calling standard described here. I will discuss about it in a follow-up posting.

Thursday, January 7, 2010

Define a functor

In C you have the classical pointer to function which allows you to store a pointer to a parametric function f. Then you can pass the pointer to another function g, which will use f. A typical example is the cmp function you use to sort a generic array of type T elements. The problem with pointer to function is that you don't have a strong type system to check the signature of the function and this can generate potential horrid bugs.

In C++ and STL you typically define a functor object which had the type system check but is limited to unary, binary or ternary functions.

In Boost you resolve this problem by using the Boost Function library, which allows you to define a type checked function object with unlimited number of parameters.

Wednesday, January 6, 2010

Nearest Neighbour on KD-Tree in C++ and Boost

Wikipedia describes the pseudo-code for computing the nearest neighbour (nn) on an already built KDtree. Here you have a boost implementation of Nearest Neighbour with Kd-tree in boost.

Tuesday, January 5, 2010

Very curious about voice search on Nexus one

Google released a Phone, aiming at competing with Iphone, windows mobile, symbian, etc. I am quite interested in their voice search features. Have you used it?

Monday, January 4, 2010

In the end the love you receive is more than the love you give

You know that old song from Beatles. This is not the case for my life. The love I receive from Francesca, my wife, Lorenzo and Leonardo, my two kids is more than the love I give to them. Thanks for being here, my love.

Sunday, January 3, 2010

Do you need to return more than a value

Typically you do

return_value function(parameters)


with boost::tuple you can return tuples of values. here is an example of code.

Friday, January 1, 2010

In the end the love you give is equal to the love you receive

New Year and off-topic. I had 3 teachers in my life.
  • Antonio C, who taught how to use the power of imagination and think about a product before anyone else.
  • Apostolos G, who taught how to realize world class industrial products
  • Paolo F, who taught how important is theory and the beauty of algorithms.
I never had a chance to say thank you to them directly. I believe that the best way to thank them is to create new opportunities for other people, as they created opportunities for me.

PS: A 'teacher' is someone who is senior to me and I no longer work for him