Friday, July 24, 2009

StatSnowball: a Statistical Approach to Extracting Entity Relationships

Can you extract relations among entities on Web scale? This is a very hard problem to address. For instance these days "Neil Armostrong" is related to "Buzz Aldrin" and to "Apollo". This is an easy one, but he is also related to "Barack Obama" and that is a bit more difficult since it depends on this particular instant of time.

A common approach is to provide a set of NPL rules and learn (1) what are the entities (2) what are the relations among the entities.

The paper "StatSnowball: a Statistical Approach to Extracting Entity Relationships" introduces an interesting methodology where new rules can be derived from a small set of examples. In order to achieve this goal, a bootstrapping techniques is adopted as in the figure.

