23 February 2007

Link of the day: Colloqium talk on community-based data management

I learned about a really interesting talk via Dave Bacon's blog. Here Raghu Ramakrishman talks about the emergence of social websites (e.g. flickr, del.icio.us, etc) and how the future of the web may lie in leveraging community input to improve data extraction. In the beginning, we had the web which was just static text and pictures. Then Alta Vista and other search engines used anchor text to improve searching. After that, Google came up with the concept of page rank. By looking at the many levels of linkage and giving more reputable sources a higher weight, you can come up with the most relevant search result. The next step in the evolution of the web may be using the direct input of users (e.g. tags) to improve data extraction.

Ramakrishman has been doing research on a very small-scale example called DBLife. It is a site that tries to extract data about academic researchers in the field of databases. If you search for a particular researcher, DBLife will attempt to generate pictures of the researchers, his/her list of publications and talks, and other information. You (the user) can then submit input on whether you think the information is accurate. So Ramakrishman's great hope is that we can develop software that is easily maintained by a community of users, that it will be easy to abstract the underlying structure of DBLife and apply it to other communities like Hollywood (e.g. HollywoodLife). In other words, one could give the software to another community without them having to look at any code.

The video of the talk is available here.

