Please forward this announcement to anyone who might be interested.

UCLA Information Studies Seminar

Managing Information on the Web

John Cho
UCLA Department of Computer Science

Thursday, February 27th, 2003, 3pm-5pm
GSE&IS Building, Room 111
(just west of the Research Library)

Abstract: The advent of the World Wide Web has made a rich set of information available to the user. However, due to the sheer volume of the information, the user often feels disoriented and wastes significant time and effort "searching" the Web. In this talk, I will briefly describe two major approaches that address this "information-overload" problem, and I will discuss some of my research results in this area.

The first "indexing" approach, which is taken by Web search engines, tries to build a central index, so that the user's queries can be answered by looking up the index. In this case, the major challenges are 1) how to efficiently build and update the index, 2) how to efficiently process the user's query, and 3) how to rank the pages.

The second "integration" approach, which is taken by comparison shopping services, tries to build a "mediator" that "translates" the user's queries to the underlying source queries, so that the underlying sources can process the user's queries and return the results. In this case, the major challenges are 1) how to automatically translate the user's queries to the native source queries and 2) how to handle different "capabilities" of the underlying sources.

Out of these challenges, I will discuss how we can maintain a central index "up-to-date". To address this problem, I will first present the result from an experiment, in which I traced the change history of half million Web pages for 4 months. Based on the result of this experiment, I will develop a Web change model and design an index refresh policy, which can selectively and incrementally update the index to maximize its "freshness".

 

Junghoo (John) Cho is an assistant professor in the Department of Computer Science at University of California, Los Angeles. He got his PhD from Computer Science at Stanford University. He also received an MS in Computer Science from Stanford University and BS in Physics from Seoul National University in Korea. His research interests are in databases and Web technologies. He is particularly interested in information discovery, integration and search on the Web. He has published many technical papers on databases and Web technologies in leading conferences and journals.

Everyone is invited.