Information Studies 277 -- Information Retrieval Systems: User-Centered Designs

Phil Agre
Office: 229 GSE&IS Building
Phone: (310) 825-7154

Fall 2007
Wednesdays from 9am to 12:30pm, GSE&IS room 245


This is a course on the semantic web, an important new document-centered computing technology in which web pages and other online resources are provided with metadata that can be automatically processed by computers.

The course prerequisites are IS 245 (Information Access) and IS 260 (Information Structures). The course complements several other courses in the program, including IS 240 (Management of Digital Records), IS 270 (Introduction to Information Technology), IS 272 (Human/Computer Interaction), IS 274 (Database Management Systems), IS 276 (Information Retrieval Systems: Structures and Algorithms), and IS 464 (Metadata).

The main idea of the semantic web is machine-readable ontology standards. An "ontology" is a theory of the categories of things that comprise a given domain. Familiar examples of ontologies include taxonomies and controlled vocabularies. Every computer system uses an ontology. Computer systems that are built independently of one another, however, often cannot interoperate because they use different ontologies. Now that computer systems are heavily networked, numerous user groups have begun standardizing their ontologies. The semantic web consists of mechanisms for "marking up" ontologies and then processing them.

There are roughly four kinds of ontologies: document ontologies (e.g., the chapters of a book or the footnotes of a paper), metadata ontologies (e.g., the format of a file or the copyright status of a document), domain ontologies (e.g., the components of an automobile or the entries of a schedule), and service ontologies (e.g., the inputs of a software module, the steps of a transaction, or the formats of messages that are passed back and forth between a client and server). Because its topic is information retrieval and not web services generally, this course is mainly about document and metadata ontologies. In practice, however, document and metadata ontologies often include elements of domain ontologies, and many services use documents and metadata. For completeness, therefore, weeks 9 and 10 respectively will be on domain and service ontologies.

The semantic web includes a layered series of markup languages starting with XML (itself derived from SGML). The most distinctive aspect of XML is that user groups can use it to define specialized sublanguages to mark up the ontologies that are meaningful for their own work. "User-centered design" in a semantic web context means precisely the codification of a user group's ontologies. This is important because document collections that have been marked up within a standard XML-based language can be stored and retrieved in much more sophisticated ways than unstructured plain text documents, or documents whose structures have not been marked up in a standardized way. Although it is too soon to be certain, the result may be a revolution in the technology of information retrieval. In the past, this course has generally applied ideas from user interface design to more traditional information retrieval technologies. Here, for example, is the IS 277 syllabus from Winter 2002 (in Word format):

Unfortunately, serious user interfaces for semantic web technologies hardly exist. Indeed, it is not even clear what semantic web technologies would do. Nor has much research been done on the uses in practice of large collections of structured documents that use the four kinds of machine-readable ontologies. This is truly an opportunity to work on the ground floor of an important new field. We will analyze how these new technologies apply to the reinvention of information retrieval in particular domains. And, both in class and in the course assignment, we will attempt to anticipate what kinds of interfaces will best integrate the new technologies into the work practices of those domains.

Although the semantic web is highly technical in nature, this course does not presuppose any technical background beyond that of the program in general and the course prerequisites. In general, the course will emphasize ontology markup more than the programs that use it, ontologies for documents more than for services, reading the marked-up documents more than writing them, and real examples of semantic web documents more than synthetic examples. Students will be required, each week, to discover and bring in a particular real-life example of a web document that uses the markup technology of the week, and much time will be taken in class reading, line by line, the particular marked-up documents that students have brought in.

Unfortunately, these markup languages were never meant for human beings to read, and even their inventors regard them as obnoxiously obscure. However, the user-friendly software tools that are supposed to stand between human beings and marked-up documents largely do not exist, and will not for the time being substitute for an ability to read the markup. Nor do there exist useful textbooks or manuals of semantic web technologies that are written for anyone except computer scientists. Accordingly, large amounts of computer science will be explained in plain English as we read the markup.

The most familiar markup language is HTML, a simple language that provides features for common document formatting conventions and for hyperlinks to other documents on the web. Although it is not a prerequisite for the course, students who take a few days to teach themselves HTML before the course begins will be happier than ones who do not. Several introductions to HTML are freely available on the Web, for example:

Burke's How to Write HTML

Introduction to HTML

HTML Code Tutorial

Students will also be happier if they have used the "Source" menu entry in Internet Explorer (or "Page Source" in Netscape) to retrieve and read the marked-up document source for several simple HTML pages. HTML, however, is being replaced by a very similar XML-based markup language called XHTML. Students in this course will learn to write simple XHTML web pages, and will use XHTML format to write a heavily-hyperlinked online paper that describes how the semantic web is being used, and can be used, in a particular industry (e.g., finance or media), profession (e.g., architecture or engineering), or academic field (e.g., classics or geography). The assignment for the paper is here:

The assignment for bringing in some marked-up pages each week is here:

The online paper will be 75% of the grade and the weekly collection of real-life examples of marked-up web documents will be 25%.

In general, this will be a paperless course. Students will "hand in" all of the weekly assignments, the paper proposal, and the paper itself by linking to them from a Web page that they maintain themselves. Here is an example of what such a Web page might be like:

Once you create your page, please send Phil the URL for it. Here is the directory of IS 277 students' Web pages:

All of the course readings will be on the web. Students who wish to purchase a (relatively) introductory book on the semantic web might use an online bookseller to buy a copy of Grigoris Antoniou and Frank van Harmelen, A Semantic Web Primer, MIT Press, 2004. This book is also online:

Each class session will include a document reading session and a lecture. Each lecture will introduce the week's material using theory and simple examples, and the corresponding readings should be done after the lecture. The readings will typically be too technical, but students should read them as best they can. Students should use the readings in an attempt to read the real-life marked-up documents that they collect from the web, and should come to class prepared to explain their documents line by line, again as best they can. We will have an Internet connection and projector in class to read online materials.

Week 1. Ontology standards

slides for this week's lecture (in PowerPoint)

Spinning the Semantic Web (introduction)

Business to Consumer Markets on the Semantic Web

Working towards MetaUtopia: A Survey of Current Metadata Research

Using Ontologies: Enabling Knowledge Sharing and Reuse on the Semantic Web

Semantic Web Portals
(read through page 17)

Sorting Things Out: Classification and Its Consequences
(read the introduction)

The Cascade of Interactions in the Digital Library Interface

Recommended reading:

Anatomy of the Grid

Physiology of the Grid

Information Technology and the Transformation of Research

A Global Grid-Enabled Collaboratory for Scientific Research

Social Theoretical Issues in the Design of Collaboratories

Towards Institutional Infrastructures for E-Science

A Practical Guide to Federal Enterprise Architecture

Week 2. XML

A Gentle Introduction to XML
(read through section 2.3)

a TEI manual including examples of XML markup

introduction to XML syntax

some examples of XML pages
(use "Page Source")

an example of an XML application

Understanding ebXML

ebXML: A Critical Analysis


the Protege ontology editor

Piggy Bank semantic web extension for Firefox

Also, read several pages on these sites to get the general idea:

World Wide Web Consortium

Organization for the Advancement of Structured Information Standards


An act to add Section 11541.1 to the Government Code, relating to information technology

Asynchronous JavaScripting and XML

The Cover Pages


European survey of semantic Web applications

"Thinking XML" column at the IBM developers' site

IBM XML Technical Library

Week 3. XML document types

A Gentle Introduction to XML
(read section 2.4 onward)

some examples of XML pages with document type definitions

some more DTD's

an example of a large-scale XML document type

Comparative Analysis of Standardization of Vertical Industry Languages
(scroll down to page 210)

Standards Fragmentation in Electronic Markets

if you want to learn XML Schema, an alternative to DTD's:

XML Schema examples and tutorial

Open Archives Initiative

Recommended reading:

Standardization of XML-Based e-Business Frameworks

A Web-Based Negotiation System

A System for the Mediated Sharing of Sensitive Data

Porting a Rich-Media Collection to a Mobile Platform

On Engineering Design Generation with XML-Based Knowledge-Enhanced Grammars

XML-Based Modeling of Corporate Memory

Third Workshop on Legislative XML

Week 4. XHTML

XHTML 1.0: The Extensible HyperText Markup Language

XHTML W3C Recommendation Summary

Index of HTML Elements

an example of an XHTML page
(use "Page Source")

Week 5. RDF

An Introduction to the Resource Description Framework

RDF Primer
(skip section 5 on RDF Schema)

Collaborative Mapping with RDF

conference proceedings with extensive RDF markup

an example of an RDF database application
(click on "data" for the RDF files)

An RDF Model for Multi-Level Hypertext in Digital Libraries

Barriers to Real World Adoption of Semantic Web Technologies

some examples of RDF files

Recommended reading:

Introducing SPARQL (RDF database language)

Week 6. RDF Schema

RDF Primer
(read section 5 on RDF Schema)

RDF Vocabulary Description Language 1.0: RDF Schema

RDF Schema Directory

an RDF Schema markup language and some instances of it

more examples of RDF Schema pages

Recommended reading:

papers from Semantics 2006

Week 7. RSS 1.0

An Introduction to RSS for Educational Designers

RDF Site Summary (RSS) 1.0

RDF Schema for RSS 1.0
(use "Page Source" and scroll down)

Semantic Blogging

an example of an RSS interface

some examples of RSS 1.0 blog markup
(use "Page Source")

Recommended reading:

Why Choose RSS 1.0?

"a universal publishing standard for personal content and weblogs"

Week 8. Learning Object Metadata

Instructional Planning with Learning Objects
(scroll down to page 52)

Semantic Web Meta-data for e-Learning

Interoperability between Library Information Services and Learning Environments

How RDF Will Change Learning Technology Standards

The Next Wave: CETIS Interviews Mikael Nilsson about the Edutella Project

EDUTELLA: A P2P Networking Infrastructure Based on RDF

Learning Object Metadata

XML Knowledge Management Flourishes in Learning Technology Initiatives

DTD for learning object metadata

an example of learning object metadata
(scroll down below the tables)

more examples
(use "Page Source")

another example
(scroll down)

IEEE Learning Object Metadata RDF Binding

some of the RDF files

an example

IMS Resource Description Framework RDF Bindings

Recommended reading:

Business Process Managment Technology in e-Learning Systems

For the really dedicated:

IMS Global Learning Consortium

SCORM (Sharable Content Object Reference Model)

Week 9. OWL

Web Ontology Language Guide

Web Ontology Language: OWL

OWL Use Cases and Requirements

Semantic Web in a Pervasive Context-Aware Architecture

some examples of OWL

Recommended reading:

eClassOWL: A Fully-Fledged Products and Services Ontology in OWL

Standard Ontology for Ubiquitous and Pervasive Applications

Semantic Web Technologies for Context-Aware Museum Tour Guide Applications

Semantic Web for Research Communities

syllabus for a course about OWL (with numerous readings)

Week 10. OWL-S

Semantic Web Services: A Communication Infrastructure for eWork and eCommerce

The Semantic Grid: A Future e-Science Infrastructure

OWL-S: Semantic Markup for Web Services

some examples of OWL-S services

Ontology-Enabled Pervasive Computing Applications

Recommended reading:

"web services" versions of established distributed computing ideas

Service-Oriented Computing - ICSOC 2005

Customized Delivery of E-Government Web Services

Planning for Semantic Web Services

Interleaving Semantic Web Reasoning and Service Discovery

A Framework for Dynamic Semantic Web Services Management

A System for Dynamically Composing and Intelligently Executing Web Services

Pitfalls of OWL-S

Conflicts in the Internet Standards Process