Information Studies 277 -- Information Retrieval Systems: User-Centered Designs
Phil Agre
Office: 229 GSE&IS Building
Phone: (310) 825-7154
Email: pagre@ucla.edu
Home: http://polaris.gseis.ucla.edu/pagre/
Fall 2007
Wednesdays from 9am to 12:30pm, GSE&IS room 245
DRAFT
This is a course on the semantic web, an important new document-centered computing technology in which web pages and other online resources are provided with metadata that can be automatically processed by computers.
The course prerequisites are IS 245 (Information Access) and IS 260 (Information Structures). The course complements several other courses in the program, including IS 240 (Management of Digital Records), IS 270 (Introduction to Information Technology), IS 272 (Human/Computer Interaction), IS 274 (Database Management Systems), IS 276 (Information Retrieval Systems: Structures and Algorithms), and IS 464 (Metadata).
The main idea of the semantic web is machine-readable ontology standards. An "ontology" is a theory of the categories of things that comprise a given domain. Familiar examples of ontologies include taxonomies and controlled vocabularies. Every computer system uses an ontology. Computer systems that are built independently of one another, however, often cannot interoperate because they use different ontologies. Now that computer systems are heavily networked, numerous user groups have begun standardizing their ontologies. The semantic web consists of mechanisms for "marking up" ontologies and then processing them.
There are roughly four kinds of ontologies: document ontologies (e.g., the chapters of a book or the footnotes of a paper), metadata ontologies (e.g., the format of a file or the copyright status of a document), domain ontologies (e.g., the components of an automobile or the entries of a schedule), and service ontologies (e.g., the inputs of a software module, the steps of a transaction, or the formats of messages that are passed back and forth between a client and server). Because its topic is information retrieval and not web services generally, this course is mainly about document and metadata ontologies. In practice, however, document and metadata ontologies often include elements of domain ontologies, and many services use documents and metadata. For completeness, therefore, weeks 9 and 10 respectively will be on domain and service ontologies.
The semantic web includes a layered series of markup languages starting with XML (itself derived from SGML). The most distinctive aspect of XML is that user groups can use it to define specialized sublanguages to mark up the ontologies that are meaningful for their own work. "User-centered design" in a semantic web context means precisely the codification of a user group's ontologies. This is important because document collections that have been marked up within a standard XML-based language can be stored and retrieved in much more sophisticated ways than unstructured plain text documents, or documents whose structures have not been marked up in a standardized way. Although it is too soon to be certain, the result may be a revolution in the technology of information retrieval. In the past, this course has generally applied ideas from user interface design to more traditional information retrieval technologies. Here, for example, is the IS 277 syllabus from Winter 2002 (in Word format):
http://polaris.gseis.ucla.edu/pagre/is277-winter-2002.doc
Unfortunately, serious user interfaces for semantic web technologies hardly exist. Indeed, it is not even clear what semantic web technologies would do. Nor has much research been done on the uses in practice of large collections of structured documents that use the four kinds of machine-readable ontologies. This is truly an opportunity to work on the ground floor of an important new field. We will analyze how these new technologies apply to the reinvention of information retrieval in particular domains. And, both in class and in the course assignment, we will attempt to anticipate what kinds of interfaces will best integrate the new technologies into the work practices of those domains.
Although the semantic web is highly technical in nature, this course does not presuppose any technical background beyond that of the program in general and the course prerequisites. In general, the course will emphasize ontology markup more than the programs that use it, ontologies for documents more than for services, reading the marked-up documents more than writing them, and real examples of semantic web documents more than synthetic examples. Students will be required, each week, to discover and bring in a particular real-life example of a web document that uses the markup technology of the week, and much time will be taken in class reading, line by line, the particular marked-up documents that students have brought in.
Unfortunately, these markup languages were never meant for human beings to read, and even their inventors regard them as obnoxiously obscure. However, the user-friendly software tools that are supposed to stand between human beings and marked-up documents largely do not exist, and will not for the time being substitute for an ability to read the markup. Nor do there exist useful textbooks or manuals of semantic web technologies that are written for anyone except computer scientists. Accordingly, large amounts of computer science will be explained in plain English as we read the markup.
The most familiar markup language is HTML, a simple language that provides features for common document formatting conventions and for hyperlinks to other documents on the web. Although it is not a prerequisite for the course, students who take a few days to teach themselves HTML before the course begins will be happier than ones who do not. Several introductions to HTML are freely available on the Web, for example:
Burke's How to Write HTML
http://www.speech.cs.cmu.edu/~sburke/html/
Introduction to HTML
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/htmlindex.html
HTML Code Tutorial
http://www.htmlcodetutorial.com/
Students will also be happier if they have used the "Source" menu entry in Internet Explorer (or "Page Source" in Netscape) to retrieve and read the marked-up document source for several simple HTML pages. HTML, however, is being replaced by a very similar XML-based markup language called XHTML. Students in this course will learn to write simple XHTML web pages, and will use XHTML format to write a heavily-hyperlinked online paper that describes how the semantic web is being used, and can be used, in a particular industry (e.g., finance or media), profession (e.g., architecture or engineering), or academic field (e.g., classics or geography). The assignment for the paper is here:
http://polaris.gseis.ucla.edu/pagre/is277-assignment.html
The assignment for bringing in some marked-up pages each week is here:
http://polaris.gseis.ucla.edu/pagre/is277-markup.html
The online paper will be 75% of the grade and the weekly collection of real-life examples of marked-up web documents will be 25%.
In general, this will be a paperless course. Students will "hand in" all of the weekly assignments, the paper proposal, and the paper itself by linking to them from a Web page that they maintain themselves. Here is an example of what such a Web page might be like:
http://polaris.gseis.ucla.edu/pagre/is277-example.html
Once you create your page, please send Phil the URL for it. Here is the directory of IS 277 students' Web pages:
http://polaris.gseis.ucla.edu/pagre/is277-pages.html
All of the course readings will be on the web. Students who wish to purchase a (relatively) introductory book on the semantic web might use an online bookseller to buy a copy of Grigoris Antoniou and Frank van Harmelen, A Semantic Web Primer, MIT Press, 2004. This book is also online:
http://thoth.ilit.umbc.edu/CMSC-771/semantic%20web%20primer.pdf
Each class session will include a document reading session and a lecture. Each lecture will introduce the week's material using theory and simple examples, and the corresponding readings should be done after the lecture. The readings will typically be too technical, but students should read them as best they can. Students should use the readings in an attempt to read the real-life marked-up documents that they collect from the web, and should come to class prepared to explain their documents line by line, again as best they can. We will have an Internet connection and projector in class to read online materials.
Week 1. Ontology standards
slides for this week's lecture (in PowerPoint)
http://polaris.gseis.ucla.edu/pagre/ontologies.ppt
Spinning the Semantic Web (introduction)
http://w5.cs.uni-sb.de/ss03/SemanticWebHTML/Vorlesung%20SemanticWebSS03/Introduction.pdf
Business to Consumer Markets on the Semantic Web
http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/otm2003_Semmarkets.pdf
Working towards MetaUtopia: A Survey of Current Metadata Research
http://archive.dstc.edu.au/RDU/staff/jane-hunter/LibTrends_paper.pdf
Using Ontologies: Enabling Knowledge Sharing and Reuse on the Semantic Web
http://www.deri.org/fileadmin/documents/DERI-TR-2003-10-29.pdf
Semantic Web Portals
(read through page 17)
http://sw-portal.deri.org/papers/publications/SemanticWebPortalSurvey.pdf
Sorting Things Out: Classification and Its Consequences
(read the introduction)
http://epl.scu.edu:16080/~gbowker/classification/
The Cascade of Interactions in the Digital Library Interface
http://www.gseis.ucla.edu/faculty/bates/articles/cascade.html
Recommended reading:
Anatomy of the Grid
http://hpc.sagepub.com/cgi/content/short/15/3/200
Physiology of the Grid
http://www.globus.org/alliance/publications/papers/ogsa.pdf
Information Technology and the Transformation of Research
http://www7.nationalacademies.org/itru/Transforming%20Research.pdf
A Global Grid-Enabled Collaboratory for Scientific Research
http://pcbunn.cithep.caltech.edu/GECSR_Final.pdf
Social Theoretical Issues in the Design of Collaboratories
http://epl.scu.edu:16080/~gbowker/collab.pdf
Towards Institutional Infrastructures for E-Science
http://www.oii.ox.ac.uk/resources/publications/OIIRR2_200309.pdf
A Practical Guide to Federal Enterprise Architecture
http://www.cio.gov/archive/bpeaguide.pdf
Week 2. XML
A Gentle Introduction to XML
(read through section 2.3)
http://www.tei-c.org/P4X/SG.html
a TEI manual including examples of XML markup
http://etext.lib.virginia.edu/tei/uvatei.html
introduction to XML syntax
http://www.zvon.org/xxl/XMLTutorial/General/book.html
some examples of XML pages
(use "Page Source")
http://www.w3schools.com/xml/plant_catalog.xml
http://www.ibiblio.org/xml/examples/shakespeare/win_tale.xml
http://www.ibiblio.org/xml/examples/4-2.xml
http://www.scc.rutgers.edu/ceth/intromat/xml/samples3/poem/Converted_WordsworthPoem_xml.htm
http://www.brics.dk/~amoeller/XML/xml/recipes2.xml
http://www.ise.gmu.edu/faculty/ofut/classes/642/Examples/XML/stamps.xml
http://clerk.house.gov/evs/2006/roll004.xml
an example of an XML application
http://www.recordare.com/xml.html
Understanding ebXML
http://www-106.ibm.com/developerworks/xml/library/x-ebxml/
ebXML: A Critical Analysis
http://www.rawlinsecconsulting.com/ebXML/
Swoogle
http://swoogle.umbc.edu/
the Protege ontology editor
http://protege.stanford.edu/
Piggy Bank semantic web extension for Firefox
http://simile.mit.edu/piggy-bank/
Also, read several pages on these sites to get the general idea:
World Wide Web Consortium
http://www.w3.org/
Organization for the Advancement of Structured Information Standards
http://www.oasis-open.org/
OpenDocument
http://www.google.com/search?hl=en&q=%22Open+Document+Format%22&btnG=Google+Search
An act to add Section 11541.1 to the Government Code, relating to information technology
http://www.leginfo.ca.gov/pub/07-08/bill/asm/ab_1651-1700/ab_1668_bill_20070223_introduced.html
Asynchronous JavaScripting and XML
http://www.ajaxmatters.com/
The Cover Pages
http://xml.coverpages.org/
O'Reilly xml.com
http://www.xml.com/
xml.gov
http://www.xml.gov/index.asp
European survey of semantic Web applications
http://www.w3.org/2001/sw/Europe/reports/chosen_demos_rationale_report/hp-applications-survey.html
"Thinking XML" column at the IBM developers' site
http://www.ibm.com/developerworks/views/xml/libraryview.jsp?search_by=thinking+xml:
IBM XML Technical Library
http://www-128.ibm.com/developerworks/views/xml/library.jsp
Week 3. XML document types
A Gentle Introduction to XML
(read section 2.4 onward)
http://www.tei-c.org/P4X/SG.html
some examples of XML pages with document type definitions
http://www.w3schools.com/xml/node_in_dtd.xml
http://www.npac.syr.edu/projects/tutorials/XML/example_files/booksIntSub.xml
http://www.cs.rpi.edu/~puninj/XMLJ/classes/class3/slide11-0.html
some more DTD's
http://www.brics.dk/~amoeller/XML/schemas/dtd-example.html
http://www.fly.faa.gov/AirportStatus.dtd
http://support.sciencedirect.com/xml/sd_holdings_01.dtd
an example of a large-scale XML document type
http://www.oreilly.com/catalog/docbook/chapter/book/docbook.html
http://www.docbook.org/
Comparative Analysis of Standardization of Vertical Industry Languages
(scroll down to page 210)
http://www.si.umich.edu/misq-stds/proceedings/ICIS2003-misq-stds.pdf
Standards Fragmentation in Electronic Markets
http://wareham.eci.gsu.edu/Resume/Papers/WarehamRaiXML.pdf
if you want to learn XML Schema, an alternative to DTD's:
XML Schema examples and tutorial
http://www.xfront.com/
Open Archives Initiative
http://www.openarchives.org/
Recommended reading:
Standardization of XML-Based e-Business Frameworks
http://www.si.umich.edu/misq-stds/proceedings/137_135-146.pdf
A Web-Based Negotiation System
http://www.ists.dartmouth.edu/library/odi1103.pdf
A System for the Mediated Sharing of Sensitive Data
http://www.ists.dartmouth.edu/library/sce0503.pdf
Porting a Rich-Media Collection to a Mobile Platform
http://www.mlearn.org.za/CD/papers/arias-%20reichenbach-pasch.pdf
On Engineering Design Generation with XML-Based Knowledge-Enhanced Grammars
http://www.ifi.unizh.ch/~noser/BIBLIO/rudolph00.pdf
XML-Based Modeling of Corporate Memory
http://www.icaen.uiowa.edu/~ankusiak/Journal-papers/Bill_IEEE.pdf
Third Workshop on Legislative XML
http://www.cnipa.gov.it/site/_files/Quaderno%2018.pdf
Week 4. XHTML
XHTML 1.0: The Extensible HyperText Markup Language
http://www.w3.org/TR/xhtml1/
XHTML W3C Recommendation Summary
http://train.msu.edu/classinfo/downloads/xhtml.pdf
Index of HTML Elements
http://www.w3.org/TR/html401/index/elements.html
an example of an XHTML page
(use "Page Source")
http://www.w3.org/
Week 5. RDF
An Introduction to the Resource Description Framework
http://www.dlib.org/dlib/may98/miller/05miller.html
RDF Primer
(skip section 5 on RDF Schema)
http://www.w3.org/TR/rdf-primer/
Collaborative Mapping with RDF
http://www.idealliance.org/papers/dx_xmle03/papers/03-03-03/03-03-03.pdf
conference proceedings with extensive RDF markup
http://dc2003.ischool.washington.edu/
an example of an RDF database application
(click on "data" for the RDF files)
http://chefmoz.org/
An RDF Model for Multi-Level Hypertext in Digital Libraries
http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fischer_Fuhr:02.pdf
Barriers to Real World Adoption of Semantic Web Technologies
http://www.cems.uwe.ac.uk/~mhbutler/papers/barriersToRealWorldAdoptRDF.pdf
some examples of RDF files
http://www.zvon.org/xxl/RDFTutorial/Examples/example1.html
http://www.ontoknowledge.org/oil/case-studies/KA-facts.rdf
http://www.ukoln.ac.uk/metadata/resources/rdf/examples/2/
Recommended reading:
Introducing SPARQL (RDF database language)
http://www.xml.com/lpt/a/2005/11/16/introducing-sparql-querying-semantic-web-tutorial.html
Week 6. RDF Schema
RDF Primer
(read section 5 on RDF Schema)
http://www.w3.org/TR/rdf-primer/
RDF Vocabulary Description Language 1.0: RDF Schema
http://www.w3.org/TR/rdf-schema/
RDF Schema Directory
http://139.91.183.30:9090/RDF/Examples.html
an RDF Schema markup language and some instances of it
http://139.91.183.30:9090/RDF/VRP/Examples/gml.rdfs
http://139.91.183.30:9090/RDF/VRP/Examples/example_profile3.rdf
more examples of RDF Schema pages
http://www.csd.abdn.ac.uk/~yzhang/test.rdfs
http://swws.semanticweb.org/data/swws_web_site_kb.rdfs
http://www.csc.fi/kielipankki/puhe/schemas/official/recording.rdfs
http://www.ilrt.bris.ac.uk/discovery/2001/09/rdf-schema-tests/rdf-schema.rdfs
http://lsdis.cs.uga.edu/~farshad/events/EventSchema.rdf
http://www.metadata.net/harmony/ABCSchemaV5Commented.rdf
papers from Semantics 2006
http://www.semantics2006.net/
Week 7. RSS 1.0
An Introduction to RSS for Educational Designers
http://www.downes.ca/files/RSS_Educ.htm
RDF Site Summary (RSS) 1.0
http://web.resource.org/rss/1.0/spec
RDF Schema for RSS 1.0
(use "Page Source" and scroll down)
http://web.resource.org/rss/1.0/
Semantic Blogging
http://dijest.com/aka/2003/08/23.html#a2584
an example of an RSS interface
http://bloglines.com/
some examples of RSS 1.0 blog markup
(use "Page Source")
http://www.pocketsoap.com/weblog/rss.xml
http://www.techbargains.com/rss.xml
http://boingboing.net/rss.xml
http://www.ilrt.bris.ac.uk/discovery/rdf/resources/rss.rdf
Recommended reading:
Why Choose RSS 1.0?
http://www.xml.com/lpt/a/2003/07/23/rssone.html
"a universal publishing standard for personal content and weblogs"
http://www.atomenabled.org/
Week 8. Learning Object Metadata
Instructional Planning with Learning Objects
(scroll down to page 52)
http://www.uni-koblenz.de/fb4/publikationen/gelbereihe/RR-16-2003.pdf
Semantic Web Meta-data for e-Learning
http://kmr.nada.kth.se/papers/SemanticWeb/p744-nilsson.pdf
Interoperability between Library Information Services and Learning Environments
http://www.imsproject.org/digitalrepositories/CNIandIMS_2004.pdf
How RDF Will Change Learning Technology Standards
http://www.cetis.ac.uk/content/20010927172953/viewArticle
The Next Wave: CETIS Interviews Mikael Nilsson about the Edutella Project
http://www.cetis.ac.uk/content/20010927163232
EDUTELLA: A P2P Networking Infrastructure Based on RDF
http://edutella.jxta.org/reports/edutella-whitepaper.pdf
Learning Object Metadata
http://ltsc.ieee.org/wg12/
XML Knowledge Management Flourishes in Learning Technology Initiatives
http://www-106.ibm.com/developerworks/xml/library/x-think21.html
DTD for learning object metadata
http://www.ema.fr/~mcrampes/Cours_%20semantic_web/TPXML02/LOM%20DTD%20imsmd_rootv1p2.dtd
an example of learning object metadata
(scroll down below the tables)
http://www.rdn.ac.uk/publications/rdn-ltsn/ap/
more examples
(use "Page Source")
http://www.imsglobal.org/metadata/mdv1p2p2/samples/merlot/MERLOTexample1_schema.xml
http://www.imsglobal.org/metadata/mdv1p2p2/samples/ims/imsmdexample_schema.xml
http://www.imsglobal.org/metadata/mdv1p3pd/xslt/samples-LOM/test_schema_LOM.xml
another example
(scroll down)
http://math.unipa.it/~grim/SiDonley.PDF
IEEE Learning Object Metadata RDF Binding
http://kmr.nada.kth.se/el/ims/md-lomrdf.html
some of the RDF files
http://kmr.nada.kth.se/el/ims/schemas/lom-general
http://kmr.nada.kth.se/el/ims/schemas/lom-educational
http://kmr.nada.kth.se/el/ims/schemas/lom-lifecycle
http://kmr.nada.kth.se/el/ims/schemas/lom-rights
http://kmr.nada.kth.se/el/ims/schemas/lom-metametadata
http://kmr.nada.kth.se/el/ims/schemas/lom-classification
an example
http://kmr.nada.kth.se/el/ims/examples/lom-rdf1.rdf
IMS Resource Description Framework RDF Bindings
http://www.imsproject.org/rdf/
Recommended reading:
Business Process Managment Technology in e-Learning Systems
http://coronet.iicm.edu/denis/pubs/elearn2005a.pdf
For the really dedicated:
IMS Global Learning Consortium
http://www.imsglobal.org/
SCORM (Sharable Content Object Reference Model)
http://www.adlnet.gov/scorm/index.cfm
Week 9. OWL
Web Ontology Language Guide
http://www.w3.org/TR/owl-guide/
Web Ontology Language: OWL
http://www.cs.vu.nl/~frankh/postscript/OntoHandbook03OWL.pdf
OWL Use Cases and Requirements
http://www.w3.org/TR/2004/REC-webont-req-20040210/
Semantic Web in a Pervasive Context-Aware Architecture
http://w5.cs.uni-sb.de/~krueger/aims2003/camera-ready/chen-8.pdf
some examples of OWL
http://www.cs.vu.nl/~frankh/spool/wildlife.owl
http://www.w3.org/TR/2002/WD-owl-guide-20021104/food.owl
http://www.aiai.ed.ac.uk/resources/go/obo.owl
http://osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl
Recommended reading:
eClassOWL: A Fully-Fledged Products and Services Ontology in OWL
http://www.heppnetz.de/files/eclassOWL-finalPoster-shortA4.pdf
Standard Ontology for Ubiquitous and Pervasive Applications
http://ebiquity.umbc.edu/_file_directory_/papers/105.pdf
Semantic Web Technologies for Context-Aware Museum Tour Guide Applications
http://www.cs.cmu.edu/~sadeh/Publications/MCommerce/WAMIS05%20Submission_Final.pdf
Semantic Web for Research Communities
http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/2005_swrc_baosw.pdf
syllabus for a course about OWL (with numerous readings)
http://www.cse.lehigh.edu/~heflin/courses/sw-2006/
Week 10. OWL-S
Semantic Web Services: A Communication Infrastructure for eWork and eCommerce
http://www.springerlink.com/index/CJHJQLML7JKQ8RVE.pdf
The Semantic Grid: A Future e-Science Infrastructure
http://www.semanticgrid.org/documents/semgrid-journal/semgrid-journal.pdf
OWL-S: Semantic Markup for Web Services
http://www.daml.org/services/owl-s/1.0/owl-s.html
some examples of OWL-S services
http://www.mindswap.org/2004/owl-s/services.shtml
Ontology-Enabled Pervasive Computing Applications
http://www.flacp.fujitsulabs.com/~rmasuoka/papers/20030915-Task-Computing-IEEE-Intelligent-Systems-September-October-2003.pdf
Recommended reading:
"web services" versions of established distributed computing ideas
http://www-128.ibm.com/developerworks/webservices/library/ws-comproto/
Service-Oriented Computing - ICSOC 2005
http://www.springer.com/west/home?SGWID=4-102-22-107952204-0
Customized Delivery of E-Government Web Services
http://www-personal.engin.umd.umich.edu/~brahim/mypublications/medjahed-IS.pdf
Planning for Semantic Web Services
http://www.ai.sri.com/SWS2004/final-versions/SWS2004-Sirin-Final.pdf
Interleaving Semantic Web Reasoning and Service Discovery
http://www.cs.cmu.edu/~sadeh/Publications/More%20Complete%20List/techreport%20%20july%2027%202005.pdf
A Framework for Dynamic Semantic Web Services Management
http://eceb.gmu.edu/pubs/IJCIS_Howard_Kerschberg.pdf
A System for Dynamically Composing and Intelligently Executing Web Services
http://dblab.usc.edu/Users/shkim/papers/proteus.pdf
Pitfalls of OWL-S
http://www.informatik.uni-ulm.de/ki/Liebig/papers/icsoc04.html
Conflicts in the Internet Standards Process
http://www.stevens-tech.edu/jnickerson/SpiritOfTheWeb.pdf