Ease of Implementing the Semantic Web

by Zach Tomaszewski

for ICS 691, Fall 2000, taught by Dr. Luz Quiroga



Introduction

The Semantic Web is currently one of the latest Internet buzzwords. It concerns web pages classified by metadata--meaning and subject matter--rather than sought by keywords. It will be machine readable and so the process will not have to be learned by users.

Some day soon, it will be possible to say, "Computer, find the full moon cycle for the last 18 years and compare that with Norwegian lemming migrations. Is there any correlation?" And the computer will be able to determine whether a document concerns lunar cycles or lemming migrations. It will also be able to recognize that numbers within those documents, whether labeled as "date" or "day-month," are related and, using linked schemata and namespaces, be able to compared the two as related data.

But this is one small part. The metadata that will be the basis of the Semantic Web will allow:

Much of this is vague or conceptual or already being done in some other format than the developing RDF and XML formats commonly associated with the Semantic Web idea. This means it is hard to grasp the significance of it all. I am reminded of two stories. The first is perhaps an apocryphal story about an employee at a large electronics company such as Hewlett-Packard or IBM. The employee went to his supervisor with the idea of a personal computer and the supervisor asked, "What would people do with their own computers?" At the time, computers were used mainly for mathematical processing and scientific research. The best the employee could come up with at the time was that users could keep their recipes on a computer. The company correctly decided people were not going to pay thousands of dollars in order to store their recipes electronically, and so they did not start manufacturing PCs. Some one else did though. Now, with word-processors, spreadsheets, games, desktop publishing, and so much more, it's hard to think of not using a computer.

A second story is about Tim Berners-Lee, the inventor of HTML, HTTP, and the World Wide Web. In the face of many other existing systems of hypertext (most also based on SGML, as HTML is) and many other agendas, it was difficult to convince people of the future of the World Wide Web. The idea of two documents on machines a world apart, connected by a simple selectable link, didn't naturally lead to a conceptualization of the sprawling, all-encompansing Web of business and information that we know today.

The Semantic Web is very similar to the computer and the World Wide Web. It is currently in its conceptual infancy. With so many applications and possibilities and yet few concrete implementation examples, it is hard to appreciate, or even imagine, the countless possibilities. But, if nothing else, remember that the Semantic Web is based on concepts, meaning, and relationships between documents. That alone will provide a difference like that between browsing a controlled vocabulary verses searching in natural language. And that's just the beginning.

Semantic Web

The Semantic Web, like the World Wide Web, is primarily the idea of Tim Berners-Lee. He now heads the World Wide Web Consortium, and so there are many people working and commenting on the actual implementation. This is a good thing, since, in order to work in reality, the specifications will have to be able to include a broad base of goals and agendas. The Semantic Web is based on XML and RDF, though it could be implemented through other mediums.

Another related idea of Tim Berners-Lee is the Web of Trust, which involves being able to identify different people online and knowing whether they are honest, authoritative, or whatever. This is indeed becoming more important. The basics can be seen in the feedback structure used in eBay, Yahoo Auctions, and the many web sites that compile user comments about certain online companies. Gradually this will become more structured through the use of digital signatures and security certificates, aided by or encoded in metadata.

XML

Like the examples above about realizing the implications of new technologies, XML was, until very recently, hard to appreciate. Now, as it is being used in a great number of different applications, its benefits are becoming obvious. Like HTML, XML is based on markup tags. However, unlike HTML, the tags are independently definable for different applications. Data saved as XML is text based, so it can easily be transported to any system with no problems, unlike many traditional file types. As text, it is also human-readable, which can be helpful for debugging and spotting file errors.

Currently, databases, spreadsheets, word processors, and more could save files in an XML format. To publish these files to the web, it is easy to write a script or program to convert these files into HTML. For example, a program that records recipes may have <ingredient> and <mixing-step> tags. These could be converted into HTML <li> tags, which are "list items" in an ordered or unordered list. The downside of this conversion to HTML is that, though humans will be able to differentiate the two lists--ingredients and the steps in mixing them--by the contents, a computer could not as both are now marked with the same <li> tag.

This conversion to HTML will likely become unnecessarily soon as XSL (eXtensible Style Language) is more developed and supported. Then, the original <ingredient> and <mixing-step> tags can be maintained, differentiating them for a machine, while the XSL instructions will tell a web browser how to display the information--in this case, as a bulleted or number list of items. This is a large part of the Semantic Web. The pages will have links to XML schemata that define the different tags used. There will also be resources that allow conversion between different XML structures. For example, there may be another recipe program that uses <ingred> and <mix> tags. Through the Semantic Web, a computer will be able to determine that the information contained in these different tags and published by different program vendors is actually equivalent.

RDF

The other big language to be used in the Semantic Web is the Resource Description Framework. This will describe information about documents and the relationships between them. While XML is being used and so is easier to understand, RDF is still quite conceptual and under-implemented.

RDF will mainly contain metadata -- information about the creator or publisher of the document, the rating, the relationship to other pages in the site, etc. Like XML, it will be linked to schemata that explain the tags used. As only a framework, RDF does not itself specify the elements or properties to be used in describing documents. These are determined in the schemata, which may be written by a single person or standardized by a large organization. PICS and Dublin Core are two of the most well-known existing schemata. Again, it will be possible to translate between different schemata. For example, Library of Congress uses the term "author;" the British Library uses the term "creator." If these two different terms are used in metadata, a computer will still be able to equate them through the web of linked RDF schemata.

RDF is written in XML. The added layer is used, rather than simply stating metadata in XML alone, because the structure dictated by RDF means that complex relationships and statements about statements can be recorded.

Dublin Core

Dublin Core was developed in Dublin, Ohio (hence the name) in 1995 by OCLC, a big name in the library cataloging world and an organization closely involved with the development of RDF. Dublin Core is a set of 15 property elements used to describe metadata. Though intended primarily for cataloging digital resources, the properties apply to a wide variety of information sources. Qualifiers can be used to increase property specificity, but this degrades interoperability.

Metadata Examples

Below, I have shown three different ways of describing metadata in a webpage. The resource described is another page in this site, Bibliography Plan: User-Centered Design for Web Developers.

Free-form HTML <meta> Tags

HTML already includes the <meta> tag. It goes in the head of the document and the contents are not displayed to a user. There is no defined set of values to be used for the name attribute. In other words, the metadata elements are completely up to the web page creator. However, the three most commonly used properties are author, description, and keywords. The contents of description is often used in search engine results as a summary for the page. The contents of keywords is used by many search engines to determine relevancy to a user's query. The following is an example of this HTML-coded metadata.

<META name="author" content="Z. Tomaszewski">
<META name="generator" content="Arachnophilia 4.0">
<META name="description" content="An analysis of search terms, a documentation of search strategy, and a few sample annotated entries.">
<META name="keywords" content="user-centered design, web usability, web design, World Wide Web, web sites, information architecture, acessability, human-computer interaction, webliography, bibliography">

Dublin Core Stated in HTML

The following is metadata using the restricted list of 15 elements stated by Dublin Core. Each property (i.e., creator, language, etc.) has a defined meaning and something a restricted list of allowed values. Note the link to the defining schema at http://purl.org/DC/elements/1.0/. This Dublin Core information is still stated in regular HTML

<LINK rel="schema.DC" href="http://purl.org/DC/elements/1.0/">
<META name="DC.Title" content="Bilbliography Plan: User-Centered Design for Web Developers">
<META name="DC.Creator" content="Tomaszewski, Zach">
<META name="DC.Subject" content="user-centered design, web usability, web design, World Wide Web, web sites, information architecture, acessability">
<META name="DC.Subject" scheme="LCSH" content="Human-computer interaction, Web sites -- Design, Web sites -- Psychological aspects">
<META name="DC.Description" content="An analysis of search terms, a documentation of search strategy, and a few sample annotated entries.">
<META name="DC.Date" content="2000-12-03">
<META name="DC.Language" content="en">

Dublin Core Stated in RDF

The following is the same Dublin Core information as the previous example, only this time described in RDF. Again, note the link to the Dublin Core schema. Also, note the about attribute of Description, which identifies which document is being discussed. With RDF it is possible to describe or make comments about other separate resources. It is even possible to make statements about other people's statements, even if those statements do not refer to existing resources. This is the power of RDF: it is capable of complex statements and logical levels. Human meaning and thought is not a simple thing!

<rdf:RDF
  xmlns:rdf="http://w3.org/TR/1999/PR-rdf-syntax-19990105#"
  xmlns:dc="http://purl.org/metadata/dublin_core#">
    <rdf:Description about="http://www2.hawaii.edu/~ztomasze/biblio.html"
    dc:Title="Bilbliography Plan: User-Centered Design for Web Developers"
    dc:Creator="Tomaszewski, Zach"
    dc:Subject="user-centered design, web usability, web design, World Wide Web, web sites, information architecture, acessability"
    dc:Subject="Human-computer interaction, Web sites -- Design, Web sites -- Psychological aspects"
    dc:Description="An analysis of search terms, a documentation of search strategy, and a few sample annotated entries."
    dc.Date="2000-12-03"
    dc.Language="en" />
</rdf:RDF>

Ease of Implementation

It is easy to see from the above three examples how web metadata is becoming more structured and complex. For writing HTML, there are browsers, editors, validators, and specifications that can be used as aids. There is little of these same tools available for RDF as it is still developing. As an added hinderance, RDF and metadata in general do not display in normal web browsers, which make it hard to detect errors. RDF is very abstract, and so it may be difficult to write correctly for busy web developers. This will of course slow its implementation.

Yet, as with the proliferation of tools for creating HTML documents, I think RDF tools will be developed and implemented in HTML editors. XML is already gaining support, and as developers and programmers get used to the idea of schemata and the need for making statements (especially in a machine-readable way) about documents and resources grows, I think RDF will also gain in popularity.

Conclusion

The Semantic Web holds great promise for harnassing the information available on the World Wide Web. Though still in its infancy now, the related XML and RDF technologies are developing and show great promise. As the Semantic Web gains more support, more aids will be created and using these technologies will become easier. Concrete, existing examples will greatly aid people's understanding. In the same way that the World Wide Web developed, the Semantic Web will develop and it is likely to have the same deep impact on our lives.




Resources

Lassila, Ora. "Introduction to RDF Metadata." <http://www.w3.org/TR/NOTE-rdf-simple-intro> Accessed: 5 Nov 2000.

Berners-Lee, Tim. "Semantic Web roadmap." <http://www.w3.org/DesignIssues/Semantic.html> Accessed: 12 Dec 2000.

"Resource Description Framework (RDF) Model and Syntax Specification." <http://www.w3.org/TR/REC-rdf-syntax/> 22 Feb 1999

"RDF: Resource Description Framework" <http://www.w3.org/RDF/> Accessed: 5 Nov. 2000.

"Dublin Core Metadata Inititiative / Questions and Answers / FAQs." <http://purl.org/dc/education/index.htm> Accessed: 08 Dec 2000.

"Dublin Core Metadata Initiative / Documents / Recommendations / Dublin Core Element Set, Version 1.1." <http://purl.org/dc/documents/rec-dces-19990702.htm> Accessed: 08 Dec 2000.