What is Semantic web?

The Semantic web is the vision of the web’s inventor Tim Berners-Lee. The motivation behind the Semantic web is that while online information is accessible, its meaning remains incomprehensible to search engines. Computers can easily locate search words but they do not understand the context within which they appear. For example, retrieval of the word “turkey” will locate documents wherein the word appears but with different meanings; in this case “Turkey” as poultry and the country “Turkey”. Most search engines will not directly answer the question: “How many world leaders are under the age of sixty?”, even though the information is available on the web. The vision of the Semantic web is that the web will transform from a collection of documents comprehensible only to humans to a database which computers will also be able to understand.

Ontologies form the base of the Semantic web’s realization. Ontology is a formal vocabulary, a rich semantic model of shared knowledge, which comprises a set of concepts, their definitions and semantic inter-relationships (Uschold&Gruninger, 1996). Using the ontology, software applications and automatic agents could communicate with each other, to share and exchange information and perform complex tasks together without human intervention. The W3C organization developed standards for the formal definition of ontologies such as the RDF/RDFS and OWL XML-based languages. The building blocks of these languages are statements or facts on the domain of knowledge. Every statement is a triple of the form: subject – predicate – object, which expresses the semantic relationship (predicate) between two concepts (subject and object). Both concepts and relationships among them are defined and uniquely identified by some URI (namespaces). The concepts can be abstract classes or specific objects and then logical inference rules and restraints can be applied to them to induce new relationships that are not explicitly encoded. Furthermore, ontological concepts and relationships (called properties) can be added as components of meaning to web documents thus making them comprehensible both to computer programs and human users. For example, it is possible to identify the author of the web document or whether the document refers to a geographic location. Ontologies are usually built by experts in the specific fields, who might make use of corpus-based, automated-statistical aids which can propose words similar in meaning (Geffet and Dagan, 2009). Afterwards, these words are examined by experts who then arrange them in a precise fashion and build the ontology.

In recent years, ontologies in various fields of knowledge were developed and published on the web by groups such as universities (protégé ontology library http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library), W3C social initiatives (FOAF – http://xmlns.com/foaf/0.1/, SIOC – http://www.w3.org/Submission/sioc-spec/), public projects (DBPedia – ontology derived from the Wikipedia data, http://dbpedia.org/About) and government ministries (SemanticGov, data.gov/semantic). The concepts and relationships of these ontologies are interlinked and thus constitute the linked data (http://www.w3.org/standards/semanticweb/data). The linked data of RDF or OWL triples (statements) across ontologies can be effectively retrieved by semantic search engines (e.g. Virtuoso Universal Server – http://virtuoso.openlinksw.com/, Sesame – http://www.openrdf.org/documentation.jsp) by the means of SPARQL queries (http://www.w3.org/TR/rdf-sparql-query/). Hence, the Semantic web is the next step in the advancement of the organization, management and retrieval of the enormous amount of information on the internet.

Humanities and the Semantic web

Nowadays, libraries are one of the main institutions which produce digital information. This includes bibliographic records, authority files, and concept schemes. This information is currently stored in databases that have, for the most part, a web interface. However, these databases are not deeply integrated with other web sources. In the current situation, library standards such as MARC or the information retrieval protocol Z39.50 are planned for use by librarians. However, the librarian community and the Semantic web community use different terminology for the same information concepts. To bridge this terminology gap, several libraries have recently taken the initiative to convert their catalogs to RDF-based triples and to linked data. For example, the British National Bibliography has been published as linked data by the British Library (http://www.bl.uk/bibliographic/datafree.html). The Library of Congress (http://www.loc.gov/bibframe/) and the Library of Stanford University has also announced that linked data has been included in their roadmap (http://dataliberate.com//wp-content/uploads/2012/01/LDWTechDraft_ver1.0final_111230.pdf ). Additional linked data initiatives by libraries include the Spanish project (http://datos.bne.es) and that of Europeana – the European Digital Library (http://www.europeana.eu/, http://pro.europeana.eu/edm-documentation). Also, many domain-specific vocabularies and authority files have been published by libraries and other organizations, such as Geonames(http://www.geonames.org/ontology), GND (http://en.wikipedia.org/wiki/Integrated_Authority_File), VIAF (http://viaf.org/), Library of Congress subject headings (http://id.loc.gov/), AAT (Getty thesaurus – http://www.getty.edu/research/tools/vocabularies/aat/).

In the field of cultural heritage two fundamental ontologies were recently developed by two large research groups: CIDOC-CRM, a result of a 10-year project (http://www.cidoc-crm.org/)and Europeana Data Model.

W3Schools, Introduction to OWL

W3Schools, Introduction to RDF

W3Schools, RDF Schema (RDFS)

Gruber, T. R. (1993). A translation approach to portable ontology specifications.
Knowledge Acquisition, 5(2), 199-220.

Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, methods and
applications. Knowledge engineering review11(2), 93-136.‏

Geffet, M., and Dagan, I. (2009). Bootstrapping distributional feature vector quality.
Computational Linguistics 35: 435–61. doi:10.1162/ coli.08-032-R1-06-96