ReLoad (Repository for Linked Open Archival Data) is a project of the Central State Archive, the Cultural Heritage Institute of Emilia Romagna, and Regesta.exe. The goal of this project is to experiment with Semantic Web technical standards and methods relating to linked open data in order to further the sharing of data among a broad range of archives.
Reload hopes to become the central point for storage and access to distributed archival resources, using LOD as its technology. This initial phase of the project will emphasize the development of a shared space for archival description metadata, and will not address the creation of a “portal” for the access to archival materials.
Instead, the project is designed to verify the possibility of creating a “web of archival data” by exploring in detail how using Semantic Web technologies to link to common resources, like places, persons and organizations, themes, etc., would facilitate the integration of diverse archival collections in a single web of data.
The Work Plan
The first step in the work plan was to define an ontology for the description of archival data (OAD, “Ontology of Archival Description”) using the Web Ontology Language (OWL). This ontology represents the classes and properties needed to expose the archival resources as linked data.
Based on analysis, the defined OAD is a synthesis of commonly used metadata elements employed in archival description. The first step was the definition of the “things” of archival description so that those could be defined as classes using the standard ontology language. That analysis was then extended to define the necessary descriptive properties that belong within those classes. As is required by the standards, each class and property was then assigned a Uniform Resource Identifier (URI).
To provide the knowledge organization aspect of the archival resources under investigation, some subject classification schemes and lists that are used in the databases selected for this test were analysed: that of the Office of Agriculture, which has been in use since 1960 in the province of Piacenza; the Astengo scheme from 1897 that is used to classify the papers of the city administration; and some schemes similar to this latter. These subject lists have now been coded in the standard Semantic Web language for thesauri and subject lists, the Simple Knowledge Organization System (SKOS). The SKOS-defined terms were then used in an automated extraction of key concepts from the available databases, and in this way it was possible to discover common topics and themes existing within the archives.
To define OAD ontology, we follow these steps based on Linked data principles: commons metadata, such as name, title, date, have been duplicated using RDF vocabularies widespread ( Dublin Core, SKOS, FOAF) to foster natural interoperability between similar resources. Furthermore to foster linking with external resources, implementing linked data cloud diagram with archival data, we added links to other international dataset (GeoNames, DBPedia, VIAF) by using owl:sameAs property.
The experimentation: data production and triplestor ingestion
To generate archival LOD, we used XSL stylesheet for each finding aid of the experimentation and we transform EAD file to RDF/XML file according to OAD data model.
Most of finding aid we transoform, are without index of places and names, so we used semi-automated process of entity recognition to parse the text and to geolocalize and to identify authority form of the name. This work is based on Stanbol OS project, that aims to recognize entity in text using vocabolaries and RDF sources.
For EAC-CPF records of persons, corporate bodies, and families related to archival materials, we use EAC-CPF ontology developed in 2010 from IBC Archivi and regesta.exe. After, with SILK, we worked on semantic alignment to DBPedia and GeoNames.