The amount of digital data being produced across various disciplines is increasing at an exponential rate, but this information may not be around for future generations because the data are often incompatible with rapidly changing technologies and become unreadable. To address this risk, ESA is assisting a European-Union backed project for the preservation of fragile digital information.

The large-scale project called CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) will build a pioneering framework to support the end-to-end preservation lifecycle for digital information based on existing and emerging standards.

Project Co-ordinator Dr David Giaretta explains: “It is widely recognised that the digital information on which we all rely is actually remarkably fragile. Society needs to ensure that digitally encoded information can still be understood and used in the future when the software, systems and everyday knowledge will have changed. Things we take for granted now would otherwise be completely unfamiliar, something to be guessed at, even if we preserve the bits and bytes.

“Moreover, in many currently planned and future experiments, several orders of magnitude more data will be generated than has been collected in the whole of human history.”

Of particular importance is the huge breadth of users and types of digital information against which CASPAR will be tested: science (using ESA satellite data and a variety of science data from the Central Laboratory of the Research Councils (CCLRC), cultural heritage (using data from UNESCO – the United Nations’ cultural agency) and performing arts (including data from the Institut National Audiovisuel, Groupe de Recherches Musicales (Ina-GRM) and the Institut de Recherche et de Coordination Acoustique-Musique (IRCAM) – French institutions that fostered the development of electronic music).

Satellite data

Protecting data acquired by satellites for future generations is of utmost importance because it allows for the continuity of datasets. For instance, scientists accessing today’s climate change data in 50 years will be able to better understand and detect trends in global warming and apply this knowledge to ongoing natural phenomena.

The volume of data generated in environmental science is projected to increase radically over the next few years. ESA satellites, such as Envisat, ERS-2 and Meteosat Second Generation, are currently generating around one Petabyte of data per day. With the upcoming launch of the new MetOp satellites, the daily data volume generated by ESA will increase at an even faster rate. ESA’s mandate is to maintain archives of data gathered from satellites for 10 years after the end of the mission. Currently ESA is using funds from various ongoing programmes to maintain these historical bit streams in accessible archives.

Sustainable preservation of this information in the long term will require the logical integration of many more pieces of data and objects, such as the conditions under which the instruments were operated, the system and software environment used to gather the signal and the algorithms used for manipulating the acquisition bit stream. All this information is required systematically for all instruments and missions, in a dedicated programmatic vision.

Within the CASPAR project, selected ESA satellite data streams will be the first objects to demonstrate how the proposed preservation platform architecture can be applied to handle complex digital objects. ESA will not only provide the necessary satellite data and associated information but also the operational experience and demonstration infrastructure.

The Global Ozone Monitoring Experiment (GOME), launched onboard ERS-2 in April 1995, is set to be the first candidate. Since 1996, ESA has been delivering GOME global observations of total ozone, nitrogen dioxide and related cloud information to users via CD-ROM and the Internet.

Cultural and artistic data

Data from UNESCO will be centred on World Heritage sites and will include: legal text, site description, historic documents, books, paper photos, slides, satellite images, maps, virtual tours and virtual reconstruction. For example, information will be provided on the two Buddhas carved into the cliffs of Bamiyan in Afghanistan around the third century A.D., which were destroyed in 2001.

Artistic data from Ina-GRM, which holds archives of French public radio and TV, and IRCAM will focus on electronic music, preserving components of scores, pieces of computer codes, instructions and documents indicating author’s motivations to preserve intelligibility or the minimal understanding necessary to be able to perform the work again.

Development and implementation

CASPAR’s work will include the development of key components and framework providing characterisation – including Representation Information and Preservation Description Information, virtual storage – using advanced storage technologies, and access services – including intuitive query and browsing mechanisms, and, throughout all these, exploiting the potential of semantic web.

Throughout the project, attention will be paid to related issues such as standardisation, authentication, accreditation and digital rights management, seen as critical for the operational implementation of digital preservation services. The achievements of the CASPAR project will be disseminated and promoted in worldwide communities interested in digital archiving and preservation.