Over the last 50 years, the role of an archivist has adapted massively to include the collecting and processing of digital records. This page talks about what’s involved in working with digital records and why we do it.
There are two type of digital record – Born Digital and Digitised. Born digital records are records which are created in a digital format- for example a word document, a spreadsheet, an audio file or a video file. These files are accessed via digital hardware such as laptops or desktop computers and unless they are created in a format that can be printed, they cannot be accessed outside of this environment.
Digitised records begin life in a physical format, like a book, a leaflet, VHS tape or cassette tape. A series of processes is applied to these items to create a digital version which can then be stored, preserved and accessed via digital hardware. We digitise these records for many reasons. One main reason is access. People all over the world can view a digital file online, whereas the same number will be unlikely to travel to an archive to view the original. The originals may be very fragile or damaged and digitising them will create a digital copy which can then be accessed to protect the original. In rare cases records are digitised for storage reasons where the physical items take up too much space but there is a desire to keep the information they contain.
Just like with physical collections, the most important thing when you start working on a collection is to know what you have. The UK National Archive have created DROID, a, free, easy to use tool which runs checks on a specified batch of files and produces a list of all the file formats contained in the file batch. If you can carry out this process, you have completed not only the first, but one of the most important steps in any digital preservation workflow. Droid and a user guide can be accessed and downloaded below.
DROID uses a tool called PRONOM, which is a file format registry to identify your files. PRONOM can also be accessed and searched if you come a file format you are haven’t seen before.
PRONOM is a file format registry or database in which we store information about different file formats, such as word documents, pdf or mp3. In PRONOM we try to describe some key information about each file format, the typical extensions used and, most importantly, we look inside the files and find patterns and information that can be used to identify file formats. In the same way that a signature can be used to identify an individual we call the information that identifies file formats signatures.
Why is this work important? For the same reason it is essential for a conservator to know if a physical record is vellum or paper, or if the materials used for writing are watercolour or iron gall ink. Digital preservation experts have to know what file formats are in their collection and what software can open them in order to effectively preserve them for future generations. PRONOM is used all over the world mostly in the fields of digital preservation, information management and digital forensics.
The information the PRONOM team and its contributors record is used by file format identification tools you may have used such as DROID, Freud or Siegfried. It can even come built into your preservation system.
There are now over 2000 file formats recorded in PRONOM but there is still a long way to go. We have a lot of help from contributors all around the world and have collaborated with around 100 institutions, ranging from archives and libraries, through to museums and educational institutions, and even private companies who are interested in the long-term preservation of digital records.
Learn more about PRONOM here.
If you would like to test your file format knowledge, our game, ‘File Format or Fake’ can be accessed below. .
For further help and support, the resources provided by the Digital Preservation Coalition, particularly the free to access Digital Preservation Handbook will provide a step by step guide for those new to working with digital records, or anyone keen to manage their own personal collections better.