LibGuides: Researcher Skills Toolkit: Store and archive

Manage data

Store and archive

Data archiving is the long-term storage and preservation of research data and information supporting the data that allows the data to be accessible and usable.

Once the research is completed, data should be preserved and archived. These steps ensure the data can be used into the future even when program or protocol versions change.

It is important to think about the storage requirements for the whole research project/group, over the life of the project, before requesting storage. Scope out what you will need beforehand.

Much research data today is created in digital format (‘born digital’). Born digital items are distinct from analogue items that are subsequently digitised, such as paper manuscripts or photographs, and are at risk of digital obsolescence due to:

the physical carrier of the data, e.g. USB drives, becoming obsolete or 'dying’
the hardware to access the data becoming obsolete, e.g. contemporary desktop computers are no longer produced with disk drives – what future technology will replace USB?
software needed to access the data, e.g. the data was saved in a file format which is closed or proprietary and becomes inaccessible.

Consider the following when selecting suitable storage options for your research data

University Digital Technology Solutions can provide advice on access to secure storage, including

Does the data contain any sensitive information?
How often is the data backed up? How many copies are retained, where are they retained and by whom?
Who else may need to access the data and are they able to do this?
Are there any legislative or ethical requirements that restrict where the data will be stored?
Does the storage provider claim ownership or usage rights over the data stored on their systems?
Do extra controls like access restrictions or encryption need to be applied to the data?

University- supported storage platforms, such as OneDrive, Teams, SharePoint, Share Drive, and TRIM
Other storage options, such as local computer drives, portable drives, OwnCloud, and shared drives.
Non-standard storage options for data which is extensive, used for complex simulations, or which needs to be stored in a structured, searchable database format. Request advice and assistance via the Research Data Storage Request Form.

Consider the following:

Ensure that the storage conditions will not impact the research data’s durability
Is the storage area climate-stable, structurally sound, and physically appropriate? For example, is it free from mould and damp conditions, is there risk of fire or damage from insects or other pests?
Devise, document and communicate a system for securing and accessing the material by those with authorisation and include matters of security (e.g. keys, passwords); rules surrounding removal of materials; and check-out/check-in procedures
Ensure the Data Management Plan documents who is responsible for documenting, organising, labelling, storing, maintaining, and checking the non-digital data.

Backing up your research data will ensure that it can be retrieved in case disaster strikes due to hardware or software failure, human error, theft, fire or other damage, or degradation of storage media.

10 steps to a successful backup plan

1. Check the University's backup strategy

The University provides advice regarding backup strategies via the following ServiceUON posts:

2. Determine what you want to backup

The three common options for backups are:

Full backup of the entire system and files;
Differential backups, where everything is recorded that was changed since the last full backup. To restore your data and/or system, you will require the last full backup and the last differential backup;
Incremental backups, where only changes since the last backup are recorded. To restore your data and/or system, the last full backup and the entire series of incremental backups is required.

Differential and incremental backups are also called 'intelligent' backups. If only a small percentage of your data changes on a daily basis, it may be a waste of time and disk space to run a full backup every day.

3. Decide how many backups you will need and how frequently to backup

It is recommended that you make three backup copies. This will minimise the risk of data loss, even in the case that one of the backups is damaged or lost. However, if storage capacity is an issue and/or if sensitive data is involved, it may be necessary to work with fewer copies.

You should clearly state in your backup strategy how often backups will be made. The frequency of backups will depend on the frequency and amount of change to your data and documents.

4. Decide where backups will be stored

It is recommended that you store at least some of the backups in (physically) separate places. For example, backing up to two servers standing in the same room or building may cause you to lose both backups in case of a fire. Having an offsite copy of your backup mitigates this risk.

Backups can be made to networked drives, cloud storage, and to local or portable devices (see 'Storage'). What works best for your project will depend on the amount of data that needs to be backed up, the required frequency of backups, the level of automation, and the sensitivity of the data.

5. Determine how much storage capacity will be needed

Estimate which data and documentation you will collect and create in your project. Then determine the corresponding approximate amount of storage capacity needed for backups. Talk to University IT Services to help determine requirements

6. Determine whether there are tools you could use to automate backup

Automating backups can help to ensure that they are created at the correct time and that they are saved to the correct location, reducing the risk of human errors. Both Microsoft and Apple operating systems have software to support automatic backups. Cloud storage solutions may also have a backup functionality. However, you should check frequently that functional backups were indeed created.

OS X - Watch the video tutorial on creating backups for your Mac using Time Machine (UK Data Service, 2016).

Windows 10 - Windows 10 includes two different backup programs:

File History - The File History tool automatically saves multiple versions of a given file, so you can 'go back in time' and restore a file before it was changed or deleted. This is useful for files that change frequently.
Windows Backup and Restore - This tool creates a single backup of the latest version of your files on a schedule.

Note that you should still employ an off-site backup as well.

7. Determine how long backups will be kept and how they will be destroyed

It is recommended that you do not overwrite one backup with another. However, if you must back up large amounts of data frequently it may not be feasible to retain all backups for the entire duration of the project.

If sensitive data is involved, make sure that any deleted data are truly gone and cannot be recovered in any way.

8. Determine how personal data will be protected

Make sure that backups of data containing sensitive information are protected against unauthorised access in the same manner as the original files.

9. Devise a disaster recovery plan

A disaster recovery plan defines the steps to take if a data loss occurs, and helps you to restore data as quickly as possible. The plan should also assign responsibilities for data recovery tasks and list people and functions to contact when a data loss occurs.

To ensure that data recovery will run as smoothly as possible in the event of an actual data loss, make sure to test regularly whether restoring lost files from your backups is possible.

10. Assign responsibilities

Never assume that someone will take care of backups and data recovery. Assign responsibilities for making manual backups, for checking whether those automatic backups took place, for testing data recovery, and for restoring any lost data.

The above information is adapted from:

CESSDA Training Team (2022). CESSDA data management expert guide. Bergen, Norway: CESSDA ERIC. Retrieved from https://dmeg.cessda.eu/

Errors can happen when backups are written or copied. You should check the integrity of backed up files frequently. This can be done with ‘checksum tools’ such as MD5summer.

Checksums can be compared to digital fingerprints. Checksum tools create a fingerprint with the help of an algorithm that computes the fingerprint - a string of numbers - from the bit values (the ones and the zeros) of a file. Monitoring whether the fingerprint of a given file changes allows you to detect if a file was changed in any way intentionally or unintentionally.

Features of data archives and repositories can include:

A focus on preservation and ensuring permanency
The ability to curate the data and assume custodianship of the data
Unique and permanent location and assignation of unique identifiers
Data deposited in established repositories and archives rank higher in search engine page rankings, i.e. becomes more discoverable to the wider research community
Rights, licensing, and access management including managing access to restricted or sensitive data
Data is backed up and managed by the service, leaving you free to focus on current research projects
Impact metrics – often these services allow you to track the interest your data receives by viewing the number of downloads and/or views.

Check the Research Data and Primary Materials Management Procedure (Section 5 (26-32)) for advice regarding the archiving and retention of research data.

Who needs to access the data?

Will the data be open, or need restrictions to be placed on access?
Who requires access to the data, e.g., project team members, colleagues, collaborators?
Is the required access internal or external to the University?
Are there requirements from any funding agreements?
What type of access is required (read/write or just read-only)?
Are there any access limitations that need to be managed, e.g., legislative, agreements, location- or role-based?

What storage capacity do you need?

How much data do you currently have?
How much data do you expect to generate or collect over the life of the project? Check the University's Research Solutions pages

Is the data in an open file format?

Regarding the current format of the data, is conversion to another file format more appropriate for archival access?

How safe is your data?

Who has access to the data?
Does everyone who has access to the data need write access?
Are there certain directories of data that need to be locked down to limit users?

Does a retention period apply to the data?

How long will the data be kept? A retention period can be nominated as part of the Data Management Plan

Does the data include sensitive or confidential elements?

Do you need to consider de-identification of any sensitive data or elements that may identify research participants?
De-identification of sensitive data allows data to be used by others without the possibility of individuals being identified and is mostly undertaken to protect the privacy of individuals, organisations or businesses, or other information.
Check the Publishing sensitive data guide

Cleaning up - what can be thrown out?

Check before disposing of any files or items associated with the research as there may be a legal requirement to retain documentation, in hard or soft format. Check your contract and the University's Research Data and Primary Materials Management Procedure.