Data archiving is the long-term storage and preservation of research data and information supporting the data that allows the data to be accessible and usable. Once the research is completed, data should be preserved and archived. These steps ensure the data can be used into the future even when program or protocol versions change. It is important to think about the storage requirements for the whole research project/group, over the life of the project, before requesting storage. Scope out what you will need beforehand. |
Much research data today is created in digital format (‘born digital’). Born digital items are distinct from analogue items that are subsequently digitised, such as paper manuscripts or photographs, and are at risk of digital obsolescence due to:
Consider the following when selecting suitable storage options for your research data | University Digital Technology Solutions can provide advice on access to secure storage, including |
|
|
Consider the following:
Ensure that the storage conditions will not impact the research data’s durability
Is the storage area climate-stable, structurally sound, and physically appropriate? For example, is it free from mould and damp conditions, is there risk of fire or damage from insects or other pests?
Devise, document and communicate a system for securing and accessing the material by those with authorisation and include matters of security (e.g. keys, passwords); rules surrounding removal of materials; and check-out/check-in procedures
Ensure the Data Management Plan documents who is responsible for documenting, organising, labelling, storing, maintaining, and checking the non-digital data.
Backing up your research data will ensure that it can be retrieved in case disaster strikes due to hardware or software failure, human error, theft, fire or other damage, or degradation of storage media.
The University provides advice regarding backup strategies via the following ServiceUON posts:
The three common options for backups are:
Differential and incremental backups are also called 'intelligent' backups. If only a small percentage of your data changes on a daily basis, it may be a waste of time and disk space to run a full backup every day.
It is recommended that you make three backup copies. This will minimise the risk of data loss, even in the case that one of the backups is damaged or lost. However, if storage capacity is an issue and/or if sensitive data is involved, it may be necessary to work with fewer copies.
You should clearly state in your backup strategy how often backups will be made. The frequency of backups will depend on the frequency and amount of change to your data and documents.
It is recommended that you store at least some of the backups in (physically) separate places. For example, backing up to two servers standing in the same room or building may cause you to lose both backups in case of a fire. Having an offsite copy of your backup mitigates this risk.
Backups can be made to networked drives, cloud storage, and to local or portable devices (see 'Storage'). What works best for your project will depend on the amount of data that needs to be backed up, the required frequency of backups, the level of automation, and the sensitivity of the data.
Estimate which data and documentation you will collect and create in your project. Then determine the corresponding approximate amount of storage capacity needed for backups. Talk to University IT Services to help determine requirements
Automating backups can help to ensure that they are created at the correct time and that they are saved to the correct location, reducing the risk of human errors. Both Microsoft and Apple operating systems have software to support automatic backups. Cloud storage solutions may also have a backup functionality. However, you should check frequently that functional backups were indeed created.
OS X - Watch the video tutorial on creating backups for your Mac using Time Machine (UK Data Service, 2016).
Windows 10 - Windows 10 includes two different backup programs:
Note that you should still employ an off-site backup as well.
It is recommended that you do not overwrite one backup with another. However, if you must back up large amounts of data frequently it may not be feasible to retain all backups for the entire duration of the project.
If sensitive data is involved, make sure that any deleted data are truly gone and cannot be recovered in any way.
Make sure that backups of data containing sensitive information are protected against unauthorised access in the same manner as the original files.
A disaster recovery plan defines the steps to take if a data loss occurs, and helps you to restore data as quickly as possible. The plan should also assign responsibilities for data recovery tasks and list people and functions to contact when a data loss occurs.
To ensure that data recovery will run as smoothly as possible in the event of an actual data loss, make sure to test regularly whether restoring lost files from your backups is possible.
Never assume that someone will take care of backups and data recovery. Assign responsibilities for making manual backups, for checking whether those automatic backups took place, for testing data recovery, and for restoring any lost data.
The above information is adapted from:
Errors can happen when backups are written or copied. You should check the integrity of backed up files frequently. This can be done with ‘checksum tools’ such as MD5summer. Checksums can be compared to digital fingerprints. Checksum tools create a fingerprint with the help of an algorithm that computes the fingerprint - a string of numbers - from the bit values (the ones and the zeros) of a file. Monitoring whether the fingerprint of a given file changes allows you to detect if a file was changed in any way intentionally or unintentionally. |
Features of data archives and repositories can include:
A focus on preservation and ensuring permanency
The ability to curate the data and assume custodianship of the data
Unique and permanent location and assignation of unique identifiers
Data deposited in established repositories and archives rank higher in search engine page rankings, i.e. becomes more discoverable to the wider research community
Rights, licensing, and access management including managing access to restricted or sensitive data
Data is backed up and managed by the service, leaving you free to focus on current research projects
Impact metrics – often these services allow you to track the interest your data receives by viewing the number of downloads and/or views.
Check the Research Data and Primary Materials Management Procedure (Section 5 (26-32)) for advice regarding the archiving and retention of research data.
How long will the data be kept? A retention period can be nominated as part of the Data Management Plan
Check before disposing of any files or items associated with the research as there may be a legal requirement to retain documentation, in hard or soft format. Check your contract and the University's Research Data and Primary Materials Management Procedure.