Data is fundamental to all research and all research has data in some form or other. Data does not just mean huge datasets but can be excel files, diaries, lab notes, photographs, interviews and a myriad of other formats that are the outputs of the research process. Open research data refers to the data underpinning research results that has no restrictions on its access, enabling anyone to use and reuse it. For the Sciences the expectation is that the data can be reused and used in order to validate the research findings. For the Arts, Humanities and Social Sciences it is more about using the data to evidence the production of new knowledge (this may be more subjective, ephemeral and tacit). An article can be seen as the promotional tool for the research but without the data how can you be sure that the conclusions arrived at are valid?
There are many good reasons to share data; funders mandates, sharing data reduces redundancy and the risk of data loss, it can increase efficiency in the research process (avoids duplication), and making the underlying data available is good for reproducibility. Data loss is a huge problem in research. Without proper curation and preservation, data is left lying around on USB keys, in hard drives, and in hard copies that either get lost or become obsolete. There is also an increasing demand from funders in acknowledgment of the fact that they are using taxpayers’ money that resources be shared and researchers do not keep inventing the same wheel.
There is a recognition that not all data can be made open but it should be as open as possible and as closed as necessary. This means that the default position is open and if access to the data is restricted there must be justifiable reasons why this is so. There can be many genuine reasons for restricting access to the data. Human data needs to be completely anonymised before it can be made open, rights to privacy must be observed and other rights under GDPR (General Data Protection Regulations) must be maintained. Always remember if you are dealing with human subjects you must have informed consent to use their data.
Tim Berners-Lee’s devised a five-star step system to open data demonstrating how easy making data open can be.
- One Star: Data is made available on the web in whatever format under an open licence.
- Two Star: Make the data available as structured data i.e. use Excel instead of a scan of a table.
- Three Star: Use non-proprietary formats (CSV instead of Excel).
- Four Star: Use identifiers to denote things so that people can point at your stuff.
- Five Star: Link your data to other data to provide context.
Arrow@TU Dublin has a Data Portal where you can upload your dataset. It is important that you add any contextual information that makes the data more comprehensible. You should also explain how the data should work so a user knows what results to expect as this is the only way they have of knowing if the results are valid. This should be contained in a ReadMe File which accompanies the data. View the following guide on creating a readme file.
Remember you can be cited for your datasets just like your articles so always include a recommended citation. A data citation should contain the following:
- doi (digital object identifier)
- date published and a version number if appropriate
- date and time it was accessed
- name of the distributor (if you were citing something accessed on Arrow, TU Dublin would be the distributor)
Correct citation will enable discoverability and tracking of your citations where the doi is invaluable.
FAIR data means data that is findable, accessible, interoperable and reusable. The FAIR Guiding Principles for scientific data management and stewardship was published in Scientific Data in 2016 and you can view a summary of The Fair Data Principles. When people talk about data, they also talk about metadata which is structured information about data. It is used to summarise basic information about the data which makes finding and using the data easier. A good example of metadata is a library catalogue entry for a book which gives you the author, title and publication details which is structured information about the publication.
- Help other people and machines find your data. Rich metadata should be available online in a searchable resource and the data should be assigned a persistent identifier (all done by submitting to the Arrow Data Portal).
- Assign a persistent identifier to your data
- Describe your data in detail
- Put in online in a searchable resource like a data repository
- The metadata record includes the persistent identifier
It should be possible for people and machines to access your data under specific conditions and restrictions where appropriate. FAIR data does not mean data must be open but there should be a metadata record at least.
- Clicking on the persistent identifier brings you to the data or associated metadata
- The protocol by which data can be retrieved follows recognised standards e.g. https
- If necessary, authorisation and authentication steps are included.
- The metadata is accessible even if the data is not.
- Data and metadata should conform to recognised formats and standards that allow them to be combined and exchanged.
- Data conforms to commonly understood and preferably open standards
- Meta provided follows relevant standards
- Controlled vocabularies, keywords, thesauri or ontologies are used where possible
- Qualified references and links are provided to other related data
Lots of documentation is needed to support data interpretation and reuse. The data should conform to community norms and be clearly licensed so others know what kind of reuse is permitted.
- The data is accurate and well described with many relevant attributes
- Data has a clear and accessible data usage license
- It is evident how, why and by whom the data has been created and processed
- The data and metadata meet the relevant standard
Making data FAIR can be a complicated process but anyone can make their data findable and accessible. TU Dublin has a Data Portal where you can create a metadata (catalogue), record for your data (findable), and a contact the owner button means anyone interested can get in touch with you (accessible). You are obliged to do this under the TU Dublin Open Access Policy.
- Go Fair Initiative
- OpenAire: How to make your data fair
- UCD Library: Addressing the FAIR Data Principles in a Data Management Plan
- Australian Research Data Commons: FAIR data self-assessment tool