The benefits of Open Access

Astrid Wissenburg, Deputy Chair of RCUK Impact Group and RCUK representative on the Finch Group, and Mark Thorley, Chair of the RCUK Research Outputs Network, explain why open access is so high up the agenda for Research Councils.

Just over a month ago Research Councils UK launched a new Open Access policy. One of the key drivers for making published journal articles freely available through open access mechanisms is the potential it offers to the research community (and beyond) to mash, mine and mix information and knowledge..  This provides real opportunities to substantially further the progress of research and innovation. 

Professor Douglas Kell, RCUK Champion for Research and Information Management and CEO of the BBSRC, is well known for arguing the importance of open access to undertake exiting and ground breaking research through text and data mining. His blog gives many examples such as genome-based metabolic network reconstruction, text mining for systems biology, and pulling together disparate literatures and synthesising inductive knowledge in pharmacokinetics, medicine and toxicology. 

Beyond the Research Councils, Professor Peter Murray Rust, in his manifesto on Open Mining of Scholarship, notes that the lack of support for text mining stifles the imagination of the wider community and can lead to bad policy decisions through the lack of full use of scientific literature. The Value and Benefits of Text Mining report, commissioned by JISC , highlights that one of the barriers to overcome is providing unrestricted access to information sources.

It is this need for unrestricted access, allowing full use and re-use, which is one of the reasons why the Research Councils, along with the Wellcome Trust, are advocating the use of a Creative Commons ‘Attribution’ license (CC-BY). The CC-BY licence allows others to modify, build upon and/or distribute the licensed work, including for commercial purposes, as long as the original author is credited.  Crucially, CC-BY licensed works can be deposited in repositories with no further restrictions on access or re-use. Combine this with requiring immediate access where this is possible, if necessary through paying an open access fee, and we have some of the critical building blocks to fundamentally speed up the scientific and research process.

Murray Rust also notes that text mining is a major tool in data review. and the important role it plays in validating science. A key requirement of the new RCUK policy is that peer reviewed research papers, resulting from Council funded research must include a statement on how the underlying research materials such as data, samples or models can be accessed. This requirement has been included with the specific aim of making the work funded by the Research Councils more open, and so more accountable, both to other scientists and to the wider public.  This supports recommendations made in the recent Royal Society report on Science as an Open Enterprise to improve the conduct of science, respond to changing public expectations and political culture and to enable researchers to maximise the impact of their research. 

Whilst the requirement for a statement does not imply that the supporting data etc must always be Open Access, researchers must be clear about what supporting information can be made available, and how this can be accessed.  Researchers will also need to be equally clear about what it is not possible to make available including the reasons why.  For example, it is often not possible to make data relating to human subjects openly available because of issues relating to consent and confidentiality.

Implementing this requirement will be the responsibility of both researchers and their host institutions.  Researchers will need to think about openness as they plan and undertake research.  Institutions will need to develop an open data culture, and the necessary infrastructure and skills to support this. 

Institutional and subject repositories are expected to form a key element of that infrastructure by providing a secure, and accessible, home for the data, models and other information underlying a research paper.  They will not be suitable for all material, for example physical samples, however, they can provide a primary repository for a lot of the material, and by holding copies of the associated papers, provide the linkages between the paper and the underlying materials. This is also one of the recommendations of the Finch report Accessibility, sustainability, excellence: how to expand access to research publications.  By doing so, institutional and subject repositories, containing ‘green’ and ‘gold’ materials can be an essential facilitator of text and data mining.  By supporting both gold and green open access, the Research Councils ensure further opportunities for repositories to develop this role.

Launching the new policy is not an end to the work that the Research Councils have been engaged in since launching their first joint statement on open access in 2005.  We are, in conversation with researchers and institutions, in the process of developing the operational details of the policy and will share the details as quickly as they become available. This is a fast moving area of research policy which, as major funders of research in the UK, we have a duty to ensure provides the best possible opportunity to the UK research base.

JISC and Research Councils UK work to reduce reporting burden on universities

Matt Jukes, MRC Digital Communication Manager –

JISC have just published a blogpost outlining joint activity between them and us here at the Research Councils that is of particular interest to anyone following the Gateway to Research project. Rather than reproduce it here I recommend you check it out over on the JISC blog. There is a great of interesting work being undertaken in this area.

The GTR technical approach

Paul Chitson, BBSRC Head of Information Services –

The GtR technical team have started the process of pulling together data from various Research Council funded projects and building a technology platform through which to make it available to a wide variety of audiences.

The project is being managed and delivered within an Agile framework, perhaps the first cross council project in the Research Councils to adopt this approach.  This means we will not only be trying to hit key milestones but also aiming to release as much information and as many early previews as we can.  This blog post is the first from the technical team and hopefully sets a precedent for the way we wish to continue.

The process began with the gathering of user stories from potential audiences which have since enabled us to understand the content that is of interest, the ways in which people may want to interface with the data and, of equal importance, what people do not want.

The requirements gathering phase has informed our initial thoughts on the solution architecture and how the data will be stored.

The data will be extracted from various sources, transformed and then stored in two forms. The first will be in a relational database using the CERIF data model.  Initially, the intention was to provide an interface that would output data in a CERIF format but following a workshop with Keith Jeffery (President of euroCRIS) we felt that we could use it as the internal storage model.  However, we also have a commitment to provide data in a linked data format (RDF) via a SPARQL interface, so have chosen to store the data in two forms (rather than a RDF wrapper) and benefit from a SPARQL interface out of the box.  We will, wherever possible use existing open source technologies and integrate rather than build from scratch. Any code or customisations we make will be made freely available.

GtR technology at a glance:

  • Data storage model will be CERIF. This ensures compliance with many universities, research organisations and consumers of research data;
  • A user portal with full text and facetted search (SOLR);
  • REST based interface providing rich information in both human and machine readable forms;
  • SPARQL interface for querying a triple store (initially using the JENA Software stack);
  • Web services providing data in OAI-PMH and CERIF formats.

 

We are at the start of an exciting journey and invite you to add your input along the way.

Introducing the Gateway to Research

Catherine Coates, Director of Business Innovation at EPSRC and the SRO for the Gateway to Research project, introduces the project and explains why the Gateway is important –

The UK’s Research Councils host a significant quantity of data which provides information on the research and training that they support, as well as the outcomes of that research. This is of huge potential interest and value to business and many other organisations, particularly universities that already make similar data publically available. The Research Councils together are determined to play their part in making the data we hold freely and easily available for others to use as they see fit, including seeding collaborations and helping interested parties to find out who, what and where knowledge sits to enable them to make contact with people who can help them.

With the Gateway to Research project, we envisage an integrated Research Council data set that enables data sharing across the government, private and university sectors.

For example, although currently our data is in the public domain, accessible through our websites, it isn’t easy to navigate what seven Councils hold when using seven different websites!

So we want to create a smart way to make that easy, with common data standards and interoperability, so anyone can access it and use it as they see fit. This is the Gateway to Research concept. Not a controlling gateway but an open door!

We aim to produce an integrated data depositing and harvesting experience for universities and other stakeholders.

We need your help in making this happen. We will use this blog to engage with interested parties regarding the platforms, technologies and data formats that we will be delivering. Help us to deliver the functionality and user experience that will enable you to use our data.

This blog will be updated as often as is practical when there is new information to share. Realistically this will be once a week at most. We will, however, endeavour to engage with questions on a more frequent basis.