Reinvesting for Growth

Outcomes of past research surround us everywhere. The technology that runs our cars, pain-killers and other simple treatments for our ailments or the scholarship that underpins that fascinating history documentary we watch on TV; it’s hard to imagine a world without the products of research.

Another example is the smartphone technology that many of us rely on, not just for communication but also for access to information. Smartphones represent an extraordinary convergence of research into a single device, but underpinning it all is the wireless network. And that aspect of the technology is about to go through the next revolution as we move from 3G to 4G networks, which promise to provide connection speeds on smartphones to rival the best available from wired broadband.

As well as a huge opportunity for new innovation, the 4G revolution will also provide revenue for the Government through the auction of 4G spectrum that is planned for next year. A new report from the Campaign for Science and Engineering and NESTA makes the case that this windfall is part of the return on our past investment in research. And also argues that the best use of the money is to reinvest it in research, technology and innovation; a boost to drive sustainable economic growth.

The report proposes a programme to spend the projected £4 billion of additional income. The suggestions are well designed to cover both investments that will bring benefits in the longer term, and those that will deliver on shorter time-scales  I think these proposals merit serious consideration and represent an exciting opportunity to make a step change in innovation-led growth.

From an RCUK perspective, it is especially pleasing to see such a strong emphasis on spending on research infrastructure. Innovation is driven by world-leading research which needs access to world-leading research infrastructure. Government has made significant investment in research capital over the last two years, but there are still major opportunities available that we need to seize to keep our research at the highest level. In the coming weeks we will be publishing the RCUK Capital Investment Framework which sets out these opportunities. Allocation of further funding would allow us to turn opportunities into realities, with major benefits for the economic performance of the Nation in the long-term.

RCUK Open Access Policy – Our Preference for Gold

In this blog post I want to outline the reasons why ‘Gold’ is the preferred route for Open Access for the Research Councils.

Our overall philosophy on Open Access is underpinned by four key principles, first detailed in our 2005 position statement on Access to Research Outputs.  These principles are around Accessibility, Quality, Efficiency and Cost Effectiveness, and Long-term Preservation.

The first of our four key principles is that the ideas and knowledge derived from publicly-funded research must be made available and accessible for public use, interrogation and scrutiny, as widely, rapidly and effectively as practicable.  It is this principle which is at the heart of our preference for Gold.

The scope of this principle is around ‘public use’, which covers many more potential users than just those within the research community.  Many of these users, and I am aware that there will be exceptions to this, are not familiar with how research papers are produced and distributed, and will not understand the subtle differences between pre-prints, post-prints and the publisher’s version of a paper.  Our concern is to ensure that all users have access to the highest quality version of a paper, and for us the most effective way of doing that is for a user to have access to the published version on a journal web site.  If a user wants to read a paper from Nature, the best way to ensure they are reading the definitive version is to read the version available from the Nature web site.  Gold delivers this universal access to the published version of the paper.

For us ‘use’ means much more than just being able to read research papers – it means having the ability to re-use and exploit research papers in the widest possible sense – be that text and data mining to advance new areas of research, to re-presenting collections of research papers in particular areas, to mashing together elements of research papers with other information to create new information products.  With maximal openness and accessibility, comes maximal opportunity to exploit, and thus maximal opportunity for innovation.  And from innovation comes growth, and benefit to the UK as a whole.  Gold delivers this maximal openness and opportunity for innovation through the CC-BY licence which we require where we pay an APC.

Widely’ means that access to the research outputs we fund must not be limited to those who can afford to pay for subscriptions, or for copies of articles from a journal’s web site.  Hence they should be available without cost.  Gold delivers free access for all users.

Rapidly’ means that articles should be available as soon as they are published, or with a minimum delay.  Gold delivers immediate access on publication with no embargo period.

Effectively’ means that the systems used to provide access must be straightforward for all users, and should be scalable and sustainable.  In the long-term, Gold with payment of APCs will provide a scalable and sustainable solution, to cover the costs of publishing, especially for the learned societies who are key members of the UK research community.  Gold is also straightforward for users – if you want a copy of a paper you go to a journal web site, rather than having to search a repository, and then possibly wait whilst you contact the author to request a copy.  It is also not clear to me how scalable a ‘request a copy’ or ‘Almost-OAfunction is for papers in high public demand.  An author might be happy to email copies to a few researchers, but what happens when they get 10’s or 100’s or 1000’s of requests from interested members of the public?

Basically, our preference for Gold can be summarised as we want to make the outputs of the research we fund accessible at the highest quality to the widest number of people, to do the widest range of stuff with, with the least restrictions.  We consider that, at the current time, Gold with CC-BY direct from a journal’s web site provides the route for ensuring that the papers arising from the research we fund are accessible to the widest number of users to meet this preference.

GTR – Dealing with the Data Challenges

At the outset of the GTR project, one of the major challenges we identified was establishing the scope of the data and the semantics of terms associated with it.

A little background to the challenge

In 2011, the seven Research Councils finished consolidating 5 grants transaction processing systems into a single shared solution. This has helped to reduce some of the differences between the format of council data, but not always the meaning of values held. Whilst many of the semantic differences are purely a labelling issue, some are more deeply entrenched in the ways councils work with their respective research communities. To this end, we are taking time to unpick and agree a common set of terms under which we can publish data.

Alongside these semantic equivalence challenges come issues over whether we are permitted to publish some of the data we hold. The GtR portal will be publishing data from various Research Council systems and these systems are often a result of previous projects to migrate legacy data. The terms under which the information was originally gathered have variation on whether the data can be made public, although they generally permit the Research Councils to publish information for the furtherance of their missions. We are working to ensure that the information gathered for one purpose is suitable and fit for release and does not break any expectations of privacy or confidentiality.

The form and content of the data gathered for one purpose (e.g. to provide the basis of evidence of the impact of research) may not be suitable or indeed easily consumable for another purpose, or may be missing some linkages that would have been made if the aim was publication – e.g. we may not be able to link a publication to more than one person on a research project as we gather this as evidence of outcomes against a project. Over time, as the GtR develops, we would expect it to inform the way we gather data so that we can ensure it delivers value as both evidence of what we have funded and as a resource for those wanting to exploit it in either a public or commercial interest.

Why did we choose CERIF

CERIF was chosen as a storage mechanism to store GtR data for 2 main reasons:

  • The common challenges of storing research and related data have already been discussed and documented by a wide group of skilled and knowledgeable people. Developing our own bespoke model to deal with the same challenges seemed wasted effort
  • CERIF is highly regarded within the research community and we felt it was important to store the data in a consistent way to facilitate exploitation and information exchange with many universities, research organisations and consumers of research data

Following our decision to adopt CERIF we invested time in understanding the model and also seeking the advice of Brigitte Joerg. Brigitte has been instrumental in helping us understand the CERIF model and how to populate it with data from our staging area.

The Cost of Adoption

Adopting CERIF brings a steep learning curve and also adds time. Most of this additional time has been associated with documenting and defining the semantics of every attribute and how it relates to others. This in-depth understanding, notwithstanding the challenge of getting 7 bodies to agree, is critical to identifying where it belongs in the CERIF model. If your data is already well defined then it will put you in as strong position. If like most systems, it is not then you will need to agree a set of terms and definitions with the business for every attribute that you want to map to CERIF before proceeding.

This issue most clearly manifests itself in populating the CERIF semantic layer with your vocabularies. Only when the vocabulary data has been added can you begin loading your data sets. For example:

Project A has a current status of Authorised. To be able to represent that information against a project in CERIF you need to populate the CERIF semantic layer with 2 key pieces of data. A) What is the definition for project status B) What does the term ‘Authorised’ mean. With this semantic reference data in place you can relate project with the “Authorised” status. This has a lot of similarities with the open data RDF world, but CERIF contains its semantic information both in the table structure (the attributes of the tables) and in the contents of the semantic layer.

Another key point is understanding what to do with free text and date fields that the CERIF model hasn’t catered for. There are a couple of options how to store them, with the simplest being to extend the existing tables by adding attributes but this is definitely not supported by the CERIF task group. The more appropriate route would be to add additional tables containing the attributes. Of course if you want the additional fields to have language variations, then you face a decision of whether to adopt the same model used in CERIF and maintain some logical consistency across the model and the extension or adopt your own model. However, you choose to resolve the issue of unsupported fields, you should try and maintain clean semantics, as it is tempting to put something in that solves a problem now and live with consequences of this decision going forward.

We have made good progress on the subset of data that we hope to launch in November and have managed to gain agreement for the vocabulary (although some definitions are still being hammered out). The use of CERIF has definitely been a journey of discovery so far and we hope to keep you updated with progress and more detail on how we have mapped the data into CERIF.

Note: If an attribute is added and it could be of general interest to the community. We will notify the CERIF task group and they will consider it as an addition to the model.