Introducing the beta Gateway to Research

Today (12 December 2012), Research Councils UK (RCUK) release the first phase of the Gateway to Research (GtR) portal and dataset –  a beta release. http://gtr.rcuk.ac.uk

In January, we agreed with the Department of Business, Innovation and Skills (BIS) to to deliver a “proof of concept” by the end of the year.

We hope we have delivered more than a proof of concept. The portal is live, the data is real. This is the first time that it has been possible to use one location to explore the entire breadth of the RCUK portfolio that results from the investment of around £3 Billion of public money annually in research and innovation.

The beta is an early release which will enable users to try out the system under real conditions. It has gone through robust testing in-house and is close in look, feel and function to how we envisage the final product. We want to engage with users to ensure that the functionality and data we are delivering meets their needs.

A few points about the system and data:

  • The dataset is currently a static dataset (i.e. at this stage it will not be routinely updated);
  • A public interface is available (API) that will enable external users to use the data. This will initially be a simple CERIF (XML) API, based on an international research information standard but others will follow (REST, OAI and SPARQL) to maximise potential users. Data that is visible on all the detailed screens will be viewable in XML;
  • We have used Open Source, Open Standards and adopted an Open Government Licence.

We intend to make user engagement a central part of the project’s development. Complementing this, there are a number of activities that we have identified which will enhance the user experience, enrich the information available, and help the Research Councils meet their obligations to make research information more open and better aligned with users’ needs. Some highlights of the next 12 months include: 

  • Expansion of the GtR dataset to include further Research Council information for example, studentships, all intra Council grants and linking to research datasets and publication repositories.
  • Making the GtR dataset dynamic, reflecting changes in source systems rapidly;
  • Further iterations of the User Interface based on feedback on the beta-system and changes prompted by the expanding dataset;
  • Working with JISC to enhance the experience of HEIs and other data users in depositing and harvesting data from the Research Councils;

The next year should provide exciting opportunities to demonstrate the value of RCUK research information in diverse settings. Hack days are being planned for the spring, with at least two different providers to ensure a range of approaches. This project has involved the Research Councils working cooperatively on a complex project, and rapid agreement has been reached on the key decision points to date. The final phase of the project will build on this.

New RCUK study on the impact of doctoral training

By Dr Iain Cameron, Head of RCUK Research Careers and Diversity.

There are existing studies that look at the career pathways and impact of doctoral training – after six months from graduating and then after three years. But what difference do PhD graduates make in the longer term?

As part of a larger of programme of longitudinal tracking of doctoral graduates, RCUK has launched a new study, with the UK funding councils, to examine the economic impact of doctoral training with PhD graduates from 2004-05. We want to gain a deeper and more evidence-based understanding of their impact and how their skills contribute to innovation and the competitiveness of the UK. The study is being carried out by the research consultancy CFE in partnership with the Higher Education Careers Services Unit (HECSU) and Sheffield University.

As well as examining their impact to the economy and in the workplace, we will be looking at the vibrancy of the research base and the contribution of doctoral graduates across the economy, not just in academic fields. We want to find out the range and scope of career destinations and how their careers are developing, which will help to inform guidance for researchers on the career choices available to them.

We know that, logistically, this is going to be challenging. Getting back in touch with people after a number of years is not straightforward. People move, change jobs, change emails, change phone numbers. Unusual names might pop up in searches but while we might find lots of people called Jane Smith, knowing which the right one is, is much harder.

To identify typical career pathways and routes for innovation for different subjects or groups of people, we need enough replies to be confident their paths really are representative. It’s not going to be easy and enough replies to do even broad subject groupings may be ambitious, so we may also include graduates from 2003-04 and 2005-06.

A first stage is inviting anyone who completed their doctorate between 2003 and 2006 to get in touch with CFE so we can work out whether we will be able to reach enough people for a survey to work. We need the help of supervisors, research centres, alumni offices and other networks of doctoral graduates to help us to identify and make contact with graduates. Providing we have enough contact information, the research will begin with a short online survey and, for some, more in-depth interviews.

We are confident that if we can reach doctoral graduates, they will be pretty good at responding and many will feel compelled to be involved for the ‘greater good’. By telling us about what they doing, it will provide more evidence that is vital to inform decisions on future investment in doctoral training.

The other big challenge is, of course, about what we’ll ask them. There are reasonably established methods for looking at the impact of research outcomes. When it comes to individual impacts and careers, there are quite a few career stories, case studies and in-depth work. For example, the Economic and Social Research Council (ESRC) work on the contribution of social science to government policy is at the cutting edge of such comprehensive work. We know there is a large survey of doctoral graduates in Germany. There are also studies around salaries, but we wish to look beyond private returns to establish whether the UK economy and society benefits from investments in doctoral research. This is where we don’t yet have established methods and indicators, but if we want evidence-based decision-making, we have to continue to work on it.

We expect to publish the findings of the study by the end of 2013 and this will develop a sustainable research tool that could be used again to help fill gaps in evidence. The data will be made available to others through the UK Data Archive.

We would like to encourage graduates, and anyone who can put us in touch with the right doctoral graduates, to get in touch. Further information is available at http://www.cfe.org.uk/doctoralimpactstudy

Reinvesting for Growth

Outcomes of past research surround us everywhere. The technology that runs our cars, pain-killers and other simple treatments for our ailments or the scholarship that underpins that fascinating history documentary we watch on TV; it’s hard to imagine a world without the products of research.

Another example is the smartphone technology that many of us rely on, not just for communication but also for access to information. Smartphones represent an extraordinary convergence of research into a single device, but underpinning it all is the wireless network. And that aspect of the technology is about to go through the next revolution as we move from 3G to 4G networks, which promise to provide connection speeds on smartphones to rival the best available from wired broadband.

As well as a huge opportunity for new innovation, the 4G revolution will also provide revenue for the Government through the auction of 4G spectrum that is planned for next year. A new report from the Campaign for Science and Engineering and NESTA makes the case that this windfall is part of the return on our past investment in research. And also argues that the best use of the money is to reinvest it in research, technology and innovation; a boost to drive sustainable economic growth.

The report proposes a programme to spend the projected £4 billion of additional income. The suggestions are well designed to cover both investments that will bring benefits in the longer term, and those that will deliver on shorter time-scales  I think these proposals merit serious consideration and represent an exciting opportunity to make a step change in innovation-led growth.

From an RCUK perspective, it is especially pleasing to see such a strong emphasis on spending on research infrastructure. Innovation is driven by world-leading research which needs access to world-leading research infrastructure. Government has made significant investment in research capital over the last two years, but there are still major opportunities available that we need to seize to keep our research at the highest level. In the coming weeks we will be publishing the RCUK Capital Investment Framework which sets out these opportunities. Allocation of further funding would allow us to turn opportunities into realities, with major benefits for the economic performance of the Nation in the long-term.

RCUK Open Access Policy – Our Preference for Gold

In this blog post I want to outline the reasons why ‘Gold’ is the preferred route for Open Access for the Research Councils.

Our overall philosophy on Open Access is underpinned by four key principles, first detailed in our 2005 position statement on Access to Research Outputs.  These principles are around Accessibility, Quality, Efficiency and Cost Effectiveness, and Long-term Preservation.

The first of our four key principles is that the ideas and knowledge derived from publicly-funded research must be made available and accessible for public use, interrogation and scrutiny, as widely, rapidly and effectively as practicable.  It is this principle which is at the heart of our preference for Gold.

The scope of this principle is around ‘public use’, which covers many more potential users than just those within the research community.  Many of these users, and I am aware that there will be exceptions to this, are not familiar with how research papers are produced and distributed, and will not understand the subtle differences between pre-prints, post-prints and the publisher’s version of a paper.  Our concern is to ensure that all users have access to the highest quality version of a paper, and for us the most effective way of doing that is for a user to have access to the published version on a journal web site.  If a user wants to read a paper from Nature, the best way to ensure they are reading the definitive version is to read the version available from the Nature web site.  Gold delivers this universal access to the published version of the paper.

For us ‘use’ means much more than just being able to read research papers – it means having the ability to re-use and exploit research papers in the widest possible sense – be that text and data mining to advance new areas of research, to re-presenting collections of research papers in particular areas, to mashing together elements of research papers with other information to create new information products.  With maximal openness and accessibility, comes maximal opportunity to exploit, and thus maximal opportunity for innovation.  And from innovation comes growth, and benefit to the UK as a whole.  Gold delivers this maximal openness and opportunity for innovation through the CC-BY licence which we require where we pay an APC.

Widely’ means that access to the research outputs we fund must not be limited to those who can afford to pay for subscriptions, or for copies of articles from a journal’s web site.  Hence they should be available without cost.  Gold delivers free access for all users.

Rapidly’ means that articles should be available as soon as they are published, or with a minimum delay.  Gold delivers immediate access on publication with no embargo period.

Effectively’ means that the systems used to provide access must be straightforward for all users, and should be scalable and sustainable.  In the long-term, Gold with payment of APCs will provide a scalable and sustainable solution, to cover the costs of publishing, especially for the learned societies who are key members of the UK research community.  Gold is also straightforward for users – if you want a copy of a paper you go to a journal web site, rather than having to search a repository, and then possibly wait whilst you contact the author to request a copy.  It is also not clear to me how scalable a ‘request a copy’ or ‘Almost-OAfunction is for papers in high public demand.  An author might be happy to email copies to a few researchers, but what happens when they get 10’s or 100’s or 1000’s of requests from interested members of the public?

Basically, our preference for Gold can be summarised as we want to make the outputs of the research we fund accessible at the highest quality to the widest number of people, to do the widest range of stuff with, with the least restrictions.  We consider that, at the current time, Gold with CC-BY direct from a journal’s web site provides the route for ensuring that the papers arising from the research we fund are accessible to the widest number of users to meet this preference.

Questions about our Open Access policy?  Please email openaccess@rcuk.ac.uk.

GTR – Dealing with the Data Challenges

At the outset of the GTR project, one of the major challenges we identified was establishing the scope of the data and the semantics of terms associated with it.

A little background to the challenge

In 2011, the seven Research Councils finished consolidating 5 grants transaction processing systems into a single shared solution. This has helped to reduce some of the differences between the format of council data, but not always the meaning of values held. Whilst many of the semantic differences are purely a labelling issue, some are more deeply entrenched in the ways councils work with their respective research communities. To this end, we are taking time to unpick and agree a common set of terms under which we can publish data.

Alongside these semantic equivalence challenges come issues over whether we are permitted to publish some of the data we hold. The GtR portal will be publishing data from various Research Council systems and these systems are often a result of previous projects to migrate legacy data. The terms under which the information was originally gathered have variation on whether the data can be made public, although they generally permit the Research Councils to publish information for the furtherance of their missions. We are working to ensure that the information gathered for one purpose is suitable and fit for release and does not break any expectations of privacy or confidentiality.

The form and content of the data gathered for one purpose (e.g. to provide the basis of evidence of the impact of research) may not be suitable or indeed easily consumable for another purpose, or may be missing some linkages that would have been made if the aim was publication – e.g. we may not be able to link a publication to more than one person on a research project as we gather this as evidence of outcomes against a project. Over time, as the GtR develops, we would expect it to inform the way we gather data so that we can ensure it delivers value as both evidence of what we have funded and as a resource for those wanting to exploit it in either a public or commercial interest.

Why did we choose CERIF

CERIF was chosen as a storage mechanism to store GtR data for 2 main reasons:

  • The common challenges of storing research and related data have already been discussed and documented by a wide group of skilled and knowledgeable people. Developing our own bespoke model to deal with the same challenges seemed wasted effort
  • CERIF is highly regarded within the research community and we felt it was important to store the data in a consistent way to facilitate exploitation and information exchange with many universities, research organisations and consumers of research data

Following our decision to adopt CERIF we invested time in understanding the model and also seeking the advice of Brigitte Joerg. Brigitte has been instrumental in helping us understand the CERIF model and how to populate it with data from our staging area.

The Cost of Adoption

Adopting CERIF brings a steep learning curve and also adds time. Most of this additional time has been associated with documenting and defining the semantics of every attribute and how it relates to others. This in-depth understanding, notwithstanding the challenge of getting 7 bodies to agree, is critical to identifying where it belongs in the CERIF model. If your data is already well defined then it will put you in as strong position. If like most systems, it is not then you will need to agree a set of terms and definitions with the business for every attribute that you want to map to CERIF before proceeding.

This issue most clearly manifests itself in populating the CERIF semantic layer with your vocabularies. Only when the vocabulary data has been added can you begin loading your data sets. For example:

Project A has a current status of Authorised. To be able to represent that information against a project in CERIF you need to populate the CERIF semantic layer with 2 key pieces of data. A) What is the definition for project status B) What does the term ‘Authorised’ mean. With this semantic reference data in place you can relate project with the “Authorised” status. This has a lot of similarities with the open data RDF world, but CERIF contains its semantic information both in the table structure (the attributes of the tables) and in the contents of the semantic layer.

Another key point is understanding what to do with free text and date fields that the CERIF model hasn’t catered for. There are a couple of options how to store them, with the simplest being to extend the existing tables by adding attributes but this is definitely not supported by the CERIF task group. The more appropriate route would be to add additional tables containing the attributes. Of course if you want the additional fields to have language variations, then you face a decision of whether to adopt the same model used in CERIF and maintain some logical consistency across the model and the extension or adopt your own model. However, you choose to resolve the issue of unsupported fields, you should try and maintain clean semantics, as it is tempting to put something in that solves a problem now and live with consequences of this decision going forward.

We have made good progress on the subset of data that we hope to launch in November and have managed to gain agreement for the vocabulary (although some definitions are still being hammered out). The use of CERIF has definitely been a journey of discovery so far and we hope to keep you updated with progress and more detail on how we have mapped the data into CERIF.

Note: If an attribute is added and it could be of general interest to the community. We will notify the CERIF task group and they will consider it as an addition to the model.

RCUK Open Access Policy – When to go Green and When to go Gold

Yesterday I took part in the Imperial College Science Communication Forum event ‘discussing’ the new RCUK Policy on Access to Research Outputs with Stephen Curry (Imperial College) and Richard Van Noorden (Nature News) – though after two hours under the spotlight, for me it felt a little more like a ‘grilling’ than a discussion 😉  However, many thanks to the SciCommForum team for the invitation to present our policy in more detail and to have the opportunity to discuss issues around the interpretation and implementation of the policy.  One of the things I committed to do was to update the guidance to the policy to be very clear about the choices RCUK funded authors can make in terms of which routes they must use to make their research papers open access.  I want to use this blog post to reiterate the policy clarifications I gave at the SciCommForum event, and previously at the Open Access Publishers Association Meeting.

Our policy requires that peer reviewed research papers which result from research that is wholly or partially funded by the Research Councils must be published in journals which are compliant with Research Council policy on Open Access.

A journal is compliant with our policy if it provides Gold OA using the CC-BY licence, and RCUK will provide funds to institutions to cover payment of APCs.  However, if a journal is not prepared to offer a Gold CC-BY option, it can achieve compliance by offering a specific Green option which must meet the following requirements.  It must allow, at a minimum, the accepted manuscript with all changes resulting from peer-review, to be deposited in a repository without restrictions on non-commercial re-use and with a maximum embargo period of 6 months.  For a limited transition period the maximum embargo period is extended to 12 months for papers arising from research funded by the AHRC and the ESRC.  This is in recognition that journals in these areas are not yet as well placed to move to an OA model.

So what does this mean for authors?  If the journal they want to publish in only offers policy compliance through a Gold route, they must use that journal’s Gold option.  If the journal only offers compliance through the Green route, the author must ensure that a copy of the post-print is deposited in an appropriate repository – for example, UKPMC for papers arising from MRC funded research.  If the journal offers both a Gold and a Green route to compliance (and some journals already do this), it is up to the author and their institution to decide on the most appropriate route to use.  And, if a journal offers neither a Green nor a Gold compliant route, it is not eligible to take RCUK funded work, and the author must use a different, compliant, journal.

The Research Councils are not anti-Green and support a dual approach for delivering OA.  However, we do have a strong preference for Gold, and I will explain why in my next blog post.  And, where there is a choice between compliant-Green and compliant-Gold – either through a journal offering both routes to compliance, or through using different journals offering different compliance routes – it is up to authors and their institutions to work together to make the choice as to which option to use.

Questions about our Open Access policy?  Please email openaccess@rcuk.ac.uk.

An audio recording of the Imperial College discussion is available on FigShare.

RCUK celebrated five years of UK-China research success

By Dr Alicia Greated, Director RCUK China.

This week over 100 UK and Chinese delegates attended the event ‘RCUK China – Five Years and Beyond’ in Beijing, to celebrate the fifth anniversary of RCUK China. Delegates heard about current successful RCUK-China collaborations and contributed to discussions to scope future areas for collaborative RCUK-China activity. A UK delegation of 23 key research figures, including representatives from all seven UK Research Councils and leading academics, travelled to China to take part in the event.  Representatives from a range Chinese funding organisations, leading Chinese research institutions, and UK organisations based in China also actively contributed to the day’s discussions.

Major new investments were announced which will build on RCUK China’s success and ensure that the UK and China remain at the cutting edge of science and innovation.

 The British Ambassador to China, Sebastian Wood, announced that leading UK energy scientists have received £4million in funding from the Engineering and Physical Sciences Research Council (EPSRC) with matched resources from the NSFC, to work in partnership with researchers in China to develop better smart grid technology and to help both countries reduce their carbon footprint. In addition a new multi-million pound joint collaboration between RCUK and the NSFC in smart grids and the integration of electric vehicles was announced by Professor Rick Rylance (Chair of RCUK) and Professor Ding (NSFC).

 This brings the RCUK commitment to UK-China energy research alone to over £24million, with matched funding from Chinese partners.

David Willetts, Minister of State for Universities and Science, said: “The UK’s relationship with China is extremely valuable in driving research and innovation. By co-operating in this way, both countries can enjoy more of the benefits that high-quality scientific research brings, including economic growth and a better quality of life. The new investments announced today will help to ensure that the partnership between our two countries goes from strength to strength.”

 Since its launch in 2007, RCUK China has supported a range of activities to promote UK-China research collaborations and the team has also helped developed a significant multi-million pound programme of joint funding activities with key research funders in China in areas including healthcare, social sciences, food security and energy. Professor Paul Boyle, RCUK’s International Champion, said: “International cooperation is a fundamental part of enhancing and stimulating the research we support in the UK, and China is a highly valued and important partner for us. RCUK China was the first overseas RCUK office and we are very excited about the great opportunities for building partnerships with our Chinese colleagues over the next five years.”

The benefits of Open Access

Astrid Wissenburg, Deputy Chair of RCUK Impact Group and RCUK representative on the Finch Group, and Mark Thorley, Chair of the RCUK Research Outputs Network, explain why open access is so high up the agenda for Research Councils.

Just over a month ago Research Councils UK launched a new Open Access policy. One of the key drivers for making published journal articles freely available through open access mechanisms is the potential it offers to the research community (and beyond) to mash, mine and mix information and knowledge..  This provides real opportunities to substantially further the progress of research and innovation. 

Professor Douglas Kell, RCUK Champion for Research and Information Management and CEO of the BBSRC, is well known for arguing the importance of open access to undertake exiting and ground breaking research through text and data mining. His blog gives many examples such as genome-based metabolic network reconstruction, text mining for systems biology, and pulling together disparate literatures and synthesising inductive knowledge in pharmacokinetics, medicine and toxicology. 

Beyond the Research Councils, Professor Peter Murray Rust, in his manifesto on Open Mining of Scholarship, notes that the lack of support for text mining stifles the imagination of the wider community and can lead to bad policy decisions through the lack of full use of scientific literature. The Value and Benefits of Text Mining report, commissioned by JISC , highlights that one of the barriers to overcome is providing unrestricted access to information sources.

It is this need for unrestricted access, allowing full use and re-use, which is one of the reasons why the Research Councils, along with the Wellcome Trust, are advocating the use of a Creative Commons ‘Attribution’ license (CC-BY). The CC-BY licence allows others to modify, build upon and/or distribute the licensed work, including for commercial purposes, as long as the original author is credited.  Crucially, CC-BY licensed works can be deposited in repositories with no further restrictions on access or re-use. Combine this with requiring immediate access where this is possible, if necessary through paying an open access fee, and we have some of the critical building blocks to fundamentally speed up the scientific and research process.

Murray Rust also notes that text mining is a major tool in data review. and the important role it plays in validating science. A key requirement of the new RCUK policy is that peer reviewed research papers, resulting from Council funded research must include a statement on how the underlying research materials such as data, samples or models can be accessed. This requirement has been included with the specific aim of making the work funded by the Research Councils more open, and so more accountable, both to other scientists and to the wider public.  This supports recommendations made in the recent Royal Society report on Science as an Open Enterprise to improve the conduct of science, respond to changing public expectations and political culture and to enable researchers to maximise the impact of their research. 

Whilst the requirement for a statement does not imply that the supporting data etc must always be Open Access, researchers must be clear about what supporting information can be made available, and how this can be accessed.  Researchers will also need to be equally clear about what it is not possible to make available including the reasons why.  For example, it is often not possible to make data relating to human subjects openly available because of issues relating to consent and confidentiality.

Implementing this requirement will be the responsibility of both researchers and their host institutions.  Researchers will need to think about openness as they plan and undertake research.  Institutions will need to develop an open data culture, and the necessary infrastructure and skills to support this. 

Institutional and subject repositories are expected to form a key element of that infrastructure by providing a secure, and accessible, home for the data, models and other information underlying a research paper.  They will not be suitable for all material, for example physical samples, however, they can provide a primary repository for a lot of the material, and by holding copies of the associated papers, provide the linkages between the paper and the underlying materials. This is also one of the recommendations of the Finch report Accessibility, sustainability, excellence: how to expand access to research publications.  By doing so, institutional and subject repositories, containing ‘green’ and ‘gold’ materials can be an essential facilitator of text and data mining.  By supporting both gold and green open access, the Research Councils ensure further opportunities for repositories to develop this role.

Launching the new policy is not an end to the work that the Research Councils have been engaged in since launching their first joint statement on open access in 2005.  We are, in conversation with researchers and institutions, in the process of developing the operational details of the policy and will share the details as quickly as they become available. This is a fast moving area of research policy which, as major funders of research in the UK, we have a duty to ensure provides the best possible opportunity to the UK research base.