The GTR technical approach

Paul Chitson, BBSRC Head of Information Services –

The GtR technical team have started the process of pulling together data from various Research Council funded projects and building a technology platform through which to make it available to a wide variety of audiences.

The project is being managed and delivered within an Agile framework, perhaps the first cross council project in the Research Councils to adopt this approach.  This means we will not only be trying to hit key milestones but also aiming to release as much information and as many early previews as we can.  This blog post is the first from the technical team and hopefully sets a precedent for the way we wish to continue.

The process began with the gathering of user stories from potential audiences which have since enabled us to understand the content that is of interest, the ways in which people may want to interface with the data and, of equal importance, what people do not want.

The requirements gathering phase has informed our initial thoughts on the solution architecture and how the data will be stored.

The data will be extracted from various sources, transformed and then stored in two forms. The first will be in a relational database using the CERIF data model.  Initially, the intention was to provide an interface that would output data in a CERIF format but following a workshop with Keith Jeffery (President of euroCRIS) we felt that we could use it as the internal storage model.  However, we also have a commitment to provide data in a linked data format (RDF) via a SPARQL interface, so have chosen to store the data in two forms (rather than a RDF wrapper) and benefit from a SPARQL interface out of the box.  We will, wherever possible use existing open source technologies and integrate rather than build from scratch. Any code or customisations we make will be made freely available.

GtR technology at a glance:

  • Data storage model will be CERIF. This ensures compliance with many universities, research organisations and consumers of research data;
  • A user portal with full text and facetted search (SOLR);
  • REST based interface providing rich information in both human and machine readable forms;
  • SPARQL interface for querying a triple store (initially using the JENA Software stack);
  • Web services providing data in OAI-PMH and CERIF formats.

 

We are at the start of an exciting journey and invite you to add your input along the way.