CD2H Phase 2 Proposal
Project Title: A computable representation of contributions
Kristi Holmes, firstname.lastname@example.org, NU
Dave Eichmann, email@example.com, Iowa
Representing individual contributions to translational science is important for providing credit, as well as understanding the expertise of our workforce.
Phase 1 attribution work that focused on inventorying contribution roles and building the first version of the contribution ontology; and previous work at Iowa on extracting semantics from the acknowledgements sections of papers from PubMed Central Open Access.
There has been a fundamental shift that recognizes both the interdisciplinary, team-based approach to science as well as the more fine-grained characterization and contextualization of the hundreds and thousands of contributions of varying types and intensities that are necessary to move science forward. Unfortunately, little infrastructure exists to identify, aggregate, present, and (ultimately) assess the impact of these contributions. These significant problems are technical as well as social and require an approach that assimilates cultural and social aspects of these problems in an open and community-driven manner. Here we are developing a contribution role ontology to support modeling of the multiple additional ways in which the translational workforce contributes to research. This effort also includes mining of acknowledgements section of publications to harvest existing contributor roles to serve as “ground truth” and demonstrate that the population of the ontology with actual data is successful and drives additional development.
Work in support of this problem statement will produce a Contribution Role Ontology to represent the types of contributions that a person makes, whether at a micro or macro scale and investigate the use of annotation files to better represent the types of digital products created in the research workflow. The two components together will support “roll up” or transitivity of contributions.
Work completed to date includes: piloting of an early version of a contribution ontology in the OpenVIVO platform and we have hosted preliminary community engagement and stakeholder relationship building events to define requirements. We have released an ontology of the existing CRediT taxonomy to support the extension and interoperability, started recruiting collaboration partners and stakeholders, explored the concept of annotation files to support machine-operable data, and cross-mapping of several digital object type schema from research information systems. We have also established our teams and project workflows.
Also completed in parallel, we will adapt the existing relational database model for acknowledgements to the current version of the ontology. This will include reconfiguring the extraction engine to store ontology-compliant triples, followed by iterative cycles of
- Consult the community on the utility of extracted data and where next to focus attention
- Extending the set of extraction rules
- Mapping new data into the ontology & triplestore
- Assess the ontology for gaps
- Extend the ontology as necessary to provide modeling coverage
Improved representation of roles and outputs in systems will enable better recognition and crediting of work and improve our ability to make more meaningful connections between people, their roles and work, the outputs, and outcome/impacts. The combined work of the two projects will greatly enhance the CTSA program’s ability to recognize a broad spectrum of contributions and provide the program with concrete data regarding those contributions in a form directly usable by the hubs.
Expected outputs (6 months):
- Contribution Role Ontology released & enhanced
- CRO ready to pilot in research information systems
- Better understanding of how to address research output types and versioning of objects in the context of unique identifiers
- Attribution workshop and community building
- Local guide to support attribution in CTS at the hub level
- A large knowledge base of contribution / attribution data available for use by CTSA hubs