Skip to content

Latest commit

 

History

History
15 lines (8 loc) · 2.97 KB

8.md

File metadata and controls

15 lines (8 loc) · 2.97 KB

Project 8: Characterising and classifying the images contained within case reports

Description

This project proposes to explore classification of the images/figures in the biomedical literature, focussing on those contained within case reports. Case reports are a type of publication prevalent in the medical and veterinary literature which describe one, or sometimes two cases, giving a description of the patient(s), symptoms, diagnoses, treatments and follow up and outcomes. Case reports may contain a variety of different kinds of figure, including photographs of patient presentation, medical imaging, micrographs, plots, and diagrams. Part of the project will be analysing broadly and categorising the figures present in this type of literature. We will use the corpus of case reports and figures available in Europe PMC, an ELIXIR core resource. This corpus includes the figures (in image formats), their captions, and the article text (in XML/JSON format), allowing use of the context around mentions of the figure to be used as part of classification.

Europe PMC is a digital archive for life sciences and biomedical research literature, providing access to an extensive collection of approximately 43 million scientific articles, of which 9.2 million are full-text. Europe PMC serves a broad spectrum of community groups, including researchers, biological database curators, funders, developers, data scientists, text miners and the interested public. It facilitates efficient searching through content enrichments powered by text mining and data analysis tools. All article metadata and added value information are made available via open APIs, providing flexibility in accessing data for large scale analyses.

PubMed and PubMed Central are widely used in the biomedical literature domain, but primarily focus on text and currently do not support image or related content searches. Open access publishers, such as PLOS, Frontiers and eLife, have not yet explored this area either. Google Scholar also favours text over figures or tables. Preprint platforms for early research findings offer fast PDF uploads and only BioRxiv and MedRxiv generate full-text XML, but this is limited and their search tool lacks image-based search features. Furthermore, available imagery classification does not meet the specific demands of the biomedical community. In spite of its vast collection of annotated images, ImageNet (https://www.image-net.org/) mainly centres on generic image classification. Scientific research demands nuanced classifications, which its current format does not cater to.

As an outcome of this project, we would like to develop a proof of concept search engine for case report figures. This will allow users to perform queries like finding MRIs for patients with cervical subluxation, or micrographs of Reed-Sternberg cells. If successful, this could potentially form the basis for a future feature of Europe PMC, thereby increasing its utility to the medical community.

Leads

Matt Jeffryes, Tim Beck