FAST DATA SCIENCE LTD

Example of how the named entity recognition software can recognise and resolve molecule names.

Identifying molecules in clinical literature

The drug known as Aspirin is still a trademark of Bayer in some countries. In a scientific paper it could appear under acetylsalicylic acid, 2-acetoxybenzenecarboxylic acid, C9H8O4, or a number of identifiers such as DB00945. There could also be identifiers that refer to other molecules, or identifiers that refer to only one version of a molecule.

Because of these pathological effects, the task of identifying names of proteins, genes and molecules in scientific literature is fraught with difficulty.

Fast Data Science has developed several tried and tested techniques to disambiguate these terms. Usually we need a number of annotated examples to start with, and we will train a machine learning model to learn from these examples and annotate new publications as they come in.

https://fastdatascience.com/finding-molecules-and-proteins-in-scientific-literature/

Company details

Industry:
Healthcare and medical