Pfizer and SciBite collaborate to pioneer a new machine learning approach to document classification
For large pharmaceutical enterprises, knowledge transfer is crucial for successful integration of external research projects or commercial acquisitions into the enterprise. However, cataloging and integrating a myriad of free-text documents with internal data management systems is a challenge. To help pharmaceutical enterprises overcome this challenge, Joe Mullen, Senior Informatics Scientist at SciBite, and Steve Pen, Medicinal Sciences Information Strategy Lead at Pfizer, have jointly pioneered a new approach that leverages advanced machine learning and natural language processing techniques for integrating these free-text documents with internal data management systems. This will help pharmaceutical enterprises derive maximum benefit from external research projects or from their strategic acquisitions.
Conventionally, to integrate regulatory documents into an enterprise, a two-step process is required: (1) extracting the key metadata (such as title, study compound, and study species) and (2) aligning the documents to a hierarchy (such as the eCTD M4 hierarchy). Accomplishing this task manually is incredibly cumbersome, time consuming, and can result in inconsistencies due to multiple full time employees being assigned the task. Furthermore, the complexity of the task is exacerbated as the format of these regulatory documents varies from one enterprise to another.
To address this problem, Pfizer and SciBite collaborated on the development of ClassifR. ClassifR is a web-based tool that combines SciBite’s named entity recognition (NER) platform with novel machine learning approaches to automate the process of metadata extraction and align incoming documents against a user-defined hierarchy, such as the M4 hierarchy. In addition, it provides a web user interface (WUI) as well as a RESTful API for systematic access and integration with other tools.
Click here to read the implications of the work developed as a part of this project.