Rule-based and machine learning (ML) models are widely utilized in pharmacovigilance (PV) to process safety report data.
Both models have advantages and disadvantages and contribute differentially time-wise when applied to the PV process. Many life sciences organizations are exploring artificial intelligence (AI)-driven PV platforms to automate the processing of safety data reports.
A well-designed AI program strategy is crucial for developing a PV business case. Shifting from a traditional pharma process to an AI-led approach invites disruption and requires the leadership to understand its business value. Before implementing an AI-driven PV platform in any organization, its business leadership must set clear goals, define a robust business case for investment and financial commitment, and adopt an agile approach to design and scale up pilots. To succeed, it is essential to commit to long-term strategic investments as well as be agile to adapt to plans within the evolving technology landscape.
In PV, safety report data comes from structured formats like XML and from unstructured formats such as emails. Extracting information from these formats is a challenging activity. As mentioned above, there are primarily two methodologies for achieving efficiency in case processing:
Rule-based approach
Machine learning approach
An AI-driven PV platform may use both approaches during automated case processing. Given the variety of automation techniques available, it is important to find the right technology combination to support an effective PV automation program. It is also imperative to analyze the cost of implementation in terms of time and resources while finalizing an automation strategy.
In this article, we discuss how these approaches are used in tandem to enhance PV efficiencies in processing safety data.
“Efficient automation” uses appropriate and calibrated combination of different technologies to provide significant return-on-investment (RoI) on PV process automation.
The following are the critical areas of case processing where automation is potentially able to show a significant impact:
Data intake: Converting source documents (PDF, MS Excel, MS Word) into formats that can be processed by any safety database
Data extraction: Extracting data from source documents and mapping them to the relevant case attributes in the corresponding final report
Data enrichment: Determination of attributes and performing the following activities:
Duplicate check
Identification of case validity
Case priority and seriousness
Submission of cases to respective partners
Automated narrative generation
Benefits of efficient automation:
Efficient automation, by its very nature, provides many benefits over the traditional process. A few of them include:
Production-ready: Availability of out-of-the-box automation results in efficiency and efficacy improvement from Day 1 of deployment.
Ease of implementation: A short implementation cycle requires limited human and technical resources.
Continuous improvement: Automation algorithms get better trained with every case processed, resulting in enhanced accuracy.
Scalability & Flexibility: Minimal changes in the PV operational landscape and case processing under all circumstance including,
Increase in the number of Safety cases
New repot formats
New products in the market/pipeline
Ultimately, implementing a cognitive automation case processing platform provides PV teams with a powerful tool to assess and evaluate potential safety signals in a timely manner, and results in a more proactive and efficient approach to signal detection and effective risk management.
Automation methods
Let’s look at the two automation approaches widely used in pharmacovigilance.
1) Rule-based approach
A rule-based system produces predefined outcomes and determinations that are based on a set of certain deterministic rules based on coded predefined conventions. Such a system applies rules to store and handle data.
Based on the input conditions and rules, the output is determined. In other words, a rule-based system is a logical program that uses predefined rules to perform automated actions, as shown in Table 1 below.
Rule-based systems perform high-volume activities, thereby freeing up human resources for handling more complex tasks, and include the creation and usage of the following:
Deterministic rules-based on a predefined convention
Optical character recognition (OCR)
And techniques such as:
Named entity recognition (NER) patterns
Dictionary lookups
Regular expressions
Rule-based approach |
Example |
Deterministic rules-based on a predefined convention |
Age calculated from the patient’s date and time of birth (DOB). |
Optical Character Recognition (OCR) |
Optical Character Recognition (OCR) is a technology for extracting data from a scanned document or image file and then converting the text into a machine-readable format. It is used to extract data from PDF forms like MedWatch 3500, CIOMS, or any other sponsor specific documents. |
Named entity recognition (NER) patterns
|
Pattern is an arrangement of identifiable text elements in a particular sequence.
Sample input text: “The patient was treated with drug X for lung cancer”.
The pattern identifies the words “treated”, ” drug X” as a drug, “lung cancer” as a medical term etc.
Based on the NER pattern, the application concludes that the medical term “lung cancer” is an indication.
Output: Lung cancer is an indication
|
Dictionary lookups
|
Synonym dictionaries for medical terms, products, etc.
Global and customer-specific dictionaries. |
Regular expressions
|
A regular expression is a pattern that describes a set of strings that matches a pattern.
Example: Regular expressions are used to identify dates in text.
A numeric string of the DD-MON-YEAR pattern is identified as a date using a regular expression. |
Table 1 – Rule-based approach
2) Machine learning approach
The machine learning system uses statistical techniques to enable applications to learn without being explicitly programmed.
It creates its own set of rules based on the data it gets trained on.
ML systems are based on the probabilistic approach (as against the deterministic approach of the rule-based algorithms).
A modern, open, flexible, and well-designed AI-driven PV platform should include several models that simulate case-processing activities like humans. These models include various ontologies comprising patterns as well as phrases analogous to human intellect.
The ML approach has two phases: (a) the Modeling phase, and (b) the Operational phase
a) Modeling phase: In this phase, the AI models are trained using the “labeled” data. The labelled data is the input data, which is tagged with the correct output.
The modeling phase, as shown in Fig. 1, consists of the following steps:
Obtaining a labeled data set
Training the models on the labeled data set
Validation of the models
Model changes based on validation results
Model training, validation, and update is a cyclical process that is repeated till the model reaches its optimized performance.
b) Operational phase: After the models are built and evaluated, they enter the operational phase consisting of the below steps, as shown in Fig. 2.
Continuous learning is an essential feature of the operational phase. Once the validated system is deployed, the performance is monitored and the models are re-trained as required.
As part of the performance evaluation, monitoring takes place through reconciliation of the information from safety cases processed by a machine with the cases amended or changed by the human user. The output from the automated application is compared with the cases that have been quality and medical reviewed. The reconciliation of pre- and post- reviewed cases provides inputs to re-train the models.
Thus, the ML models continue to get trained with more cases being processed, resulting in enhanced accuracy.
The use of both rule-based and ML approaches is essential for efficient automation.
Case processing is divided into different activities, and automation is applied to each activity. The activities and type of automation applied (rule-based and ML-based) are given below in Table 2.
Case processing activity |
Rule-based approach |
Machine learning approach |
Data intake |
Digitization through optical character recognition (OCR) tools. Case triage |
AI/ML-based OCR tools that extract data from documents of different formats. |
Data extraction |
Data extraction of different fields, including but not limited to patient tab, laboratory data, event description, product tab, causality and relationship, assessment, and listedness based on various rule-based techniques. |
Usage of AI/ML models to identify and map various field attributes. |
Duplicate check |
Incoming case to be flagged as initial/follow-up/duplicate/new based on predefined rules. |
Using trained AI/ML models, the application updates the key parameters used for duplicate search and the weightages associated with those parameters. |
Auto narrative generation |
Narrative is generated automatically based on predefined templates. |
Uses trained AI/ML models to select the correct narrative template based on the case data. The narrative is then automatically generated based on the selected template. |
Submission & reporting |
Generates ad hoc and customized report and output dissemination to predefined users based on predefined rules. |
|
Table 2 - Efficient automation with rule-based and ML approaches
Improved PV data processing and analysis with powerful methods and algorithms, like the rule-based approach and ML models, eliminate manual burdens and provide enhanced efficiencies compared to the traditional PV model. The ever-increasing volume of safety reports results in failure to handle adverse events timely, thus compromising patient and consumer safety and health. During the last five years, we have seen a rapid advancement and adoption of cognitive technologies. However, there are concerns about the quality of information, how accurately the AI/ML platforms can interpret safety report data, how to justify decisions made by AI, and the investment required for procuring these new systems. In this article, we discussed that, despite the challenges, when we apply rule-based approach and ML approaches simultaneously to the same case processing steps, the benefits of automation are significantly higher.