Transforming PV Efficiencies with ML and Rule-based Automation

Dr. Alejandra Guerchicoff

Head Industry Leader, TCS ADD™ Safety, TCS

Tejas Almelkar

Product Manager, TCS ADD™ Safety, TCS, Life Sciences

Narayanan R.

Chief Architect and Head of Clinical & Safety Platforms, TCS ADD™, TCS

Industry

Life Sciences

Solution

TCS ADD™

Highlights

The use of the rule-based and machine learning approaches is transforming the pharmacovigilance (PV) landscape, providing high efficiencies in safety case processing.

The organizations need to have a deep and clear understanding of the rule-based and ML-based approaches towards PV automation.

Implementing a cognitive automation platform provides a powerful tool for a proactive PV approach and effective risk management.

Introduction

Rule-based and machine learning (ML) models are widely utilized in pharmacovigilance (PV) to process safety report data.

Both models have advantages and disadvantages and contribute differentially time-wise when applied to the PV process. Many life sciences organizations are exploring artificial intelligence (AI)-driven PV platforms to automate the processing of safety data reports.

A well-designed AI program strategy is crucial for developing a PV business case. Shifting from a traditional pharma process to an AI-led approach invites disruption and requires the leadership to understand its business value. Before implementing an AI-driven PV platform in any organization, its business leadership must set clear goals, define a robust business case for investment and financial commitment, and adopt an agile approach to design and scale up pilots. To succeed, it is essential to commit to long-term strategic investments as well as be agile to adapt to plans within the evolving technology landscape.

In PV, safety report data comes from structured formats like XML and from unstructured formats such as emails. Extracting information from these formats is a challenging activity. As mentioned above, there are primarily two methodologies for achieving efficiency in case processing:

Rule-based approach
Machine learning approach

An AI-driven PV platform may use both approaches during automated case processing. Given the variety of automation techniques available, it is important to find the right technology combination to support an effective PV automation program. It is also imperative to analyze the cost of implementation in terms of time and resources while finalizing an automation strategy.

In this article, we discuss how these approaches are used in tandem to enhance PV efficiencies in processing safety data.

ROI in PV automation

“Efficient automation” uses appropriate and calibrated combination of different technologies to provide significant return-on-investment (RoI) on PV process automation.

The following are the critical areas of case processing where automation is potentially able to show a significant impact:

Data intake: Converting source documents (PDF, MS Excel, MS Word) into formats that can be processed by any safety database
Data extraction: Extracting data from source documents and mapping them to the relevant case attributes in the corresponding final report
Data enrichment: Determination of attributes and performing the following activities:
- Duplicate check
- Identification of case validity
- Case priority and seriousness
- Submission of cases to respective partners
- Automated narrative generation

Benefits of efficient automation:

Efficient automation, by its very nature, provides many benefits over the traditional process. A few of them include:

Production-ready: Availability of out-of-the-box automation results in efficiency and efficacy improvement from Day 1 of deployment.

Ease of implementation: A short implementation cycle requires limited human and technical resources.

Continuous improvement: Automation algorithms get better trained with every case processed, resulting in enhanced accuracy.

Scalability & Flexibility: Minimal changes in the PV operational landscape and case processing under all circumstance including,

Increase in the number of Safety cases
New repot formats
New products in the market/pipeline

Ultimately, implementing a cognitive automation case processing platform provides PV teams with a powerful tool to assess and evaluate potential safety signals in a timely manner, and results in a more proactive and efficient approach to signal detection and effective risk management.

Automation methods

Let’s look at the two automation approaches widely used in pharmacovigilance.

1) Rule-based approach

A rule-based system produces predefined outcomes and determinations that are based on a set of certain deterministic rules based on coded predefined conventions. Such a system applies rules to store and handle data.

Based on the input conditions and rules, the output is determined. In other words, a rule-based system is a logical program that uses predefined rules to perform automated actions, as shown in Table 1 below.

Rule-based systems perform high-volume activities, thereby freeing up human resources for handling more complex tasks, and include the creation and usage of the following:

Deterministic rules-based on a predefined convention
Optical character recognition (OCR)

And techniques such as:

Named entity recognition (NER) patterns
Dictionary lookups
Regular expressions

Rule-based approach

Example

Deterministic rules-based on a predefined convention

Age calculated from the patient’s date and time of birth (DOB).

Optical Character Recognition (OCR)

Optical Character Recognition (OCR) is a technology for extracting data from a scanned document or image file and then converting the text into a machine-readable format.

It is used to extract data from PDF forms like MedWatch 3500, CIOMS, or any other sponsor specific documents.

Named entity recognition (NER) patterns

Pattern is an arrangement of identifiable text elements in a particular sequence.

Sample input text: “The patient was treated with drug X for lung cancer”.

The pattern identifies the words “treated”, ” drug X” as a drug, “lung cancer” as a medical term etc.

Based on the NER pattern, the application concludes that the medical term “lung cancer” is an indication.

Output: Lung cancer is an indication

Dictionary lookups

Synonym dictionaries for medical terms, products, etc.

Global and customer-specific dictionaries.

Regular expressions

A regular expression is a pattern that describes a set of strings that matches a pattern.

Example: Regular expressions are used to identify dates in text.

A numeric string of the DD-MON-YEAR pattern is identified as a date using a regular expression.

Table 1 – Rule-based approach

2) Machine learning approach

The machine learning system uses statistical techniques to enable applications to learn without being explicitly programmed.

It creates its own set of rules based on the data it gets trained on.

ML systems are based on the probabilistic approach (as against the deterministic approach of the rule-based algorithms).

A modern, open, flexible, and well-designed AI-driven PV platform should include several models that simulate case-processing activities like humans. These models include various ontologies comprising patterns as well as phrases analogous to human intellect.

The ML approach has two phases: (a) the Modeling phase, and (b) the Operational phase

a) Modeling phase: In this phase, the AI models are trained using the “labeled” data. The labelled data is the input data, which is tagged with the correct output.

The modeling phase, as shown in Fig. 1, consists of the following steps:

Obtaining a labeled data set
Training the models on the labeled data set
Validation of the models
Model changes based on validation results

Model training, validation, and update is a cyclical process that is repeated till the model reaches its optimized performance.

Figure 1- The modeling phase showing an ML-based approach in pharmacovigilance.

In the cyclical process of the modeling phase in an ML-based approach in pharmacovigilance, first comes obtaining a labeled dataset. The next step is to move towards training the models on the labeled dataset, followed by the validation of models. Lastly, the model changes are executed based on the validation results. This process is repeated till the model reaches its optimized performance.

b) Operational phase: After the models are built and evaluated, they enter the operational phase consisting of the below steps, as shown in Fig. 2.

Figure 2 - The operational phase of an ML-based approach in pharmacovigilance.

Continuous learning is an essential feature of the operational phase. Once the validated system is deployed, the performance is monitored and the models are re-trained as required.

As part of the performance evaluation, monitoring takes place through reconciliation of the information from safety cases processed by a machine with the cases amended or changed by the human user. The output from the automated application is compared with the cases that have been quality and medical reviewed. The reconciliation of pre- and post- reviewed cases provides inputs to re-train the models.

Thus, the ML models continue to get trained with more cases being processed, resulting in enhanced accuracy.

Step-up automation

The use of both rule-based and ML approaches is essential for efficient automation.

Case processing is divided into different activities, and automation is applied to each activity. The activities and type of automation applied (rule-based and ML-based) are given below in Table 2.

Case processing activity	Rule-based approach	Machine learning approach
Data intake	Digitization through optical character recognition (OCR) tools. Business rules for case validity, case priority, and case seriousness. Case triage	AI/ML-based OCR tools that extract data from documents of different formats.
Data extraction	Data extraction of different fields, including but not limited to patient tab, laboratory data, event description, product tab, causality and relationship, assessment, and listedness based on various rule-based techniques.	Usage of AI/ML models to identify and map various field attributes. The trained AI/ML models are strengthened as part of continuous learning, resulting in increased accuracy of extraction.
Duplicate check	Incoming case to be flagged as initial/follow-up/duplicate/new based on predefined rules.	Using trained AI/ML models, the application updates the key parameters used for duplicate search and the weightages associated with those parameters.
Auto narrative generation	Narrative is generated automatically based on predefined templates.	Uses trained AI/ML models to select the correct narrative template based on the case data. The narrative is then automatically generated based on the selected template.
Submission & reporting	Generates ad hoc and customized report and output dissemination to predefined users based on predefined rules.

Table 2 - Efficient automation with rule-based and ML approaches

Conclusion

Improved PV data processing and analysis with powerful methods and algorithms, like the rule-based approach and ML models, eliminate manual burdens and provide enhanced efficiencies compared to the traditional PV model. The ever-increasing volume of safety reports results in failure to handle adverse events timely, thus compromising patient and consumer safety and health. During the last five years, we have seen a rapid advancement and adoption of cognitive technologies. However, there are concerns about the quality of information, how accurately the AI/ML platforms can interpret safety report data, how to justify decisions made by AI, and the investment required for procuring these new systems. In this article, we discussed that, despite the challenges, when we apply rule-based approach and ML approaches simultaneously to the same case processing steps, the benefits of automation are significantly higher.

About the authors

Dr. Alejandra Guerchicoff

Dr. Alejandra Guerchicoff is a Ph.D. in Molecular Genetics with postdoctoral training in Molecular Cardiology and Genetics of Cardiac Arrhythmias. She is working as an Industry Advisor for TCS ADD™ at TCS. Dr. Guerchicoff possesses a rich experience of more than 20 years in the domain of clinical research and post-marketing pharmacovigilance.

Write to me

Tejas Almelkar

Tejas Almelkar currently works as a Product Manager for TCS ADD™ Safety at TCS. He comes with extensive experience in pharmacovigilance, banking, and healthcare domains. In his current role, Tejas is working on leveraging artificial intelligence and machine learning to create applications that could benefit the society.

Write to me

Narayanan R.

Narayanan heads the Clinical and Safety products of TCS ADD™. He works closely with multiple pharma and biotech companies to enable the adoption of digital products and solutions and transform their drug development processes and systems.

Write to me

Contact

TCS is here to make a difference through technology.

We’re in it for good, driving positive change for the benefit of all.

Extraordinary expertise leads to remarkable results.

Want to be a global change-maker? Join our team.

Find the latest news about TCS in our Newsroom

Recent Press releases

Recent News

Recent recognitions

Upcoming events

TCS works hand in hand with world-leading investors.

TCS is here to make a difference through technology.

We’re in it for good, driving positive change for the benefit of all.

Extraordinary expertise leads to remarkable results.

Want to be a global change-maker? Join our team.

Find the latest news about TCS in our Newsroom

Recent Press releases

Recent News

Recent recognitions

Upcoming events

TCS works hand in hand with world-leading investors.

Next-gen pharmacovigilance transformation model

Industry

Solution

Highlights

On this page

Introduction

ROI in PV automation

Step-up automation

Conclusion