Sci-fi movies were her trigger into the world of AI and deep learning.
“I was always intrigued by the sci-fi genre – from superheroes to robots, and futuristic tech to time travel. It made me curious about what technology might be able to bring to the real world,” says TCS researcher Monika Sharma. The scientist currently holds a total of 35 granted patents, no mean feat and a number that’s higher than her age.
Technologies like autonomous vehicles and biometric verification pulled Monika into exploring her interest in software programming and game development, and eventually led to a graduation from Punjab Engineering College with a Bachelor’s in engineering in computer science.
With a total of 35 granted patents to her name, TCS scientist Monika Sharma holds more patents than her age. Visual intelligence and deep learning are her core current areas of focus.
Her internship projects in college included building digital sudoku puzzles and 3D mobile racing games, and were what eventually led Monika to a research role within a computer-vision team at TCS in 2012.
Today, she leads a team that works in the field of deep visual intelligence using computer vision, machine learning, and deep learning technologies to push the envelope in innovation. Her pioneering work resulted in her most recent patents, among which include a tool to extract details from scanned images in documents to convert them into readable formats; using AI to detect lesions in organs in the human body by piecing together the many minuscule frames that comprise a CT scan; COVID-19 diagnosis using X-ray images; and a robotic process automation system to navigate screen-based workflows.
Monika not only has 35 patents granted, but 36 that are pending approval.
Her quest to bring technology as an answer to resolve complex business problems has led to at least 27 of her research papers being published. She has a total of 71 patents filed across geographies.
Her claim to intellectual property apropos her patents are primarily in the fields of image analysis and visual intelligence. And both these concepts are related, she says. The former lays the groundwork by processing visual data, while the latter takes it a step further, involving a more holistic understanding and interpretation of visual information.
Monika’s team is credited for its significant contributions in developing use cases, winning hackathons, and being winners in contests where teams compete to build the best proofs of concept (PoC) based on a problem statement provided.
In one such hackathon, for instance, her team developed a solution to identify empty shelves in a retail store. The task was to pinpoint vacant spaces on shelves from a given image, using visual intelligence, to replenish inventory in a time-bound manner. In another tech challenge that involved building a PoC, her team developed a visual intelligence solution to authenticate signatures on documents from a reference set of images.
Continued pioneering work in visual intelligence, impacting business, industry, and life.
At TCS Research, a team headed by Monika has built an AI-lead platform to extract details from scanned document images and convert them into user-friendly formats such as CSV files.
The platform uses deep learning-based vision algorithms to preprocess (remove inconsistencies from) image documents to help identify varied elements such as tables and text. It features several integrated modules, including a document cleaning suite, handwritten and printed text recognition, and the capability to extract data from tabular structures.
A path-breaking exploration by Monika Sharma’s team led to the development of a universal lesion detection system that uses AI to identify lesions across human organs by analyzing within seconds the multiple tiny frames in a single CT scan.
Another stride her team has made in visual intelligence is in developing learning-based techniques for medical image analysis using artificial intelligence. In this, the AI platform identifies diseases from skin-lesion images captured by a camera; the uniqueness is that a smaller number of labeled images (marking specific details on an image) are required to train the deep learning model.
In a different related instance, their exploration led to the development of a universal lesion detection system that can identify lesions across human organs using AI to analyze multiple tiny frames from a single CT scan. The intelligence works thus: AI reads the images, identifies the lesion, and detects patterns, if any. So, a task that would otherwise take a human radiologist hours to complete now presents findings in minutes.
The same team has also developed an automated X-ray image-based COVID-19 diagnostic tool. The prototype was designed to help with early diagnosis, isolate high-risk patients, and prevent spread. The tool was proposed as an alternative and precautionary measure should standard COVID-19 tests not be available.
Building smarter and creative applications with large language models and generative AI.
Monika’s team has also devised tools to automate business processes, such as form filling, customer service, invoice processing, and back-office operations. As part of this technology enablement, a robotic process automation system automates screen-based workflows. It utilizes deep vision analysis (visual analysis using deep learning technology) to identify varied field types on a digital screen. Large language models then use this information to generate the navigation workflow to complete the user request.
The team has also worked on generative AI to build creative images using diffusion models based on a text prompt. Existing text-to-image generation prompts can generate only one or two personalized elements in an image. In contrast, the developed tool can give inputs for multiple personalized subjects, such as humans, pets, and objects. This novel approach generates multiple customized concepts in the image and can be used for personalized advertisements, marketing, or creative content creation such as comic strips, across digital media.
Talking about the future of AI-powered systems, Monika says, “Platforms can be tailored to individual preferences, potentially revolutionizing things like personalized content recommendation and healthcare diagnostics. While responsible development and ethical considerations are needed, systems will become faster and more capable of real-time analyses, benefiting applications like robotics, augmented reality, and real-time surgery assistance.”
Monika’s work in the world of visual intelligence continues to push the boundaries of technological research. It tells us that the world of AI, fascinating as it is, is fashioned by the mind of a human. Monika Sharma’s is one such.