AI Testing Framework: Building Trustworthy AI Systems

Sivakumar S

Innovation Evangelist, TCS

Srinivasa Rao Chalamala

Senior Scientist, TCS

Dr Sitarama Brahmam

Principal Consultant, TCS

What we do

TCS Research

Highlights

A testing framework is required to establish trust in AI systems, improving the security, privacy, explainability, bias, and calibration of models.
A proposed mechanism helps evaluate various non-functional aspects of testing using methods and tools to measure the reliability of AI models.
The paper also recommends setting up an expert panel to standardize the testing and governance of AI models in order that they conform to various aspects of trust.

Establishing the element of trust in artificial intelligence (AI) models

A mechanism to standardize testing and measure the reliability of AI models.

AI helps achieve super efficiency and overall competency in business applications. However, with AI adoption comes the problem of challenges or concerns around establishing trust. There is need for a research-backed AI testing framework that uses methods and processes to improve the security, privacy, explainability, bias, and calibration of AI models—a mechanism that evaluates non-functional aspects of testing. The framework explored must comprise methods and tools from state-of-the-art and existing technologies to measure the reliability of AI models.

Currently, such a framework’s scope is limited to testing image recognition, object detection models in computer vision, text classification, and sequence-to-sequence models in natural language processing. The aim is to extend the realm to tabular data systems and help standardize the testing and governance of AI models at all levels.

AI-specific testing methodologies and standards

Software testing methodologies and standards used for AI systems remain inadequate.

State-of-the-art (SOTA) software testing methodologies are still evolving and limited or restricted to certain data types. The prominent SOTAs include waterfall methodology, verification and validation, and incremental, spiral, extreme programming and agile methodologies.

While methodologies are approached from a product development or management perspective, software testing is classified into two types based on functionality:

Functional testing: It tests the behavior and usability of the software, verifying the input and output data. There are many established standards for functional testing; however, they do not cover all the requirements for testing AI models.

Non-functional testing: This testing assesses the application’s compliance with non-functional requirements; namely, security, performance, reliability, scalability, etc. It covers areas that functional testing does not address.

Non-functional testing aspects to evaluate software systems

A graphic showing six non-functional testing parameters: performance, efficiency, reliability, accountability, portability, and security. While functional testing tests the behavior and usability of the software, verifying the input and output data, non-functional testing involves assessing the application’s compliance with requirements such as security, performance, reliability, etc.

At present, there are no pre-defined tools and procedures to test AI models for non-functional aspects such as trust, robustness, privacy, explainability, resilience, etc. Not all methods and value ranges required for trust-related aspects fit the parameters for AI testing.

As a result, there is a widespread need to quantify trust aspects, define the ranges of validity, and assess models for compliance. A comprehensive mechanism that could test AI models evaluating these aspects is imperative.

Likewise, established standards have evolved for non-functional software testing. The latest ISO 25010 covers eight quality characteristics: functional suitability, reliability, performance efficiency, usability, security, compatibility, maintainability, and portability. Despite such standards in place, non-functional testing requirements fail to meet expected norms, as the quality assurance model does not cover some of the unique characteristics that AI systems possess.

Challenges and concerns

Use of AI models has raised ethical and legal concerns.

Most AI models or black boxes are impenetrable. They have some inherent testing challenges, such as data inconsistencies, vulnerability to manipulation, insufficient training data, inexplicable predictions and outcomes, imprecise calibrations, and so on. The use of black box models has raised ethical and legal concerns, including inadequate transparency, privacy concerns, and biases in data utilized for training, among others.

Here are some critical concerns related to AI systems:

Challenges and concerns related to AI systems

The use of AI models has raised ethical and legal concerns. The graphic lists five critical problems related to AI systems. 1. AI-enabled citizen scoring: AI techniques are applied to obtain individual scores to improve business sales, violating fundamental rights. 2. Covert AI systems: End users are unaware whether they interact with machines or humans. 3. Patient privacy: AI systems should respect patient privacy and avoid collecting and using data without consent. 4. Autonomous vehicles: Instances of autonomous car accidents while on autopilot have been noted. These could be a cause of model inversion attacks or data poisoning. 5. Social media: Incidents of customer data privacy breaches involving social media giants indicate the need for regulatory control.

A scheme to secure trust in AI systems

A mechanism that aims to standardize the testing and governance of AI models.

The AI testing framework is built factoring certain processes, leading to APIfication that will help test the non-functional aspects in AI systems:

Processes involved in structuring the AI testing framework

The AI testing framework is built by factoring certain processes, leading to APIfication that will help test the non-functional aspects in AI systems. The processes listed in the graphic are as follows:  1. Identify popular state-of-the-art methods to evaluate models 2. Map application requirements  3. Identify metrics  4. Prescribe safe and optimal ranges 5. Evaluate models using new methods 6. APIfication

The framework evaluates models for their non-functional requirements based on the following aspects:

Compliance: This covers industrial and regulatory governance aspects. It includes policies, processes, procedures, checklists, tools, and guidelines to check on the inner aspects of the framework.

Ethical AI: A reliable model should be ethical. Privacy, societal and environmental concerns, and AI governance are key aspects that are assessed.

An AI model needs to ensure that user privacy is respected. It should avoid collecting data or using sensitive personal information for training purposes without the user’s consent. Systems are prone to model inversion attacks, and private inferencing is one such method that can prevent obtaining sensitive data.

AI systems also assist in addressing critical environmental issues. From this perspective, one should evaluate its entire life cycle in an environment-friendly manner. To build a sustainable model, one must ensure that training consumes the minimal energy possible, as well as consider societal and social concerns while developing new methods for equitable and fair practices.

Moreover, data collection and processing should be fair and without bias as they influence the outcomes. While there are no straightforward solutions to address it, there are ways to measure, detect, and mitigate biases. Calibration is critical, especially in clinical and financial use cases, as it measures the difference between a model’s computed output probability and accuracy.

Trustworthy AI: The trustworthiness of AI models rests on the credence and accuracy of predictions, especially for businesses such as healthcare, banking, and insurance. Decisions based on analytical projections can have significant implications for such industries.

A graphical representation of an AI testing framework

An AI testing framework is depicted in a circular graphic. Compliance is the outermost ring comprising of industrial and regulatory governance aspects such as policies, processes, procedures, guidelines, checklists, and tools. Ethical AI is the first inner layer in the circle which covers aspects such as privacy, societal and environmental concerns, and AI governance. Trustworthy AI forms the second layer within the first one, encompassing human-centred approach and explainability. Robust AI forms the core of the framework as the third layer within the first two, covering aspects such as resilience, calibration, performance, and fairness.

Let’s take an example to understand the process involved in structuring the framework. Consider an application that takes image data, computes it, and then provides an output classifying the image. It is now essential to verify that the model is well-calibrated, ensures privacy, and is resilient. Following this, the metrics must first identify that they quantify the requirement and use non-functional testing procedures to improve the model. An API should then analyze the model to generate a report on the non-functional aspects.

The critical role of trust in AI systems for industries

The graphic features key points as to why trustworthiness of AI model matters across four key industries: 1.Banking and Insurance Non-explainability of loan recommendations Model confidence in fraud detection Regulatory requirements on explainability of models 2.Healthcare Digital surgeries need accurate models Military-based health data sources could be biased against males Patient data could be compromised in the event of a cyber attack 3.Transportation High accuracy of AI models not guaranteed in autonomous cars Human judgement is very different from how a machine makes a decision 4.Retail Trust is a critical factor in customer retention and cross-selling Privacy of users could be compromised Incorrect recommendations based on previous history of usage

Trust encompasses factors such as human-centricity and explainability. Human-centric values, principles, and perspectives are central to building AI systems, and safeguarding these is key to determining the trustworthiness of a model.

With advancements in deep learning and artificial neural networks, data privacy and the role of fundamental rights have become critical while collecting, training data, and interpreting sensitive information.

Explainability is the level to which humans can understand predictions and decisions made by AI models. The model should provide valid reasons behind automated decisions, as newer regulations expect the service provider to ensure this. Explainability will help uncover biases and promote fairness. It can also bring in more transparency, help identify weak spots, enable debugging, and help increase consumer trust in the system.

Robust AI: This forms the framework’s core, critical to every organization. Models must be robust to encompass factors such as resilience, calibration, bias, performance, and fairness. While adversarial training can help overcome the attacks, it can also negatively impact model performance. Adversarial defenses do not stop the attacker from trying new methods.

Raw data is susceptible to attack from insiders and outsiders. There could be situations where an attacker could gain access to an AI model and reconstruct the data. These types of attacks are known as model inversion attacks. A robust model should be able to resist intentional or unintended manipulations. A high level of security and accuracy is required even in the face of threats or changes in input. It must use several metrics for evaluation criteria instead of a single metric.

A proposal for better AI testing

A governing body will help institute common standards and processes.

AI models need a governance panel, comprising AI developers, government bodies, industry and consumer group representatives, and individual legal experts. The panel can be tasked with reviewing datasets to ensure that it delivers the requisite model outcomes.

Developing clear standards for the application of AI is vital to have an inside-out view of how models are designed and deployed in different contexts. The panel will also help adhere to the ethical principle of governance of AI systems. Overall, the idea is to reduce inequities and provide a lever to drive sustainable, ethical, and robust AI practices across industry.

Additional references:

Johner Institute, Do You Need to Be Aware of ISO/IEC TR 29119-11 on Testing AI/ML Software?, published May 6, 2021. https://www.johner-institute.com/articles/software-iec-62304/and-more/isoiec-tr-29119-11-on-testing-aiml-software/
SoftwareTestingHelp, Software Testing Methodologies For Robust Software Delivery, published December 5, 2022. https://www.softwaretestinghelp.com/types-of-software-testing/
European Commission, Ethics Guidelines For Trustworthy AI: High-Level Expert Group on Artificial Intelligence, published April 8, 2019. https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=60419

About the authors

Sivakumar S

Sivakumar is an innovation evangelist for cybersecurity and privacy research at TCS. He focuses on the packaging and positioning of IP-based assets to address business problems and supports research outcomes toward engineering and scale-up opportunities within business units.

Write to me

Srinivasa Rao Chalamala

Srinivasa is a senior scientist leading a trustworthy AI project under cybersecurity and privacy research at TCS. For the past four years, he has been working on understanding and evaluating security and privacy of deep-learning solutions and other trust aspects such as explainability, fairness, and bias in AI.

Write to me

Dr Sitarama Brahmam

Dr Brahmam is a principal consultant and the head of the Software Product Development (SPD) group at IP&E, Components Engineering Group, TCS. The SDP group evaluates TCS IP assets to comply with non-functional requirements in conventional and emerging technologies. His current interests include developments in AI; particularly, ethical and trustworthy AI. He is also a member of two ISO working groups.

Write to me

Where adaptability meets advantage

With you for the long run

Discover why customers choose TCS

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS in our Newsroom

Recent Press releases

Recent News

Global knowledge. Local support

Where adaptability meets advantage

With you for the long run

Discover why customers choose TCS

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS in our Newsroom

Recent Press releases

Recent News

Global knowledge. Local support

Institutionalizing trust in AI models

What we do

Highlights

In this article

Establishing the element of trust in artificial intelligence (AI) models

AI-specific testing methodologies and standards

Challenges and concerns

A scheme to secure trust in AI systems

A proposal for better AI testing

About the authors

Sivakumar S

Srinivasa Rao Chalamala

Dr Sitarama Brahmam

Adaptability starts here

Find out more

Where adaptability meets advantage

With you for the long run

Discover why customers choose TCS

Upcoming events

Recent recognitions

Want to be a global change-maker? Join our team.

Find the latest news about TCS in our Newsroom

Recent Press releases

Recent News

Global knowledge. Local support

What we do

Highlights

In this article

Establishing the element of trust in artificial intelligence (AI) models

AI-specific testing methodologies and standards

Challenges and concerns

A scheme to secure trust in AI systems

A proposal for better AI testing

About the authors

Sivakumar S

Srinivasa Rao Chalamala

Dr Sitarama Brahmam

Related reading

Adaptability starts here

Find out more

Accessibility Adjustments"