Google's What-If Tool: Addressing AI Bias Issues

You can click on the Spotify podcast for a TL/DR. The sources generated for this post were done using Gemini 2.5 pro. The podcast was produced by NotebookLM.

1. Introduction

1.1. The Imperative for Transparency and Fairness in Machine Learning

Machine learning (ML) systems are increasingly integrated into the fabric of modern society, driving decisions in critical domains such as finance, healthcare, hiring, and content moderation.¹ While these systems offer unprecedented capabilities for automation and prediction, their growing influence raises significant concerns regarding transparency, accountability, and fairness. Opaque “black-box” models can make it difficult to understand why a particular decision was made, hindering debugging, trust, and regulatory compliance.³ More critically, ML models trained on historical data can inadvertently learn and perpetuate existing societal biases, leading to discriminatory outcomes against certain demographic groups.¹ There are documented instances where AI systems have exhibited bias related to gender, race, or age in areas like recruitment, credit scoring, and facial recognition, potentially causing significant harm and reinforcing inequalities.⁶

Building effective and responsible ML systems, therefore, necessitates more than simply optimizing for predictive accuracy. It demands a rigorous process of examination and interrogation, where practitioners act as “detectives” to probe model behavior across diverse inputs and subgroups.⁴ Understanding how models respond to changes in data, identifying performance disparities across different populations, and assessing alignment with fairness principles are crucial steps.² The growing focus on AI ethics and regulations, such as the EU AI Act, further underscores the need for tools and methodologies that support the development of fair, transparent, and accountable AI.²

1.2. Introducing the What-If Tool (WIT): A Solution for Interactive Model Probing

Addressing the challenge of understanding ML model performance and behavior, particularly concerning fairness, requires accessible and effective tools. The What-If Tool (WIT) emerged as a significant contribution in this space, developed as part of Google’s People + AI Research (PAIR) initiative.⁴ PAIR focuses on advancing the research and design of human-centered AI systems, emphasizing interpretability, fairness, and collaboration.³ WIT embodies these principles by providing an open-source, interactive visual interface designed to help practitioners probe, visualize, and analyze trained ML models with minimal or no coding required.⁴

The core aim of WIT is to offer a simple, intuitive, yet powerful means to explore model behavior.¹² It allows users to test model performance in hypothetical scenarios, analyze the influence of different data features, visualize model outputs across various subsets of data, compare multiple models, and critically, evaluate model performance according to different fairness metrics.¹¹ This focus on code-free analysis represents a deliberate effort to democratize model understanding. By lowering the technical barrier to entry compared to writing custom analysis scripts, WIT enables a broader set of stakeholders—including data scientists, developers, researchers, product managers, and potentially non-programmers—to participate in the crucial process of examining, evaluating, and debugging ML systems.⁴

1.3. Report Objectives and Scope

This report provides an expert-level, comprehensive analysis of the What-If Tool. It aims to elucidate WIT’s architecture, core functionalities, and operational mechanisms. Particular emphasis is placed on detailing how WIT can be leveraged for assessing fairness concerns and exploring potential biases within machine learning models. The analysis synthesizes information drawn from official documentation, research publications, tutorials, technical guides, and usage examples associated with the tool. The objective is to equip readers with a deep understanding of WIT’s capabilities and limitations, enabling informed decisions about its potential application in responsible AI development workflows.

2. The What-If Tool: An Overview

2.1. Purpose and Core Goals

The What-If Tool (WIT) is fundamentally designed as a visual interface for the interactive probing and understanding of trained machine learning models.¹⁰ Its primary purpose is to move beyond aggregate performance metrics and allow users to delve into how and why a model makes specific predictions under various conditions. The core goals driving WIT’s functionality include:

Understanding Performance Across Diverse Inputs: Facilitating the examination of model behavior not just on average, but across a wide range of specific inputs and scenarios.⁴
Hypothetical Scenario Testing: Enabling users to ask “what-if” questions by manipulating input data points and observing the resulting changes in model predictions.⁴
Feature Importance Analysis: Helping users analyze the impact and importance of different data features on model outcomes, often through techniques like partial dependence plots.¹¹
Comparative Visualization: Allowing visualization of model behavior across different subsets of input data (e.g., demographic groups) and supporting the direct comparison of multiple models trained on the same dataset.¹¹
Fairness Assessment: Providing tools to measure and visualize model performance according to various machine learning fairness metrics, particularly for classification tasks.¹¹

2.2. Target Users and Accessibility

WIT is designed for a broad audience involved in the machine learning lifecycle. This includes data scientists who build and train models, ML engineers deploying systems, researchers investigating model properties, and product managers or business stakeholders who need to understand model behavior and its implications.⁴ A key design principle is accessibility, achieved through its interactive visual interface that minimizes the need for coding during the analysis phase.⁴ By integrating into common data science environments like notebooks, WIT aims to fit naturally into existing workflows.¹²

2.3. Integration Platforms and Environment

WIT offers flexibility in its deployment environment. It is primarily designed for use within interactive notebook environments:

Jupyter Notebooks ¹²
Google Colaboratory (Colab) Notebooks ¹²
Google Cloud AI Platform Notebooks ¹²

While initially accessible as a plugin within TensorBoard ¹², this integration path has evolved. The core WIT code was moved to its own repository ²⁰, and the TensorBoard plugin is no longer actively maintained as of TensorBoard 2.11, with notebook-based usage or the newer Language Interpretability Tool (LIT) often recommended instead.²⁰ However, using WIT within TensorBoard remains possible if models are served via TensorFlow Serving.¹² The shift towards notebook environments aligns WIT more closely with typical interactive data analysis workflows and cloud-based ML platforms. WIT can also be integrated as a plugin in other platforms, such as Hopsworks (Briefing Doc I).

2.4. Supported Models and Data Types

WIT demonstrates significant versatility in the types of models and data it can analyze:

Models:
- TensorFlow Estimators (Briefing Doc I).
- Models served via TensorFlow Serving (using Classify, Regress, or Predict APIs).¹²
- Models deployed on Google Cloud AI Platform, including those built with TensorFlow, XGBoost, and Scikit-learn.¹⁶
- Effectively, any classification or regression model that can be wrapped in a custom Python prediction function.¹⁴ This abstraction is key to WIT’s broad applicability beyond the TensorFlow ecosystem. It interacts with models via standardized inputs and prediction outputs, rather than relying on specific internal model structures.
Tasks:
- Binary Classification ⁴
- Multi-class Classification ⁸
- Regression ⁴
Data:
- Tabular data.⁴
- Image data (often expects images encoded in a bytes feature, e.g., named “image/encoded”).¹⁴
- Text data.¹⁶

A core requirement, particularly for non-Cloud AI Platform or non-custom function usage, is that the input data must be formatted as TensorFlow Example protocol buffers, often stored in TFRecord files.¹² While wrappers can handle conversion ¹⁹, this dependency on the tf.Example format represents a potential integration step for users whose data pipelines utilize different formats. For Cloud AI Platform integration or custom functions, formats like lists of JSON dictionaries or lists may also be acceptable.¹⁸

3. Navigating the What-If Tool: Interface and Core Functionalities

3.1. Interface Structure: Workspaces, Modules, and Playgrounds

The What-If Tool presents users with a structured interface designed for interactive exploration. Typically, the layout consists of a left-hand panel containing configuration options and interactive “modules,” and a larger right-hand panel serving as a “playground” for visualizing data points, performance metrics, or feature distributions. Navigation between different modes of analysis is primarily handled through three main tabs, each opening a distinct workspace: “Datapoint Editor,” “Performance” (or “Performance and Fairness”), and “Features Overview”.

3.2. The “Datapoint Editor” Workspace

This workspace serves as the default view and is centered around the inspection and manipulation of individual data points. Its key functionalities allow for granular model understanding:

Prediction Inspection: Users can see the model’s prediction (e.g., classification label, confidence score, regression value) for every data point loaded into the tool.¹⁴ Selecting a point reveals its detailed feature values in the side panel.¹⁴
Datapoint Editing and Re-inference: This is a cornerstone of WIT’s “what-if” capability. Users can manually edit any feature value for a selected data point directly in the interface, add or remove features, or even upload a new image if working with image data. Crucially, they can then re-run inference on this modified point with a single click and immediately see how the model’s prediction changes.⁴ A history of edits and resulting inferences for a point is maintained.¹⁴ Users can also clone data points for comparative editing.¹⁴
Counterfactual Exploration: WIT facilitates the discovery of counterfactual examples – data points that are minimally different from a selected point but yield a different prediction from the model.⁴ It can automatically find the “nearest” counterfactual based on L1 or L2 distance (normalized across features, customizable via code).¹⁴ Comparing a point to its nearest counterfactual side-by-side can illuminate the model’s decision boundaries and feature sensitivities near those boundaries.⁴
Partial Dependence Plots (PDPs): For a selected data point, WIT can automatically generate PDPs.⁴ These plots visualize how the model’s prediction for that specific point changes as the value of a single feature is systematically varied across its possible range, holding other features constant.¹⁴ This provides insight into the local sensitivity of the model to individual features.¹²
Custom Visualizations (Facets Dive Integration): The main panel leverages Facets Dive ⁴ to display the dataset visually. Users can arrange data points into various custom layouts, such as scatter plots or histograms, using different features for axes, coloring, or faceting.¹⁴ Points can be organized based on their feature values or inference results (e.g., predicted label, confidence, correctness, error magnitude), enabling visual discovery of patterns and correlations.¹⁴

The combination of direct editing, counterfactual generation, and PDPs within this workspace provides a powerful interactive loop. Users can form hypotheses about model behavior based on aggregate views or specific examples, then immediately test those hypotheses by manipulating individual data points and observing the consequences, fostering a deeper, more intuitive understanding of local model mechanics.

3.3. The “Performance” / “Performance and Fairness” Workspace

This workspace shifts the focus from individual data points to aggregate model performance evaluation across the entire dataset or specific subsets. For binary classification models, this tab is enhanced and labeled “Performance and Fairness” to include specific fairness-related analyses.

Performance Slicing: A key capability is the ability to slice the dataset based on the values of one or more features and evaluate model performance metrics separately for each slice.⁴ This is crucial for understanding if the model performs differently for distinct subgroups within the data (e.g., different demographic groups, different ranges of a numerical feature).⁸ Performance between slices can be directly compared.¹²
Metric Visualization: The tool displays standard evaluation metrics relevant to the task type. For classification, this can include accuracy, confusion matrices, and precision-recall curves.¹⁴ For regression, it might show error distributions or scores.¹⁴
Threshold Adjustment (Binary Classification): For binary classifiers, users can interactively adjust the classification threshold that determines the boundary between predicting the positive and negative class.⁴ This adjustment can be applied globally or, significantly, independently for different data slices. As the threshold changes, users can observe the real-time impact on metrics like precision, recall, false positive rates, false negative rates, and derived fairness metrics. This allows for exploration of the trade-offs inherent in choosing an operating point for the classifier.
Fairness Optimization Strategies (Binary Classification): Building on threshold adjustment, the “Fairness” module allows users to apply specific fairness optimization strategies.¹⁷ This typically involves adjusting thresholds per slice to try and satisfy certain fairness constraints, such as achieving demographic parity or equal opportunity (based on the concepts discussed in ⁴). Users can configure parameters like cost ratios to guide these optimizations. This feature makes the abstract concepts of fairness metrics and post-processing adjustments tangible and explorable.
Model Comparison: If two models were provided during setup, WIT enables a side-by-side comparison of their performance and fairness characteristics within this workspace.¹⁴ Users can directly see which model performs better overall, or how they differ in their treatment of various data slices or according to fairness metrics.

This workspace bridges the gap between individual point analysis and overall model assessment. By enabling slicing and threshold manipulation, particularly for fairness analysis, it allows users to move beyond single performance numbers and investigate potential disparities and the effects of simple post-hoc interventions.

3.4. The “Features Overview” Workspace

This workspace is dedicated to understanding the characteristics of the input dataset itself.

Feature Distribution Visualization: It provides visualizations showing the distribution of values for each feature in the loaded dataset. This often utilizes Facets Overview ¹⁴ to display histograms for numerical features and bar charts for categorical features, along with summary statistics (e.g., count, missing values, min/max, mean/median).¹⁴
Dataset Understanding: The primary purpose is to help users quickly grasp the properties of their data, such as the range and distribution of numerical values, the frequency of different categories, the presence of missing data, and the overall balance of the dataset across different features.⁴ This can be valuable for identifying potential data quality issues or imbalances before extensive model analysis, or even before model training.¹⁶
Simplicity: Unlike the other workspaces, the Features Overview typically does not have associated interactive modules for configuration; its main function is displaying the pre-computed feature statistics and distributions in the playground area.

By leveraging the capabilities of Facets Overview, this workspace provides an essential data exploration component within WIT, complementing the model-centric analyses of the other tabs.

4. Leveraging WIT for Fairness Analysis and Bias Mitigation

4.1. The Imperative of Fairness in Machine Learning

The concept of fairness in machine learning centers on the principle that AI systems should treat individuals and groups equitably, avoiding the creation or amplification of discriminatory biases.¹ As AI influences critical decisions affecting people’s lives—such as loan approvals, job applications, medical diagnoses, and criminal justice assessments—ensuring fairness is not merely an ethical consideration but often a legal and societal necessity.² Biased AI systems can perpetuate historical inequalities, lead to disparate impacts on vulnerable populations, erode trust, and incur significant regulatory and reputational risks.¹

However, defining and measuring “fairness” is complex and highly context-dependent.²² There is often no single definition that satisfies all perspectives or situations. Several quantitative fairness metrics have been proposed, each capturing a different notion of equity ²:

Group Unaware: This approach advocates for ignoring sensitive attributes (like race or gender) entirely during model prediction, aiming for decisions based purely on other “merit” features.²² However, this can be problematic if historical biases are encoded in the seemingly neutral features, leading to indirect discrimination.
Statistical Parity (Demographic Parity): Requires that the probability of receiving a positive outcome is the same across different protected groups.² For example, loan approval rates should be equal for different racial groups. Mathematically, P(Outcome=1∣Group=A)=P(Outcome=1∣Group=B).² While intuitive, achieving this might require accepting qualified candidates from one group at a lower rate than another if base rates differ, and can sometimes conflict with maximizing overall accuracy.
Equal Opportunity: Focuses on fairness for individuals who genuinely qualify for the positive outcome. It requires that the True Positive Rate (TPR, or recall) is equal across groups.² In other words, among all individuals who should receive a loan (i.e., would repay it), the model approves an equal percentage from each group.²²
Equalized Odds: A stricter criterion demanding equality of both True Positive Rates (TPR) and False Positive Rates (FPR) across groups.⁶ This aims to ensure the model makes errors (both false positives and false negatives) at the same rate for all groups.
Equal Accuracy: Simply requires that the overall classification accuracy is the same for all groups.²² This might mask underlying issues, such as different types of errors being more prevalent in different groups.

Other metrics like Predictive Parity (equal precision across groups) and Treatment Equality (balancing error rate ratios) also exist.² Critically, applying and interpreting these metrics requires careful consideration of the specific context, potential trade-offs with accuracy, the limitations of relying on a single metric, and the dynamic nature of fairness over time.²

4.2. WIT’s Role in Identifying Potential Bias

The What-If Tool provides several features that directly aid in the diagnosis and identification of potential fairness issues and biases within trained models:

Performance Slicing by Sensitive Attributes: This is arguably WIT’s most direct feature for fairness assessment. By using the Performance (& Fairness) workspace, users can segment the dataset based on sensitive features (e.g., race, gender, age group) provided in the data and compare standard performance metrics (accuracy, precision, recall, FPR, confusion matrices) across these slices.⁴ Observing significant disparities in these metrics between groups is a primary indicator of potential bias and allows for quantitative assessment relative to metrics like Statistical Parity, Equal Opportunity, or Equalized Odds.
Visualizing Outcome Distributions: In the Datapoint Editor, the Facets Dive integration allows users to create visualizations (e.g., scatter plots, histograms) where data points are colored by prediction outcome (e.g., approved/denied) and faceted or positioned by sensitive attributes.¹⁴ This can provide an intuitive visual representation of whether predictions are disproportionately distributed across different groups.
Counterfactual Analysis for Bias Probing: The counterfactual feature offers a powerful micro-level probe for bias.⁴ If the nearest counterfactual example (one with a different prediction) differs from the original point primarily by a sensitive attribute or a known proxy, it raises a red flag about the model’s reliance on that attribute.⁴ Similarly, manually editing only a sensitive attribute in the Datapoint Editor and observing whether the prediction flips provides direct evidence of the model’s sensitivity to that feature, which can be indicative of bias.⁴ This aligns with the concept of counterfactual fairness, which posits that a prediction should ideally remain stable if only a sensitive attribute were changed.²⁴
Feature Importance Analysis (via PDPs): While Partial Dependence Plots show local feature sensitivity ⁴, examining them for sensitive attributes or their proxies can reveal if the model heavily relies on potentially problematic features to make its predictions. High sensitivity might suggest a pathway through which bias influences outcomes.

4.3. WIT’s Role in Exploring Fairness Interventions (Post-Hoc Analysis)

Beyond diagnosis, WIT enables users to interactively explore the potential impact of certain post-processing fairness interventions, primarily through threshold adjustments:

Threshold Adjustment for Fairness Optimization: The ability to modify the classification threshold independently for different data slices in the Performance & Fairness tab is a key mechanism for exploring fairness-accuracy trade-offs.⁴ By observing how metrics change as thresholds are adjusted per group, users can simulate strategies aimed at achieving specific fairness goals (e.g., equalizing TPRs for Equal Opportunity, or balancing TPRs and FPRs for Equalized Odds).⁴ WIT makes the consequences of these adjustments immediately visible, allowing users to understand the often-complex interplay between different fairness criteria and overall model performance.²²
Comparing Fair vs. Unfair Models: WIT’s model comparison feature allows users to load and directly compare a baseline model with a model that has undergone some form of bias mitigation (e.g., trained using pre-processing techniques like reweighting, or in-processing methods with fairness constraints).¹⁴ This enables a visual and quantitative assessment of the effectiveness of the mitigation strategy in terms of both fairness metrics and overall performance changes across different data slices.

4.4. Connecting WIT to Broader Bias Mitigation Strategies

It is important to situate WIT within the broader landscape of bias mitigation techniques. These strategies are often categorized into three main types ²:

Pre-processing: Modifying the training data before model training to remove or reduce bias. Examples include reweighting samples to balance group representation, resampling (oversampling minority groups or undersampling majority groups), or relabeling biased outcomes in the training data.⁷
In-processing: Incorporating fairness considerations directly into the model training process. This might involve adding fairness constraints to the optimization objective, using regularization terms to penalize unfairness, or employing techniques like adversarial debiasing where a secondary network tries to predict the sensitive attribute from the model’s representations, encouraging the primary model to become invariant to it.²
Post-processing: Adjusting the model’s outputs after training to improve fairness. The most common method is adjusting classification thresholds differently for different groups, as explored in WIT.² Other methods might involve calibrating scores or modifying predictions directly.

WIT primarily functions as a tool for diagnosis (identifying bias through slicing, visualization, and counterfactuals) and for exploring and evaluating post-processing interventions (specifically threshold adjustments). It does not perform pre-processing or in-processing mitigation itself. However, it is invaluable for assessing the results of such interventions by comparing the fairness and performance characteristics of models trained or modified using those techniques against baseline models.

The following table summarizes how specific WIT features facilitate the assessment of key fairness concepts:

Table 1: Key Fairness Concepts and How WIT Facilitates Their Assessment

Fairness Concept	Brief Definition	How WIT Helps Assess/Explore	Relevant WIT Features	Supporting Sources
Statistical Parity	Equal probability of positive outcome across groups.	Compare positive prediction rates across slices defined by sensitive attributes. Visualize outcome distributions faceted by group. Adjust thresholds per slice to approach parity.	Performance Slicing, Datapoint Visualization, Threshold Adjustment (Performance/Fairness Tab)	¹²
Equal Opportunity	Equal True Positive Rate (TPR) across groups.	Compare TPR (Recall) across slices in confusion matrices or performance metrics. Adjust thresholds per slice to equalize TPR.	Performance Slicing (Metrics View), Threshold Adjustment (Performance/Fairness Tab)	⁴
Equalized Odds	Equal TPR and Equal False Positive Rate (FPR) across groups.	Compare both TPR and FPR across slices. Use threshold adjustment per slice to explore trade-offs in balancing both rates simultaneously.	Performance Slicing (Metrics View), Threshold Adjustment (Performance/Fairness Tab)	⁶
Counterfactual Fairness	Model prediction should ideally not change if sensitive attribute were different.	Generate nearest counterfactuals; check if difference lies primarily in sensitive attribute. Manually edit sensitive attributes and observe prediction stability.	Counterfactual Exploration, Datapoint Editing (Datapoint Editor)	⁴
Group Performance Disparities	General differences in accuracy, error rates, or confidence across groups.	Slice data by sensitive attributes and compare overall accuracy, error distribution (e.g., using scatter plots in Datapoint Editor), average prediction confidence. Visualize feature distributions.	Performance Slicing, Datapoint Visualization, Features Overview	⁸

5. Practical Implementation Considerations

5.1. Setting Up WIT in Notebook Environments

Utilizing WIT within notebook environments like Jupyter, Colab, or AI Platform Notebooks involves a relatively standard workflow:

Installation: The witwidget package needs to be installed (e.g., via pip).²⁷ In some environments like AI Platform Notebooks with TensorFlow instances, it might come pre-installed.¹⁸
Configuration: The WitConfigBuilder class is the primary tool for configuring the WIT instance.¹⁴ This involves providing the dataset to be analyzed and specifying how WIT should obtain predictions from the model(s).
Data Provision: The dataset is typically provided as a Python list of examples. These examples should ideally be tf.Example protocol buffers ¹⁴, although for Cloud AI Platform integration or custom prediction functions, lists of JSON dictionaries or lists might also work.¹⁸ It is crucial to include the ground truth labels within these examples for performance and fairness analysis.¹² The feature name corresponding to the ground truth label should be specified in the configuration.¹²
Model Connection:
- For models deployed on Google Cloud AI Platform, use the set_ai_platform_model method, providing the Google Cloud project ID, model name, and model version.¹⁶
- For custom models (e.g., Keras, Scikit-learn running locally or not on AI Platform), use the set_custom_predict_fn method, passing a reference to a Python function that WIT can call to get predictions.¹⁹
- For comparing two models, use methods like set_compare_ai_platform_model or provide a second prediction function/endpoint during configuration.¹⁶
Instantiation: Finally, create an instance of the WitWidget class, passing the configured WitConfigBuilder object to it. Displaying this widget object in a notebook cell will render the interactive WIT interface.¹⁸

While the interaction within the WIT interface requires no coding, the initial setup, particularly data preparation and model connection for custom models, necessitates some programming effort.

5.2. Using WIT with Different Frameworks

WIT’s flexibility allows it to work with models beyond the core TensorFlow ecosystem:

TensorFlow: Native integration is smoothest, especially when models are already structured as TF Estimators or served via TensorFlow Serving.¹²
Keras: Keras models can be used within notebooks by writing a wrapper function.¹⁹ This function must:
1. Accept a list of input examples (typically serialized tf.Example protos provided by WIT).
2. Deserialize or convert these examples into the format expected by the Keras model’s predict method (e.g., NumPy arrays).
3. Call the Keras model’s predict method.
4. Format the returned predictions into the structure WIT expects (e.g., lists of scores or probabilities).¹⁹
XGBoost / Scikit-learn: Similar to Keras, these models can be integrated either by deploying them on Cloud AI Platform (where WIT has direct integration) ¹⁶ or by creating a custom Python prediction function wrapper for use in notebooks.¹⁹ The wrapper handles the data format conversion and prediction call.

The key enabler for this cross-framework compatibility is the abstraction layer provided by custom prediction functions and the Cloud AI Platform integration, which decouples WIT’s frontend analysis capabilities from the specific underlying model implementation.

5.3. Custom Prediction Functions

Custom prediction functions are essential for integrating WIT with models not hosted on TF Serving or Cloud AI Platform, or when custom inference logic is needed.¹⁹ They act as a bridge, translating between WIT’s data format (often TF Examples) and the model’s specific input/output requirements.¹⁹ These functions can be any standard Python function, offering immense flexibility.¹⁴ For instance, they could load a model file, perform pre-processing specific to the model, run inference, and even calculate additional per-datapoint metrics to be displayed alongside predictions within WIT.²¹

5.4. Data Requirements

Successful use of WIT hinges on providing data in the correct format:

Format: As mentioned, tf.Example protocol buffers are the standard format, especially for TensorBoard/TF Serving usage (often stored in TFRecord files) ¹² and frequently used in notebook setups.¹⁴ Lists of dictionaries or lists might suffice for Cloud AI Platform or custom functions.¹⁸
Ground Truth: Including the true target labels (ground truth) for each example is essential for calculating performance metrics and enabling fairness analysis features.¹²
Attributions (Optional): WIT can visualize pre-computed feature attributions (e.g., SHAP values). To enable this, the attribution scores for each input feature should be included as separate features within the tf.Example protos, using the naming convention attributions__<original_feature_name>.¹⁴ This integration allows WIT to serve as an interface for exploring local model explanations alongside predictions and fairness assessments, enhancing its utility for eXplainable AI (XAI).¹⁷

6. Limitations and Future Directions

6.1. Scope of Functionality

It is crucial to understand that the What-If Tool is primarily designed for post-hoc analysis, visualization, and interactive exploration of trained models.¹³ Its strengths lie in surfacing patterns, enabling “what-if” experimentation, comparing subgroups, and exploring threshold-based fairness adjustments. However, WIT does not perform automated bias detection – it requires users to identify disparities through exploration. Furthermore, it does not implement bias mitigation techniques directly within the model training process (in-processing) or by modifying the training data (pre-processing) [Insight 4.1]. While it allows exploration of post-processing threshold adjustments, the implementation of more complex mitigation strategies occurs outside the tool itself. WIT helps evaluate the results of mitigation but is not the mitigation tool itself.

6.2. The Importance of Context and Human Judgment

WIT provides powerful quantitative lenses through which to examine model behavior and fairness, but it does not eliminate the need for human judgment and contextual understanding.⁴ Fairness is inherently context-specific, and metrics alone cannot capture the full ethical or societal implications of a model’s deployment.²² Interpreting whether observed statistical disparities constitute actionable unfairness requires domain expertise, consideration of the application’s impact, and ethical reasoning.⁴ The tool effectively visualizes trade-offs—for example, between different fairness metrics or between fairness and accuracy—but the user must ultimately decide on the appropriate balance based on the specific goals and constraints of their project.² WIT surfaces evidence and facilitates exploration, but the responsibility for interpretation and decision-making remains firmly with the human user.

6.3. Technical Limitations and Dependencies

While powerful, WIT has practical limitations. The strong reliance on the tf.Example data format, particularly in non-cloud setups, can present an initial hurdle for teams whose data pipelines use different formats [Insight 2.3]. Setting up WIT for custom models requires writing wrapper functions, meaning the “no code” benefit primarily applies to the analysis phase, not necessarily the integration phase for all users [Insight 5.1]. Furthermore, while WIT can handle inference on large datasets, the interactive visualization component, being browser-based, might face performance challenges or rely on sampling when dealing with extremely large numbers of data points displayed simultaneously.

6.4. Evolution and Relation to Other Tools

The field of Responsible AI, including tools for interpretability and fairness, is rapidly evolving.³ WIT, launched around 2018-2019 ⁸, represents a significant contribution but should be viewed within this dynamic landscape. Google itself has introduced newer tools like the Language Interpretability Tool (LIT) ¹⁰, which may offer overlapping or complementary functionalities, particularly for NLP models. The ecosystem also includes numerous other open-source libraries and platforms focused on fairness and explainability, such as Fairlearn, AI Fairness 360, Fairness Indicators, SHAP, and others.³ The strategic shift of WIT away from active TensorBoard integration towards notebook and cloud environments also signals its ongoing evolution [Insight 2.1]. Practitioners should consider WIT as one valuable component within a potentially larger toolkit for building and evaluating responsible AI systems.

7. Conclusion

7.1. Summary of WIT’s Contributions

The What-If Tool stands out as a powerful and intuitive visual interface designed to demystify the behavior of trained machine learning models.⁴ Its core strength lies in enabling interactive exploration with minimal coding during the analysis phase. Key capabilities include detailed dataset visualization and feature analysis using Facets integration ⁴, granular probing of individual data points through editing, counterfactual generation, and partial dependence plots ⁴, robust performance analysis across user-defined data slices ¹², direct comparison of multiple models ¹⁴, and dedicated features for exploring fairness considerations, most notably through interactive threshold adjustment for binary classifiers.⁴

7.2. WIT’s Role in Responsible AI Development

WIT makes significant contributions to the practice of responsible AI development. By providing accessible tools to visualize performance disparities across subgroups and probe model sensitivity to specific features (including sensitive attributes), it directly facilitates the identification of potential biases and fairness issues.⁶ Its interactive nature allows practitioners to move beyond static metrics and gain a more intuitive understanding of how their models operate. Furthermore, by lowering the barrier to entry for model analysis, WIT fosters greater transparency and enables more informed, data-driven conversations among diverse stakeholders about model behavior, limitations, and alignment with fairness goals.⁴ It empowers teams to ask critical “what-if” questions about their models’ impact.

7.3. Final Thoughts

The What-If Tool is a valuable asset for any practitioner committed to developing machine learning systems responsibly. It provides essential capabilities for understanding model behavior in depth, identifying potential fairness concerns, and exploring the trade-offs involved in deploying AI in real-world contexts. While it is not a complete solution for bias mitigation—requiring careful human interpretation, contextual understanding, and integration within broader ethical frameworks and potentially other specialized tools—WIT significantly enhances the ability of developers and researchers to build, evaluate, and debug models with a greater focus on transparency and equity. Its emphasis on visual interaction makes the complex task of model interrogation more accessible and effective, ultimately contributing to the development of more trustworthy and beneficial AI systems.

8. References

(Note: The following list includes sources referenced via Snippet IDs and the Briefing Document/FAQs provided in the prompt. APA 7th Edition style is approximated for web content and reports. URLs are included as requested.)

AI Language Model. (2024, May 15). Briefing Document: Understanding and Analyzing Machine Learning Models with the What-If Tool and Fairness Metrics. [Internal Document].
AI Language Model. (2024, May 15). Frequently Asked Questions: Understanding and Analyzing ML Models with the What-If Tool and Fairness Metrics. [Internal Document].
Arize AI. (n.d.). Algorithmic Bias: Examples, Types & Tools. Arize AI Blog & Course. Retrieved May 15, 2024, from https://arize.com/blog-course/algorithmic-bias-examples-tools/ ²⁶
Aubin, N. (2022). L’impact des biais algorithmiques sur les droits et libertés de la personne en contexte canadien [Master’s thesis, Université de Montréal]. Papyrus. https://papyrus.bib.umontreal.ca/xmlui/bitstream/handle/1866/28164/Aubin_Nicolas_2022_Memoire.pdf?sequence=2&isAllowed=y ²⁹
Cloud Skills Boost. (n.d.). Bias Mitigation in AI. Google Cloud. Retrieved May 15, 2024, from https://www.cloudskillsboost.google/paths/17/course_templates/985/video/515625 ²⁴
Crescendo AI. (n.d.). AI Bias: 10 Examples & A Guide to Mitigation in 2024. Crescendo Blog. Retrieved May 15, 2024, from https://www.crescendo.ai/blog/ai-bias-examples-mitigation-guide ⁶
FACCT Conference. (2020). Accepted Tutorials. ACM Conference on Fairness, Accountability, and Transparency. Retrieved May 15, 2024, from https://facctconference.org/2020/acceptedtuts.html ¹⁷
Google. (n.d.). AI Principles Progress Update 2020. Google AI. Retrieved May 15, 2024, from https://ai.google/static/documents/ai-principles-2020-progress-update.pdf ³⁰
Google. (n.d.). People + AI Research (PAIR) Tools. Google PAIR. Retrieved May 15, 2024, from https://pair.withgoogle.com/tools/ ¹⁰
Google AI. (2018, September 6). The What-If Tool: Code-free probing of machine learning models. Google Research Blog. Retrieved May 15, 2024, from https://research.google/blog/the-what-if-tool-code-free-probing-of-machine-learning-models/ ⁴
Google Cloud Blog. (2019, October 29). Introducing the What-If Tool for Cloud AI Platform models. Retrieved May 15, 2024, from https://cloud.google.com/blog/products/ai-machine-learning/introducing-the-what-if-tool-for-cloud-ai-platform-models ¹⁸
Holistic AI. (n.d.). Technical Resources for Bias Mitigation in AI. Holistic AI Blog. Retrieved May 15, 2024, from https://www.holisticai.com/blog/technical-resources-bias-mitigation ⁹
Lightly AI. (n.d.). Bias in Machine Learning: Types, Examples and How to Address Them. Lightly AI Blog. Retrieved May 15, 2024, from https://www.lightly.ai/post/bias-in-machine-learning ²⁸
LF AI & Data Foundation. (2022, August 17). Bias and Fairness in Artificial Intelligence – Industrial Use Case. LF AI & Data Foundation Blog. Retrieved May 15, 2024, from https://lfaidata.foundation/blog/2022/08/17/bias-and-fairness-in-artificial-intelligence-industrial-use-case/ ⁷
Lumenova AI. (n.d.). Fairness and Bias in Machine Learning. Lumenova AI Blog. Retrieved May 15, 2024, from https://www.lumenova.ai/blog/fairness-bias-machine-learning/ ²³
Minnalearn Courses. (n.d.). Methods for assessing and documenting model bias and fairness. Advanced Trustworthy AI Course. Retrieved May 15, 2024, from https://courses.minnalearn.com/en/courses/advanced-trustworthy-ai/preview/bias-and-fairness-in-ai/methods-for-assessing-and-documenting-model-bias-and-fairness/ ²⁵
PAIR-code. (n.d.-a). What-If Tool. GitHub Repository. Retrieved May 15, 2024, from https://github.com/PAIR-code/what-if-tool ¹⁴
PAIR-code. (n.d.-b). What-If Tool. GitHub Pages Documentation. Retrieved May 15, 2024, from https://pair-code.github.io/what-if-tool/ ¹¹
PAIR-code. (n.d.-c). What-If Tool DEVELOPMENT.md. GitHub Repository. Retrieved May 15, 2024, from https://github.com/PAIR-code/what-if-tool/blob/master/DEVELOPMENT.md ²⁷
PAIR-code. (n.d.-d). Playing with AI Fairness. GitHub Pages Documentation. Retrieved May 15, 2024, from https://pair-code.github.io/what-if-tool/ai-fairness.html ²²
PAIR-code. (n.d.-e). ai-fairness.html source. GitHub Repository. Retrieved May 15, 2024, from https://github.com/PAIR-code/what-if-tool/blob/master/docs/ai-fairness.html ³¹
PAIR-code. (n.d.-f). Getting Started in TensorBoard. GitHub Pages Documentation. Retrieved May 15, 2024, from https://pair-code.github.io/what-if-tool/learn/tutorials/tensorboard/ ²¹
Russell, J. (2018, September 12). Google Releases Machine Learning “What-If” Analysis Tool. HPCwire. Retrieved May 15, 2024, from https://www.hpcwire.com/2018/09/12/google-releases-machine-learning-what-if-analysis-tool/ ⁸
Stack Overflow. (2019, February 1). How to use the What If Tool (WIT) provided by google in my Keras model. Retrieved May 15, 2024, from https://stackoverflow.com/questions/54403217/how-to-use-the-what-if-tool-wit-provided-by-google-in-my-keras-model ¹⁹
TensorFlow. (n.d.-a). Responsible AI Toolkit. TensorFlow Official Website. Retrieved May 15, 2024, from https://www.tensorflow.org/responsible_ai ³
TensorFlow. (n.d.-b). What-If Tool. TensorFlow Official Website. Retrieved May 15, 2024, from https://www.tensorflow.org/tensorboard/what_if_tool ¹²
Vats, R., Agrawal, S., & Chippada, S. S. (2025). Bias Detection and Fairness in Large Language Models for Financial Services. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 11(2), 1329-1345. https://doi.org/10.32628/CSEIT25112461 (Note: Also available on ResearchGate: https://www.researchgate.net/publication/389954152_Bias_Detection_and_Fairness_in_Large_Language_Models_for_Financial_Services) ⁵
Wexler, J. (n.d.). What-If Tool Google Group. Google Groups. Retrieved May 15, 2024, from https://groups.google.com/g/what-if-tool ²⁰
Wexler, J., Cai, C., Morgenstern, J., Wattenberg, M., Viégas, F., & Doshi-Velez, F. (2019). The What-If Tool: Interactive Probing of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics, 26(1), 56-65. https://doi.org/10.1109/TVCG.2019.2934619 (Note: Also available at: https://research.google/pubs/the-what-if-tool-interactive-probing-of-machine-learning-models/ and https://www.researchgate.net/publication/335354961_The_What-If_Tool_Interactive_Probing_of_Machine_Learning_Models) ¹³
Wexler, J., & Google Cloud Tech [Username]. (2019, October 29). Getting Started with the What-if Tool [Video]. YouTube. https://www.youtube.com/watch?v=qTUUwfG1vSs ¹⁶
Zdrok, O. (2024a, May 31). Fairness Metrics in AI: Your Step-by-Step Guide to Equitable Systems. Shelf.io Blog. Retrieved May 15, 2024, from https://shelf.io/blog/fairness-metrics-in-ai/ ²
Zdrok, O. (2024b). Author Profile. Shelf.io Blog. Retrieved May 15, 2024, from https://shelf.io/blog/author/oksana-zdrok/ ¹