HomeCoding & DevelopmentCaptum · Model Interpretability for PyTorch

Captum · Model Interpretability for PyTorch Product Information

Captum · Model Interpretability for PyTorch is an open-source, generic library for interpretability research that enables researchers and engineers to understand and benchmark neural network predictions across modalities such as vision and text. Built on PyTorch, Captum supports most PyTorch models with minimal modification and provides a flexible API to implement and evaluate attribution algorithms. The project emphasizes extensibility, reproducibility, and ease of integration into existing PyTorch workflows.


Overview

  • Multi-Modal interpretability: supports interpretability across different data modalities (e.g., vision, text).
  • PyTorch-based: designed to work seamlessly with PyTorch models and workflows.
  • Extensible and open source: generic library that enables researchers to implement and benchmark new attribution algorithms.
  • Reproducible examples: includes tutorials and runnable code snippets to help users get started quickly.

How to Get Started

  1. Install Captum (recommended via conda):
  • conda install captum -c pytorch
  • or via pip: pip install captum
  1. Create and prepare a model in PyTorch and switch it to evaluation mode: use a simple example like a small feed-forward network or any custom model.
  2. Define input and baseline tensors to compare model outputs against a baseline (e.g., zeros).
  3. Choose and apply an attribution algorithm (e.g., Integrated Gradients) to compute attributions.
  4. Inspect results to understand which features contributed most to the prediction and analyze convergence behavior.

Example (Integrated Gradients)

  • Create a toy model and set it to eval mode.
  • Fix randomness for deterministic results.
  • Define input and baseline tensors.
  • Instantiate the attribution algorithm (e.g., IntegratedGradients).
  • Compute attributions and convergence delta.
  • Print or visualize the results.

Tutorials and Docs

  • Introduction
  • Getting Started
  • Tutorials
  • API Reference
  • Legal, Privacy, Terms
  • Community and license: © 2025 Facebook Inc.

Core Features

  • PyTorch-based interpretability: integrates with existing PyTorch models and training code
  • Supports multiple attribution algorithms (e.g., Integrated Gradients, others) to attribute predictions to input features
  • Works with various modalities (vision, text, etc.)
  • Deterministic workflows with seed control for reproducible results
  • Lightweight API designed for easy experimentation and benchmarking of new methods
  • Comprehensive tutorials and API reference for quick onboarding
  • Open-source and extensible: easily implement and benchmark new attribution methods