Interpretability

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Subhadip Mitra

Dec 28 '25

I Trained Probes to Catch AI Models Sandbagging

#llm #interpretability #agents #machinelearning

6 min read

Arvind SundaraRajan

Oct 18 '25

Peeking Under the Hood: Unlock AI Secrets Beyond Activations by Arvind Sundararajan

#machinelearning #ai #xai #interpretability

2 min read

Ali Khan

Aug 11 '25

AI Frontiers: Advances in Efficient, Robust, and Universal Machine Learning – Synthesizing Key Themes from August 2025 a

#machinelearning #efficiency #robustness #interpretability

8 min read

Ali Khan

May 13 '25

Frontiers in Computer Vision: Interpretability, Efficiency, Robustness, and Unified Learning in the Era of Deep AI Advan

#computervision #interpretability #neurosymbolicai #multimodallearning

8 min read

Ali Khan

May 13 '25

Frontiers in Computer Vision: Interpretability, Efficiency, Robustness, and Unified Learning in the Era of Deep AI Advan

#computervision #interpretability #neurosymbolicai #multimodallearning

8 min read

Giovanna

Feb 4 '25

Klarity – Open-source tool to analyze uncertainty/entropy in LLM output (github.com/klara-research)

#opensource #ai #deepseek #interpretability

1 min read

Ahsan Mangal 👨🏻‍💻

Apr 22 '23

Choosing a Suitable Model for Our Data within the Machine Learning Development Process

#machinelearning #interpretability #modelselecting #dataanalysis

3 min read

Andreas Messalas for Code4Thought

Oct 3 '19

Using explanations for finding bias in black-box models

#machinelearning #interpretability #fairness #bias

6 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# interpretability

I Trained Probes to Catch AI Models Sandbagging

Peeking Under the Hood: Unlock AI Secrets Beyond Activations by Arvind Sundararajan

AI Frontiers: Advances in Efficient, Robust, and Universal Machine Learning – Synthesizing Key Themes from August 2025 a

Frontiers in Computer Vision: Interpretability, Efficiency, Robustness, and Unified Learning in the Era of Deep AI Advan

Frontiers in Computer Vision: Interpretability, Efficiency, Robustness, and Unified Learning in the Era of Deep AI Advan

Klarity – Open-source tool to analyze uncertainty/entropy in LLM output (github.com/klara-research)

Choosing a Suitable Model for Our Data within the Machine Learning Development Process

Using explanations for finding bias in black-box models