Index of /images
Name
Last modified
Size
Description
Parent Directory
-
A-generative-benchmark-creation-framework-for-detecting-common-data-table-versions.png
2026-06-11 10:43
14K
A-Generative-Benchmark-Creation-Framework.png
2025-05-16 22:08
79K
A-Primer-on-the-Inner-Workings-of-Transformer-based-Language-Models.png
2026-06-11 10:43
95K
A-survey-on-mechanistic-interpretability-for-multi-modal-foundation-models.png
2026-06-11 10:43
149K
Abhinav-Bhatele.jpg
2025-11-07 21:01
348K
Activation-space-interventions-can-be-transferred-between-large-language-models.png
2026-06-11 10:43
35K
Activation-Steering-via-Generative-Causal-Mediation.png
2026-06-11 10:43
232K
ADAG-Automatically-Describing-Attribution-Graphs.png
2026-06-11 10:43
90K
adam.jpg
2025-09-29 23:45
119K
Alexander-Rush.jpg.jpeg
2025-05-16 22:08
22K
Annotating-the-Chain-of-Thought-A-Behavior-Labeled-Dataset-for-AI-Safety.png
2026-06-11 10:43
58K
apple-touch-icon.png
2025-05-16 22:08
38K
arjun.jpeg
2025-05-16 22:08
33K
Aurojit-Panda.png
2025-05-16 22:08
638K
Back-Attention-Understanding-and-Enhancing-Multi-Hop-Reasoning-in-Large-Language-Models.png
2026-06-11 10:43
6.0K
Benchmarking-Mental-State-Representations-in-Language-Models.png
2025-05-16 22:08
198K
Black-Box-Access-is-Insufficient-for-Rigorous-AI-Audits.png
2026-06-11 10:43
69K
BlueGlass-A-Framework-for-Composite-AI-Safety.png
2026-06-11 10:43
112K
Brett-Bode-Crop.png
2025-05-16 22:08
2.0M
Brett-Bode.jpg
2025-05-16 22:08
7.0M
byron.jpeg
2025-05-16 22:08
24K
Can-SAEs-reveal-and-mitigate-racial-biases-of-LLMs-in-healthcare.png
2026-06-11 10:43
66K
Can-you-map-it-to-English-The-Role-of-Cross-Lingual-Alignment-in-the-Multilingual-Performance-of-LLMs.png
2026-06-11 10:43
50K
carla.jpeg
2025-05-16 22:08
23K
Circuit-Tracer-A-New-Library-for-Finding-Feature-Circuits.png
2026-06-11 10:43
242K
Comgra-A-Tool-for-Analyzing-and-Debugging-Neural-Networks.png
2026-06-11 10:43
83K
Compassionate-AI-Design-Governance-and-Use.png
2026-06-11 10:43
183K
Competition-dynamics-shape-algorithmic-phases-of-in-context-learning.png
2026-06-11 10:43
161K
Constructive-Circuit-Amplification-Improving-Math-Reasoning-in-LLMs-via-Targeted-Sub-Network-Updates.png
2026-06-11 10:43
70K
Counting-Hypothesis-Potential-Mechanism-of-In-Context-Learning.png
2026-06-11 10:43
761K
david.jpeg
2025-05-16 22:08
27K
Decomposing-Theory-of-Mind-How-Emotional-Processing-Mediates-ToM-Abilities-in-LLMs.png
2026-06-11 10:43
83K
DeltaProduct-Improving-State-Tracking-in-Linear-RNNs-via-Householder-Products.png
2026-06-11 10:43
30K
DFWe-Efficient-Knowledge-Distillation-of-Fine-tuned-Whisper-Encoder-for-Speech-Emotion-Recognition.png
2026-06-11 10:43
117K
Discovering-Forbidden-Topics-in-Language-Models.png
2026-06-11 10:43
166K
Disentangling-meaning-from-language-in-LLM-based-machine-translation.png
2026-06-11 10:43
26K
Disentangling-Recall-and-Reasoning-in-Transformer-Models-through-Layer-wise-Attention-and-Activation-Analysis.png
2026-06-11 10:43
72K
Do-Language-Models-Use-Their-Depth-Efficiently.png
2026-06-11 10:43
35K
Do-Natural-Language-Descriptions-of-Model-Activations-Convey-Privileged-Information.png
2026-06-11 10:43
25K
Do-Transformers-Use-their-Depth-Adaptively-Evidence-from-a-Relational-Reasoning-Task.png
2026-06-11 10:43
164K
DreamReader-An-Interpretability-Toolkit-for-Text-to-Image-Models.png
2026-06-11 10:43
108K
eDIF-A-European-Deep-Inference-Fabric-for-Remote-Interpretability-of-LLM.png
2026-06-11 10:43
189K
Elucidating-Mechanisms-of-Demographic-Bias-in-LLMs-for-Healthcare.png
2026-06-11 10:43
64K
Emergence-of-Hierarchical-Emotion-Organization-in-Large-Language-Models.png
2026-06-11 10:43
67K
Emergence-of-Hierarchical-Emotion-Representations.png
2025-05-16 22:08
314K
emma.jpg
2025-05-16 22:08
458K
Evaluating-Open-Source-Sparse-Autoencoders-on-Disentangling-Factual-Knowledge-in-GPT-2-Small.png
2026-06-11 10:43
89K
Even-Heads-Fix-Odd-Errors-Mechanistic-Discovery-and-Surgical-Repair-in-Transformer-Attention.png
2026-06-11 10:43
130K
Evidence-of-Learned-Look-Ahead-in-a-Chess-Playing-Neural-Network.png
2026-06-11 10:43
169K
Explaining-Neural-Networks-with-Reasons.png
2026-06-11 10:43
28K
Explaining-the-Explainer-Understanding-the-Inner-Workings-of-Transformer-based-Symbolic-Regression-Models.png
2026-06-11 10:43
57K
Exploring-the-Limits-of-Probes-for-Latent-Representation-Edits-in-GPT-Models.png
2026-06-11 10:43
98K
Fine-Grained-Analysis-of-Shared-Syntactic-Mechanisms-in-Language-Models.png
2026-06-11 10:43
80K
Fluid-Representations-in-Reasoning-Models.png
2026-06-11 10:43
76K
Friends-and-Grandmothers-in-Silico-Localizing-Entity-Cells-in-Language-Models.png
2026-06-11 10:43
316K
From-Directions-to-Cones-Exploring-Multidimensional-Representations-of-Propositional-Facts-in-LLMs.png
2026-06-11 10:43
135K
From-Prompts-to-Patches-A-Vocabulary-for-Bridging-Interpretability-and-Interaction.png
2026-06-11 10:43
9.8K
Gabriele-Sarti.jpg
2026-06-11 10:43
16K
Heman-Shakeri.png
2025-11-07 21:01
439K
Hidden-Pieces-An-Analysis-of-Linear-Probes-for-GPT-Representation-Edits.png
2026-06-11 10:43
76K
Hidden-Pieces-An-Analysis-of-Linear-Probes.png
2025-05-16 22:08
332K
Hierarchical-Latent-Structures-in-Data-Generation-Process-Unify-Mechanistic-Phenomena-across-Scale.png
2026-06-11 10:43
32K
How-do-Llamas-process-multilingual-text-A-latent-exploration-through-activation-patching.png
2026-06-11 10:43
34K
How-do-llms-persuade-linear-probes-can-uncover-persuasion-dynamics-in-multi-turn-conversations.png
2026-06-11 10:43
60K
How-Open-Must-Language-Models-be-to-Enable-Reliable-Scientific-Inference.png
2026-06-11 10:43
116K
ICLR-In-Context-Learning-of-Representations.png
2026-06-11 10:43
97K
If-open-source-is-to-win-it-must-go-public.png
2026-06-11 10:43
120K
In-Context-Algebra.png
2026-06-11 10:43
96K
In-Context-Learning-Without-Copying.png
2026-06-11 10:43
20K
In-Which-Areas-of-Technical-AI-Safety-Could-Geopolitical-Rivals-Cooperate.png
2026-06-11 10:43
34K
Incremental-Sentence-Processing-Mechanisms-in-Autoregressive-Transformer-Language-Models.png
2026-06-11 10:43
89K
Incremental-Sentence-Processing-Mechanisms.png
2025-05-16 22:08
395K
Inference-Time-Decomposition-of-Activations-ITDA-A-Scalable-Approach-to-Interpreting-Large-Language-Models.png
2026-06-11 10:43
139K
Insights-into-a-radiology-specialised-multimodal-large-language-model-with-sparse-autoencoders.png
2026-06-11 10:43
293K
InterPLM-Discovering-Interpretable-Features-in-Protein-Language-Models-via-Sparse-Autoencoders.png
2026-06-11 10:43
259K
Interplm-Discovering-Interpretable-Features-in-Protein-LMs.png
2025-05-16 22:08
1.7M
Interpreto-An-Explainability-Library-for-Transformers.png
2026-06-11 10:43
129K
jaden.jpeg
2025-05-16 22:08
34K
Jailbreak-Strength-and-Model-Similarity-Predict-Transferability.png
2026-06-11 10:43
89K
Jailbreak-transferability-emerges-from-shared-representations.png
2026-06-11 10:43
89K
jon.jpeg
2025-05-16 22:08
29K
Jonelle-Bradshaw.jpg.jpeg
2025-11-07 21:01
1.2M
Katie-Cumiskey.jpg
2025-05-16 22:08
3.2M
Katina-Michael.jpg
2025-05-16 22:08
33K
Kelsey-Badger.jpg
2025-05-16 22:08
71K
LangFIR-Discovering-Sparse-Language-Specific-Features-from-Monolingual-Data-for-Language-Steering.png
2026-06-11 10:43
60K
Language-Models-Represent-Beliefs-of-Self-and-Others.png
2026-06-11 10:43
94K
Language-Models-use-Lookbacks-to-Track-Beliefs.png
2026-06-11 10:43
35K
Language-Models-Use-Trigonometry-to-Do-Addition.png
2026-06-11 10:43
20K
Language Models Use Trigonometry to Do Addition.png
2025-05-16 22:08
305K
Large-Language-Models-Share-Representations-Latent.png
2025-05-16 22:08
191K
Large-Language-Models-Share-Representations-of-Latent-Grammatical-Concepts-Across-Typologically-Diverse-Languages.png
2026-06-11 10:43
45K
Learning-a-Generative-Meta-Model-of-LLM-Activations.png
2026-06-11 10:43
99K
Learning-State-Tracking-from-Code-Using-Linear-RNNs.png
2026-06-11 10:43
41K
LLMs-Process-Lists-With-General-Filter-Heads.png
2026-06-11 10:43
103K
Localized-Cultural-Knowledge-is-Conserved-and-Controllable-in-Large-Language-Models.png
2026-06-11 10:43
55K
Locating-and-Editing-Factual-Associations-in-Mamba.png
2026-06-11 10:43
61K
Mathematical-Modeling-of-Common-Pool-Resources-A-Comprehensive-Review-of-Bioeconomics-Strategic-Interaction-and-Complex-Adaptive-Systems.png
2026-06-11 10:43
186K
Measuring-Mechanistic-Independence-Can-Bias-Be-Removed-Without-Erasing-Demographics.png
2026-06-11 10:43
102K
Michael-Simeone.png
2025-05-16 22:08
152K
michael.jpg
2025-09-29 23:45
168K
Model-Medicine-A-Clinical-Framework-for-Understanding-Diagnosing-and-Treating-AI-Models.png
2026-06-11 10:43
402K
Multi-property-Steering-of-Large-Language-Models-with-Dynamic-Activation-Composition.png
2026-06-11 10:43
69K
nairr-pilot-logo.svg
2026-06-11 10:43
1.5K
ncsa.png
2025-05-16 22:08
20K
ndif-fellowship.jpg
2025-05-16 22:08
161K
ndif-png.png
2026-06-11 10:43
22K
ndif-workshop-1.jpg
2025-05-16 22:08
1.8M
NDIF_Acr_color.png
2025-05-16 22:08
43K
NDIF_color.png
2025-05-16 22:08
64K
NDIF_system.png
2025-05-16 22:08
831K
newamerica.png
2025-05-16 22:08
33K
New_Venture_Fund.png
2025-11-07 21:01
72K
nnsight-png.png
2026-06-11 10:43
7.6K
nnterp-A-Standardized-Interface-for-Mechanistic-Interpretability-of-Transformers.png
2026-06-11 10:43
106K
northeastern-red-square.png
2025-05-16 22:08
22K
northeastern.svg
2025-05-16 22:08
4.3K
nsf.png
2025-05-16 22:08
47K
NSF_NDIF_color.png
2025-05-16 22:08
151K
Overcoming-Sparsity-Artifacts-in-Crosscoders-to-Interpret-Chat-Tuning.png
2026-06-11 10:43
53K
Patch-Explorer-Interpreting-Diffusion-Models-through-Interaction.png
2026-06-11 10:43
293K
Patches-of-Nonlinearity-Instruction-Vectors-in-Large-Language-Models.png
2026-06-11 10:43
79K
Penzai-Treescope-A-Toolkit-for-Interpreting-Visualizing-and-Editing-Models-As-Data.png
2026-06-11 10:43
193K
pit.png
2025-05-16 22:08
14K
pitun.png
2025-05-16 22:08
15K
Polo-Chau.jpg
2025-11-07 21:01
86K
Prem-Trivedi.jpg
2025-05-16 22:08
85K
Prisma-An-Open-Source-Toolkit-for-Mechanistic-Interpretability-in-Vision-and-Video.png
2026-06-11 10:43
41K
Provable-Low-Frequency-Bias-of-In-Context-Learning-of-Representations.png
2026-06-11 10:43
110K
Punctuation-and-Predicates-in-Language-Models.png
2026-06-11 10:43
98K
PyHealth-20-A-Comprehensive-Open-Source-Toolkit-for-Accessible-and-Reproducible-Clinical-Deep-Learning.png
2026-06-11 10:43
406K
pyvene-A-Library-for-Understanding-and-Improving-PyTorch-Models-via-Interventions.png
2026-06-11 10:43
125K
Representation-Shattering-in-Transformers-A-Synthetic-Study-with-Knowledge-Editing.png
2026-06-11 10:43
108K
Representation-Shattering-in-Transformers.png
2025-05-16 22:08
151K
reward-lens-A-Mechanistic-Interpretability-Library-for-Reward-Models.png
2026-06-11 10:43
49K
Robustly-identifying-concepts-introduced-during-chat-fine-tuning-using-crosscoders.png
2026-06-11 10:43
62K
Sarah-Wiegreffe.jpeg
2025-11-07 21:01
1.3M
Securing-External-Deeper-than-black-box-GPAI-Evaluations.png
2026-06-11 10:43
83K
Separating-Tongue-from-Thought-Activation-Patching-Reveals-Language-Agnostic-Concept-Representations-in-Transformers.png
2026-06-11 10:43
273K
Separating-Tongue-From-Thought-Activation-Patching.png
2025-05-16 22:08
212K
Signatures-of-human-like-processing-in-Transformer-forward-passes.png
2026-06-11 10:43
32K
Sparse-Autoencoders-for-Sequential-Recommendation-Models-Interpretation-and-Flexible-Control.png
2026-06-11 10:43
375K
Sparse-Autoencoders-Reveal-Temporal-Difference-Learning-in-Large-Language-Models.png
2026-06-11 10:43
103K
Steering-Fine-Tuning-Generalization-with-Targeted-Concept-Ablation.png
2025-05-16 22:08
261K
Steering-Large-Language-Models-for-Machine-Translation-Personalization.png
2026-06-11 10:43
129K
Steven-Piantadosi.jpg.JPG
2025-05-16 22:08
8.6M
Steven-Piantadoso.png
2025-05-16 22:08
2.3M
Structured-In-Context-Task-Representations.png
2026-06-11 10:43
124K
Superposition-as-Lossy-Compression-Measure-with-Sparse-Autoencoders-and-Connect-to-Adversarial-Vulnerability.png
2026-06-11 10:43
66K
SymTorch-A-Framework-for-Symbolic-Distillation-of-Deep-Neural-Networks.png
2026-06-11 10:43
111K
TDHook-A-Lightweight-Framework-for-Interpretability.png
2026-06-11 10:43
36K
The-Curious-Case-of-Factual-Mis-Alignment-between-LLMs-Short-and-Long-Form-Answers.png
2026-06-11 10:43
130K
The-Dual-Route-Model-of-Induction.png
2026-06-11 10:43
18K
The-Geometry-of-Refusal-in-Large-Language-Models-Concept-Cones-and-Representational-Independence.png
2026-06-11 10:43
178K
The-Quest-for-the-Right-Mediator-Surveying-Mechanistic-Interpretability-for-NLP-Through-the-Lens-of-Causal-Mediation-Analysis.png
2026-06-11 10:43
191K
The-Truthfulness-Spectrum-Hypothesis.png
2026-06-11 10:43
113K
Thomas-Dietterich.jpg
2025-11-07 21:01
1.0M
Timothy-Beal.jpg
2025-11-07 21:01
736K
Token-Erasure-as-a-Footprint-of-Implicit-Vocabulary-Items-in-LLMs.png
2026-06-11 10:43
9.1K
Transformer-See-Transformer-Do-Copying-as-an-Intermediate-Step-in-Learning-Analogical-Reasoning.png
2026-06-11 10:43
244K
Triggers-Hijack-Language-Circuits-A-Mechanistic-Analysis-of-Backdoor-Behaviors-in-Large-Language-Models.png
2026-06-11 10:43
16K
Understanding-How-CodeLLMs-MisPredict-Types-with-Activation-Steering.png
2026-06-11 10:43
145K
What-needs-to-go-right-for-an-induction-head-A-mechanistic-study-of-in-context-learning-circuits-and-their-formation.png
2026-06-11 10:43
90K
When-and-How-Does-CLIP-Enable-Domain-and-Compositional-Generalization.png
2026-06-11 10:43
96K
When-Meanings-Meet-Investigating-the-Emergence-and-Quality-of-Shared-Concept-Spaces-during-Multilingual-Language-Model-Training.png
2026-06-11 10:43
72K
Youngbok-Hong.jpg
2025-11-07 21:01
61K