Ivan Ternovtsii, Yurii Bilak
View original ↗Create an interpretability toolkit designed specifically to probe MoE expert activations for causal semantic meaning. This allows researchers to identify what specific tokens 'trigger' specific experts.
Suggested repo: ExpertVision
"Look under the hood of your Mixture-of-Experts: which expert does what?"
Estimated effort: 60h