Implement a counterfactual routing mechanism in a sparse MoE model. This will allow researchers to test whether dormant experts can be effectively activated for rare tasks.