arXiv1d ago

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Yuanhang Li

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

rlinferenceoptimization

Opportunity Brief

Build a testbed to evaluate the stability of DRL agents under dynamic vs. static reward regimes. This helps developers verify if their agent rewards are causing instability.

Suggested repo: rl-stable

"Does your reward function cause instability? Find out with this testbench."

Estimated effort: 45h