I've always found curriculum learning incredibly hard to tune and calibrate reliably (even more so than many other RL approaches!).Reward scales and horizon lengths may vary across tasks with different difficulty, effectively exploring policy space (keeping multimodal strategy distributions for exploration before overfitting on small problems), and catastrophic forgetting when mixing curriculum levels or when introducing them too late.
Does any reader/or the author have good heuristics for these? Or is it still so problem dependent that hyper parameter search for finding something that works in spite of these challenges is still the go to?