Intrinsic Reward Policy Optimization for Sparse-Reward Environments
TL;DR: We propose Intrinsic Reward Policy Optimization (IRPO), a novel framework leveraging a surrogate policy gradient to overcome credit assignment and sample inefficiency in sparse-reward environments.






