Optimizing reward distribution through long action chains

5. Optimizing reward distribution through long action chains¶

5.1. Diminishing reward
- Experiment 1 - Straight Corridor
- Experiment 2 - Deceptive Corridor

previous

Experiment 3 - Balancing the pole

next

5.1. Diminishing reward