Here is the code:

you should be able to use it on a single .bk2 or a whole directory of them:

python3 ./scripts/map-paths.py ./results/jerk-agentv11/bk2/

This script has gone through a couple iterations, but still has pretty poor performance ( from my limited python knowledge), so please let me know if you have some suggestions and I will update. I also didn’t see a way to get the scenario while replaying, so for now the level map is hardcoded but ideally it would have all of the level maps and use the appropriate one for the files it was reading.

Other Tooling Improvements

local_evaluation.sh

While on the topic of tooling I also updated my local_evaluation.sh script:

There are a couple minor improvements to note here:

I use a variable for the tag name, so I only have to use it once, and it is when I call it from the command line.

There is now a million timestep limit, just like the real contest execution, so my program will stop at the correct time.

All of my results are stored in directories per their version tag.

The agent that the evaluation uses is copied over to the results directory and committed to git with the tag it was used on. No more running an agent and forgetting what the code looked like!

Reading the logs

I didn’t have a good idea from running locally if I was making improvements, other than watching the replays and seeing if Sonic mostly made it to the end or not. While running the contest locally, two CSVs get created as the agent runs, a log.csv and a monitor.csv . log.csv looks like this:

1000,4.046873092651367

2000,7.72090220451355

3000,11.31653642654419

4000,14.871755599975586

5000,18.917137384414673

6000,23.270551919937134

7000,26.892014980316162

8000,30.42851948738098

9000,33.96245861053467

and only records the timesteps, and how many seconds the agent has been running. It can be useful for figuring out how long your agent has been running, or how long it has to go, but not much else. monitor.csv on the other hand records the reward and number of timesteps that each episode achieved, along with the wall clock time of finishing. This seemed really useful for seeing if agents were improving over time, or if they were not getting better at all. Armed with almost no python knowledge I made this plot script to visualize the data. Suggestions welcome!

I found this pretty interesting. It was easy to tell when I was exploiting too much, the reward would stay pretty low, while most episodes timed out:

A better balance where the pattern of exploiting more over time becomes more apparent with rewards that generally get higher, and short episode lengths:

Making those plots was super helpful for understanding how the agent was behaving, but not as useful in helping me decide which ones were performing the best and might be scored higher in the contest. To find the total reward for an agent, I made another script that averages all of the runs:

This one takes in the tag name I want to see the reward for and prints out how close the agent is to completion as well as the current average score:

$ python3 ./scripts/calc_reward.py jerk-agentv12

99.900000% done, reward: 6970.131268

While Sonic runs

I also changed up the logging output of my agent. Instead of finding out when a solution is replayed, or backtracking occurs, at the beginning of each episode I see the percentage complete and my run’s score.

I do that with an addition of the following lines in my main while loop: