Buffer for Prioritized Experience Replay

Prioritized experience replay samples important transitions more frequently. The transitions are prioritized by the Temporal Difference error.

We sample transition $i$ with probability, where $\alpha$ is a hyper-parameter that determines how much prioritization is used, with $\alpha = 0$ corresponding to uniform case.

We use proportional prioritization $p_i = |\delta_i| + \epsilon$ where $\delta_i$ is the temporal difference for transition $i$.

We correct the bias introduced by prioritized replay by importance-sampling (IS) weights that fully compensates for when $\beta = 1$. We normalize weights by $1/\max_i w_i$ for stability. Unbiased nature is most important towards the convergence at end of training. Therefore we increase $\beta$ towards end of training.

Binary Segment Trees

We use binary segment trees to efficiently calculate $\sum_k^i p_k^\alpha$, the cumulative probability, which is needed to sample. We also use a binary segment tree to find $\min p_i^\alpha$, which is needed for $1/\max_i w_i$. We can also use a min-heap for this.

This is how a binary segment tree works for sum; it is similar for minimum. Let $x_i$ be the list of $N$ values we want to represent. Let $b_{i,j}$ be the $j^{\mathop{th}}$ node of the $i^{\mathop{th}}$ row in the binary tree. That is two children of node $b_{i,j}$ are $b_{i+1,2j}$ and $b_{i+1,2j + 1}$.

The leaf nodes on row $D = \left\lceil {1 + \log_2 N} \right\rceil$ will have values of $x$. Every node keeps the sum of the two child nodes. So the root node keeps the sum of the entire array of values. The two children of the root node keep the sum of the first half of the array and the sum of the second half of the array, and so on.

Number of nodes in row $i$, This is equal to the sum of nodes in all rows above $i$. So we can use a single array $a$ to store the tree, where,

Then child nodes of $a_i$ are $a_{2i}$ and $a_{2i + 1}$. That is,

This way of maintaining binary trees is very easy to program. Note that we are indexing from 1.