Stage 1: Defining the Environment

The Task

Very simply, I want to know the best action in order to get a piece of paper into a bin (trash can) from any position in a room. I can throw the paper in any direction or move one step at a time.

Although simple to a human who can judge location of the bin by eyesight and have huge amounts of prior knowledge regarding the distance a robot has to learn from nothing.

This defines the environment where the probability of a successful throw are calculated based on the direction in which the paper is thrown and the current distance from the bin.

For example, in the image below we have three people labelled A, B and C. A and B both throw in the correct direction but person A is closer than B and so will have a higher probability of landing the shot.

Person C is closer than person B but throws in the completely wrong direction and so will have a very low probability of hitting the bin. This may seem illogical that person C would throw in this direction but, as we will show more later, an algorithm has to try a range of directions first to figure out where the successes are and will have no visual guide as to where the bin is.

Task Environment Example

To create the environment in python, we convert the diagram into 2-d dimensions of x and y values and use bearing mathematics to calculate the angles thrown. We used normalised integer x and y values so that they must be bounded by -10 and 10.

Environment Mapped to 2-d Space

Environment Probabilities

The probability of a successful throw is relative to the distance and direction in which it is thrown. Therefore, we need to calculate two measures:

The distance the current position is from the bin

The difference between the angle at which the paper was thrown and the true direction to the bin

Distance Measure

As shown in the plot above, the position of person A in set to be (-5,-5). This is their current state and their distance from the bin can be calculated using the Euclidean distance measure:

For the final calculations, we normalise this and reverse the value so that a high score indicates that the person is closer to the target bin:

Because we have fixed our 2-d dimensions between (-10, 10), the max possible distance the person could be is sqrt{(100) + (100)} = sqrt{200} from the bin. Therefore our distance score for person A is:

Direction Measure

Person A then has a decision to make, do they move or do they throw in a chosen direction. For now, let imagine they choose to throw the paper, their first throw is at 50 degrees and the second is 60 degrees from due north. The direction of the bin from person A can be calculated by simple trigonometry:

Therefore, the first throw is 5 degrees off the true direction and the second is 15 degrees.

When we consider that good throws are bounded by 45 degrees either side of the actual direction (i.e. not throwing the wrong way) then we can use the following to calculate how good this chosen direction is. Any direction beyond the 45 degree bounds will produce a negative value and be mapped to probability of 0:

Both are fairly close but their first throw is more likely to hit the bin.

Probability Calculation

We therefore calculate our probability of a successful throw to be relative to both these measures:

Creating a Generalised Probability Function

Although the previous calculations were fairly simple, some considerations need to be taken into account when we generalise these and begin to consider that the bin or current position are not fixed.

In our previous example, person A is south-west from the bin and therefore the angle was a simple calculation but if we applied the same to say a person placed north-east then this would be incorrect. Furthermore, because the bin can be placed anywhere we need to first find where the person is relative to this, not just the origin, and then used to to establish to angle calculation required.

This is summarised in the diagram below where we have generalised each of the trigonometric calculations based on the person’s relative position to the bin:

Angle Calculation Rules

With this diagram in mind, we create a function that calculates the probability of a throw’s success from only given position relative to the bin.

We then calculate the bearing from the person to the bin following the previous figure and calculate the score bounded within a +/- 45 degree window. Throws that are closest to the true bearing score higher whilst those further away score less, anything more than 45 degrees (or less than -45 degrees) are negative and then set to a zero probability.

Lastly, the overall probability is related to both the distance and direction given the current position as shown before.

Note: I have chosen 45 degrees as the boundary but you may choose to change this window or could manually scale the probability calculation to weight the distance of direction measure differently.

We re-calculate the previous examples and find the same results as expected.

Plotting Probabilities for Each State

Now that we have this as a function, we can easily calculate and plot the probabilities of all points in our 2-d grid for a fixed throwing direction.

The probabilities are defined by the angle we set in the previous function, currently this is 45 degrees but this can reduced or increased if desired and the results will change accordingly. We may also want to scale the probability differently for distances.

For example, the probability when the paper is thrown at a 180 degree bearing (due South) for each x/y position is shown below.

Animated Plot for All Throwing Directions

To demonstrate this further, we can iterate through a number of throwing directions and create an interactive animation. The code becomes a little complex and you can always simply use the previous code chunk and change the “throw_direction ” parameter manually to explore different positions. However this helps explore the probabilities and can be found in the Kaggle notebook.