$\begingroup$

So after reading and thinking about it more this is my explanation (thanks software foundations):

The key confusion for me seems to be the meaning of $P[e/x]$ (replaces every free instance of x with e). What this does is wherever you see the symbol $x$ literally remove it and place $e$. e.g. $ P[e/x] = (x+y+1)[e/x] \to P[e/x] = (e+y+1)$ so notice how $x$ literally disappeared from $P$. So what we want is once we do the assignment:

$$ x:= e$$

that the statement is true if we had $x$ instead of $e$. So if the rule is:

$$ \{ P[e/x] \} x:= e \{ P \}$$

then what we want is, when we plug $x$ for $e$ we want the statement in consideration to be true. So before we started the code we have $P[e/x]$. Then we run the assignment and all instances of $e$ disappear and we get $x$'s to replace them. That must be true if the code that ran was assignment (since $x$ now hold the value $e$, so you can remove $e$'s and place $x$).

Thats the explanation of the abstract concept. Lets (shamelessly) use the software foundations (SF) example:

{{ Y = 1 }} X ::= Y {{ X = 1 }} In English: if we start out in a state where the value of Y is 1 and we assign Y to X, then we'll finish in a state where X is 1. That is, the property of being equal to 1 gets transferred from Y to X.

Another useful paragraph from SF:

Similarly, in {{ Y + Z = 1 }} X ::= Y + Z {{ X = 1 }} the same property (being equal to one) gets transferred to X from the expression Y + Z on the right-hand side of the assignment. More generally, if a is any arithmetic expression, then {{ a = 1 }} X ::= a {{ X = 1 }} is a valid Hoare triple.

Addendum from comment's great observation:

it is important to recognize that the rule { P [ e / x ] } x := e { P } allows both the expression e and the variable x to occur in the postcondition. In other words, the inference rule allows you to deduce a postcondition in which any subset of the occurrences of e in the precondition have been replaced with x , including none of them. Moreover, it allows you to deduce all of these different postconditions simultaneously.

Reply to bonus why is

1) $ \{P[e/x]\} x:=e \{P\} $

better than

2) $ \{ Q \} x:=e \{Q[x/e]\} $:

Besides the subtle point of replacing x with expressions that might be invisible (e.g. e=0, should we replace $0$ with an infinite number of $x$'s? What if the zeros are not there...the rules should be syntactic but I think it's better to avoid such confusions), I think this is the reason: