So let’s continue with more algorithms. Let’s try with the AdamOptimizer:

optimize_op = tf.train.AdamOptimizer(learning_rate=0.0005, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False)...

AdamOptimizer with a learning rate of 0.0005 for 10 000 iterations

Slightly better at the begining, but essentially the same afterwards. After an initial high, the Sharpe Ratio drops slightly down, probably we are overtraining. Let’s experiment with the learning rate anyway:

AdamOptimizer with a learning rate of 0.05 for 10 000 iterations

Obviously so fast. A faster learning rate finds a local high at the beginning, stagnates and refuses to explore new variants. So the output result is worse. Also note that the Sharpe Ratio keeps oscillating within a broad interval.

Let’s try with a much slower rate:

AdamOptimizer with a learning rate of 0.00005 for 10 000 iterations

Nothing new, really. Let’s see what we can do with another optimizer algorithm.

Time for the GradientDescentOptimizer:

GradientDescentOptimizer with a learning rate of 0.0005 for 10 000 iterations

10% better that our best approach so far. As we can see, it hits the top so quickly, so maybe we should tweak the learning rate and see if a slower one opens the door to new approaches:

GradientDescentOptimizer with a learning rate of 0.000005 for 10 000 iterations

Well, we achieved the same, just more progressively.

So far, you get the idea. Feel free to try any of the available optimizers, see which ones work well for your problem and tweak the learning rates and all their specific parameters to your advantage.

Constraints

In the first article, we defined a group of three constraints so that negative values became 0, values grater than 1 became 1 and the sum of their weights was 1.

However, in the two first constraints we are basically clipping the balances that our optimizer was exploring. If the weight of two coins was 1.2 and 1.5 at a given point, our constraints would set them both to 1 and potentially drop a good search space. To avoid adulterating weights, we could rather use the absolute values and divide by the absolute value of the weights sum:

weights_sum = tf.reduce_sum(coin_weights) constraints_op = coin_weights.assign(tf.divide(tf.abs(coin_weights), tf.abs(weights_sum) ))

But now a new question arises: why don’t we allow negative values if we can short sell crypto assets?

Portfolio optimization is generally suitable for mid-term investments, in which you open long positions. Short selling normally implies paying broker fees, as we are selling something that we don’t own yet.

Short selling may be quite profitable in bearish markets over a short period of time, but keep in mind that holding a short position for longer may cost more money than you might eventually earn.

Short selling

For educational purposes, we will explore the scenario of short positions. So the only constraint we need now is that the sum of the weights’ absolute values is always 1.

weights_sum = tf.reduce_sum(tf.abs(coin_weights)) constraints_op = coin_weights.assign(tf.divide(coin_weights, tf.abs(weights_sum) ))

How does it do?

It turns out that we achieve half as much return with half as much volatility, so the Sharpe Ratio stays about the same.

But as we said earlier, this method is not intended for this kind of problems, so you should probably leave the constraints that ensure coin weights ranging from 0 to 1.

Alternative minimization

If you recall from the previous article, the optimizers at our disposal did not provide a function to maximize an output value. Instead, we chose to minimize the negative value of the Sharpe Ratio.

This is, instead of attempting to minimize the value to around +0, the trick was to aim for －∞. However, what would happen if we attempted to reduce the absolute value (mimicking an error function)?

First, we need to compare apples with apples. 10 000 iterations with the GradientDescentOptimizer at a learning rate of 0.0001:

GradientDescentOptimizer with a learning rate of 0.0001 for 10 000 iterations (-sharpe_ratio)

Ok, let’s see what happens when we attemp to minimize 1 / sharpe_ratio instead of -sharpe_ratio :

GradientDescentOptimizer with a learning rate of 0.0001 for 10 000 iterations (1 / sharpe_ratio)

At first glance it seems that the Sharpe Ratio is not as good, but the plot tells us that the output is not stabilized yet. What if we allow for a faster learning rate? About 0.002:

Ok, we get about the same results as before the change… but isn’t there any difference, really? What if we plot a pie chart of both scenarios?