19 Pages Posted: 14 Aug 2017 Last revised: 4 Dec 2017

New York University (NYU) - Courant Institute of Mathematical Sciences; City University of New York - Weissman School of Arts and Sciences; Rutgers, The State University of New Jersey - Financial Statistics & Risk Management; New York University (NYU) - NYU Tandon School of Engineering

Date Written: August 8, 2017

Abstract

In multi-period trading with realistic market impact, determining the dynamic trading strategy that optimizes expected utility of final wealth is a hard problem. In this paper we show that, with an appropriate choice of the reward function, reinforcement learning techniques (specifically, Q-learning) can successfully handle the risk-averse case. We provide a proof of concept in the form of a simulated market which permits a statistical arbitrage even with trading costs. The Q-learning agent finds and exploits this arbitrage.