Hi All,

As some of you may be aware I have been working to find either a workaround or fix to the AMD Vega reset bug. Last week I posted to AMD’s reddit a cry for help to fix this issue in an attempt to show AMD how much demand there is for this. As a result, an AMD Engineer got in touch and has guided me to a possible solution to the problem.

Over the weekend I have spent considerable time implementing what seems to be a working reset for Vega 10 and 12, initial testing by a few people confirm that it is working on Vega 10, however it needs further testing.

You must apply this patch to your kernel to prevent vfio-pci from attempting to reset the GPU incorrectly.

Please note that this application is intended as a interim workaround while I work on implementing this into the kernel for vfio.

Download reset-test.tar.gz

Usage is simple, obviously you must not be using the GPU at the time and it should be bound to vfio-pci.

./reset-test 0000:24:00.0

The expected output is:

============================================================================ AMD Vega 10/12 Reset Application (Version: 1.0) Copyright (c) 2019 Geoffrey McRae <[email protected]> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. This tool is intended as an interim workaround while I port this into the kernel driver. If you like my work and want to support it you can contribute using the following methods: * Ko-Fi - https://ko-fi.com/lookingglass * Patreon - https://www.patreon.com/gnif * BTC - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13 ============================================================================ Attempting Vega 10 reset CMD_READMODIFYWRITE 0x00000e1c CMD_WRITE 0x00000e1f CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_WAITFOR 0x0001667c CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x00000e2b CMD_DELAY_MS CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_WAITFOR 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_DELAY_MS CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_WAITFOR 0x0001667c CMD_READMODIFYWRITE 0x0001667c CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_READMODIFYWRITE 0x00000e2b CMD_WAITFOR 0x00000e2b CMD_WRITE 0x00000052 CMD_WRITE 0x00000053

At this point the GPU should successfully post inside a VM, even after a dirty shutdown or VM crash.

A reset for Vega 20 and Navi is possible, but as I do not have these devices to develop against I can not safely implement it. Poking blindly at the wrong registers is dangerous and can destroy the GPU.

If you would like to see Navi also supported you can contribute to the cost to purchase a suitable card below:

Edit: Funding is complete! Thank you everyone for your support!