hdcmeta





Junior Member Posts: 28

Threads: 1

Joined: Nov 2015







Performance

Generally, graphics-intensive games get a nice win, while (Gamecube CPU)-bound games (Zelda OOT from the 'bonus disk' is a good example) are the same - graphics wasn't on the critical path there. At higher resolutions, graphics becomes more important, so the relative improvement can increase there. In general, CPU usage is now much lower for the same workload relative to DX11/OpenGL.



Results below from a few games at 2.5x native resolution on NVidia and AMD hardware (and raw FPS data attached):







Requirements

- Windows 10

- Latest graphics driver, and a AMD 7000-series, Intel HD 4400, or nVidia 600-series GPU or higher.

- VS 2015 Redist



Note: This doesn't specifically improve shader compilation stutters the first time shaders are seen, it's only faster in the 'steady state' - this could definitely be improved with the extra CPU cycles now available..





This is obviously 'as is', but please reply back with any bugs/issues seen. For more details, please see the Github readme. I hope to continue to improve the code, and pull requests would definitely be welcome.



Source:

Download here:



I've tried to make the code follow the contribution guidelines, and it should be a pretty conformant port of the DX11 backend, so hope for this to possibly end up in the main Dolphin branch. Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.





Changelog



v0.98 (1/24/2016)

- Fix issue on certain systems where frame-rate not properly uncapped (when vsync is disabled, and CPU set to > 100%)

- Integrate upstream changes



v0.97 (1/19/2016)

- Better tracking of CPU/GPU interactions, should resolve race-condition-induced corruption

- Small fix for a 'dirty' shutdown corruption shader caches (which used to possibly cause a crash on the next start)

- Includes all current upstream changes

- Misc behind-the-scenes refactoring/cleanup/fixes



v0.96 (1/5/2016)

- Fix for very large texture uploads (e.g. 4096x4096 custom textures)

- Misc behind-the-scenes refactoring/cleanup



v0.95 (1/3/2016)

- Prevent backend from showing up on systems without D3D12 support.



v0.94 (1/2/2016)

- Fixed bug in EFB depth buffer readback, could cause misc corruption issues.



v0.93 (1/1/2016)

- Fixed error in texture readback, was causing some misc corruption issues.

- Further refactored shader cache, and fixed some issues that were causing it to not cache shaders (causing constant regeneration).



v0.92 (12/30/2015)

- Fixed issue if game sets a viewport with non MIN_DEPTH/MAX_DEPTH depth. Caused incorrect results in MadWorld, possibly others.

- Lots of refactoring behind the scenes, based on pull request feedback. Nothing should have regressed (verified in local testing).



v0.91 (12/22/2015)

- Fixed full-screen operation when starting in full-screen mode (thanks rlaugh0095)



v0.90 (12/21/2015)

- Several rendering correctness bugs fixed. If you were seeing incorrect rendering before, there's a decent chance it has been fixed.

- Fixed full-screen operation

- Add clamp to texture copies to too-small destinations.. a 'real' fix needs to occur above the VideoBackend layer, and is in progress here:

- Moved to new versioning scheme..



12/21/2015

- Further fix to multisampling. Not claiming anything this time :-).. fixes crash when (multisampled) Color EFB accessed (fixes crash in SMG).

- Fixes possible corner-case crash when games presents frames without first uploading any texture data.

- Fixed small bug that could cause unnecessary stalls/performance loss in certain cases.



12/20/2015

- Multi-sampling 'really' fixed. Resolve issue in titles that sampled from depth buffer.

- Fixed texture upload race condition.



12/18/2015

- Multi-sampling fixed (though appears buggy on AMD hardware, YMMV)

- Per-pixel lighting fixed

- Fixed issue where CPU could get too far ahead of GPU, cause corruption.



12/17/2015

- Initial release Hi all, I've been experimenting with adding a DirectX 12 backend to Dolphin, and finally have something to release! It can be decently faster depending on the game/system/settings (up to 50%), binaries and source are below. It was a good way to get to know Dolphin's architecture better, and hope it might be interesting for others to try out.Generally, graphics-intensive games get a nice win, while (Gamecube CPU)-bound games (Zelda OOT from the 'bonus disk' is a good example) are the same - graphics wasn't on the critical path there. At higher resolutions, graphics becomes more important, so the relative improvement can increase there. In general, CPU usage is now much lower for the same workload relative to DX11/OpenGL.Results below from a few games at 2.5x native resolution on NVidia and AMD hardware (and raw FPS data attached):- Windows 10- Latest graphics driver, and a AMD 7000-series, Intel HD 4400, or nVidia 600-series GPU or higher.- VS 2015 RedistNote: This doesn't specifically improve shader compilation stutters the first time shaders are seen, it's only faster in the 'steady state' - this could definitely be improved with the extra CPU cycles now available..This is obviously 'as is', but please reply back with any bugs/issues seen. For more details, please see the Github readme. I hope to continue to improve the code, and pull requests would definitely be welcome.Source: https://github.com/hdcmeta/dolphin Download here: https://www.dropbox.com/s/gac7jufr9iob8tc/dolphin_dx12_v0.98.zip?dl=0 I've tried to make the code follow the contribution guidelines, and it should be a pretty conformant port of the DX11 backend, so hope for this to possibly end up in the main Dolphin branch. Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.v0.98 (1/24/2016)- Fix issue on certain systems where frame-rate not properly uncapped (when vsync is disabled, and CPU set to > 100%)- Integrate upstream changesv0.97 (1/19/2016)- Better tracking of CPU/GPU interactions, should resolve race-condition-induced corruption- Small fix for a 'dirty' shutdown corruption shader caches (which used to possibly cause a crash on the next start)- Includes all current upstream changes- Misc behind-the-scenes refactoring/cleanup/fixesv0.96 (1/5/2016)- Fix for very large texture uploads (e.g. 4096x4096 custom textures)- Misc behind-the-scenes refactoring/cleanupv0.95 (1/3/2016)- Prevent backend from showing up on systems without D3D12 support.v0.94 (1/2/2016)- Fixed bug in EFB depth buffer readback, could cause misc corruption issues.v0.93 (1/1/2016)- Fixed error in texture readback, was causing some misc corruption issues.- Further refactored shader cache, and fixed some issues that were causing it to not cache shaders (causing constant regeneration).v0.92 (12/30/2015)- Fixed issue if game sets a viewport with non MIN_DEPTH/MAX_DEPTH depth. Caused incorrect results in MadWorld, possibly others.- Lots of refactoring behind the scenes, based on pull request feedback. Nothing should have regressed (verified in local testing).v0.91 (12/22/2015)- Fixed full-screen operation when starting in full-screen mode (thanks rlaugh0095)v0.90 (12/21/2015)- Several rendering correctness bugs fixed. If you were seeing incorrect rendering before, there's a decent chance it has been fixed.- Fixed full-screen operation- Add clamp to texture copies to too-small destinations.. a 'real' fix needs to occur above the VideoBackend layer, and is in progress here: https://github.com/dolphin-emu/dolphin/pull/3355 - Moved to new versioning scheme..12/21/2015- Further fix to multisampling. Not claiming anything this time :-).. fixes crash when (multisampled) Color EFB accessed (fixes crash in SMG).- Fixes possible corner-case crash when games presents frames without first uploading any texture data.- Fixed small bug that could cause unnecessary stalls/performance loss in certain cases.12/20/2015- Multi-sampling 'really' fixed. Resolve issue in titles that sampled from depth buffer.- Fixed texture upload race condition.12/18/2015- Multi-sampling fixed (though appears buggy on AMD hardware, YMMV)- Per-pixel lighting fixed- Fixed issue where CPU could get too far ahead of GPU, cause corruption.12/17/2015- Initial release



Attached Files

PerformanceNumbers.txt (Size: 328 bytes / Downloads: 696) Find DrHouse64





A woman yet a man, a man yet a woman Posts: 340

Threads: 17

Joined: Jun 2013

Performance enhancement sounds sexy, especially for Mario Galaxy.



Well if some day Dolphin officially have a DX12 backend, I guess I will considering Windows 10 upgrade. Great news and nice work !Performance enhancement sounds sexy, especially for Mario Galaxy.Well if some day Dolphin officially have a DX12 backend, I guess I will considering Windows 10 upgrade. From France with love.

Desktop : W10 / Core i5 4670k OC@4.2 GHz / Radeon RX 570 8Go / RAM 12 Go DDR3

Laptop ASUS : W10 / Core i5 5200U / GeForce 940m 2Go / RAM 12 Go DDR3

Find JMC47





Content Producer Posts: 6,463

Threads: 28

Joined: Feb 2013 IF you're interested in this not being unofficial, maybe open a Pull Request tagged RFC or WIP. I don't have Windows 10 to test this, but the gains seem non-trivial and make sense based on your graphs. OoT has zero GFX overhead, Crazy Taxi has very little, Twilight Princess and Super Mario Galaxy are very heavy. Find delroth





Making the world a better place through reverse engineered DSP firmwares Posts: 1,356

Threads: 63

Joined: Aug 2011 How useful are performance comparisons when your code is littered with TODOs that might make things slower once implemented? I would take your graphs more seriously if they were done on games that never hit one of your TODO paths that has an equivalent implementation in our other backends. Blog



<@neobrain> that looks sophisticated enough to not be a totally dumb thing to do Pierre "delroth" Bourdon - @delroth_ that looks sophisticated enough to not be a totally dumb thing to do Website Find delroth





Making the world a better place through reverse engineered DSP firmwares Posts: 1,356

Threads: 63

Joined: Aug 2011 Also, looking at the code, my bet is that a lot of the benefits come from the queued command list implementation. I'm curious how D3D12 compares if you make the D3D code run in the same thread as the GPU emulation code like other backends do.



Not that it's a bad thing -- but if threading the GPU backend code has a big impact with D3D12 I wouldn't be surprised if it also did have the same impact on GL/D3D11. Blog



<@neobrain> that looks sophisticated enough to not be a totally dumb thing to do Pierre "delroth" Bourdon - @delroth_ that looks sophisticated enough to not be a totally dumb thing to do Website Find degasus





Developer Posts: 1,828

Threads: 10

Joined: May 2012

Please open the PR soon, so you'll get more early feedback



*very* *very* nice work



EDIT: Also, please visite us on #dolphin-dev @freenode on IRC. Most development talk is there, and currently we're only talking about d3d12 > Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.Please open the PR soon, so you'll get more early feedback*very* *very* nice workEDIT: Also, please visite us on #dolphin-dev @freenode on IRC. Most development talk is there, and currently we're only talking about d3d12 Find Zee530





Above and Beyond Posts: 1,749

Threads: 12

Joined: Jan 2011 Wow amazing, hope we can find a way to make it work on most games. ......????? Find hdcmeta





Junior Member Posts: 28

Threads: 1

Joined: Nov 2015 (12-18-2015, 07:35 PM) JMC47 Wrote: IF you're interested in this not being unofficial, maybe open a Pull Request tagged RFC or WIP. I don't have Windows 10 to test this, but the gains seem non-trivial and make sense based on your graphs. OoT has zero GFX overhead, Crazy Taxi has very little, Twilight Princess and Super Mario Galaxy are very heavy.

Ok, I'll try to do that this weekend then.



(12-18-2015, 07:43 PM) delroth Wrote: How useful are performance comparisons when your code is littered with TODOs that might make things slower once implemented? I would take your graphs more seriously if they were done on games that never hit one of your TODO paths that has an equivalent implementation in our other backends.

Good question - honestly, most of the TODOs are to make things faster :-). The only TODO I can think of that will decrease performance is to implement performance queries, and the impact from that should be pretty small. Are there specific TODOs that are being hit on the above titles that you can see? I can hopefully fix those up pretty quickly..



(12-18-2015, 07:48 PM) delroth Wrote: Also, looking at the code, my bet is that a lot of the benefits come from the queued command list implementation. I'm curious how D3D12 compares if you make the D3D code run in the same thread as the GPU emulation code like other backends do.



Not that it's a bad thing -- but if threading the GPU backend code has a big impact with D3D12 I wouldn't be surprised if it also did have the same impact on GL/D3D11.

Yeah, it was really important to have this threaded, but I found this is actually exactly what D3D11 and OpenGL already (automatically) do. For those APIs, the graphics driver creates its own background thread, and does a similar sort of thing (the main thread queues the work, the background thread processes it). Just in D3D12, the app has to do this itself.



If you want to experiment with turning off the automatic threading on D3D11, you can pass in the "D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS" flag at device creation time.



(12-18-2015, 08:07 PM) degasus Wrote: > Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.

Please open the PR soon, so you'll get more early feedback



*very* *very* nice work



EDIT: Also, please visite us on #dolphin-dev @freenode on IRC. Most development talk is there, and currently we're only talking about d3d12

Will do! Ok, I'll try to do that this weekend then.Good question - honestly, most of the TODOs are to make things faster :-). The only TODO I can think of that will decrease performance is to implement performance queries, and the impact from that should be pretty small. Are there specific TODOs that are being hit on the above titles that you can see? I can hopefully fix those up pretty quickly..Yeah, it was really important to have this threaded, but I found this is actually exactly what D3D11 and OpenGL already (automatically) do. For those APIs, the graphics driver creates its own background thread, and does a similar sort of thing (the main thread queues the work, the background thread processes it). Just in D3D12, the app has to do this itself.If you want to experiment with turning off the automatic threading on D3D11, you can pass in the "D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS" flag at device creation time.Will do! Find Tino





Above and Beyond Posts: 2,256

Threads: 1

Joined: Oct 2013 Nice work. looking really good Find Helios





Stellaaaaaaa Posts: 4,402

Threads: 15

Joined: May 2012



I'll be testing against my GTX 770 this weekend. Excellent work! Open a PR please!I'll be testing against my GTX 770 this weekend. Find