D3D command stream patches for testing

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, In the past months I have been working on a command stream / worker thread for wined3d. It moves most OpenGL calls into a separate thread to improve performance (bug 11674) and fix some bugs that are otherwise hard to fix (24684). You can test the attached patches by applying them (git am /path/to/patches/*) and setting HKCU/Software/Wine/Direct3D/CSMT = "enabled". Make sure to disable StrictDrawOrdering. It is no longer required with those patches and will destroy any performance gains. (It might be useful for debugging though). The patches apply on top of Wine 1.7.1. Please test those patches with your games. I'm interested in any successes or failures and performance differences. Performance numbers with plain Wine 1.7.1, this patchset with CSMT off and on, and Wine 1.7.7 + bugzilla attachment 44420 and __GL_THREADED_OPTIMIZATIONS would be greatly appreciated. A notes for non-developers: *) GPU limited games don't see any improvement. If you're GPU limited heavilly depends on your hardware *) So far I have not tested anything but Nvidia hardware. It should work on all GPUs and drivers though. *) Yes, this is essentially the same as Nvidia's __GL_THREADED_OPTIMIZATIONS. Just driver independent, under our control, and thus easier to fix bugs. *) A lot of games see 50%-100% performance improvements and now run as fast as on Windows or even faster. Examples are Source-Engine based games, StarCraft 2, 3DMark 2001. *) Call of Duty Modern Warfare 2 is improved a lot because you no longer need StrictDrawOrdering. It's still not as good as it could be, because it uses dynamic surfaces, which aren't properly implemented in the patchset yet. *) Some games have CPU-side bottlenecks outside d3d. Mass Effect 2 seems to be one of those. *) Some games have CPU-side bottlenecks in the GL driver, and comparably little game logic on their own. I think this applies to Civ V, which doesn't see much improvement with those patches. Some implementation notes: *) One of the big design decisions is to do all OpenGL calls from one thread, including resource creation and buffer maps. This is faster than using glFlush calls to synchronize anything we do from the main thread, and easier than trying to sync everything in a performant fashion with ARB_sync. This means I need the priority command queue. This is not yet fully implemented though, so you see GL calls from the main thread as well. *) There seem to be driver bugs when calling into GL from two threads, even though those are two different contexts. Remember, we don't have the GL lock any longer. *) The other controversial design decision is that the command stream does not hold any references to objects stored in pending commands or its own state structure. This prevents the client libraries and applications from "seeing" the CS via delayed destruction of objects and freeing of application private data. *) Currently resource destruction waits for the CS to execute all pending commands. The goal is to handle private resources and removal from the device's resource list in the main thread and freeing of GL resources, freeing of resource->heap_memory and freeing of the main structure in the worker thread. *) A big issue that needs fixing is that there isn't a clear separation between functions that are called from the main thread and functions that are called from the worker thread. The plan is to introduce comments similar to those that clarify who is responsible for context activation. *) Buffers are double-buffered and use glBufferSubData when the multithreaded CS is in use. This is necessary because we can't draw from a mapped buffer. In the long run GL_ARB_buffer_storage should be able to fix this. *) You can roughly see how surface and volume handling is going to work in the volume code. I am not entirely happy with the code yet, I hacked it together in the past few days... *) The plan behind wined3d_device_get_bo and wined3d_device_release_bo is to cache created GL BOs. Before I do that I have to write a benchmark for dynamic volumes to verify that this is really a performance improvement. *) Before this can be merged, surfaces need a cleanup similar to volumes. It's going to be a lot trickier though. *) The tests should run with the single-threaded and multi-threaded command stream. *) There should not be any temporary regressions with the single-threaded CS. If something's broken, git bisect should work with CSMT off. *) With CSMT on, there are a few known regressions and test failures. The d3d9 and ddraw tests fail between patch 18 and 71. Occlusion queries are broken between 22 and 108. In general nothing's working right between 80 and 99. Some of those problems can be fixed or their impact reduced, but I will not be able to completely avoid them. The ddraw test failure is a driver bug and GL occlusion queries break by design when used from a different thread. So if you try to bisect a regression in this patch series with CSMT on YMMV. *) This work was originally started by Henri. Some patches in the series are from him and either unmodified or with minor adjustments. Some patches are based on his work, but with heavy modifications. Cheers, Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSJJjDAAoJEN0/YqbEcdMw1VMP/i4OvTg/HT+jJhZFDzbmtB8o aHIFslhkH9CJDCTGzZExGnPoM+0gLokzzk6ppodQiyC0bJ4jLzeulIhfaUs0sgxy srWpryXYfpeoG/A/MJQlbUQEiMpqdKLvM1j3UupjMY2IsGbECronAYdUi0PTh0Oo RqlWQ64bTARpFXtywO22goNfi57e3UKf2r8Q3Q4f20eH0CwvvVC1YU/sAZ2iyvbR c0fcfergDsPSRQsw+EgEwD4N11/Z+At2XfTeXMC9MyHlZR7bfJfOYMiu6AXfy9gB uFKLgNrXXJho5UsPLjcsCw+UczR98dsyX7B5BZOfG9eyn1Du5YAAVEZ7QCHvUbLj 0WMR86q4GiCZVd5q7GZ7YH3wdNE3R1kEMQA3JRHTP7jLPpSfODUlTxP5NDk220OI +fiPSr5a0BpLqua+dNRjAQvW1Qhrk+7EgLqfwrq8632x/6sInSTnlE84UQ4FJiH6 Wql7PxPPfOJv2qG6l1nqffT7fCbiEHnemoF/4GEvG22MZbMXYTnpkLC6Y1aLUJN5 WXxAICXcy7S2dk5Io/t1IKUm4fwtWnI59MX5rPbfDpY3GGePnlo40pNLlGVv8lc8 ktDS4dvf1Lx/JKrLzu1G/1TYBePFHEVQRvJjzCw0YhNOW7sUOLbf5rgHmAeyOxvD Zdq+l22uJ1Se8m9iAbtF =lm4e -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: cs.tar.bz2 Type: application/x-bzip Size: 88654 bytes Desc: not available URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20130902/62c705fb/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: cs.tar.bz2.sig Type: application/pgp-signature Size: 543 bytes Desc: not available URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20130902/62c705fb/attachment-0001.pgp>