[Mesa-dev] [RFC PATCH 00/65] ARB_bindless_texture for RadeonSI

Hi, This series implements ARB_bindless_texture for RadeonSI. Reminder: the GLSL compiler part is already upstream. This series has been mainly tested with Feral games, here's the list of existing games that use ARB_bindless_texture (though not by default): - DXMD - Hitman - Dirt Rally - Mad Max Today, Feral announced "Warhammer 40,000: Dawn of War III" (called DOW3) which is going to be released next month. This game *requires* ARB_bindless_texture, that now explains why I did all this work. :-) So, we have ~3 weeks for merging this whole series. It would be very nice to have DOW3 support at day one! === Tracking bindless problems === The following games have been successfully tested: - Dirt Rally - Hitman - Mad Max - DOW3 For these: - No rendering issues - No VM faults (ie. amdgpu.vm_debug=1) However, DXMD is currently broken because the bindless_sampler layout qualifier is missing, which ends up by reporting a ton of INVALID_OPERATION errors. Note that Feral implemented bindless support against NV_bindless_texture and not ARB_bindless_texture. The main difference is that bindless_sampler is implicit for NV_* while it's required for ARB_*. Feral plan to fix this soon. All ARB_bindless_texture piglit tests pass with this series. === Tracking regressions/changes === - No regressions with the Intel CI system - One piglit regression that needs to be fixed (arb_texture_multisample-sample-position) - No shader-db changes - No CPU overhead (glxgears and Heaven in low) === Performance results for DOW3 === DOW3 exposes two bindless texture modes: - mode 1: all bindless (ie. no bound samplers) - mode 2: bound/bindless (ie. only bindless when the limit is reached) CPU: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz NVIDIA blob: 381.22 == GTX 1060 == LOW: - mode 1: 89 FPS - mode 2: 51 FPS MEDIUM: - mode 1: 49 FPS - mode 2: 28 FPS HIGH: - mode 1: 32 FPS - mode 2: 19 FPS The GTX 1060 performs very well with the all bindless mode (default), while the bound/bindless mode is not good at all. == RX480 == LOW: - mode 1: 67 FPS (-32%) - mode 2: 75 FPS (+32%) MEDIUM: - mode 1: 38 FPS (-28%) - mode 2: 44 FPS (+57%) HIGH: - mode 1: 26 FPS (-23%) - mode 2: 29 FPS (+52%) The RX 480 performs very well with the bound/bindless mode (default), while the all bindless mode still has to be improved. The most important bottleneck with the all bindless mode is the number of buffers that have to be added for every command stream. The overhead in the winsys and in the kernel (amdgpu_cs_ioctl) becomes important in this situation. This mode is still clearly CPU bound and should be improved (see the "Future work" section). Btw, without any optimisations, it was around 35FPS in low (mode 1). === Performance results for other Feral titles === I didn't record any numbers because these games have been initially developed/tested against the NVIDIA blob which it's unaffected by a VERY huge number of resident handles. While the AMD stack is really slow in this situation. Though, as I said, all Feral games that use bindless work fine, we just need to improve perf on both sides. === Future work === I have some ideas to try in order to improve performance with RadeonSI. I will work on this once this series is upstream. Please review, Thanks! Samuel Pitoiset (65): mapi: add GL_ARB_bindless_texture entry points mesa: implement ARB_bindless_texture mesa: add support for unsigned 64-bit vertex attributes mesa: add support for glUniformHandleui64*ARB() mesa: refuse to update sampler parameters when a handle is allocated mesa: refuse to update tex parameters when a handle is allocated mesa: refuse to change textures when a handle is allocated mesa: refuse to change tex buffers when a handle is allocated mesa: keep track of the current variable in add_uniform_to_shader mesa: store bindless samplers as PROGRAM_UNIFORM mesa: add infrastructure for bindless samplers/images bound to units glsl: process uniform samplers declared bindless glsl: process uniform images declared bindless glsl: pass the ir_variable object to set_opaque_binding() glsl: set the explicit binding value for bindless samplers/images glsl: add ir_variable::is_bindless() mesa: add update_single_shader_texture_used() helper mesa: add update_single_program_texture_state() helper mesa: update textures for bindless samplers bound to texture units mesa: pass gl_program to _mesa_associate_uniform_storage() mesa: associate uniform storage to bindless samplers/images mesa: handle bindless uniforms bound to texture/image units mesa: get rid of a workaround for bindless in _mesa_get_uniform() gallium: add PIPE_CAP_BINDLESS_TEXTURE gallium: add ARB_bindless_texture interface ddebug: add ARB_bindless_texture support trace: add ARB_bindless_texture support tc: add ARB_bindless_texture support tgsi: add new Bindless flag to tgsi_instruction_texture tgsi: add new Bindless flag to tgsi_instruction_memory tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers st/glsl_to_tgsi: add support for bindless samplers st/glsl_to_tgsi: add support for bindless images st/glsl_to_tgsi: add support for bindless pack/unpack operations st/glsl_to_tgsi: teach the DCE pass about bindless samplers/images st/glsl_to_tgsi: teach rename_temp_registers() about bindless samplers tgsi/scan: record bindless samplers/images usage st/mesa: implement ARB_bindless_texture st/mesa: make update_single_texture() non-static st/mesa: make convert_sampler_from_unit() non-static st/mesa: add st_convert_image_from_unit() helper st/mesa: add st_create_{texture,image}_handle_from_unit() helper st/mesa: add infrastructure for storing bound texture/image handles st/mesa: make bindless samplers/images bound to units resident st/mesa: do not release sampler views for resident textures st/mesa: disable per-context seamless cubemap when using texture handles st/mesa: enable ARB_bindless_texture radeonsi: add a slab allocator for resident descriptors radeonsi: add si_init_descriptor_list() helper radeonsi: add si_set_sampler_view_desc() helper radeonsi: add si_set_shader_image_desc() helper radeonsi: implement ARB_bindless_texture radeonsi: add all resident buffers to the current CS radeonsi: only add descriptors in presence of resident handles radeonsi: add si_update_check_render_feedback() helper radeonsi: decompress DCC for resident textures/images radeonsi: decompress resident textures/images before graphics/compute radeonsi: isolate real framebuffer changes from the decompression passes radeonsi: track use of bindless samplers/images from tgsi_shader_info radeonsi: only decompress resident textures/images when used radeonsi: upload new descriptors when resident buffers are invalidated radeonsi: invalidate buffers which are made resident if needed radeonsi: add support for loading bindless samplers radeonsi: add support for loading bindless images radeonsi: enable ARB_bindless_texture docs/features.txt | 2 +- docs/relnotes/17.2.0.html | 1 + src/compiler/glsl/ir.h | 11 + src/compiler/glsl/ir_uniform.h | 12 + src/compiler/glsl/link_uniform_initializers.cpp | 42 +- src/compiler/glsl/link_uniforms.cpp | 156 +++- src/compiler/glsl/shader_cache.cpp | 47 + src/gallium/auxiliary/tgsi/tgsi_build.c | 8 + src/gallium/auxiliary/tgsi/tgsi_scan.c | 37 + src/gallium/auxiliary/tgsi/tgsi_scan.h | 2 + src/gallium/auxiliary/tgsi/tgsi_ureg.c | 21 +- src/gallium/auxiliary/tgsi/tgsi_ureg.h | 16 +- src/gallium/auxiliary/util/u_threaded_context.c | 147 ++++ .../auxiliary/util/u_threaded_context_calls.h | 4 + src/gallium/docs/source/screen.rst | 2 + src/gallium/drivers/ddebug/dd_context.c | 61 ++ src/gallium/drivers/etnaviv/etnaviv_screen.c | 1 + src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.h | 4 + src/gallium/drivers/radeonsi/si_blit.c | 131 ++- src/gallium/drivers/radeonsi/si_compute.c | 2 + src/gallium/drivers/radeonsi/si_compute.h | 14 + src/gallium/drivers/radeonsi/si_descriptors.c | 943 +++++++++++++++++++-- src/gallium/drivers/radeonsi/si_hw_context.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 25 + src/gallium/drivers/radeonsi/si_pipe.h | 68 ++ src/gallium/drivers/radeonsi/si_shader.h | 12 + src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 48 +- src/gallium/drivers/radeonsi/si_state.c | 10 +- src/gallium/drivers/radeonsi/si_state.h | 9 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/swr/swr_screen.cpp | 1 + src/gallium/drivers/trace/tr_context.c | 114 +++ src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_context.h | 16 + src/gallium/include/pipe/p_defines.h | 1 + src/gallium/include/pipe/p_shader_tokens.h | 6 +- src/mapi/glapi/gen/ARB_bindless_texture.xml | 100 +++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/apiexec.py | 3 + src/mapi/glapi/gen/gl_API.xml | 4 +- src/mapi/glapi/gen/gl_genexec.py | 1 + src/mesa/Makefile.sources | 2 + src/mesa/main/api_loopback.c | 18 + src/mesa/main/api_loopback.h | 6 + src/mesa/main/bufferobj.c | 4 +- src/mesa/main/context.c | 3 + src/mesa/main/dd.h | 19 + src/mesa/main/mtypes.h | 86 ++ src/mesa/main/samplerobj.c | 48 ++ src/mesa/main/shared.c | 12 + src/mesa/main/tests/dispatch_sanity.cpp | 18 + src/mesa/main/teximage.c | 25 +- src/mesa/main/texobj.c | 12 + src/mesa/main/texparam.c | 61 ++ src/mesa/main/texstate.c | 52 +- src/mesa/main/texturebindless.c | 902 ++++++++++++++++++++ src/mesa/main/texturebindless.h | 96 +++ src/mesa/main/uniform_query.cpp | 208 ++++- src/mesa/main/uniforms.c | 119 ++- src/mesa/main/uniforms.h | 16 + src/mesa/main/varray.c | 23 + src/mesa/main/varray.h | 3 + src/mesa/main/vtxfmt.c | 4 + src/mesa/program/ir_to_mesa.cpp | 36 +- src/mesa/program/ir_to_mesa.h | 4 +- src/mesa/program/program.c | 8 + src/mesa/state_tracker/st_atifs_to_tgsi.c | 2 +- src/mesa/state_tracker/st_atom_constbuf.c | 6 + src/mesa/state_tracker/st_atom_image.c | 33 +- src/mesa/state_tracker/st_atom_sampler.c | 32 +- src/mesa/state_tracker/st_atom_texture.c | 15 +- src/mesa/state_tracker/st_cb_texture.c | 84 ++ src/mesa/state_tracker/st_context.c | 2 + src/mesa/state_tracker/st_context.h | 11 + src/mesa/state_tracker/st_extensions.c | 1 + src/mesa/state_tracker/st_glsl_to_nir.cpp | 3 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 138 ++- src/mesa/state_tracker/st_mesa_to_tgsi.c | 2 +- src/mesa/state_tracker/st_pbo.c | 2 +- src/mesa/state_tracker/st_sampler_view.c | 6 + src/mesa/state_tracker/st_shader_cache.c | 3 +- src/mesa/state_tracker/st_texture.c | 213 +++++ src/mesa/state_tracker/st_texture.h | 28 + src/mesa/vbo/vbo_attrib_tmp.h | 28 + src/mesa/vbo/vbo_context.h | 2 + src/mesa/vbo/vbo_exec_api.c | 15 +- src/mesa/vbo/vbo_save_api.c | 3 + 97 files changed, 4250 insertions(+), 260 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_bindless_texture.xml create mode 100644 src/mesa/main/texturebindless.c create mode 100644 src/mesa/main/texturebindless.h -- 2.13.0