[Beignet] [ANNOUNCE] Beignet 1.0.0 (2014-11-14)

Beignet 1.0.0 (2014-11-14) ========================= Beignet development team is proud to announce that Beignet 1.0.0 has been released. This is an important milestone after about two years of development. Thanks for everyone who helped us to improve it to relatively mature state. Now beignet supports from 3rd to 5th Generation Intel Core Processors. Besides the Broadwell support, this release also bring major performance improvement for many workloads and fixed some bugs. We observed 10% to more than 4x performance gain for some OpenCV 3.0 benchmarks. The highlighted items are as below: 1. Added 5th generation Intel Core Processors (BDW) support. 2. Optimized constant buffer load. 3. Implement basic transformation from unstructurized control flow to structurized control flow to improve performance. 4. Fixed some memory leak bugs. 5. Implemented missing constant expression handling. 6. Added Clang/ICC compiler support for Beignet build. 7. Optimized unaligned char/short vector load. 8. Speed up kernel compiling time by move built-in functions support from header file into linked library. 9. Implemented some missing llvm intrinsics. 10. Optimized loop unrolling pass, boosted some OpenCV benchmarks. 11. Several other bug fixes since last release. For OpenCV 3.0 / OpenCV 2.4/piglit test suite, Beignet's pass rates are all above 99%. Git tag: Release_v1.0.0 Gitweb URL: http://cgit.freedesktop.org/beignet https://01.org/sites/default/files/beignet-1.0.0-source.tar.gz md5sum: bfd755904c332cdd285d6058f5f3de8c Beignet-1.0.0-Source.tar.gz sha1sum: a2b0eb53e5f9a6055cd656531532a4c6ae03fbb0 Beignet-1.0.0-Source.tar.gz sha256sum: e30c4d0f4c8917fa0df2467b2d70a4ee524f28d54c42c582262d5f08928ea543 Beignet-1.0.0-Source.tar.gz ----------------------------------------------------------------- Changes since 0.9.3: Andreas Beckmann (2): fix some typos use env to set environment variables for GBE_BIN_GENERATER Chuanbo Weng (1): utest: add new test that trigger an assignment operation bug in if. Guo Yejun (18): remove requirment as drm master in non-x environment remove requirment as drm master in non-x environment free build_log when the cl program is released free build_log when the cl program is released fix three memory leaks clean llvm resource in compiler (libgbe.so) fix three memory leaks clean llvm resource in compiler (libgbe.so) delete GEPInst when it is no longer used delete GEPInst when it is no longer used remove dependency for non-X runtime environment remove dependency for non-X runtime environment support CL_MEM_USE_HOST_PTR with userptr for cl buffer enable CL_DEVICE_HOST_UNIFIED_MEMORY when userptr is supported add test for cl buffer created with CL_MEM_USE_HOST_PTR fix issue to create cl image from libva with non-zero offset add test for clCreateImageFromLibvaIntel use posix_memalign instead of aligned_alloc to be more compatible Junyan He (54): Fix the global string bug for printf. Fix a bug for runtime_barrier_list.cpp, event array out of bound Fix a bug for runtime_barrier_list.cpp, event array out of bound Fix the global string bug for printf. Add common define header files to initialize the libocl Add the async module into the libocl Add the atomic module into the libocl Add the geometric module into the libocl Add the image module into the libocl Add the misc module into the libocl Add the sync module into the libocl Add printf module into libocl Add vload module into the libocl Add thw workitem module into the libocl Add the convert and as modules into the libocl Add the gen_vector script into the libocl Add the common module into the libocl as template Add the integer module into libocl as template Add the math function into libocl as template Add the relational module into libocl as template Add the ocl_defines header file into libocl Add memcpy, memset and barrier bitcode files into libocl Add the bit code linker into the module pass. Enable libocl and disable the usage of the old huge header. Use the PCH to accelerate the parsing speed of the ocl.h Delete all the unused files of old huge header. Add the missing function prototypes of any() and atom_add() Add uncompatible PCH Options to avoid compiling failure. Fix the global string bug for printf. Add copyright header for all libocl files. Fix the issue of -cl-std=CLX.X option. Fix the issue of -cl-std=CLX.X option. Add the switch logic for math conformance fast path Modify the CMakeList to use the internal PCH first. Fix the bug of LLVM_LFLAGS fail to set Add long support for printf BDW: Add gen8 surface state struct. BDW: refine the gen8_surface_state_t. BDW: Add function intel_gpgpu_setup_bti for gen8. BDW: Correct surface base address set in setup bti. BDW: Add function intel_gpgpu_bind_buf for gen8. Add sampler state and tile define for gen8. Modify the bind sampler logic for gen8 BDW: Add gen8 into intel_driver_init Refine the shared function ID define. Add the libdrm version check. Let the failure of intel_drm lib's check as a FATAL_ERROR Fit the printf bug in loop Fix the bug of 1D array slice pitch Add the test case for image 1d array fill Add the test case for image 2d array fill Add the disasm support for Gen8 Fix the compare_image_2d_and_1d_array test case bug Fix the bug of multi-thread crash Luo (5): remove lspci, gbe_bin_genenrater would generator llvm binary by default. remove lspci, gbe_bin_genenrater would generator llvm binary by default. fix piglit get kernel info FUNCTION ATTRIBUTE fail. fix piglit get kernel info FUNCTION ATTRIBUTE fail. add opencl-1.2 builtin function popcount. Luo Xionghu (28): fix the relational built-in vector function regression. fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression. fix piglit cl-api-get-program-info fail. fix piglit cl-api-get-program-info fail. fix clGetKernelWorkGroupInfo built-in kernel fail. fix piglit cl-api-set-kernel-arg fail. fix clGetKernelWorkGroupInfo built-in kernel fail. fix piglit cl-api-set-kernel-arg fail. fix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression. fix bin/cl-program-tester tests/cl/program/execute/attributes.cl regression. remove the LinkOnceAnyLinkage since the libocl is introduced. improve the build performance of vector type built-in function. fix one bug at cl_get_kernel_workgroup_info. fix utest memory leak. Add Gen IR WHILE. add handleSelfLoopNode to insert while instruction on Gen IR level. Use instruction WHILE to manipulate structure. add utest popcount for all types. use global flag 0.0 to control unstructured simple block. add llvm Intrinsic call support. add utest compiler_overflow for llvm intrinsic function. enable llvm intrinsic call usub_with_overflow funtion. add utest for llvm intrinsic call usub_with_overflow funtion. enable llvm intrinsic call bswap function. add utest function bswap. fix bswap kernel function type issue. fix piglit clCreateProgramWithBinary fail. fix a bug in clCompileProgram(). LuoXionghu (5): add platform info in the gen binary code. add utest load_program_from_gen_bin. add platform info in the gen binary code. add utest load_program_from_gen_bin. improve the build performance of vector type built-in function. Lv Meng (6): improve the clEnqueueCopyBufferRect performance in some cases Fix compile error for ICC compiler Fix compile errors for CLANG compiler Fix compile warnings for ICC compiler Fix compile warnings for CLANG compiler Enable ICC and CLANG compiler for beignet Meng Mengmeng (3): add beignet GIT_HAL1 if there is .git directory create GIT_SHA1 without any dependency add building dependency GIT_SHA1 Rebecca Palmer (7): Fail gracefully on unsupported hardware Fail gracefully on unsupported hardware GBE: fix bug in pow()/pown(). GBE: fix bug in erf()/erfc(). GBE: fix bug in tgamma(). utests: fix bugs in builtin_pow(). utests: fix bugs in builtin_tgamma(). Ruiling Song (43): GBE: Fix builtin tanpi. GBE: Fix builtin tanpi. GBE: Use varying register to save one instruction GBE: Optimize constant load with sampler. GBE: align the fields in union ImageInfoKey. utests: Fix a bug in image_1D_buffer. GBE: align the fields in union ImageInfoKey. utests: Fix a bug in image_1D_buffer. runtime: set correct state for constant buffer on hsw. runtime: set correct state for constant buffer on hsw. GBE: Refine bti usage in backend & runtime. GBE: Handle bti allocation for internal buffer used by printf. GBE: remove some useless code for getting printf buffer address. GBE: Fix a warning in getConstantPointerRegister. GBE: Fix type size for vector3 GBE: initialize BTI structure to zero. GBE: Fix a bug in gatherBTI. cmake: Fix a license issue. GBE: clear deadprintfs when current function is done. GBE: refine the llvm multi-thread related code. GBE: Fix type size for vector3 cmake: Fix a license issue. GBE: clear deadprintfs when current function is done. GBE: refine the llvm multi-thread related code. GBE: Optimize constant load with sampler. GBE: Refine bti usage in backend & runtime. GBE: Handle bti allocation for internal buffer used by printf. GBE: initialize BTI structure to zero. GBE: Fix a bug in gatherBTI. GBE/libocl: Fix sub_sat corner case. GBE: Fix sub_sat corner case. GBE: Output linkModules's error message. GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp. GBE: Fix a bug when setting flag register GBE: add legalize pass to handle wide integers Re-apply "improve the build performance of vector type built-in function." GBE: workaround register allocation fail caused by custom loop unroll. GBE: Fix live range for temporary register in replaceReg GBE: Fix kernel argument size for vector3 utests: add a test to trigger cl_float3 bug in clSetKernelArg. GBE: Fix a bitcast from float vector to wide interger issue in legalize pass. GBE: Do topological sorting of basicblocks. docs: update mixed_buffer_pointer document. Yang Rong (54): Add some hsw missed pci ids (reserved PCI IDs). Add some hsw missed pci ids (reserved PCI IDs). Fix a utest compiler_async_stride_copy typo. Fix a utest compiler_async_stride_copy typo. Only compiler X11 files and do X11 operations when found X11. Only compiler X11 files and do X11 operations when found X11. Update Beignet.mdwn X11 dependency. Two minor fix. Fix two bugs. Update Beignet.mdwn X11 dependency. Two minor fix. Fix two bugs. Update README for the command parser in drm kernel. Update README for the command parser in drm kernel. Update license disclaimer. Update license disclaimer. Avoid use GenNativeInstruction directly out of GenEncode and gen_insn_compact. BDW: Add BDW pci ids and BDW device struct. BDW: Add BDW instruction define. BDW: Add Gen8Encoder and Gen7Encoder. BDW: Add class Gen8Context. BDW: Pass Jip and Uip when patchJMPI. BDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address for BDW. BDW: add some BDW function. BDW: Fix Pointer argument curbe alloce size. BDW: enable SLM in BDW. BDW: Fix unsample bug. BDW: Refine BDW's int 32*32 multiply. BDW: BDW don't need add slm offset, remove it. BDW: Add BDW Device id to gen binary generater and binary serialize in backend. BDW: Add device's sub slice field, for cl_get_kernel_max_wg_sz. BDW: Correct scratch buffer of BDW. BDW: Forgot to set UIP of else in BDW. BDW: Correct BDW device name. BDW: Fix a scaler int 32*32 bug. BDW: Need not restore SLM setting in BDW. BDW: Correct stack setting in BDW. Fix a segment fault. Fix a HSW regression. Fix memcpy and memset bug. Fix HSW thread_n <= 64 assert. Fix a HSW constant buffer regression. BDW: Change BDW's max work group size to 512. BDW: Fix load/store half error. BDW: Also need set Shader Channel Select for constant buffer in BDW. Fix a upsample regression. Fix a HSW regression. Refine the the error handling in function cl_command_queue_ND_range_gen7. Refine the intel gpgpu delete. Fix a size assert when setup bti. BDW: Fix bwd 32*32 scalar multiplication bug. IVB/HSW/BYT: Revert the Dynamic state Base Addr and relative buffers address setting. BDW: Set the URB/REST size to 384K/384K when SLM disable. BDW: Change the default tiling mode to TILING_Y on BDW. Yichao Yu (1): Use ${PYTHON_EXECUTABLE} to run python scripts. Yongjia Zhang (6): Add Gen IR IF, ELSE and ENDIF Add Gen instruction 'else' Add structure identification on ir level Use instruction if else and endif manipulate structures Enable structural analysis GBE: fix empty block disassemble bug. Zhenyu Wang (5): Make use of write enable flag for mem bo map Clear batch buffer pointer after unmap Use pread/pwrite for buffer enqueue read/write Fix AUX buffer for page alignment Remove intel_gpgpu_check_binded_buf_address() Zhigang Gong (111): Build: Change versioning policy. runtime/driver: refine error handlings. runtime: fix some subtle event bugs. runtime/driver: refine error handlings. runtime: fix some subtle event bugs. gbe: add the new else instruction to the assert checking. docs: add a NEWS document to point to the release notes pages. docs: add a NEWS document to point to the release notes pages. Bump to 0.9.2. NEWS: update for 0.9.2. GBE: cleanup image base index related code. GBE: refine post register allocation scheduling for global buffers. GBE: refactor the immediate class to support vector data type. GBE: simplify processConstant. GBE: complete constant expression processing. GBE: enable constant expression processing. utest: add new test for constant expression processing. GBE: Reduce random behaviour of the code generation GBE: adjust preferred vector length. GBE: refactor the immediate class to support vector data type. GBE: simplify processConstant. GBE: complete constant expression processing. GBE: enable constant expression processing. utest: add new test for constant expression processing. Revert "GBE: refine post register allocation scheduling for global buffers." utests: fix two utest bugs. GBE: fix error in the rootn fastpath function for some special input. utests: fix two utest bugs. GBE: fix error in the rootn fastpath function for some special input. Add new vload benchmark/test case. GBE: optimize unaligned char and short data vector's load. GBE: relax the batch byte/short load vector size restrication. GBE: refine the unaligned data gathering. GBE: adjust preferred vector length. GBE: fixup/refine a bug for image1D array's extra binding index handling. GBE: remove the user defined macro cl_khr_fp64. GBE: avoid one optimization pass to generate wide integer. GBE: avoid one optimization pass to generate wide integer. GBE: fix a bug with LLVM 3.3. GBE: fallback if we get a wider than i64 constant. GBE: fix a bug with LLVM 3.3. GBE: fallback if we get a wider than i64 constant. GBE: cleanup image base index related code. GBE: fixup/refine a bug for image1D array's extra binding index handling. build: fix a CXXFLAGS override bug in backend directory. GBE: fix some predfeined OCL macros. Runtime: Implement clGetExtensionFunctionAddressForPlatform. Runtime: Implement clGetExtensionFunctionAddressForPlatform. GBE/libocl: fix the wrong prototype of scalar native_powr. GBE: fix bugs when handling -cl-std option. GBE: fix bugs when handling -cl-std option. GBE/libocl: Added one missing prototype fma(). GBE: don't return error if we get an empty module. GBE: Fix a potential segfault. GBE: Fix a potential segfault. GBE: fix a potential memory leak bug. GBE: fix a potential memory leak bug. GBE: don't enable double by default. GBE: don't enable double by default. GBE: fix multiple files compilation bugs. runtime: fix program binary type bug. runtime: fix build status handling. runtime: fix program binary type bug. runtime: fix build status handling. GBE: fix multiple files compilation bugs. Update readme. Update readme. Document fixup. Remove out-of-date document. Bump to 0.9.3. Remove out-of-date document. Update NEWS. GBE/libocl: add missing vector builtin definition for fma. GBE/libocl: fix a regression after libocl change. Revert "improve the build performance of vector type built-in function." GBE/libocl: fix build dependency issue. GBE: fix a loop header file including bug. GBE: structurized loop exit need an extra branching instruction when do reordering. GBE: fix a bug in legalize pass. GBE: do intrinsics lowering pass earlier. GBE: fix a legalize pass bug when bitcast wide integer to incompaitble vector. GBE: Add a customized loop unrolling handling mechanism. GBE: disable custom loop unroll for LLVM 3.3/3.4. GBE: add Selection instruction handler at legalize pass. GBE: increase maximum src/dst operands to 32. GBE: add basic PHINode support in legalize pass. GBE: fix regression caused by simple block optimization. GBE: handle dead loop BBs in liveness analysis. GBE: set default address space to -1 to avoid incorrect unroll hint. GBE: fix a wrong type of cl_device_info. utest: change the box_blur_image to be identical to box_blur. utests: replace the nodistriutable picture. GBE: fix disassembly bug. GBE: fix a bool handling bug when SEL on a uniform bool variable. GBE: Support more instructions for constant expression handling. GBE: remove useless debug info. Revert "add test for clCreateImageFromLibvaIntel" Revert "fix issue to create cl image from libva with non-zero offset" utests: remove all shader toy test cases. License: adjust all license version to LGPL v2.1+. GBE: fix relocatable issue for pch file. Revert "BDW: Change the default tiling mode to TILING_Y on BDW." GBE: fix one double related bugs for post register scheduling. update some documents. runtime: fix one bug in BDW image. Update documents. runtime: refine version handling. runtime: fix bug in cl_enqueue_read_buffer. runtime: disable userptr due to random fail. GBE: work around error reporting for unresolved symbols Bump to 1.0.0. -- Zhigang Gong, Thanks.