- 21 May, 2020 2 commits
-
-
Jeroen Baert authored
-
Jeroen Baert authored
-
- 19 May, 2020 8 commits
-
-
Jeroen Baert authored
-
Jeroen Baert authored
Updated notes about contributing
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
Travis: Use GCC9 to build on bionic for icelake compilation support
-
- 18 May, 2020 8 commits
-
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
_mm512_bitshuffle_epi64_mask implementation
-
- 10 Apr, 2020 6 commits
-
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
Cmake support added by @shohirose
-
- 09 Apr, 2020 5 commits
-
-
Sho Hirose authored
-
Sho Hirose authored
-
Sho Hirose authored
-
Sho Hirose authored
-
Sho Hirose authored
-
- 05 Apr, 2020 5 commits
-
-
Wunkolo authored
-
Wunkolo authored
-
Wunkolo authored
Updated benchmarks using a proper release-mode build ``` CPU: Topology: Quad Core model: Intel Core i7-1065G7 bits: 64 type: MT MCP L2 cache: 8192 KiB Speed: 950 MHz min/max: 400/3900 MHz Core speeds (MHz): 1: 864 2: 958 3: 615 4: 806 5: 977 6: 704 7: 702 8: 840 00.915 ms 0.854 ms : 64-bit BMI2 instruction set 01.060 ms 1.033 ms : 64-bit AVX512 instruction set 0.881 ms 0.841 ms : 32-bit BMI2 instruction set 1.065 ms 1.102 ms : 32-bit AVX512 instruction set 1.082 ms 4.436 ms : 64-bit BMI2 Instruction set 0.894 ms 4.216 ms : 64-bit AVX512 Instruction set 00.933 ms 5.062 ms : 32-bit BMI2 Instruction set 01.075 ms 4.701 ms : 32-bit AVX512 Instruction set 08.231 ms 6.995 ms : 64-bit BMI2 instruction set 08.235 ms 8.252 ms : 64-bit AVX512 instruction set 7.091 ms 6.755 ms : 32-bit BMI2 instruction set 8.238 ms 8.264 ms : 32-bit AVX512 instruction set 6.836 ms 33.431 ms : 64-bit BMI2 Instruction set 6.872 ms 33.619 ms : 64-bit AVX512 Instruction set 06.908 ms 33.352 ms : 32-bit BMI2 Instruction set 06.839 ms 33.265 ms : 32-bit AVX512 Instruction set 57.233 ms 57.425 ms : 64-bit BMI2 instruction set 65.779 ms 66.391 ms : 64-bit AVX512 instruction set 56.659 ms 55.211 ms : 32-bit BMI2 instruction set 65.746 ms 68.178 ms : 32-bit AVX512 instruction set 55.670 ms 268.382 ms : 64-bit BMI2 Instruction set 55.339 ms 269.114 ms : 64-bit AVX512 Instruction set 60.553 ms 291.424 ms : 32-bit BMI2 Instruction set 59.651 ms 293.627 ms : 32-bit AVX512 Instruction set ```
-
Wunkolo authored
Implements fully implements encoding for 2D and 3D cases. Not particular optimized but passes all the tests. Uses BMI2 placeholder implementation for decoding for verification Current comparison against BMI2 ``` CPU: Topology: Quad Core model: Intel Core i7-1065G7 bits: 64 type: MT MCP L2 cache: 8192 KiB Speed: 834 MHz min/max: 400/3900 MHz Core speeds (MHz): 1: 969 2: 1018 3: 923 4: 1037 5: 708 6: 992 7: 613 8: 691 02.050 ms 1.818 ms : 64-bit BMI2 instruction set 04.213 ms 4.208 ms : 64-bit AVX512 instruction set 1.835 ms 1.803 ms : 32-bit BMI2 instruction set 4.222 ms 4.216 ms : 32-bit AVX512 instruction set 2.021 ms 5.489 ms : 64-bit BMI2 Instruction set 1.952 ms 5.462 ms : 64-bit AVX512 Instruction set 01.964 ms 5.285 ms : 32-bit BMI2 Instruction set 01.964 ms 5.517 ms : 32-bit AVX512 Instruction set 14.823 ms 14.787 ms : 64-bit BMI2 instruction set 33.683 ms 34.622 ms : 64-bit AVX512 instruction set 14.487 ms 14.675 ms : 32-bit BMI2 instruction set 33.629 ms 33.935 ms : 32-bit AVX512 instruction set 15.448 ms 41.621 ms : 64-bit BMI2 Instruction set 15.282 ms 43.666 ms : 64-bit AVX512 Instruction set 15.635 ms 42.059 ms : 32-bit BMI2 Instruction set 16.512 ms 44.433 ms : 32-bit AVX512 Instruction set 137.746 ms 135.549 ms : 64-bit BMI2 instruction set 314.947 ms 325.968 ms : 64-bit AVX512 instruction set 136.730 ms 132.937 ms : 32-bit BMI2 instruction set 315.256 ms 321.689 ms : 32-bit AVX512 instruction set 141.504 ms 374.482 ms : 64-bit BMI2 Instruction set 136.387 ms 381.539 ms : 64-bit AVX512 Instruction set 153.533 ms 370.826 ms : 32-bit BMI2 Instruction set 136.177 ms 377.794 ms : 32-bit AVX512 Instruction set ```
-
Wunkolo authored
-
- 04 Mar, 2020 5 commits
-
-
Jeroen Baert authored
-
Jeroen Baert authored
-
Jeroen Baert authored
Removed unused math.h
-
Jeroen Baert authored
-
Jeroen Baert authored
-
- 24 Nov, 2019 1 commit
-
-
Jeroen Baert authored
-