On 5/19/16 4:12 AM, Jens Müller wrote: > The code applying the sentinel optimization assumes mutability of > the input. That needs to be checked for. Indeed. As I mentioned after discussing find, I didn't worry about those checks assuming they were obvious. > That's fine for partition > because that is assumed to be in-place. But for other algorithms it's > not so obvious. It's sad that the optimization works only for > non-const input. Several optimizations only apply to mutable data. Others apply to immutable data. It's the way things go. > I didn't get the idea behind sentinels for sparse dot product. I picked the > smallest of the last elements (so you need bidirectional ranges) and fix > up as > needed. For gdc I get a speedup (baseline over new implementation) of > 1.2 in > best case and >1.0 in worst case. On average it's about 1.1 I would say. I > expected more. How would you approach sentinels with the sparse dot > product. Can > you elaborate the idea from the video? I didn't get it. That's the idea - to only need to bounds check on one of the three possibilities. What test data did you use? 10%-20% win on dot product is significant because for many algorithms dot product is a kernel and dominates everything else. For those any win goes straight to the bottom line. > The base line (dot1 in the graphs) is the straightforward version > > --- > size_t i,j = 0; > double sum = 0; > while (i < a.length && j < b.length) > { > if (a[i].index < b[j].index) i++; > else if (a[i].index > b[j].index) j++; > else > { > assert(a[i].index == b[j].index); > sum += a[i].value * b[j].value; > i++; > j++; > } > } > return sum; > --- That does redundant checking. There's a better baseline: --- if (a.length == 0 || b.length == 0) return 0; const amax = a.length - 1, bmax = b.length - 1; size_t i,j = 0; double sum = 0; for (;;) { if (a[i].index < b[j].index) { if (i++ == amax) break; } else if (a[i].index > b[j].index) { bumpJ: if (j++ == bmax) break; } else { assert(a[i].index == b[j].index); sum += a[i].value * b[j].value; if (i++ == amax) break; goto bumpJ; } } return sum; --- Then if you add the sentinel you only need the bounds tests in the third case. > BTW the effects vary greatly for different compilers. > For example with dmd the optimized version is slowest. The baseline is > best. Weird. With gdc the optimized is best and gdc's code is always > faster than dmd's code. With ldc it's really strange. Slower than dmd. I > assume I'm doing something wrong here. > > Used compiler flags > dmd v2.071.0 > -wi -dw -g -O -inline -release -noboundscheck > gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0 > -Wall -g -O3 -fomit-frame-pointer -finline-functions -frelease > -fno-bounds-check -ffast-math > ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1 > -wi -dw -g -O3 -enable-inlining -release -boundscheck=off > > Am I missing some flags? These look reasonable. > I uploaded my plots. > - running time > - speed up I uploaded my plots.- running time https:// www.scribd. com/doc/ 312951947/ Running-Time - speed up https:// www.scr ibd.com/ doc/312951964/ Speedup