an extension of the previous algorithm to perform nearly all computations on the GPU itself, thereby reducing CPU load

the technique is "easy to implement" although it requires an NVidia GeForce6-class card, and there is no public implementation