Date: Sun, 2 Sep 2012 19:13:37 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: using scrypt for user authentication Since this thread is so old, I'll over-quote, then add the new stuff below the quote: On Tue, Aug 07, 2012 at 10:54:36AM +0200, Pavel Kankovsky wrote: > On Thu, 12 May 2011, Solar Designer wrote: > > > 1. Use such settings that scrypt doesn't use more than, say, 1 MB of > > memory. Is 1 MB way too low? Is scrypt at this setting significantly > > better than bcrypt or not? > > According to Colin Percival's BSDCan2009 paper the amortized cost (chip > area times time) of scrypt is (at least) 1024 N^2 r^2 p s t where > parameters N and r determine the size of memory (1024 N r + O(r) bits), p > is a paralellization parameter and s and t are unit costs of storage and > computation. > > The paper claims the cost of scrypt with (N, r, p) = (2^14, 8, 1) is > approximately 35 times higher than the cost of bcrypt with cost = 11 while > the time needed to compute both of those functions on a general-purpose > CPU is comparable. These ratios are probably quite stable even when > hardware evolves and unit costs (s, t) change. > > The aformentioned parameters (N = 2^14, r = 8) correspond to 16 MiB of RAM > if my calculation is correct. In order to reduce memory consumption to > 1 MiB you would have to reduce the product of N and r 16-fold. p can be > increased from 1 to 16 now but the overall cost would still be reduced by > a factor of 16 because its dependence on N and r is quadratic. > > Such a change would degrade the strength of scrypt almost to the level of > bcrypt. On customized hardware. On the other hand, it would probably use > enough memory and memory bandwidth to choke GPUs and other hardware that > has not been explicitly designed to crack it. We got a curious real-world data point due to Litecoin using scrypt with (2^10, 1, 1), which corresponds to 128 KiB. I guess the (misguided?) intent was to use L2 cache instead of RAM. IIRC, scrypt was designed not to bump into RAM bandwidth on typical computers anyway, so there was little point in limiting it to just L2 cache. Well, perhaps it'd bump into RAM bandwidth with many CPU cores running instances of scrypt at once. (How many concurrent instances would this take?) A result is that GPU miners for Litecoin now exist, and they outperform CPUs roughly by a factor of 10 per-chip: https://github.com/litecoin-project/litecoin/wiki/Mining-hardware-comparison Yes, they do optionally use the time-memory tradeoff previously mentioned in here: http://www.openwall.com/lists/crypt-dev/2011/07/01/1 http://www.openwall.com/lists/crypt-dev/2011/07/01/2 although according to the documentation for cgminer "performance peaks at a gap of 2", which corresponds to very light use of recomputation (avoiding storage of every other lookup table element). Thus, perhaps part of the 10x speedup comes not from the GPUs' computing power, but rather from GPU cards' memory subsystem being on par with or faster than CPUs' L2 cache for scrypt's access pattern (entire cache lines being requested) and from the greater concurrency of such accesses. For comparison, current implementations of bcrypt on CPU and GPU achieve roughly the same speed per-chip (no advantage from GPUs yet, except that it may be cheaper to build hobbyist multi-GPU than multi-CPU systems). Thus, we can see that scrypt with improper settings can be worse than bcrypt for a very realistic attack. (For attacks with ASICs, things may be different, but attacks with GPUs are relevant too.) Curiously, bcrypt uses only 4 KiB, which is less than Litecoin scrypt's 128 KiB. However, bcrypt's use of L1 (rather than L2) cache on CPUs provides an advantage to CPU implementations, and its access pattern (random 32-bit accesses) makes attempts to use L2+ cache or RAM (for many concurrent instances of bcrypt) very inefficient. On GPUs, best performance at bcrypt is currently achieved by using local memory (very limited in size) and keeping many of the execution units idle (since there's not enough local memory to use them all). Trying to use global memory (partially cached) slows things down. (A local+global mixed implementation might be a little bit faster than local alone, though. This was briefly discussed on john-dev.) Credit: the experiments with bcrypt on GPU were performed by Sayantan Datta. It is plausible that scrypt at 1 MiB is comparable to bcrypt in terms of attacks with GPUs, but this is not certain. scrypt at 1 MiB might as well be somewhat weaker than bcrypt in this respect. Overall, I think scrypt at 1 MiB is not a good replacement for bcrypt currently. We need bigger settings. (And if we go for much bigger settings, this may imply a need to limit concurrency when scrypt is used on authentication servers.) We might also want to have a tradeoff-defeating variation of scrypt, as Anthony Ferrara suggested or otherwise. It appears that this would make scrypt maybe a factor of 2 more resistant to attacks with current GPUs at Litecoin's low settings and perhaps a lot more at bigger settings (where the total GPU card's RAM size is more easily hit if the tradeoff is not used in an attack). Maybe this property should be configurable (a Boolean parameter, to be encoded along with the resulting encrypted data or password hash). As usual, I'd appreciate any comments. Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.