Created on 2012-06-13 23:00 by christian.heimes, last changed 2012-06-24 13:12 by christian.heimes. This issue is now closed.

Messages (96)

msg162739 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-13 23:00

The secure_compare() function immediately returns False when both strings don't have equal length. With the patch the run time of secure_compare() always depends on the length of the right side. It no longer gives away information about the length of the left side. The patch should be applied in combination with the patch in issue #14955.

msg162758 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 08:33

secure_compare leaks the password always. Note that it takes different time to create a result of ord() depending whether it's <=100 or > 100 due to caching of small numbers. Such functions should be written in C.

msg162759 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-14 09:10

We should. Adding secure functions that aren't really secure is something we should rather avoid. :) Christian, are you willing to do that?

msg162760 - (view) Author: Armin Rigo (arigo) * Date: 2012-06-14 09:18

fijal: while I agree with you, the limit for small ints has actually been pushed to 257 in recent CPythons. So it should still theoretically work --- of course, assuming a predictable CPU, which is wrong, and assuming a simple interpreter. (We can probably dig enough to find a timing issue even with CPython, and of course it's clear with any of the other Python interpreters out there.)

msg162761 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-14 09:19

I don't see how the function is going to leak this information when both this patch and the patch in #14955 are applied. With http://bugs.python.org/file25801/secure-compare-fix-v2.patch ord() is no longer used and thus avoid the timing difference for integers > 256 (NSMALLPOSINTS is defined as 257, not 100).

msg162762 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 09:22

Ah unicodes. is encode('unicode-internal') independent on the string characters? I heavily doubt so. you leak at least some information through that function alone.

msg162763 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-14 09:31

With PEP 393 unicode objects can have several representations, which makes it unlikely that *really* constant-timing functions can be devised. Speaking about this particular patch, I don't understand the point. secure_compare() is obviously meant to be used on inputs of some known length, not arbitrary data.

msg162764 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-14 09:55

IMHO it's not obvious to all users. Better safe than sorry. ;) The invariant 'known and equal length' impresses an artificial limitation. Code may need to compare outside data with internal data without exposing too many details about the structure of the internal data. The other matter should be discussed at #14955.

msg162765 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-14 10:02

I don’t want to be the killjoy but I find it highly questionable to add a function that is advertised as "secure" while we can't fully grok the complexities at play. If we can't produce a provable secure one, we should scrub the function for good; or at least rename it somehow. Unjustified perceived security is IMHO the worst possible case.

msg162766 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-14 10:10

> I don’t want to be the killjoy but I find it highly questionable to > add a function that is advertised as "secure" while we can't fully > grok the complexities at play. If we can't produce a provable secure > one, we should scrub the function for good; or at least rename it > somehow. The function is probably secure (modulo unseen bugs) in the bytestrings-of-the-same-size case. To make it "provably" secure, we could write a C version (which would be quite easy). For unicode strings things are a bit trickier though. Again, a C version could provide some guarantees (and could raise an error if the passed unicode strings use a different representation from each other).

msg162767 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 10:13

Antoine, seriously? You want to explore a function that's called "secure" when the only thing you know about it is "probably secure"? This is extremely tricky business and I think it should be called secure only if you can prove it's secure. Otherwise it's plain insecure and should not be named that.

msg162768 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 10:15

export not explore. Why can't I edit my own post?

msg162769 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-14 10:18

> Antoine, seriously? You want to explore a function that's called > "secure" when the only thing you know about it is "probably secure"? > This is extremely tricky business and I think it should be called > secure only if you can prove it's secure. Otherwise it's plain > insecure and should not be named that. What's the methodology to "prove" that it's secure? We could rename "secure" to "safe" to downtone it a bit, but it's still an improvement on the nominal equality comparison.

msg162770 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 10:27

For unicode at the very least it's not an improvement at all. With the patch mentioned that does encode it's also not an improvement at all. Prove as in reason about the function in C and make sure it does not do any conditionals depending on the input data. This is much easier than it is in Python. We did this exercise for PyPy once, just for the sake of it. We looked at generated IR and made sure a comparison is not leaking any data. As far as the function goes right now - I don't know. For now following the entire code of long_bitwise is a lot of effort - I genuinely can't say that it'll be the same for all numbers of 0-255. Can you? It's easier with low-level language simply (And yes, this is one of the few cases where I would argue it makes sense to implement something in C :)

msg162773 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-14 11:17

I recommend to revert the addition of the function, given that it can't be made secure.

msg162775 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-14 12:01

I've two suggestions: * rename the function to 'total_compare'. The name explains what the function actually does in comparison to '=='. It takes the total input values into account instead of using short circuit comparison. * restrict the function to bytes. The implementation works well with bytes but not with unicode. It's not an issue since hash digests are bytes. The docs could explain the issue with unicode and how user code can implement a reasonable safe conversion.

msg162777 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-14 12:16

Hi Christian. It's either secure or it's not. If it's not, there is no point in introducing it at all as I don't think it's a good idea to have a kind-of-secure-but-i-dont-know functions in stdlib. If you restrict input to bytes it looks okish, but I looked at all the code that's invoked on the C side and it's quite a lot of code. Does you or anyone else actually go and review all the C code that's called via various operations to check if it does or does not depend on the value of various characters? I can't tell myself, it's too long.

msg162778 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-14 12:26

> It's either secure or it's not. I don't think that's true. By that reasoning, Python is not secure so there's no point in fixing crashes or providing a hashlib module. That said, I think renaming to "total_compare" isn't really helpful. The point of the function is to be secure (as much as possible), not to do a "total" comparison.

msg162838 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 01:37

Maciej, please read http://mjg59.dreamwidth.org/13061.html "Secure" vs "not secure" is not a binary state - it's about making attacks progressively more difficult. Something that is secure against a casual script kiddie scatter gunning attacks on various sites with an automated script won't stand up to a systematic attack from a motivated attacker (also see the reporting on Flame and Stuxnet for what a *really* motivated and well resourced attacker can achieve). The hash randomisation changes didn't make Python completely secure against hashing DoS attacks - it just made them harder, by requiring attackers to figure out the hashing seed for the currently running process first. It's protecting against scatter gun attacks, not targeted ones. Performing a timing attack on Python's default short-circuiting comparison operation is *relatively easy* because the timing variations can be so large (send strings of increasing length until the time stops increasing - now you know the target digest length. Then try various initial characters until the time starts increasing again - now you know the first character. Repeat for the last character, then start with the second character and work through the string. Now you have the target hash, which you can try to crack offline at your leisure. The new comparison function is designed to significantly *reduce* the variance, thus leaking *less* information about the target hash, and making the attack *harder* (albeit, probably still not impossible). I agree with Christian's last two suggestions: change the name to total_compare, and only allow use on byte sequences (where the integer values are certain to be cached). Nothing should ever be called "safe" or "secure" in the standard library, because the immediate question from anyone that knows what they're talking about is "Secure/safe against what level of threat and what kind of threat"? People that *don't* know what they're doing will think "Secure/safe against everything" and that's dangerously misleading. Improving protection against remote timing attacks (e.g. by reducing comparison timing variance to the point where it is small relative to network timing variance) is a *lot* easier than protecting against local timing attacks. Protecting against someone with physical access to the machine hosting the target hash is harder still. Regardless, the target needs to be *improving the status quo*. Being able to tell people "using hmac.total_compare will make you less vulnerable to timing attacks than using ordinary short circuiting comparisons" is a *good thing*. We just need to be careful not to oversell it as making you *immune* to timing attacks.

msg162845 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-15 06:19

> "Secure" vs "not secure" is not a binary state - it's about making attacks progressively more difficult. Something that is secure against a casual script kiddie scatter gunning attacks on various sites with an automated script won't stand up to a systematic attack from a motivated attacker (also see the reporting on Flame and Stuxnet for what a *really* motivated and well resourced attacker can achieve). The problem here is, that _if_ you add a "secure" to the name of a method, it becomes binary. At least in the minds of the users. I know you address that, but that's the main point here. > Regardless, the target needs to be *improving the status quo*. > > Being able to tell people "using hmac.total_compare will make you less vulnerable to timing attacks than using ordinary short circuiting comparisons" is a *good thing*. We just need to be careful not to oversell it as making you *immune* to timing attacks. Why not write a C function which can be more secure than Python code? I would argue that would be an general asset for the stdlib, not just for HMAC (therefore, I'd put it elsewhere).

msg162846 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 06:31

On 14.06.2012 14:26, Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> It's either secure or it's not. > > I don't think that's true. By that reasoning, Python is not secure so > there's no point in fixing crashes or providing a hashlib module. The proper statement is "It's either time-independent or it's not". This *is* a binary state (I agree that being secure is not a binary state).

msg162847 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 06:37

> Being able to tell people "using hmac.total_compare will make you > less vulnerable to timing attacks than using ordinary short > circuiting comparisons" is a *good thing*. No, it's not. It's a *bad thing*. The two issues that have been opened since the function was first submitted indicate that people will keep inspecting the code and find out that it's not time-independent. If they had been relying on that it is, they will be upset. Since it's inherently impossible to make the function time-independent, people will be constantly annoyed about this function. I can't find anything good in that. If nobody else does, I'll revert the addition before the beta. Note that there is no *actual* issue that is being resolved by this function; it was added only because of its cuteness value.

msg162848 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 06:40

> Why not write a C function which can be more secure than Python code? For Unicode strings, it's impossible to write a time-independent comparison function even in C > I would argue that would be an general asset for the stdlib I would argue that it's not. No actual use case for this function has been demonstrated so far.

msg162850 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-15 07:08

>> Why not write a C function which can be more secure than Python code? > For Unicode strings, it's impossible to write a time-independent > comparison function even in C Really? Some comments sounded different. That's too bad but also what I suspected in the first place – it seems to complex. However, this function seems only useful to bytes anyway so why not strip it down if it _is_ possible with bytes? Am I missing something? >> I would argue that would be an general asset for the stdlib > I would argue that it's not. No actual use case for this function > has been demonstrated so far. Well, one example: https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py and any other place that compares passwords, tokens, …

msg162852 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 07:28

Can people please stop raising a false dichotomy and using that as an excuse not to do anything? The decision is not between "leak some information" and "leak no information". It is between "leak more information" and "leak less information". The timing variations with standard comparison are relatively massive and relatively easy to analyse (if the time taken goes up, you got the previous digit correct). With this comparison, they're far more subtle and require much greater analysis to figure out the significance of the timing changes. That reduces the pool of attackers to those capable of performing that analysis (or in possession of tools that will perform that analysis for them). Yes, the docs and name are currently completely unacceptable. But scorched earth is not a good answer, because that just means people will fall back to using "==" which is *even worse* from a security point of view.

msg162853 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 07:38

> Well, one example: > https://github.com/mitsuhiko/python-pbkdf2/blob/master/pbkdf2.py It says that it needs that, but I fail to understand why. pbkdf2 is used to generate encryption keys from passwords, where you don't need to compare strings at all. Instead, you derive a key from the password, and use the key e.g. for AES encryption. If you use pdkdf2 for password hashing, then you do need a comparison function, but it's irrelevant whether that is time-independent. If an attacker was able to determine that his hash brings him close to the actual hash, this is no gain in cracking - since similar hashes do not at all mean that the passwords are similar. > and any other place that compares passwords, tokens, … No no no. Any sensible place to compare passwords would use some sort of one-way function (password hash) before the comparison, so that someone breaking into the machine will not gain the clear text passwords. As a side effect, timing attacks become futile, since hash functions provide confusion and diffusion, so if a timing attack detects that it found a key that hashes similar to the real key, that doesn't get it any closer to revealing the real key.

msg162855 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 07:41

To repeat, the specific feature being proposed for retention is: * a function called hmac.total_compare() that is clearly documented as being still vulnerable to timing analysis given a sufficiently sophisticated attacker, while still being more resistant to such analysis than the standard comparison operator * restricting that function to operating on bytes, to eliminate timing variations associated with encoding/decoding of Unicode text and reduce those associated with the calculation of integer values Leaking less information on each comparison is intended to increase the effectiveness of higher level timing attack countermeasures (such as rate limiting and lockouts). Anyone that would use "hmac.total_compare" and call it done is likely using ordinary comparison today (which is even worse).

msg162856 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 07:45

> The timing variations with standard comparison are relatively massive > and relatively easy to analyse (if the time taken goes up, you got > the previous digit correct). If you have an application that is vulnerable to such an attack, you better reconsider your entire approach, rather than using a library function that says it will harden your approach (but may actually not). > Yes, the docs and name are currently completely unacceptable. But > scorched earth is not a good answer, because that just means people > will fall back to using "==" which is *even worse* from a security > point of view. It's not scorched earth. It's the standard procedure for adding features to the standard library. *At a minimum* there needs to be a use case, which has not been demonstrated yet (the OP of the original report thought he had a use case, but then agreed that he was wrong). Then, the use case should be fairly relevant for inclusion in the standard library. I wish there was an AES implementation in the library before we start worrying about stuff like this. Then, ideally, the new library function has been in wide use for some time. This was rushed, and it needs to be reverted.

msg162857 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 07:47

> To repeat, the specific feature being proposed for retention is: To repeat, no use case has been demonstrated for that function. It has been added because it was fun to write, not because it is useful.

msg162858 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-15 07:49

On Fri, Jun 15, 2012 at 9:41 AM, Nick Coghlan <report@bugs.python.org>wrote: > > Nick Coghlan <ncoghlan@gmail.com> added the comment: > > To repeat, the specific feature being proposed for retention is: > > * a function called hmac.total_compare() that is clearly documented as > being still vulnerable to timing analysis given a sufficiently > sophisticated attacker, while still being more resistant to such analysis > than the standard comparison operator > > * restricting that function to operating on bytes, to eliminate timing > variations associated with encoding/decoding of Unicode text and reduce > those associated with the calculation of integer values > > Leaking less information on each comparison is intended to increase the > effectiveness of higher level timing attack countermeasures (such as rate > limiting and lockouts). Anyone that would use "hmac.total_compare" and call > it done is likely using ordinary comparison today (which is even worse). > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue15061> > _______________________________________ > Nick, I fail to understand why are you opposing writing such a function in C. Such a function can be provably time-independent (and as MvL says this is a binary state), at least as long as it operates on bytes (I'll refrain from asking about unicode, I think it's possible, but I dunno). For the same function in python it's at the very least much harder to prove (and has bugs as we've seen) Cheers, fijal

msg162859 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-15 07:50

On Fri, Jun 15, 2012 at 9:47 AM, Martin v. Löwis <report@bugs.python.org>wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > > To repeat, the specific feature being proposed for retention is: > > To repeat, no use case has been demonstrated for that function. It > has been added because it was fun to write, not because it is useful. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue15061> > _______________________________________ > Is comparing passwords against a secure one not useful?

msg162860 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-15 07:55

>> and any other place that compares passwords, tokens, … > > No no no. Any sensible place to compare passwords would use some > sort of one-way function (password hash) before the comparison, > so that someone breaking into the machine will not gain the clear > text passwords. I agree that this is the right way to do. However I disagree that it's also the only sensible way to do in the real world. Sometimes you just _have_ to compare sensitive strings, whether you like it or not. I see your point that adding such a function would leverage bad security behavior and thus may be a bad thing. The usefulness of such a function to some(?) people is IMHO not disputable though.

msg162861 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-15 07:58

On Fri, Jun 15, 2012 at 9:55 AM, Hynek Schlawack <report@bugs.python.org>wrote: > > Hynek Schlawack <hs@ox.cx> added the comment: > > >> and any other place that compares passwords, tokens, … > > > > No no no. Any sensible place to compare passwords would use some > > sort of one-way function (password hash) before the comparison, > > so that someone breaking into the machine will not gain the clear > > text passwords. > > I agree that this is the right way to do. However I disagree that it's > also the only sensible way to do in the real world. Sometimes you just > _have_ to compare sensitive strings, whether you like it or not. > > I see your point that adding such a function would leverage bad security > behavior and thus may be a bad thing. The usefulness of such a function to > some(?) people is IMHO not disputable though. > Note that this does not relief you from using a time-independent comparison function. If you call some hash function (which time is known to the attacker), then you compare it against a stored hashed version. If you use a normal compare you're leaking the hash. This is indeed not as bad as leaking the password, but it has been demonstrated that one-direction functions are still vulnerable to some sort of attacks, so it's not ideal either.

msg162862 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 07:59

This point was discussed in #14532 when the new API was added. From http://bugs.python.org/issue14532#msg158045: """Given that this issue has affected a lot of security-sensitive third-party code (keyczar, openid providers, almost every python web service that implements "secure cookies" [1] or other HMAC-based REST API signatures), I do like the idea of adding a warning in the relevant documentation as sbt proposed. The only reason I'd recommend _not_ putting a time_independent_comparison() function in the hmac module is that it's not really HMAC-specific. In practice, any fixed-length secrets should be compared in a time-independent manner. It just happens that HMAC verification is a pretty common case for this vulnerable construct. :-)""" For password hashing, the attacker is unlikely to be able to provide the digest directly, but for signature checking it's far more likely to be the case. The idea is to make it easy for people to reduce the time variance of their digest comparisons as the *default* choice when writing security related code. Deciding whether or not the risk of a timing attack is actually significant requires you to look at the system as a whole and decide "Oh, OK, shortcircuiting comparison doesn't leave us open to timing analysis here, we can use it as a performance enhancement". (Although, in most systems, there are likely to be plenty of other less sensitive places to go after for performance improvements first)

msg162863 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 08:03

I'm not really opposed to writing it in C - I just don't think rewriting it in C should be a requirement for keeping it. Even in pure Python, it still leaks less information than the standard comparison operator.

msg162864 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 08:03

> Is comparing passwords against a secure one not useful? I claim that this use case doesn't occur in practice. Everybody uses hashed passwords. If they do compare against a plain-text password, and they want to change something about it, they should switch to hashed passwords.

msg162865 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 08:06

> I see your point that adding such a function would leverage bad > security behavior and thus may be a bad thing. The usefulness of such > a function to some(?) people is IMHO not disputable though. I think this entire issue is out of scale. There is really bigger fish to fry. The two people who really need this should post a package to PyPI.

msg162866 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 08:09

> Note that this does not relief you from using a time-independent comparison > function. If you call some hash function (which time is known to the > attacker), then you compare it against a stored hashed version. If you use > a normal compare you're leaking the hash. This is indeed not as bad as > leaking the password, but it has been demonstrated that one-direction > functions are still vulnerable to some sort of attacks, so it's not ideal > either. But you don't leak the hash - you leak the first byte of the hash if you make 256 tries, and the first two bytes if you make 65536 tries. To leak the first four bytes of the hash, you need to make 2**32 tries. So this is equivalent to a brute-force attack, which works just as well against a time-independent function. So using a time-independent function does not add any security.

msg162867 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-15 08:12

On Fri, Jun 15, 2012 at 10:09 AM, Martin v. Löwis <report@bugs.python.org>wrote: > > Martin v. Löwis <martin@v.loewis.de> added the comment: > > > Note that this does not relief you from using a time-independent > comparison > > function. If you call some hash function (which time is known to the > > attacker), then you compare it against a stored hashed version. If you > use > > a normal compare you're leaking the hash. This is indeed not as bad as > > leaking the password, but it has been demonstrated that one-direction > > functions are still vulnerable to some sort of attacks, so it's not ideal > > either. > > But you don't leak the hash - you leak the first byte of the hash if you > make 256 tries, and the first two bytes if you make 65536 tries. To leak > the first four bytes of the hash, you need to make 2**32 tries. > So this is equivalent to a brute-force attack, which works just as well > against a time-independent function. So using a time-independent > function does not add any security. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue15061> > _______________________________________ > Martin, you fail to understand how this works. You don't do 2**32 tries to leak the 4 charaters, you need 4 * 256, that's why this attack is so bad, because the time needed for the next character is brute force, but then you can move on to the next one.

msg162868 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 08:13

> For password hashing, the attacker is unlikely to be able to provide > the digest directly, but for signature checking it's far more likely > to be the case. Can you elaborate? What is the application, where is the digest checking, and what is the threat?

msg162871 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 08:18

> Martin, you fail to understand how this works. You don't do 2**32 tries to > leak the 4 charaters, you need 4 * 256, that's why this attack is so bad, > because the time needed for the next character is brute force, but then you > can move on to the next one. How so? Assume we have a hashed password, and assume we have somehow guessed the first three bytes. How can I then find out the fourth byte in only 256 tries? I would have to generate passwords whose *hash* matches in the first three bytes. This is not feasible, for any hash function that is worth its salt.

msg162872 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 08:32

That's why the vulnerable cases are far more likely to be related to *signature* checking. In those you can generally provide both the hash input (the message) and the hash target (the purported "signature"). If the signature check uses a time-dependent comparison that exhibits a lot of externally visible variance, then you can use a timing attack to find the signature that corresponds to a particular message (by keeping the message constant and changing the "signature"). Depending on the nature of the message, you're potentially done at that point (since on your final attempt your signed message was accepted), or else you may be after data that you can feed into an analysis aimed at breaking the signing key itself (a much harder prospect, but still possible given a sufficiently large sample, or a signing algorithm that is vulnerable to leaking the key as a result of chosen plaintext attacks). Yes, system level defences are also important (that's why multiprocessing turned out to not, in fact, be vulnerable to an attack based on time dependent signature comparisons), but minimising information leakage is just a good principle of secure design.

msg162873 - (view) Author: Petri Lehtinen (petri.lehtinen) * Date: 2012-06-15 08:36

For example, Django uses time independent comparison to compare signatures of signed cookies. A signed cookie consists of a plain-text value followed by a signature. An attacker wants to construct a cookie that has a malformed value and a valid signature for that value. Let's assume that a signature is a string of 16 hex characters. If a short-cut comparison was used, the attacker would require at most 16 tries to find out the first character. He first tries the signature "000...0", then "100...0", and so on until he notices that Django takes a slightly longer time to respond. Now he know what's the first character of the hash, let's assume it's "8". He then tries "8000...0", "810...0", and so on until he finds the second character. He continues this until he has the correct 16 characters. This takes at most 16 * 16 tries. But because Django uses a constant-time comparison function, the attacker cannot guess one character at a time, and he needs 16 ** 16 tries. In real world, 16 * 16 tries is not enough, of course. But repeating the same requests many times, the timing variations can be used to reveal which is the correct character in each step.

msg162875 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 08:42

FWIW, Petri's example also explains why leaking the expected length of the string is considered an acceptable optimisation in most reimplementations of this signature check comparison: the attacker is assumed to already know the expected length of the signature, because it's part of a documented protocol or API. However, I think it's more reasonable for a standard library implementation to omit that optimisation by default.

msg162877 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-15 09:11

> That's why the vulnerable cases are far more likely to be related to > *signature* checking. In those you can generally provide both the > hash input (the message) and the hash target (the purported > "signature"). I see. I wonder how feasible this attack is, though, since the signature validation involves a lot of computation (including hashing the original message), of which comparing the signature values is but a tiny step. In addition, for public-key signatures (RSA and DSA), I fail to see the threat. I can verify the signature offline, so I don't need to rely on timing attacks to find out what the correct signature would be. Plus I still need to brute-force the signatures, whether for offline generation, or in the timing attack, because it's *not* the case that the expected signature gets compared with the provided signature. Instead, something derived from the message gets compared with something derived from the purported signature. It's really only HMAC which may be vulnerable. In http://seb.dbzteam.org/crypto/python-oauth-timing-hmac.pdf the author discusses that the real protection against this attack is to fix the actual bugs in the OAuth implementation (when OAuth is designed to prevent replay attacks which would also prevent this attack). In the end, he claims that using time-independent comparison would "add even more security", but in the same hand-waving fashion shown in this issue (i.e. without providing any proof that his proposed attack actually works - which I claim it won't, due to the noise caused by the HMAC generation).

msg162880 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-15 10:00

Oh dead god, what have I done ... I threw a small stone and caused a major landslide. :) I'm all with Nick on this topic. A correctly named and documented function provides a tool to users that greatly reduced the change of a side channel attack. It's all about teaching good practice. I also agree that we must neither call it 'secure' nor documented it as 'secure'. I believe the correct term is 'hardened against timing analysis and side channel attacks' I could wrap up a quick C implementation if you like. The operator module is a better place for a total_compare() function. Do you a agree? I recommend that you read/watch Geremy Condra's PyCon talk "Through the Side Channel: Timing and Implementation Attacks in Python". The slides contain timing analysis diagrams.

msg162882 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-15 10:31

> I could wrap up a quick C implementation if you like. The operator > module is a better place for a total_compare() function. Do you a > agree? I think the function is fine in either hashlib or hmac. Putting it in one of these modules is a hint that it's security-related. On the other hand, linking to it from these modules' documentations is just as fine, if it is put in the operator module. If you make a C implementation, it could also be interesting to cover the pure-ASCII unicode case.

msg162885 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 10:52

As a first step, I'm going to make a change to: 1. Rename the function to "compare_digest" 2. Remove the support for comparing strings 3. Update the documentation to be much clearer about its limitations (including why it's considered OK to leak the expected length of the digest) If a C implemented operator.total_compare is made available, then hmac.compare_digest could be updated to use it (retaining the length shortcircuiting behaviour)

msg162888 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-15 11:14

New changeset f36af3766a20 by Nick Coghlan in branch 'default': Issue #15061: Don't oversell the capabilities of the new non-shortcircuiting comparison function in hmac http://hg.python.org/cpython/rev/f36af3766a20

msg162891 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 12:16

OK, the worst aspects (the misleading name and documentation) have been dealt with, so that leaves the questions of: 1. Avoiding leaking the length information (seems unnecessary, since most digests are part of protocols where they have a known, published length, or you can find out the length by looking at public source code) 2. Providing a C implementation via the operator module (given the restriction to bytes values, and the assumption of caching for all relevant integers, would a C reimplementation really be buying us much additional security?) As far as restoring unicode support (even in a C implementation) goes, I actually don't like the idea. For the general unicode case, as suggested in the updated documentation for hexdigest(), I believe the better approach is to try to stay in the bytes domain as much as possible, and avoid having a Unicode->bytes conversion for the expected value anywhere in the comparison timing path.

msg162892 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-15 12:21

> 2. Providing a C implementation via the operator module (given the > restriction to bytes values, and the assumption of caching for all > relevant integers, would a C reimplementation really be buying us much > additional security?) I like the fact that a C implementation can be audited much more easily. Who knows what kind of effects the Python implementation can trigger, if some optimizations get added in the future. > As far as restoring unicode support (even in a C implementation) goes, > I actually don't like the idea. For the general unicode case, as > suggested in the updated documentation for hexdigest(), I believe the > better approach is to try to stay in the bytes domain as much as > possible, and avoid having a Unicode->bytes conversion for the > expected value anywhere in the comparison timing path. The point of supporting unicode would precisely be to avoid a unicode->bytes conversion when unicode strings are received.

msg162893 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-15 12:34

Am 15.06.2012 14:21, schrieb Antoine Pitrou: > I like the fact that a C implementation can be audited much more easily. > Who knows what kind of effects the Python implementation can trigger, if > some optimizations get added in the future. Secondly we can predict the function's timing on other implementations of Python. Jython or PyPy might have different settings for small int caching -- or non at all. > The point of supporting unicode would precisely be to avoid a > unicode->bytes conversion when unicode strings are received. A byte-wise comparison of the memory representation would work IFF both sides have the same type and unicode kind. Anything else could give away details of the content. Either: PyBytes_CheckExact(a) && PyBytes_CheckExact(b) or PyUnicode_CheckExact(a) && PyUnicode_CheckExact(b) && PyUnicode_KIND(a) == PyUnicode_KIND(b) I'm not sure about CheckExact, just being paranoid.

msg162895 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-15 12:53

> > The point of supporting unicode would precisely be to avoid a > > unicode->bytes conversion when unicode strings are received. > > A byte-wise comparison of the memory representation would work IFF both > sides have the same type and unicode kind. Anything else could give away > details of the content. My proposal was to only allow them on ASCII strings. Any other unicode kind would raise an error (perhaps NotImplementedError).

msg162899 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-15 14:01

(Ah, the dangers of using a real text editor for edit fields. This got rather long, but I think it's all still relevant) I'm persuaded that a C implementation is a good idea in the long run. However, I *don't* think we should rush the design of it. It doesn't seem right to me to be adding a new function to the operator module this close to feature freeze (assuming that's even the right place to add this functionality), so I'd like to close this issue on the basis of my last round of updates, and create a new one aimed at 3.4 to provide a lower level time independent bytes comparison primitive. I have a couple of major reasons for feeling this way: Firstly, the various Python web servers and frameworks have achieved a tolerable level of security through pure Python bytes comparisons that simply avoid short circuiting (sufficent to avoid them getting CVEs raised against them, anyway) Secondly, it seems to me that the proposed lower level feature may make more sense as a bytes method rather than as a function in the operator module. Consider the proposed properties for the lower level security primitive: 1. Handle comparison of bytes objects via C level indexing and xor operations I agree on the value of this part - it makes our implementation easy to audit, and explicitly puts the burden on other implementations to support this as a security primitive. Either approach (bytes method or operator function) would achieve this goal. 2. Only leak length information for one of the supplied arguments under timing analysis. Initially I was thinking a function based implementation of this would be confusing, as it would be quite hard to remember which parameter controlled the perceived length. However, I realised it doesn't actually matter - if controlled by the expected value, then the comparison time appears independent of the input length, and doesn't leak clear timing information on the expected length. If controlled by the input value, then it varies with the input length and still doesn't leak clear information regarding the length of the expected value. However, we can exploit the asymmetry of a method call signature to make it clear that in expected.compare(provided), the time taken for the comparison is dictated by the length of "expected" (or vice-versa - whichever way gets chosen, the asymmetry should help make it easier to remember). In either case, hmac.compare_digest would remain as a simpler "argument order doesn't matter, but the expected length is assumed to be public" API (which, given real world use cases, is actually a reasonable assumption). 3. Handles Unicode objects of the same kind via C level indexing and xor operations I'm still not seeing the point of this. To avoid timing attacks based on encoding your best bet is to do any encoding steps *before* calculating and comparing any signatures (that way the encoding duration is certain to be independent of any secrets involved in the signature verification). The reason the embedded unicode support was problematic was because both the supplied *and* the expected value were being converted to integers during the comparison, and a timing analysis could potentially exploit that to retrieve information about the expected value. The timing vulnerability comes from this sequence: 1. Receive Unicode message, with signature 2. In Unicode, derive expected signature from message 3. Encode expected signature <-- Timing vulnerability here 4. Encode received signature 5. Compare the derived and expected signature The original implementation had this problem, because the repeated calls to ord() were roughly equivalent to encoding the string as UTF-32, only with even bigger timing variations (due to the integer caching behaviour) Compare that with the following sequence, which avoids the problem: 1. Receive Unicode message, with signature 2. Encode the received signature to bytes 3. Encode the message to bytes <-- Cannot expose signature related secrets 4. Derive the expected signature from the encoded message 5. Compare the derived and expected signature Given that crypto primitives are designed to operate on byte sequences *anyway*, the latter approach is likely to be the only one that works regardless of Unicode support (or lack thereof). Really, I think the Unicode support in the initial version of the function was just a mistake, pure and simple - the use of ord() during the iteration is a 2.x backwards compatibility trick for 8-bit strings that isn't applicable in the 3.x standard library. You can see this in the equivalent code in the tornado web server, where the ord() based implementation is invoked to handle Python 2 byte strings, not Unicode text: https://github.com/facebook/tornado/blob/master/tornado/web.py#L2004 4. The operator module is about protocols, this is not a protocol Unlike the other operations in operator, this operation care more about consistency of timing, and the only way of achieving that is through strict type limitations. That's more appropriately handled through a new method on the affected types than it is through a new operator function.

msg162914 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-15 15:48

> Secondly, it seems to me that the proposed lower level feature may > make more sense as a bytes method rather than as a function in the > operator module. If it's a function, though, it can compare all kinds of buffer-like objects (bytearrays, memoryviews, etc.).

msg162949 - (view) Author: Jon Oberheide (Jon.Oberheide) Date: 2012-06-16 03:05

Wow, that escalated quickly. :-) Nick, thanks for keeping things focused and on track. To recap, the primary motivation here is two-fold. First, folks are using == pretty frequently in an unsafe manner when comparing digests, signatures, and other fixed-length strings. This is not good. Second, as we've seen in this thread and elsewhere, getting this right is not easy. Which is the exact reason it belongs in python's stdlib, so that folks do not try to implement it themselves incorrectly. And again, preventing the leakage of the length of the inputs is not an intended goal here for this use case. It'd be best to keep things as simple as possible while achieving the desired security objective.

msg162950 - (view) Author: Jon Oberheide (Jon.Oberheide) Date: 2012-06-16 03:16

On a side note, it may be useful to follow the conventions that already exist in OpenBSD for their timingsafe_bcmp(3): http://www.rootr.net/man/man/timingsafe_bcmp/3 "timingsafe" may be a more reasonable naming convention that is a bit less strong the "secure" which may be more appropriate. Also, the implementation does leak the length of the string (well, actually you provide the length "n", but in real-world usage "n" is the static length of the expected input): ftp://ftp.fr.openbsd.org/pub/OpenBSD/src/lib/libc/string/timingsafe_bcmp.c

msg163159 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-19 13:10

I've increased the priority to "release blocker". Reason: We should come to an agreement how to handle the issue. In particular we must not pronounce something as secure that isn't secure. Options: 1) Remove the function. 2) Rename the function to a more sensible name and provide a bytes only implementation. I like the Jon's proposal and suggest timingsafe_compare(). 2b) optionally create a C implementation as it's much easier to check C code for timing issues.

msg163163 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-19 13:27

I thought this is settled as of f36af3766a20 (option 2)?

msg163168 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-19 14:27

Oh, I totally missed Nick's checkin. Sorry for the noise. Should we add the encode('unicode-internal') trick from #14955 as the next best way to compare to unicode strings?

msg163170 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-19 14:33

> Should we add the encode('unicode-internal') trick from #14955 as the next best way to compare to unicode strings? I don't think so.

msg163186 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2012-06-19 16:07

Unicode string timing depends on the string implementation which depends on the maximum character code in the string. Strings 'A'*9999+'$' 'A'*9999+'€' have different timings for almost all operations (inluding encode('unicode-internal')).

msg163188 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2012-06-19 16:14

Oh, I see, Antoine said the same thing (msg162771).

msg163192 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-19 17:08

I'm well aware of the fact that they have different timings. That's why I argued against including a unicode aware variant of the timing safe compare function. I've used Guido's time machine and seen requests for a unicode function in the future. ;) I think (educated guess) that s.encode('unicode-internal') discloses the least amount of information. That way I argued that we suggest it in the documentation.

msg163193 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-19 17:12

> I'm well aware of the fact that they have different timings. That's > why I argued against including a unicode aware variant of the timing > safe compare function. I would not want to repeat myself, but the compare function can be made safe if it restricts itself to a given kind of unicode string (say, all-ASCII).

msg163196 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-19 17:24

Alright, Antoine. Shall explore option 2b) "optionally create a C implementation as it's much easier to check C code for timing issues" as I suggested in http://bugs.python.org/issue15061#msg162893 ?

msg163204 - (view) Author: Georg Brandl (georg.brandl) * Date: 2012-06-19 20:08

So it's not a blocker anymore, right?

msg163329 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-21 12:14

> Shall explore option 2b) "optionally create a C implementation as it's much easier to check C code for timing issues" Definitely. I'm not sure whether that can go in 3.3 post-beta, though.

msg163333 - (view) Author: Maciej Fijalkowski (fijall) * Date: 2012-06-21 14:03

Hi. This is what we did with Armin: http://bpaste.net/show/32123/ It seems there is still *some* information leaking via side-channels, although it's a bit unclear what. Feel free to play with it (try swapping, having different object etc.)

msg163343 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-21 16:24

I've attached a header for that implements a single C function timingsafe_eq(a, b). The file is targeted for Objects/stringlib/timingsafe.h. Please review the file. Comments -------- - I only handle exact byte or unicode types (no subclasses) since a user may have overwritten __eq__ and I don't want to special case it. - The unicode path works only with compact ASCII strings. I'm not familiar with the new API so please scream if I did it wrong. - length difference is currently optimized, length 0 isn't. I could easily un-optimize the len(a) != len(b) case or optimize the len(a) == len(b) == 0 case. Open questions -------------- Where should I place the function? hashlib would be a nice place but there are multiple backends for hashlib. _hashopenssl.c seems wrong.

msg163347 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-21 16:41

> The file is targeted for Objects/stringlib/timingsafe.h. stringlib is for type-generic functions, so I don't think it should be put there. > - I only handle exact byte or unicode types (no subclasses) since a > user may have overwritten __eq__ and I don't want to special case it. We could handle all bytes-compatible objects, using the buffer API. > - The unicode path works only with compact ASCII strings. I'm not > familiar with the new API so please scream if I did it wrong. It looks ok to me. > Where should I place the function? hashlib would be a nice place but > there are multiple backends for hashlib. _hashopenssl.c seems wrong. Well, if it's meant to be a private function called from hmac, where it's defined is an implementation detail. I think practicality beats purity, so _hashopenssl is a good enough place.

msg163365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2012-06-21 20:55

> > - I only handle exact byte or unicode types (no subclasses) since a > > user may have overwritten __eq__ and I don't want to special case it. > We could handle all bytes-compatible objects, using the buffer API. It is timing unsafe. > > - The unicode path works only with compact ASCII strings. I'm not > > familiar with the new API so please scream if I did it wrong. > It looks ok to me. The user can just do timingsafe_eq(a.decode('ascii'), b.decode('ascii')). I do not see a necessity in support of unicode strings. Support ASCII strings will create the false impression that all strings are supported. About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) != 0)).

msg163366 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-21 21:00

> The user can just do timingsafe_eq(a.decode('ascii'), > b.decode('ascii')). You mean .encode()? > I do not see a necessity in support of unicode > strings. Support ASCII strings will create the false impression that all > strings are supported. I agree. > About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you > should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) != > 0)). What's the difference? They are the same.

msg163368 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2012-06-21 21:09

> You mean .encode()? Yes, of cause. timingsafe_eq(a.encode('ascii'), b.encode('ascii')). > > About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you > > should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) != > > 0)). > > What's the difference? They are the same. Laziness. If "a" (a secret key) is not bytes then PyBytes_CheckExact(b) ("b" is a user input) is not called. It exposes secret key type. I'm not sure if it is real secret however.

msg163371 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-21 21:45

> > > - I only handle exact byte or unicode types (no subclasses) since a > > > user may have overwritten __eq__ and I don't want to special case it. > > We could handle all bytes-compatible objects, using the buffer API. > > It is timing unsafe. How so? > > > - The unicode path works only with compact ASCII strings. I'm not > > > familiar with the new API so please scream if I did it wrong. > > It looks ok to me. > > The user can just do timingsafe_eq(a.decode('ascii'), > b.decode('ascii')). I don't think that's the right answer, because people will instead e.g. encode('utf-8'), and suddently the encodingly will not be timing-safe.

msg163377 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-21 22:40

I'm a bit rusty and I hope I got it right. The ASCII unicode case is a good idea and IMO timing safe. The buffer path is also timing safe once I have both views. The function leaks some timing information when an error occurs. Since the timing just reveals minimal information about the involved types and none about the bytes it's IMO safe. The acquiring of the buffer views may leak an unknown amount of timing data which may be an issue. The comparison is still safe. I've introduced a new module _hashlibfb (fb = fallback) for systems without openssl. I'm also open for a completely new module for future implementation of other digest, key derivation (PBKDF2) and password related C code.

msg163378 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-21 23:26

The patch has another flaw. The compiler may choose to fold and optimize code in _tscmp(). I'm going to declare the length of the right side and both char* as volatile. That should stop any compiler. I could also add some pragmas: MSVC: #pragma optimize("", off) code #pragma optimize("", on) GCC 4.4+: #pragma GCC push_options #pragma GCC optimize ("O0") code #pragma GCC pop_options

msg163385 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2012-06-22 06:08

> > > We could handle all bytes-compatible objects, using the buffer API. > > It is timing unsafe. > How so? I checked myself, and I see that most likely I was wrong. At least for bytes and bytearrays it is timing safe. > I don't think that's the right answer, because people will instead e.g. > encode('utf-8'), and suddently the encodingly will not be timing-safe. And what of that? It is outside of the timingsafe_eq function. People can also do other timing unsafe operations with the secret key (for example reading it from file or from DB) or not to use timingsafe_eq at all. The secret key should be pre-encoded. The error will be if code works for developer from ASCII word, and then on the other side of ocean it will no longer work with non-ASCII strings. You are expected to be familiar with such issues. In any case, the obvious (and simplest, and fastest) way to check that a string is ASCII-only is try to encoded it to ASCII.

msg163390 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-22 10:09

> The error will be if code works for developer from ASCII word, and then > on the other side of ocean it will no longer work with non-ASCII > strings. You are expected to be familiar with such issues. In any case, > the obvious (and simplest, and fastest) way to check that a string is > ASCII-only is try to encoded it to ASCII. No, the fastest way is to check the kind attribute of the unicode object in C code. That doesn't involve any additional conversion or Python function call. The function is deliberately limited. The ASCII fallback is very useful as most people will store hex encoded bytes of their passphrases in their databases. With ASCII support you can do timingsafe_compare(hex_from_db, hmac.hexdigest()). Maciej: http://pastebin.com/ZAAjSkJh The C function is one order of magnitude faster and the spread is one order smaller. 1e-07 is within the noise level on my idle computer. A busy server will have a higher noise level.

msg163468 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-22 19:42

In order to get the patch in before the beta release I'm willing to drop the promise and document that the function may leak some information if the arguments differ in length or the arguments' types are incompatible.

msg163469 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-22 19:50

> In order to get the patch in before the beta release I'm willing to > drop the promise and document that the function may leak some > information if the arguments differ in length or the arguments' types > are incompatible. That's not a problem to me. Programming errors are not the nominal case.

msg163613 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-23 14:20

Updated patch with volatile, better error report for non-ASCII strings and updated comments

msg163614 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-23 14:25

I'm not really happy with the addition of a separate extension module for a single private function. You could just put it in the operator module, for instance. Also, the idea was not to expose timingsafe_cmp but to use it in compare_digest().

msg163615 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-23 14:41

Me neither but you didn't want it in the operator module in the first place (msg162882). :) Please make a decision. I'm happy to follow it. My idea is to drop the pure Python implementation of compare_digest() and just use the C implementation.

msg163617 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-23 14:50

> Me neither but you didn't want it in the operator module in the first > place (msg162882). :) Please make a decision. I'm happy to follow it. Oh, sorry. I've changed my mind about it, but I think the operator module should only export a private function (and hmac or hashlib export it publicly).

msg163619 - (view) Author: Georg Brandl (georg.brandl) * Date: 2012-06-23 14:54

Doesn't belong into operator IMO. We used to have a "strop" module where it would have fitted...

msg163623 - (view) Author: Nick Coghlan (ncoghlan) * Date: 2012-06-23 15:08

This is why I wanted to close the issue with the pure Python implementation, and punt on the question of a C accelerator for the moment. compare_digest is effectively the same as what all the Python web servers and frameworks do now for signature checking. Yes, it's more vulnerable to timing attacks than a C implementation, but it's going to to take a sophisticated attacker to attack that through the noise of network jitter. It's sufficiently hardened that's it's unlikely to be the weakest link in the security chain. For 3.4, I hope to see a discussion open up regarding the idea of something like a "securitytools" module that aims to provide some basic primitives for operations where Python's standard assumptions (such as flexibility and short circuiting behaviour) are a bad fit for security reasons. That would include exposing a C level full_compare option, as well as the core pbkdf2 algorithm.

msg163625 - (view) Author: Antoine Pitrou (pitrou) * Date: 2012-06-23 15:19

> Doesn't belong into operator IMO. We used to have a "strop" module > where it would have fitted... Again, it can be a private function in the operator module that happens to be wrapped or exposed in the hmac module. Practicality beats purity.

msg163626 - (view) Author: Georg Brandl (georg.brandl) * Date: 2012-06-23 15:22

Yes.

msg163627 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-23 15:24

> Again, it can be a private function in the operator module that happens > to be wrapped or exposed in the hmac module. Practicality beats purity. Yes, we just need a place for the function. The operator module is a good place if we don't want to introduce a new module. Nick: I'll target pbkdf2 and other password hashing / security related for 3.4.

msg163630 - (view) Author: Hynek Schlawack (hynek) * Date: 2012-06-23 15:35

> For 3.4, I hope to see a discussion open up regarding the idea of something like a "securitytools" module that aims to provide some basic primitives for operations where Python's standard assumptions (such as flexibility and short circuiting behaviour) are a bad fit for security reasons. That would include exposing a C level full_compare option, as well as the core pbkdf2 algorithm. Strong +1 on that one. We could even consider adding bcrypt and scrypt as C isn't really an issue for us. Ideally we'd add a module with docs which both promote and leverage secure behavior. Basically how to realize http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html in Python.

msg163652 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-23 17:57

New patch. The compare_digest method now lives in the operator module as operator._compare_digest

msg163671 - (view) Author: Martin v. Löwis (loewis) * Date: 2012-06-23 20:42

>>> About code. Instead (PyBytes_CheckExact(a) && PyBytes_CheckExact(b)) you >>> should use ((PyBytes_CheckExact(a) != 0) & (PyBytes_CheckExact(b) != >>> 0)). >> >> What's the difference? They are the same. > > Laziness. If "a" (a secret key) is not bytes then PyBytes_CheckExact(b) > ("b" is a user input) is not called. It exposes secret key type. I'm not > sure if it is real secret however. I see; I missed that your version was using &. In any case, I don't think this is a threat: you couldn't use it to get the secret key faster.

msg163696 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-24 00:07

New patch. I've removed the special handling of PyBytes_CheckExact, support subclasses of str, non compact ASCII str and updated the docs. (Next time I'll create a sandbox and push my work to its own branch.)

msg163780 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-24 11:49

New changeset c82451eeb595 by Christian Heimes in branch 'default': Issue #15061: Re-implemented hmac.compare_digest() in C http://hg.python.org/cpython/rev/c82451eeb595

msg163784 - (view) Author: Christian Heimes (christian.heimes) * Date: 2012-06-24 13:12