PEP 509 -- Add a private version to dict

PEP: 509 Title: Add a private version to dict Author: Victor Stinner <vstinner at python.org> Status: Final Type: Standards Track Created: 4-January-2016 Python-Version: 3.6

Abstract Add a new private version to the builtin dict type, incremented at each dictionary creation and at each dictionary change, to implement fast guards on namespaces.

Rationale In Python, the builtin dict type is used by many instructions. For example, the LOAD_GLOBAL instruction looks up a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses dict for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (function namespace) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when "something changes": we will call these checks "guards". The speedup of optimizations depends on the speed of guard checks. This PEP proposes to add a private version to dictionaries to implement fast guards on namespaces. Dictionary lookups can be skipped if the version does not change, which is the common case for most namespaces. The version is globally unique, so checking the version is also enough to verify that the namespace dictionary was not replaced with a new dictionary. When the dictionary version does not change, the performance of a guard does not depend on the number of watched dictionary entries: the complexity is O(1). Example of optimization: copy the value of a global variable to function constants. This optimization requires a guard on the global variable to check if it was modified after it was copied. If the global variable is not modified, the function uses the cached copy. If the global variable is modified, the function uses a regular lookup, and maybe also deoptimizes the function (to remove the overhead of the guard check for next function calls). See the PEP 510 -- Specialized functions with guards for concrete usage of guards to specialize functions and for a more general rationale on Python static optimizers.

Guard example Pseudo-code of a fast guard to check if a dictionary entry was modified (created, updated or deleted) using a hypothetical dict_get_version(dict) function: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.value = dict.get(key, UNSET) self.version = dict_get_version(dict) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary version = dict_get_version(self.dict) if version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary value = self.dict.get(self.key, UNSET) if value is self.value: # another key was modified: # cache the new dictionary version self.version = version return True # the key was modified return False

Changes Add a ma_version_tag field to the PyDictObject structure with the C type PY_UINT64_T , 64-bit unsigned integer. Add also a global dictionary version. Each time a dictionary is created, the global version is incremented and the dictionary version is initialized to the global version. Each time the dictionary content is modified, the global version must be incremented and copied to the dictionary version. Dictionary methods which can modify its content: clear()

pop(key)

popitem()

setdefault(key, value)

__delitem__(key)

__setitem__(key, value)

update(...) The choice of increasing or not the version when a dictionary method does not change its content is left to the Python implementation. A Python implementation can decide to not increase the version to avoid dictionary lookups in guards. Examples of cases when dictionary methods don't modify its content: clear() if the dict is already empty

if the dict is already empty pop(key) if the key does not exist

if the key does not exist popitem() if the dict is empty

if the dict is empty setdefault(key, value) if the key already exists

if the key already exists __delitem__(key) if the key does not exist

if the key does not exist __setitem__(key, value) if the new value is identical to the current value

if the new value is identical to the current value update() if called without argument or if new values are identical to current values Setting a key to a new value equals to the old value is also considered as an operation modifying the dictionary content. Two different empty dictionaries must have a different version to be able to identify a dictionary just by its version. It allows to verify in a guard that a namespace was not replaced without storing a strong reference to the dictionary. Using a borrowed reference does not work: if the old dictionary is destroyed, it is possible that a new dictionary is allocated at the same memory address. By the way, dictionaries don't support weak references. The version increase must be atomic. In CPython, the Global Interpreter Lock (GIL) already protects dict methods to make changes atomic. Example using a hypothetical dict_get_version(dict) function: >>> d = {} >>> dict_get_version(d) 100 >>> d['key'] = 'value' >>> dict_get_version(d) 101 >>> d['key'] = 'new value' >>> dict_get_version(d) 102 >>> del d['key'] >>> dict_get_version(d) 103 The field is called ma_version_tag , rather than ma_version , to suggest to compare it using version_tag == old_version_tag , rather than version <= old_version which becomes wrong after an integer overflow.

Backwards Compatibility Since the PyDictObject structure is not part of the stable ABI and the new dictionary version not exposed at the Python scope, changes are backward compatible.

Implementation and Performance The issue #26058: PEP 509: Add ma_version_tag to PyDictObject contains a patch implementing this PEP. On pybench and timeit microbenchmarks, the patch does not seem to add any overhead on dictionary operations. For example, the following timeit micro-benchmarks takes 318 nanoseconds before and after the change: python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()' When the version does not change, PyDict_GetItem() takes 14.8 ns for a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a guard can watch for multiple keys. For example, for an optimization using 10 global variables in a function, 10 dictionary lookups costs 148 ns, whereas the guard still only costs 3.8 ns when the version does not change (39x as fast). The fat module implements such guards: fat.GuardDict is based on the dictionary version.

Integer overflow The implementation uses the C type PY_UINT64_T to store the version: a 64 bits unsigned integer. The C code uses version++ . On integer overflow, the version is wrapped to 0 (and then continues to be incremented) according to the C standard. After an integer overflow, a guard can succeed whereas the watched dictionary key was modified. The bug only occurs at a guard check if there are exactly 2 ** 64 dictionary creations or modifications since the previous guard check. If a dictionary is modified every nanosecond, 2 ** 64 modifications takes longer than 584 years. Using a 32-bit version, it only takes 4 seconds. That's why a 64-bit unsigned type is also used on 32-bit systems. A dictionary lookup at the C level takes 14.8 ns. A risk of a bug every 584 years is acceptable.

Alternatives Expose the version at Python level as a read-only __version__ property The first version of the PEP proposed to expose the dictionary version as a read-only __version__ property at Python level, and also to add the property to collections.UserDict (since this type must mimick the dict API). There are multiple issues: To be consistent and avoid bad surprises, the version must be added to all mapping types. Implementing a new mapping type would require extra work for no benefit, since the version is only required on the dict type in practice.

All Python implementations would have to implement this new property, it gives more work to other implementations, whereas they may not use the dictionary version at all.

Exposing the dictionary version at the Python level can lead the false assumption on performances. Checking dict.__version__ at the Python level is not faster than a dictionary lookup. A dictionary lookup in Python has a cost of 48.7 ns and checking the version has a cost of 47.5 ns, the difference is only 1.2 ns (3%): $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33' 10000000 loops, best of 3: 0.0487 usec per loop $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100' 10000000 loops, best of 3: 0.0475 usec per loop

The __version__ can be wrapped on integer overflow. It is error prone: using dict.__version__ <= guard_version is wrong, dict.__version__ == guard_version must be used instead to reduce the risk of bug on integer overflow (even if the integer overflow is unlikely in practice). Mandatory bikeshedding on the property name: __cache_token__ : name proposed by Nick Coghlan, name coming from abc.get_cache_token().

: name proposed by Nick Coghlan, name coming from abc.get_cache_token(). __version__

__version_tag__

__timestamp__ Add a version to each dict entry A single version per dictionary requires to keep a strong reference to the value which can keep the value alive longer than expected. If we add also a version per dictionary entry, the guard can only store the entry version (a simple integer) to avoid the strong reference to the value: only strong references to the dictionary and to the key are needed. Changes: add a me_version_tag field to the PyDictKeyEntry structure, the field has the C type PY_UINT64_T . When a key is created or modified, the entry version is set to the dictionary version which is incremented at any change (create, modify, delete). Pseudo-code of a fast guard to check if a dictionary key was modified using hypothetical dict_get_version(dict) and dict_get_entry_version(dict) functions: UNSET = object() class GuardDictKey: def __init__(self, dict, key): self.dict = dict self.key = key self.dict_version = dict_get_version(dict) self.entry_version = dict_get_entry_version(dict, key) def check(self): """Return True if the dictionary entry did not change and the dictionary was not replaced.""" # read the version of the dictionary dict_version = dict_get_version(self.dict) if dict_version == self.version: # Fast-path: dictionary lookup avoided return True # lookup in the dictionary to read the entry version entry_version = get_dict_key_version(dict, key) if entry_version == self.entry_version: # another key was modified: # cache the new dictionary version self.dict_version = dict_version self.entry_version = entry_version return True # the key was modified return False The main drawback of this option is the impact on the memory footprint. It increases the size of each dictionary entry, so the overhead depends on the number of buckets (dictionary entries, used or not used). For example, it increases the size of each dictionary entry by 8 bytes on 64-bit system. In Python, the memory footprint matters and the trend is to reduce it. Examples: PEP 393 -- Flexible String Representation

PEP 412 -- Key-Sharing Dictionary Add a new dict subtype Add a new verdict type, subtype of dict . When guards are needed, use the verdict for namespaces (module namespace, type namespace, instance namespace, etc.) instead of dict . Leave the dict type unchanged to not add any overhead (CPU, memory footprint) when guards are not used. Technical issue: a lot of C code in the wild, including CPython core, expecting the exact dict type. Issues: exec() requires a dict for globals and locals. A lot of code use globals={} . It is not possible to cast the dict to a dict subtype because the caller expects the globals parameter to be modified ( dict is mutable).

requires a for globals and locals. A lot of code use . It is not possible to cast the to a subtype because the caller expects the parameter to be modified ( is mutable). C functions call directly PyDict_xxx() functions, instead of calling PyObject_xxx() if the object is a dict subtype

functions, instead of calling if the object is a subtype PyDict_CheckExact() check fails on dict subtype, whereas some functions require the exact dict type.

check fails on subtype, whereas some functions require the exact type. Python/ceval.c does not completely supports dict subtypes for namespaces The exec() issue is a blocker issue. Other issues: The garbage collector has a special code to "untrack" dict instances. If a dict subtype is used for namespaces, the garbage collector can be unable to break some reference cycles.

instances. If a subtype is used for namespaces, the garbage collector can be unable to break some reference cycles. Some functions have a fast-path for dict which would not be taken for dict subtypes, and so it would make Python a little bit slower.

Acceptance The PEP was accepted on 2016-09-07 by Guido van Rossum. The PEP implementation has since been committed to the repository.