[Security-announce] Typo squatting and malicious packages on PyPI

This is an incident report covering the recent takedown of a number of malicious packages from the Python Package Index (PyPI), as well as the subsequent pre-emptive reservation of a range of additional project names by the PyPI administrators. New dedicated list for security announcements ============================================= After malicious packages were removed from the Python Package Index based on a report received by the Python Security Response Team, the PSRT discussed how to announce the issue, as the PSRT had no official public channel to specifically communicate Python security announcements. To that end, a new security-announce at python.org mailing list has been created to address that issue. You can now subscribe to the new mailing list here: https://mail.python.org/mailman/listinfo/security-announce This is an announce-only list; discussions are redirected to security-sig at python.org https://mail.python.org/mailman/listinfo/security-sig Rather than waiting for the new list to be available, the report and subsequent package removal were announced on the python-dev mailing list to start a discussion on how we can prevent further attempts or make them less effective: [Python-Dev] SK-CSIRT identified malicious software libraries in the official Python package repository, PyPI https://mail.python.org/pipermail/python-dev/2017-September/149569.html Malicious packages published in June 2016 ========================================= On the 6th of September 2017, the National Security Authority of Slovakia contacted the Python Security Response Team (PSRT) to report that the Python Package Index (PyPI) was hosting malicious packages. The PSRT contacted PyPI administrators and all identified packages were taken down, within 70 minutes of the PSRT receiving the report. Installing these packages sent data (name and version of the fake package, user name of the user who installs the package, hostname) to a HTTP server, but also installed the expected module so it wasn't easy to notice the attack. List of the 11 malicious packages: - acqusition - apidev-coop - bzip - crypt - django-server - pwd - setup-tools - telnet - urlib3 - urllib - xml See the advisory for more information: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/ Thanks to the National Security Authority of Slovakia for reporting the issue! Typo Squatting ============== This incident was not due to a compromise of the PyPI service nor any third-party project, but instead an instance of the "typo squatting" concern that inherently arises due to the nature of PyPI as an open publication platform with deliberately minimal barriers to participation. This form of attack is not specific to Python or the Python Package Index - it arises for any publication platform which does not impose a formal pre-publication review process on potential software publishers, and instead expects consumers of the published components to conduct their own post-publication review (either individually or collectively). Examples of other systems impacted by this kind of problem include the npmjs.com repository for JavaScript projects, the rubygems.org repository for Ruby projects, and the Domain Name System itself (which provides the origin of the term: https://en.wikipedia.org/wiki/Typosquatting ). Recent History ============== The parallels between typo squatting domain names and typo squatting dynamic language package managers were highlighted in June 2016, when Nikolai Tschacher at the University of Hamburg quantified the potential scope of the problem for the Python, Ruby, and JavaScript ecosystems in his undergraduate thesis (Summary: http://incolumitas.com/2016/06/08/typosquatting-package-managers/; Full PDF: http://incolumitas.com/data/thesis.pdf ) Subsequently, in May 2017, fate0 published around 26 packages, which remained online for a period of 5 days. fate0 wrote a blog post summarising the results of the experiment: http://blog.fatezero.org/2017/06/01/package-fishing/ In June 2017, 11 malicious packages were published on PyPI and remained online until the 6th of September (a period of 3 months). These packages sent basic user data to 121.42.217.44 server (TCP port 8080), and are the main subject of this incident report. In September 2017, Benjamin Bach and Hanno Böck started a project to reserve PyPI package names to prevent malicious usage: https://www.pytosquatting.org/ Their package setup.py sends a HTTP request for statistics collection purposes (https://github.com/benjaoming/pytosquatting ), and they report that they blocked around 7500 attempted standard library package installations over a period of 4 days from September 13th to 16th (as a point of reference, this figure represents around 0.006% of the ~123 million PyPI package downloads that took place over that period). For additional links related to typo squatting and Python package security, see: https://python-security.readthedocs.io/packages.html#pypi-typo-squatting Mitigation technique: 3rd party component review ================================================ The primary mitigation technique for these kinds of attacks (both typo squatting and social engineering) is to rely on 3rd party component reviewers that are independent of the original publishers. While this approach does tend to significantly reduce the number of available Python components to either the hundreds (commercial Python redistributors and commercially supported Linux distributions) or the thousands (community Linux distributions), as compared to the tens of thousands of components available on PyPI, it also substantially reduces the risks of inadvertently installing a malicious package. Mitigation technique: blocking package registration =================================================== The PyPI administrators have historically only had limited tools available to prohibit the use of particular project names: * registering the name themselves * updating a list of prohibited names stored directly in the source code These mechanisms have now been replaced by a database backed mechanism which administrators can update directly through the admin interface: https://github.com/pypa/warehouse/pull/2396 In addition to these explicitly prohibited names, the server now also dynamically prohibits the use of any standard library package names, based on the list extracted from the standard library documentation by the stdlib-list project: https://github.com/pypa/warehouse/pull/2409 https://github.com/pypa/warehouse/issues/2401 is an open issue to discuss whether or not we want to make any further changes to the error messages reported when attempting to either download or register a project with a prohibited name (the current behaviour is to simply report a generic 404 error for attempted downloads, and a 403 error noting that the name is prohibited for attempted uploads). Mitigation technique: client side typo detection and notification ================================================================= While the default pip client utility is unlikely to ever implement typo detection due to the additional dependencies required, the higher level pipenv client (which also incorporates virtual environment management) has been enhanced to check for similarities to the top 1000 most popular downloads from PyPI and notify the user if the package they're installing is similar to, but not the same as, one of those names: https://github.com/kennethreitz/pipenv/commit/aeaabf42f16e8167ca67af5ab7a34d864e7b358d Potential mitigation technique: server notifications for similar project names ============================================================================== While not yet implemented, notifying project maintainers (rather than the PyPI admins) when projects with similar names to their existing ones are registered offers a potential mechanism for reviewing new projects for potential typosquatting concerns without overwhelming the available resources of the Python Software Foundation's infrastructure management staff and volunteers. The PyPI admins would then only need to deal with cases where either new project names are similar to names on the prohibited, or else a maintainer of a previously published project has reviewed the new project and considers it potentially suspicious. This feature has *not* yet been implemented, and can be discussed further at https://github.com/pypa/warehouse/issues/2268 Mozilla Open Source Support grant ================================= The Python Packaging Index is currently undergoing a migration from the legacy service hosted at https://pypi.python.org to an updated service hosted at https://pypi.org. This migration is taking place as the original service was built in a way that limited the applicability of most modern design & development techniques (such as test-driven development and continuous integration). While the upload features of the legacy service were successfully switched off in July 2017, a number of other enhancements to the replacement service are still required before the legacy service can be shut off, and PyPI development can focus entirely on the new, more robust, and more contributor friendly implementation. To that end, the PSF's Packaging Working Group applied for (and was awarded in September 2017), a $170k Mozilla Open Source Support foundational grant. The scope of this grant covers the design, development, and project management activities needed to finalise the migration from pypi.python.org to pypi.org, and thus make the implementation of additional security enhancements and other features more feasible. Ongoing sustaining engineering funding ====================================== While the MOSS grant will be incredibly beneficial, the fact remains that the PyPI service and the related client applications are noticeably understaffed given their importance as pieces of infrastructure for some of the world's largest organisations. Complex migrations that could potentially have been performed in a matter of months given more focused attention (e.g. migrating to per-user installations as the default in `pip`) have instead lingered for years, as the developers involved know that they're going to have to deal with a lot of users being upset by the change, and there's only so much of that anyone is prepared to put up with as part of a volunteer activity. In-kind donations of online services have been most welcome (especially the Fastly CDN, without which there is no way the legacy service would be able to handle the current download volumes), but they primarily serve to sustain current operations: they don't typically help to move the ecosystem forward through the addition of new capabilities or making improvements to default behaviours. Currently, ongoing PyPI maintenance and operations is largely being handled by two individuals, Donald Stufft (both on his own time, and on time granted by his employer, Amazon Web Services), and Ernest W. Durbin III (entirely on his own time). They are supported in this effort by the PSF's Infrastructure Manager, Mark Mangoba, and the PSF Board. The funding received from the MOSS grant award will also allow Nicole Harris and Sumana Harihareswara to dedicate additional time to design & project management activities. Client tools benefit from a broader contributor base (as updating them doesn't carry the same risk of immediately breaking a key production service for the community), but even there, the number of paid, full-time contributors stands at a grand total of zero. In many ways, this is similar to the situation that existed with the OpenSSL project prior to 2014, before the major security vulnerability "Heartbleed" was disclosed, and customers of OpenSSL redistributors all realised that their assumption that someone was already taking care of ensuring OpenSSL's sustainability was incorrect. The industry's collective response to the crisis was the Core Infrastructure Initiative, a multimillion-dollar project announced by the Linux Foundation on April, 2014 to provide funds to critical elements of the global information infrastructure. While the PSF is currently undertaking a membership drive to encourage sign-ups of new supporting members, and maintains a page for targeted PyPI-specific donations at https://donate.pypi.io/, these initiatives are not expected to be sufficient on their own to fully cover the task of suitably maintaining the shared PyPI service. Rather, what is likely needed is for Python's larger commercial redistributors to acknowledge the importance of the Python Package Index to the developer experience they're offering to their customers, and determine an appropriate level of active upstream contribution as part of their own sustaining engineering plans. Additional Links ================ Discussion on the latest typo squatting issue reported by SK-CSIRT: * Advisory: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/ * Ars Technica: https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/ * Python-Dev: https://mail.python.org/pipermail/python-dev/2017-September/149569.html * Hacker News: https://news.ycombinator.com/item?id=15256121 * LWN: https://lwn.net/Articles/733853/ Links to Python security: * Python Security: http://python-security.readthedocs.io/ * PyPI security: https://pypi.org/security/ -- Python Security Response Team (PSRT)