Skip to content

TNO-S3/piiip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

piiip - piiip interactively installs intended packages

piiip (Piiip Interactively Installs Intended Packages) is a wrapper around pip that helps to avoid installation of a different package than was intended. For example, when executing piiip install pandaa (pandaa instead of pandas), piiip asks for a confirmation before commencing the installation of pandaa. Accidentally installing a different package than was intended can result in security risks, including attackers getting control over the machine on which the unintended package is installed1. piiip is a drop-in replacement for pip; usage is exactly equal.

What can go wrong?

Using pip, it is trivial to install any desired package from PyPI by just specifying the desired package name. If the package name is incorrect however, for example due to a typo, a different package is installed than was intended. This package might contain outdated, vulnerable or even outright malicious software, which can result in a compromised machine (see 1) for an overview when and how packages can do arbitrary code execution). Malicious parties are actively uploading malicious packages to compromise systems, similar to domain typosquatting attacks. These packages, which have a name that is designed to be confused with a legitimate package name, are used to steal information, private keys or install backdoors on target machines2.

Does this actually go wrong in practice?

Yes. Several projects to protect users of pip have registered dummy packages with names that can be easily confused with popular packages. By claiming these names, real attackers cannot use the names for typosquatting purposes anymore. This is called "defensive typosquatting". Two defensive typosquatting projects 3 4 received more than a million downloads in total on their packages, showing how often a typo happens. Furthermore, a student was able to run code on 17,000 unique hosts only 7 weeks after uploading 200 packages with a name that could be easily confused with popular packages5. The Advanced Persistent Threat (APT) Lazarus also employed the package name confusion technique6. Other groups have also attracted attention by using package name confusion techniques to steal source code, cryptocurrency, SSH and GPG keys, credentials and Discord tokens.

Package name confusion and typosquatting

The term "package name confusion" is used to describe all ways in which a user can install a different package than intended. The most intuitive example of package name confusion is a typing error (typosquatting: panddas instead of pandas). Other causes include a different spelling (colourama instead of colorama), delimiter modification (charsetnormalizer instead of charset-normalizer), prefix/suffix augmentation (py-pandas instead of pandas). Neupane et al. created an overview of package name confusion categories7.

How does piiip help?

piiip adds a layer of safety by asking confirmation before installing packages. It only asks for confirmation if a package name might not represent the package that was intended to be installed. This way, piiip is not a burden on the user, but can prevent security issues. For example, when running piiip install pandas the behavior of piiip is identical to pip. But when running piiip install pandaa, piiip asks:

A package named pandas instead of pandaa exists. Are you sure you want to install pandaa? (y/n)

Examples of real malicious packages that would have triggered a warning by piiip are:

Malicious package name Real package name Category according to 7 Source
python3-dateutil dateutil Prefix/suffix augmentation Snyk
urlib3 urllib3 1-step Damerau/Levenshtein distance IQT
colourama colorama Alternate spelling Neupane et al.

Usage

piiip is fully compatible with PIP. You can use piiip in the exact same manner as pip (or pip3) and you won't see any difference until a possible name confusion occurs. In that case, piiip will ask you to confirm the installation of the package. Note that packages installed with the option --index-url are not analyzed for name confusion.

For example, if you want to install pandas you run:

piiip install pandas

For more information, run

piiip --help

piiip demo

Features

piiip currently detects the following categories7 of package name confusion:

Category Protects against: Example
Character omission Forgetting a character in the package name panda
Character addition Adding an additional character in the package name panddas
Swapped character Changing the location of two characters panads
Substituted character Exchanging a character for a random other character panfas
Prefix/suffix augmentation8 Adding a keyword before or after the package name pandas-py
Alternate spelling Exchanging a British word for an American word or vice versa colorama -> colourama
Homographic replacement Exchanging one or more characters that look alike colorama -> col0rama

Note that only one mistake can be made in the package name. Packages with two mistakes, or mistakes from two categories are not detected. Examples of what is not detected: panddas-py, pandddas and pndass.

Installing piiip

Method 1:

  1. Clone the repository
  2. Run python -m pip install .

Method 2:

Run pip install piiip.

Roadmap

  • Add detection methods for other categories7 of package name confusion:
    • Sequence reordering
    • Grammatical substitution
    • Semantic substitution
    • Asemantic substitution
    • Homophonic similarity
    • Simplification
  • Implement a more robust method to determine package popularity

How does piiip work?

piiip performs two main tasks when it receives a package name:

  1. Generating alternative package names that the user might have intended instead of the received package name
  2. Determining the popularity of all packages alternative package names and the received package

If one of the alternative package names belong to a package that is more popular than the received package name, the warning is shown. The generation of alternative package names is performed for the categories listed under Features. Popularity of packages is currently determined by using download statistics from pypistats.org.

Alternatives for other online package repositories

piiip only works for pip. For npm, TypoGard9 by Taylor et al. can be used. TypoGard has the same goal as piiip and has been integrated in (a specific version of) the npm package installer10.

Footnotes

  1. https://arxiv.org/pdf/2307.09087 2

  2. https://arxiv.org/pdf/2309.11021

  3. https://medium.com/@williambengtson/python-typosquatting-for-fun-not-profit-99869579c35d

  4. https://hackernoon.com/building-a-botnet-on-pypi-be1ad280b8d6

  5. https://incolumitas.com/data/thesis.pdf

  6. https://blogs.jpcert.or.jp/en/2024/02/lazarus_pypi.html

  7. the listed categories are taken from "Beyond Typosquatting: An In-depth Look at Package Confusion" by Neupane et al. 2 3 4

  8. for a very limited set of prefixes/suffixes

  9. https://ldklab.github.io/assets/papers/nss20-typogard.pdf

  10. https://github.com/mt3443/typogard

Releases

No releases published

Packages

No packages published

Languages