Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Change order of discovery of python environments for uv pip install #7124

Open
kdheepak opened this issue Sep 6, 2024 · 37 comments · Fixed by #7691
Open

Enhancement: Change order of discovery of python environments for uv pip install #7124

kdheepak opened this issue Sep 6, 2024 · 37 comments · Fixed by #7691
Labels
breaking A breaking change needs-decision Undecided if this should be done

Comments

@kdheepak
Copy link
Contributor

kdheepak commented Sep 6, 2024

From https://docs.astral.sh/uv/pip/environments/#discovery-of-python-environments

When running a command that mutates an environment such as uv pip sync or uv pip install, uv
will search for a virtual environment in the following order:

  • An activated virtual environment based on the VIRTUAL_ENV environment variable.
  • An activated Conda environment based on the CONDA_PREFIX environment variable.
  • A virtual environment at .venv in the current directory, or in the nearest parent directory.

If no virtual environment is found, uv will prompt the user to create one in the current directory
via uv venv.

I believe the implementation should be changed to check for the conda environment at the very end, i.e. this is the order it should operate in:

1. An activated virtual environment based on the `VIRTUAL_ENV` environment variable.
2. A virtual environment at `.venv` in the current directory, or in the nearest parent directory. (currently this is 3.)
3. An activated Conda environment based on the `CONDA_PREFIX` environment variable. (currently this is 2.)

This is because most users that have installed conda will have auto activate base enabled, since that is the default. For example:

image

This means when someone runs uv pip install even with a .venv folder in the current directory, it'll install all the dependencies into the conda environment, which in my opinion is not what users would want. If a .venv folder doesn't exist, I think it is appropriate to install into CONDA_PREFIX.

I made this mistake and accidentally installed all the project dependencies into my base conda environment environment.

@zanieb
Copy link
Member

zanieb commented Sep 6, 2024

Hm. I think you have a fair point but we can't tell if it's a conda base environment or if it's an actual active conda environment, right? If it's not the base environment I would expect us to prefer it over .venv.

Note this would be a breaking change.

@zanieb zanieb added the breaking A breaking change label Sep 6, 2024
@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

If it's not the base environment I would expect us to prefer it over .venv.

I thought about that too and I think there's an argument to be made for it either way.

For consistency, I personally think it is easier to remember it as "uv always prefers virtual environments", i.e. CONDA_PREFIX would be preferred last.

By changing the order, if a user really wants to install it in CONDA_PREFIX, all they have to do is delete the .venv folder. And a user that does activate a conda environment with a .venv folder present, wouldn't be surprised if uv installed into their .venv folder, given how much the documentation focuses on virtual environments and how little the documentation mentions system or conda environments.

@notatallshaw
Copy link
Contributor

Hm. I think you have a fair point but we can't tell if it's a conda base environment or if it's an actual active conda environment, right? If it's not the base environment I would expect us to prefer it over .venv.

If I am understanding the issue correctly I don't think this matters, virtual environments can be children of conda environments, but not the otherway around.

If I am using both conda and venv my typical worfklow would be:

  1. conda activate my_env
  2. python -m venv .venv / uv venv
  3. source .venv/bin/activate
  4. uv pip install {foo}

I would expect that final uv command to install in to my virtual environment, not my conda environment.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 6, 2024

Wait, does sourcing a virtual environment always create the env variable VIRTUAL_ENV? In that case I don't agree with @kdheepak

In particular if I run:

  1. conda activate my_env
  2. uv pip install {foo}

And there happens to be a ".venv" in my folder, I would expect uv to install in the conda environment, not in the the virtual environment. Perhaps print a warning that ".venv" folder was detected but not activated, so installing in conda environment.

In general, if a user is using conda, I would have an expectation that they want to install into conda environments, unless a virtual environment was explicitly activated.

So going back to the "base" environment issue, IMO conda should solve this on their side my adding an EXTERNALLY-MANAGED marker to that environment (conda/conda#12245).

@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

Can I ask when do you expect a user to do both conda activate my_env and python -m venv .venv? And if someone explicitly does create an environment, why would you want uv to install into the conda environment ever.

If someone explicitly does choose to create a .venv, then if that exists in that folder, I'd expect all installs to go into that environment. Otherwise, if it did install into conda environment, then imagine for instance a situation where you are working in a folder with a .venv, you cd out into a different folder, activate a conda environment, do something else, and come back to the folder with a .venv, and now your installs with uv will affect the conda environment.

Additionally, I can imagine users wanting to install jupyter, jq, fzf, bat, nodejs etc in a conda environment, and have that activated across different projects, (e.g. conda activate dev) and have uv manage individual python project environments, and in that case they'd want uv to install into an individual folder's .venv.

I guess my point is that if there's a .venv folder, uv should always only install in there because that's probably almost always going to be more foolproof.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 6, 2024

Can I ask when do you expect a user to do both conda activate my_env and python -m venv .venv? And if someone explicitly does create an environment, why would you want uv to install into the conda environment ever.

When I want to create a virtual environment for a specific project, but have it based on the Python versions created by conda. This can happen either because of how some dependencies are managed, or because some tooling/testing works well with virtual environments but not conda environments but I can still use conda to create the binaries (such as Python executable, openssl, etc.), I have done this a lot in the past for pip test suite.

Otherwise, if it did install into conda environment, then imagine for instance a situation where you are working in a folder with a .venv, you cd out into a different folder, activate a conda environment, do something else, and come back to the folder with a .venv, and now your installs with uv will affect the conda environment.

The opposite conclusion can be reached with similar reasoning, I'm cding around, I remember I need to install something in my conda environment, I activate it, I install with uv, and to my surprise it doesn't get installed in my conda environment when I try and run python -m {thingIjustinstalled}

I guess my point is that if there's a .venv folder, uv should always only install in there because that's probably almost always going to be more foolproof.

I do think uv should be explicit about if this is always true or not, even if it is the default behaviour, I think there should be some option to install into my conda environment regardless of if there is a .venv, the other way around is easy, you activate the .venv.

@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

Fair points. Accidentally installing into base is my real gripe here. If someone explicitly activates a conda environment and also has a .venv, I'd be happy with either installing into conda, installing into .venv or even just prompting the user on what to do.

@zanieb
Copy link
Member

zanieb commented Sep 6, 2024

I guess this really goes back to

We can't tell if it's a conda base environment or if it's an actual active conda environment, right?

If we can tell, we can have better behavior.

@notatallshaw
Copy link
Contributor

You could manually add the EXTERNALLY-MANAGED file to your base environment 😂.

If we can tell, we can have better behavior.

btw, is there a plan to try and locate all environments and the details from environment managers like conda? (I think https://github.com/microsoft/python-environment-tools attempts to do this). Then you would know if it's the base environment because it would be the conda environment either called "base" or "root".

@zanieb
Copy link
Member

zanieb commented Sep 6, 2024

Naw, we don't intend to read other environment managers or tools at this time. I think it's better to read from standard locations.

@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

Given that there's not an easy way to tell the difference, and without special casing base, my vote is for changing the order. Primarily because changing CONDA_PREFIX's dependencies unintentionally seems like a bad idea (a user likely manages that with a environment.yml file).

@notatallshaw
Copy link
Contributor

Thinking on it this would directly affect my tooling which does the following:

  1. Creates a series of conda environments for the user
  2. Activates each one and installs specific requirements into them with uv

If the order is changed, and the user happens to have created a virtual environment in the directory they run the script, then it would keep installing the requirements for each environment in the venv.

So, while I'm not against changing the ordering, I think it should be made a very explicit change, with an option to ignore the local ".venv".

@zanieb
Copy link
Member

zanieb commented Sep 6, 2024

We're pretty unlikely to change the order. This is the first complaint we've gotten and the behavior has been this way for a while. Furthermore, active environments always take precedence over .venv detection in the uv pip interface. I think this is correct and best matches pip's behavio. If you use the project interface or uv run, you shouldn't have this problem.

@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

Can you clarify what is the project interface?

@zanieb
Copy link
Member

zanieb commented Sep 6, 2024

@kdheepak
Copy link
Contributor Author

kdheepak commented Sep 6, 2024

I see, thanks for sharing.

I will defer to your judgement on this, you definitely will have a better sense of what users will likely want.

I do want to make two counter arguments though:

  1. My intuition is most people that use conda don't use uv at the moment, and manage environments manually with conda. So past complaints or lack thereof may not be descriptive.
  2. I see uv equivalent to npm for python projects, and I see .venv folders equivalent to node_modules. Having uv pip install do what is the equivalent of npm install -g (i.e. installing into the base conda environment) seems like a bad idea imho. By default, conda auto activates base, so anyone that uses uv pip install will run into this accidentally, and as uv continues to grow in the scientific community you'll have more users potentially run into this. uv even comes with --system for explicitly installing into the system version of python, i.e. uv pip install --system and currently it is only the CONDA_PREFIX feature that behaves differently. I see this one and only one feature as incongruent with what is otherwise a clean, well designed and self contained python packaging solution.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 6, 2024

FWIW, it appears you can detect if the base environment is currently activated if CONDA_PREFIX_1 is not set or CONDA_DEFAULT_ENV is equal to base (on very old versions of conda this was named root) :

https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/activation.html#conda-activate

If someone wants to make a seperate issue about uv avoiding installing into conda base environments as though they were system versions of Python?

@notatallshaw
Copy link
Contributor

Nevermind, I just made it: #7137

@zanieb
Copy link
Member

zanieb commented Sep 7, 2024

Thanks @notatallshaw! I wonder if we should close this in favor of that?

@zanieb zanieb added the needs-decision Undecided if this should be done label Sep 15, 2024
@chrisrodrigue
Copy link

chrisrodrigue commented Sep 18, 2024

We can't tell if it's a conda base environment or if it's an actual active conda environment, right?

Thinking out loud here.

Anaconda let's the user customize the installation location (on Windows it defaults to C:\ProgramFiles\anaconda3) so reading from there is unreliable. Additionally, it would probably be bad practice to read Anaconda config files (which also have platform-specific locations).

It seems that the conda philosophy is to not mutate/pollute the system path or use environment variables and instead it utilizes the current shell context by modifying the shell startup scripts via the conda init command. The base conda environment is named base by default (t was called root in legacy versions).

I think that calling/shelling out to a conda command/subprocess could be a viable option to tell apart a conda env from a conda base env. We can always rely on the conda command being accessible regardless of the platform, installation location, or whether it is accessed via shell context or path presence.

conda env list returns a list of environments with an asterisk next to the active one:

(base) PS C:\Users\me> conda env list
# conda environments:
#
base                  *  C:\ProgramData\anaconda3

This would probably be good enough, but as an extra precaution it could be cross-referenced with the conda info command to be dead certain that the true base environment is the same as the active named environment (in case someone renamed the base environment or something).

(base) PS C:\Users\me> conda info

     active environment : base
     # ...
       base environment : C:\ProgramData\anaconda3  (read only)
     # ...

Note that the conda base environment is marked as read-only in the output of conda info, so I think that it is definitely incorrect behavior for any tool to mutate the conda base environment.

@chrisrodrigue
Copy link

chrisrodrigue commented Sep 18, 2024

The output of conda info / conda env list should be the same across platforms so you could use some regex to parse out the base env path.

let conda_info_pattern = r"base environment\s*:\s*(.+?)(?:\s*\(read only\))?\s*$";
let conda_env_list_pattern = r"^\s*\*\s+([^\s]+)\s+([^\s]+)";

@chrisrodrigue
Copy link

chrisrodrigue commented Sep 18, 2024

Maybe a helper function to do this can look something like this?

/// Checks if the current conda environment is the base environment.
///
/// This function runs `conda info` and `conda env list` commands asynchronously,
/// captures the relevant information using regular expressions, and compares
/// the base environment path with the active environment path.
///
/// # Returns
/// - `Ok(true)` if the current environment is the base environment.
/// - `Ok(false)` if the current environment is not the base environment.
/// - `Err` if there is an error running the commands or processing the output.
async fn is_conda_base_env() -> Result<bool, Box<dyn Error>> {
    let base_env_pattern = Regex::new(r"base environment\s*:\s*(.+?)(?:\s*\(read only\))?\s*$")?;
    let active_env_pattern = Regex::new(r"^\s*\*\s+([^\s]+)\s+([^\s]+)")?;

    let (conda_info_output, conda_env_list_output) = try_join!(
        Command::new("conda").arg("info").output(),
        Command::new("conda").arg("env").arg("list").output()
    )?;

    Ok(
        base_env_pattern
            .captures(&String::from_utf8_lossy(&conda_info_output.stdout))
            .and_then(|caps| caps.get(1).map(|m| m.as_str().trim()))
            == active_env_pattern
                .captures(&String::from_utf8_lossy(&conda_env_list_output.stdout))
                .and_then(|caps| caps.get(2).map(|m| m.as_str()))
    )
}

@zanieb
Copy link
Member

zanieb commented Sep 18, 2024

Thanks for digging into this. I'm very hesitant to shell out to conda, it seems brittle and slow. Do you some data about how long that invocation takes?

@chrisrodrigue
Copy link

I don't, just hacked this together real quick. Not really sure what the best entrypoint for this check could be or how to go about caching the conda environment state.

Probably way better ways to do this.

@chrisrodrigue
Copy link

chrisrodrigue commented Sep 18, 2024

Environment variables might be a lot faster. Seems like conda has quite a few of them so there's probably more than one way to get the correct paths.

This page shows a lot of the environment variables that conda uses: https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/activation.html#activation-deactivation-scripts

/// Checks if the current conda environment is the base environment.
///
/// This function reads the `CONDA_PREFIX` and `CONDA_DEFAULT_ENV` environment variables,
/// extracts the last component of the `CONDA_PREFIX` path, and compares it with `CONDA_DEFAULT_ENV`.
///
/// # Returns
/// - `true` if the current environment is the base environment.
/// - `false` if the current environment is not the base environment or if either environment variable does not exist.
fn is_conda_base_env() -> bool {
    if let (Ok(conda_prefix), Ok(base_env)) = (env::var("CONDA_PREFIX"), env::var("CONDA_DEFAULT_ENV")) {
        if let Some(current_env) = Path::new(&conda_prefix).file_name().and_then(|s| s.to_str()) {
            return current_env == base_env;
        }
    }
    false
}

@notatallshaw
Copy link
Contributor

Thanks for digging into this. I'm very hesitant to shell out to conda, it seems brittle and slow. Do you some data about how long that invocation takes?

I would concur, from yy experience wrapping conda, particularly on Windows with lots of security tools installed, you can be talking about a few seconds to get a response back.

@zanieb
Copy link
Member

zanieb commented Sep 25, 2024

What about this though?

Screenshot 2024-09-25 at 12 55 00 PM

Seems like CONDA_DEFAULT_ENV doesn't just give for the base environment based on that. Should we just check if it's called base?

@zanieb
Copy link
Member

zanieb commented Sep 25, 2024

Hm

This might work actually..

(base) root@ba74a7a304f6:/# echo $CONDA_DEFAULT_ENV
base
(base) root@ba74a7a304f6:/# echo $CONDA_PREFIX     
/usr/local
(base) root@ba74a7a304f6:/# conda activate foo
(foo) root@ba74a7a304f6:/# echo $CONDA_PREFIX
/usr/local/envs/foo
(foo) root@ba74a7a304f6:/# echo $CONDA_DEFAULT_ENV
foo

@zanieb
Copy link
Member

zanieb commented Sep 25, 2024

I think I have the right idea in #7691

@notatallshaw
Copy link
Contributor

That's just #7137 with the suggestions of env vars I made there right?

@zanieb
Copy link
Member

zanieb commented Sep 25, 2024

Sorry I forgot about that issue and was riffing off of @chrisrodrigue's function. It is similar to your suggestion there, though the heuristic is a little different. I'm not sure which approach is better.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 25, 2024

As long as you don't make any assumptions about the relationship of the paths between the base environment and the other environments you should be fine. As users can both change where non-base environments are created, and in fact manually specify the directory for them on each creation.

@zanieb
Copy link
Member

zanieb commented Sep 25, 2024

Basically the assumption is that a base environment is always named "root" or "base" and doesn't have a prefix path ending in the name.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 27, 2024

Basically the assumption is that a base environment is always named "root" or "base" and doesn't have a prefix path ending in the name.

This is probably fine, but be aware nothing stops a user installing their conda software into a directory called "base", e.g. $HOME/.miniforge3/base. I think the absense of CONDA_PREFIX_1 might be a safer heuristic, but I'm no conda expert.

@chrisrodrigue
Copy link

Is CONDA_PREFIX_1 supposed to be the previously active conda env? That could be a good check to make sure that base is the “official” base created by default by conda.

Could an arbitrary user created conda env called base be exploited in this context?

@notatallshaw
Copy link
Contributor

Is CONDA_PREFIX_1 supposed to be the previously active conda env? That could be a good check to make sure that base is the “official” base created by default by conda.

I did a bit of investigation and found it doesn't do what I think, so I take back my suggestion for using CONDA_PREFIX_1! Apologies!

What I found was there are a series of CONDA_PREFIX_{N} variables that show which environments are activates on top of the other, e.g. if you start a terminal with base activates and do:

  1. "conda activate abc" - CONDA_PREFIX gets pointed to abc environment and CONDA_PREFIX_1 points to base environment
  2. "conda activate def" - CONDA_PREFIX gets pointed to def environment and CONDA_PREFIX_2 points to abc environment
  3. etc.

This is problematic to use as an indication of what is base then because the user can activate base again via conda activate base, and they might not even start in the base environment as you can configure conda not to activate an environment, e.g. auto_activate_base: False, and then they could start with conda activate abc.

@chrisrodrigue
Copy link

Good find!

I wish the conda environment variable documentation were a little more verbose so that we didn’t need to experiment 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking A breaking change needs-decision Undecided if this should be done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants