Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't register a non-Null logging handler #232

Closed
wants to merge 1 commit into from

Conversation

shoyer
Copy link
Contributor

@shoyer shoyer commented Apr 18, 2021

Per the Python docs [1]: "It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers."

[1] https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

Pooch is a library, so the best practice to only add "NullHandler" and to let applications add their own handlers if desired.

I imagine this change might have some implications for Pooch users, but I think it probably makes sense? On the other hand, I know the boundaries between "applications" and "libraries" can get blurry for libraries that are mostly run interactively...

Per the Python docs [1]: "It is strongly advised that you do not add any
handlers other than NullHandler to your library’s loggers."

[1] https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

The best practice to only add "NullHandler" and to let _applications_
add their own handlers if desired.
@welcome
Copy link

welcome bot commented Apr 18, 2021

💖 Thanks for opening your first pull request! 💖

Please make sure you read the following:

  • Authorship Guidelines: Our rules for giving you credit for your contributions, including authorship on publications and Zenodo archives.
  • Contributing Guide: What the review process is like and our infrastructure for testing and documentation.
  • Code of Conduct: How we expect people to interact in our projects.

A few things to keep in mind:

  • Remember to run make format to make sure your code follows our style guide.
  • If you need help writing tests, take a look at the existing ones for inspiration. If you don't know where to start, let us know and we'll walk you through it.
  • All new features should be documented. It helps to write the docstrings for your functions/classes before writing the code. This will help you think about your code design and results in better code.
  • No matter what, we are really grateful that you put in the effort to do this! ⭐

@dopplershift
Copy link
Contributor

This very well could have been copied from MetPy/Siphon. The behavior you mention for Python (where it logs to stderr by default) was new with 3.x. So in the days where everyone was support 2.7 and 3.x, adding the StreamHandler() was the most expedient way to get logging on both (without doing ugly version checks).

But yeah, not necessary these days thankfully.

@leouieda
Copy link
Member

leouieda commented May 5, 2021

@shoyer thanks for submitting this! I agree that in most cases users of the library should set the handler. But this does get tricky when we blur the line between end users and developers. Pooch serves both library developers (who mainly use pooch.Pooch) and end users in notebooks (who mainly use pooch.retrieve). I'd hate to have to learn about and configure logging just to get pooch.retrieve to print a message in my data analysis notebook. That said, I'm also unhappy with the way we handle verbosity. I always have to look up how to silence pooch.retrieve when doing batch downloads.

One issue I see with simply disabling the default handler is that it would kind of break our current behavior for dependent packages. I don't want to have them suddenly wondering why Pooch stopped printing download messages. We could implement this change and make a 2.0 release but I would want to give people warning for at least 1 release that this will be changed.

A possible solution would be:

  1. Add warnings to the changelog and a DeprecationWarning (or should it be a FutureWarning?) to pooch.Pooch that logging will be off by default from v2.0.
  2. Add a verbose argument to Pooch and retrieve that removes/adds a handler temporarily while they are being executed. This is easier than having to call pooch.get_logger and works well for use in notebooks.
  3. The verbose argument can default to True to maintain current behavior and would be changed to False in 2.0.

I don't know if this is recommended or even reasonable way to use logging. I would welcome suggestions from people more experienced with this. For example, could adding/removing handlers inside functions cause problems when executing in threads?

cc @danshapero who implemented the logging code

@leouieda leouieda mentioned this pull request May 5, 2021
30 tasks
@danshapero
Copy link
Contributor

The reason I added the logging code in the first place was to make it possible to configure how much feedback Pooch would give you. Before, the code used warnings.warn to give information about, for example, the fact that a file was being downloaded fresh or that the hashes didn't match or whatever. You could silence all of these warnings at the risk of losing feedback you might want, even from other libraries. When I changed this code to use logging instead, I made it use a non-null default handler in order to preserve the old behavior of giving feedback when a file was fetched as opposed to retrieved from cache.

I'm not really sure whether I think this behavior should be changed. It seems really unlikely that it would break someone's code or workflow, so it's maybe not of the same severity as changing the API. As a developer of a downstream application, I set the Pooch logging level to warning which mostly silences the text feedback. At the same time, most of the datasets I work with are > 1GB, so having a progress bar while downloading something is much more valuable feedback than a logging event after it's completed downloading.

@leouieda
Copy link
Member

leouieda commented Jun 1, 2021

@danshapero thanks for the feedback 👍🏽

I've been thinking a bit more about this but I'm not very familiar with logging so maybe someone can shed some light:

If we register a handler (say by calling pooch.enable_logging()) inside package X, does that interfere with the logging of package Y which didn't set a handler? My instinct says "no" but I'm not sure if imports are singletons and shared between all packages currently loaded.

If the answer is "no", then we could not set the default handler and move it inside a pooch.enable_logging function. This would be accompanied by user documentation on how to set your own handler if so desired. But when using Pooch in a notebook (or just to have the previous behavior), the new function is much easier than having to deal with logging.

If it's "yes", then we'll have to think of a workaround. Having to import logging would make interactive use a bit of a pain and I wouldn't want to sacrifice that.

@leouieda
Copy link
Member

I've been doing some digging here and I'm thinking that the logging module is just not what we need for info output to users. Here is an example where our current implementation fails (remember that our default is verbosity ON):

package1.py: This one DOES NOT want logging from pooch

import pooch

pooch.get_logger().setLevel("WARNING")

def fetch():
    fname = pooch.retrieve(
        url="doi:10.6084/m9.figshare.14763051.v1/tiny-data.txt",
        known_hash="md5:70e2afd3fd7e336ae478b1e740a5f08e",
    ) 
    return fname

package2.py: Depends on package1 but DOES want logging from pooch

import package1
import pooch

def fetch():
    fname = pooch.retrieve(
        url="doi:10.6084/m9.figshare.14763051.v1/store.zip",
        known_hash="md5:7008231125631739b64720d1526619ae",
    )
    return fname

script.py: User code that calls package2.fetch and their own use of Pooch with logging on.

import package2
import pooch

package2.fetch()

pooch.retrieve(
    url="doi:10.5281/zenodo.4924875/store.zip",
    known_hash="md5:7008231125631739b64720d1526619ae",
)

In this case, there is no logging output for scipt.py because package2 imported package1. So it's possible for downstream dependencies to turn off logging for upstream packages unwillingly (or vice-versa). This is clearly not the behaviour that we want.

Requiring users to set handlers and make sure they don't interfere with other people's code seems a bit unreasonable when this could all be fixed by simply printing to STDERR and introducing a verbose keyword argument to retrieve and Pooch (which we could default to False if everyone prefers).

@leouieda
Copy link
Member

leouieda commented Feb 9, 2022

Moving this discussion to #302 and closing the PR since we're not going to change the default behaviour of Pooch to be silent by default.

Thanks for bringing this up @shoyer and getting the process started!

@leouieda leouieda closed this Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants