Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix log output #810

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Fix log output #810

wants to merge 6 commits into from

Conversation

bryanvaz
Copy link

@bryanvaz bryanvaz commented Oct 18, 2024

Fixes issue where \n characters are randomly inserted into the console for stdout messages. This results in non-deterministic output breaking any downstream process.

Connected to #433, #496
Closes #765

Current Behaviour:

When there is significant execution time or async activity between stdout/stderr calls in an output cell, the default logger inserts newline terminators to the papermill console output. This output is different than output as executed in a live notebook client. (See below for screenshots)

Example 1:
Notebook Output:
Screenshot 2024-10-18 at 3 17 11 AM
Console Output:
Screenshot 2024-10-18 at 3 19 02 AM

=============

Example 2:
Notebook Output:

Screenshot 2024-10-18 at 3 25 31 AM
Console Output:
Screenshot 2024-10-18 at 3 24 30 AM

=============

Example 3:
Notebook Output:

Screenshot 2024-10-18 at 3 27 04 AM
Console Output:
Screenshot 2024-10-18 at 3 27 32 AM

Expected Behaviour:

When using the --log-output option, especially when using the cli, the cell output text from papermill should match the text output when running a notebook in jupyter (or other .ipynb client)

Fix:

Added a secondary logger notebook_logger to log.py that removes the extraneous terminator from the stream handlers. The notebook logger is then only used by cell outputs, leaving the default papermill logger to handle all papermill messages.

Users can override this behaviour when using papermill as a library by either setting the log.notebook_logger to the default log.loggeror modifying the parameters when instantiating a Papermill client.

Output with fix:
Screenshot 2024-10-18 at 3 47 10 AM

Reasoning behind Fix

The papermill client was using the default log.info of the default notebook client for all stdout messages from a notebook when the explicit —log-output option is used. This calls the default logger which is built as a logger for syslog. As a result, the log formatter will automatically add \n when flushing.

In Jupyter, the input cell’s stdout and stderr calls are captured and redirected to the notebook’s output cells without additional \n characters added in order to preserve the intended output formatting of the notebook’s author. If the author needs to modify their output to ensure it is compatible with whatever downstream tool/process, they should be responsible for making those changes in the notebook itself and trusting that papermill will not alter the cell’s output.

Unfortunately this does not fix the same issue in the underlying nbclient. As a result, the cell output in the output notebook is still different than if you were to run the notebook in a live system (additional newline characters.)

Excerpt from python logging that is causing the issue:

class StreamHandler(Handler):
…
    terminator = '\n'def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            # issue 35046: merged two stream.writes into one.
            stream.write(msg + self.terminator) # <-- `\n` terminator is added to every call of log._log
            self.flush()

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 9 lines in your changes missing coverage. Please review.

Project coverage is 90.87%. Comparing base (cb2eb37) to head (2ec6897).
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
- Coverage   91.54%   90.87%   -0.68%     
==========================================
  Files          17       17              
  Lines        1621     1654      +33     
==========================================
+ Hits         1484     1503      +19     
- Misses        137      151      +14     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does tqdm print properly with --log-output?
1 participant