Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes issue where
\n
characters are randomly inserted into the console for stdout messages. This results in non-deterministic output breaking any downstream process.Connected to #433, #496
Closes #765
Current Behaviour:
When there is significant execution time or async activity between stdout/stderr calls in an output cell, the default logger inserts newline terminators to the papermill console output. This output is different than output as executed in a live notebook client. (See below for screenshots)
Example 1:
Notebook Output:
Console Output:
=============
Example 2:
Notebook Output:
Console Output:
=============
Example 3:
Notebook Output:
Console Output:
Expected Behaviour:
When using the
--log-output
option, especially when using the cli, the cell output text from papermill should match the text output when running a notebook in jupyter (or other.ipynb
client)Fix:
Added a secondary logger
notebook_logger
tolog.py
that removes the extraneous terminator from the stream handlers. The notebook logger is then only used by cell outputs, leaving the defaultpapermill
logger to handle all papermill messages.Users can override this behaviour when using papermill as a library by either setting the
log.notebook_logger
to the defaultlog.logger
or modifying the parameters when instantiating a Papermill client.Output with fix:
Reasoning behind Fix
The papermill client was using the default
log.info
of the default notebook client for all stdout messages from a notebook when the explicit—log-output
option is used. This calls the default logger which is built as a logger for syslog. As a result, the log formatter will automatically add\n
when flushing.In Jupyter, the input cell’s stdout and stderr calls are captured and redirected to the notebook’s output cells without additional
\n
characters added in order to preserve the intended output formatting of the notebook’s author. If the author needs to modify their output to ensure it is compatible with whatever downstream tool/process, they should be responsible for making those changes in the notebook itself and trusting that papermill will not alter the cell’s output.Unfortunately this does not fix the same issue in the underlying nbclient. As a result, the cell output in the output notebook is still different than if you were to run the notebook in a live system (additional newline characters.)
Excerpt from python
logging
that is causing the issue: