Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Fix line breaks in markdown tables #156

Merged

Conversation

BenjaminBossan
Copy link
Collaborator

Solves #154

Description

Markdown table entires shouldn't contain "\n" line breaks. Instead, those should be replaced by "<br />". This is now done for both the hyperparameter table and for tables added through card.add_table.

An example of a fixed hyperparameter table: https://huggingface.co/skops-ci/hf_hub_example-a4b4abaf-d1c3-4300-a5ac-cc9e3c65bd39#hyperparameters

Implementation

The bugfix has to be applied after calling tabulate, as the table entries can contain arbitrary objects, whose repr is only called by tabulate. That also means we have to be careful as the tabulated table string will contain valid "\n" line breaks that should be left untouched.

Unfortunately, the hyperparameter table doesn't use card.add_table, which is why the fix has to be applied in two places. IMO it would be better if it used that method under the hood but that would result in the hyperparameter table being placed in a different section (because of the inflexibility of the templating).

A minor naming issue I found: The method that actually creates the hyperparameter table is called _extract_estimator_config. I would have never guessed to be honest.

Markdown tables shouldn't contain "\n" line breaks. Instead, they should
be replaced by "<br />". This is now done for both the hyperparameter
table and for tables added through card.add_table.

Implementation

The bugfix has to be applied after calling tabulate, as the table
entries can contain arbitrary objects, whose repr is only called by
tabulate. That also means we have to be careful as the tabulated table
string will contain valid "\n" line breaks that should be left
untouched.

Unfortunately, the hyperparameter table doesn't use card.add_table,
which is why the fix has to be applied in two places. IMO it would be
better if it used the same method under the hood but that would result
in the hyperparameter table being placed in a different section (because
of the inflexibility of the templating).

A minor naming issue I found: The method _extract_estimator_config
actually creates the hyperparameter table. I would have never guessed to
be honest.
@BenjaminBossan
Copy link
Collaborator Author

BenjaminBossan commented Sep 28, 2022

Ready for review @skops-dev/maintainers

Note: CI is still failing because of the breaking change in huggingface_hub.

@BenjaminBossan BenjaminBossan requested review from adrinjalali and removed request for adrinjalali September 28, 2022 16:07
@merveenoyan
Copy link
Collaborator

@BenjaminBossan Can we just pin the version of HF hub used in CI for "test" separately until we get up to date with Hub changes?

"huggingface_hub": ("0.9.0rc3", "install", None),

Copy link
Collaborator

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot! can you please post the model cards (of the skorch example and another example that table already works) after this change here?

@BenjaminBossan
Copy link
Collaborator Author

Can we just pin the version of HF hub used in CI for "test" separately until we get up to date with Hub changes?

I suggested pinning as a quick fix until we get a proper fix here.

can you please post the model cards

I linked this above, that should be sufficient, right?

docs/changes.rst Outdated
@@ -16,6 +16,8 @@ v0.3
- Add ``private`` as an optional argument to :meth:`.hub_utils.push` to
optionally set the visibility status of a repo when pushing to the hub.
:pr:`130` by `Adrin Jalali`_.
- Fix a bug that resulted in markdown tables being rendered incorrectly if
entries contained line breaks by `Benjamin Bossan`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR number missing here.

# replace line breaks "\n" with html tag <br />, however, leave end-of-line
# line breaks (eol_lb) intact
eol_lb = "|\n"
placeholder = "$%!?" # arbitrary sting that never appears naturally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a friend once said, equivalent to banging my head against the keyboard lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's stupid :D If you have a better idea, please let me know

About the PR number: I fixed that. How do you add the correct number before you even created the PR? Do you just check the highest number so far and hope that until you actually open the PR, there hasn't been another PR in-between?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, I just add that as a commit after I create the PR :D

Ignore if there are multiple whitespaces in tables by replacing all n>1
whitespaces with n=1 whitespace.
Also ignore multiple "-", as in the 2nd line of a markdown table.
Also ignore multiple "-", as in the 2nd line of a markdown table.
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers I fixed the failing CI tests, so the PR is ready for re-review.

The error was that the markdown table had different padding between the expected and the actual outcome, e.g.:

expected:

| foo | bar |
|-----|-----|
|  1  |  2  |

result:

| foo |  bar  |
|-----|-------|
|  1  |   2   |

Neither Adrin nor I could reproduce the error locally, even on the same OS and Python version. As this padding is only relevant when reading the table in raw text but not for how it's rendered, I opted to remove it completely when comparing the resulting table with the expected table:

after processing:

| foo | bar |
|-|-|
| 1 | 2 |

I'm not super happy with this but it seemed like an okay solution. Alternatively, I thought about parsing the markdown table into a pandas DataFrame, but that seemed to be less rigorous. If you know any better solution, LMK.

@adrinjalali adrinjalali changed the title BUG Fix line breaks in markdown tables FIX Fix line breaks in markdown tables Oct 4, 2022
@adrinjalali adrinjalali merged commit f41f891 into skops-dev:main Oct 4, 2022
@BenjaminBossan BenjaminBossan deleted the bugfix-line-break-in-tables branch October 5, 2022 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants