Skip to content

Conversation

@MaxDall
Copy link
Collaborator

@MaxDall MaxDall commented Dec 4, 2025

Add a script to parse the latest time a publisher succeeded from the Publisher Coverage action:

Example Output:

python -m scripts.check_coverage -n 150

Found workflow ID: 73755076
Latest run on '2025-12-04 14:15:34+00:00' with 16 failed publishers.
['CBC News', 'SeznamZpravy', 'Euronews (DE)', 'FreiePresse', 'N-Tv', 'Euronews (FR)', 'Le Monde', 'Morgunbladid', 'Die Neue Südtiroler Tageszeitung', 'Landesspiegel', 'Lesotho Times', 'Dagbladet', 'The Portugal News', 'Euronews (EN)', 'The Mirror', 'Daily Maverick']
Scanning runs in descending date order...
Scanning run 16171749722 from 2025-07-09: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [02:08<00:00,  1.17it/s] 

====== Publisher Failure Timeline ======

Publisher                        'Last Success'
-------------------------------------------------------------------------------------
Morgunbladid                     2025-12-02
Euronews (FR)                    2025-11-24
SeznamZpravy                     2025-11-24
Euronews (EN)                    2025-11-24
Euronews (DE)                    2025-11-24
Daily Maverick                   2025-11-19
The Mirror                       2025-11-19
N-Tv                             2025-11-05
The Portugal News                2025-10-21
CBC News                         2025-09-14
Lesotho Times                    2025-09-12
Le Monde                         UNKNOWN
FreiePresse                      UNKNOWN
Dagbladet                        UNKNOWN
Landesspiegel                    UNKNOWN
Die Neue Südtiroler Tageszeitung UNKNOWN

@MaxDall MaxDall requested review from addie9800 and dobbersc December 4, 2025 15:16
Copy link
Collaborator

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really useful addition; it will save us a lot of headaches when addressing layout changes!

return None

# Use cached file if it exists and caching is enabled
if use_cache and os.path.exists(cache_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should display a message indicating whether the cached files are being used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add something, but it should be easy to notice anyways in terms of execution time. I also think the usual workflow would rely on using the cached files.

if a.name == __ARTIFACT_NAME__:
zip_url = a.archive_download_url
r = requests.get(zip_url, headers={"Authorization": f"token {__TOKEN__}"})
if r.status_code != 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add some user feedback in this case to make debugging easier

__REPO__ = "flairNLP/fundus"
__WORKFLOW_NAME__ = "Publisher Coverage"
__ARTIFACT_NAME__ = "Publisher Coverage"
__TOKEN__ = os.getenv("GITHUB_TOKEN") # needs to be a fine-grained token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have multiple questions here. A fine-grained token would be a personal access token linked to a specific personal account. Since it can be defined to be only valid for a single repo (or so I thought, I couldn't choose fundus explicitly though), that might not be too much of an issue, but did you also look into the option of a GitHub App Access token? Also, if we stick to the version with the fine-grained token, I would also add the necessary permissions this token must have in order to keep the setup as simple as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you also look into the option of a GitHub App Access token?

No, but I also wouldn't know what that is or differs that from the rest. Maybe you can take a look into this?

Also, if we stick to the version with the fine-grained token, I would also add the necessary permissions this token must have in order to keep the setup as simple as possible.

I will add something about the permission, but as far as i understand, we can only download artifacts from GitHub using a fine grained token, nut it doesn't need to have any permissions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nut it doesn't need to have any permissions

Ah yes, my bad. It seems like I misunderstood the docs there.

Maybe you can take a look into this?

It seems like there is a bit of an overhead in setting this up, but from what I understand GitHub Apps cover the functionality we would ideally like to have, by creating an option for authentication that is not linked to a specific user and can be tied to a specific repository. By default, only the repo owner can install a GitHub App, though, so if we want to try it, we could get the permissions for that. I would be happy to look into it if you think we should go ahead. The alternative would be for one of us to create a token that gives read access to all public repositories we are a part of, which I guess is not that bad, but probably something that should be avoided if possible.

pbar.update()

if not txt:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a warning here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some messages and a verbosity flag

continue

if not (parsed := parse_coverage_file(txt)):
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants