Add downloader for AVCX #202

afontenot · 2024-07-31T04:10:31Z

Adds a downloader for AVCX (American Values Club Crosswords). These are popular crosswords from a variety of creators, see https://avxwords.com/about-us/.

This is a subscription-only crossword series, and requires authentication. This is handled in exactly the same way as NYT.

This downloader may not seem to serve an obvious purpose, given that AVCX emails subscribers an AcrossLite compatible .puz file for every new release. However, I'm thinking it will be useful for the following features:

Automatic downloading of new crosswords, e.g. using a crontab.
Would allow downstream software like Gnome Crosswords to fetch AVCX automatically.
I do a few fixups on the .puz files and include difficulty metadata and the puzzle notes.
Some crosswords are not available in .puz from the AVCX website, e.g. barred crosswords like https://avxwords.com/puzzles/1621. As these are available in JPZ I think it would be nice to do a very minimal translation into a close .puz approximation. (This is currently TODO.)

Adds a downloader for AVCX (American Values Club Crosswords). These are popular crosswords from a variety of creators, see https://avxwords.com/about-us/. This is a subscription-only crossword series, and requires authentication. This is handled in exactly the same way as NYT. This downloader may not seem to serve an obvious purpose, given that AVCX emails subscribers an AcrossLite compatible .puz file for every new release. However, I'm thinking it will be useful for the following features: * Automatic downloading of new crosswords, e.g. using a crontab. * Would allow downstream software like Gnome Crosswords to fetch AVCX automatically. * I do a few fixups on the .puz files and include and difficulty metadata. * Some crosswords are not available in .puz from the AVCX website, e.g. barred crosswords like https://avxwords.com/puzzles/1621. As these are available in JPZ I think it would be nice to do a very minimal translation into a close .puz approximation.

afontenot · 2024-07-31T05:19:21Z

Just noticed there's sort of a JPZ parser already in compilerdownloader.py, but there are subtle differences. The compiler parser doesn't handle when the clue text is inside a  element, as it is in AVCX, and it also appears to have no handling at all for barred crosswords. It would have to be extended if it were to work for AVCX, but it's certainly a starting point.

thisisparker · 2024-07-31T17:56:14Z

xword_dl/downloader/avcxdownloader.py

+
+    def find_solver(self, url):
+        if "puzzles" in url:
+            url = url.removesuffix("/")


This is 3.9+ and 3.8 is not quite EOL yet but I'm okay with that

Yep, normally I'd be all for backward compatibility, but 3.8 will probably be EOL before the next release of xword-dl, and at this point even Debian oldstable has 3.9. Still, willing to to change it if you'd prefer.

thisisparker · 2024-07-31T19:34:24Z

At a glance, this is great! I want to poke around at it some and test it out, but as an AVCX subscriber I would totally use this.

afontenot · 2024-08-03T20:55:24Z

@thisisparker Question about using puzzle.notes in a downloader: the saved file lacks the newline characters of the original string. Is this something that xword_dl is stripping out (e.g. perhaps treating the notes field as HTML?), or do I need to chase down an issue in the puzpy library?

thisisparker · 2024-08-03T21:45:00Z

It's likely my cleanup function being a little overzealous. These are \n characters getting stripped? I will take a look and confirm.

afontenot · 2024-08-03T21:52:35Z

It's likely my cleanup function being a little overzealous. These are \n characters getting stripped? I will take a look and confirm.

Yep, my AVCX code slaps several bits of metadata into the notes with "\n\n".join(self.descriptions), but there are no \n characters at all in the resulting file.

afontenot · 2024-08-04T01:25:14Z

I had a look myself, this is an issue with using html2text on the notes. Space is not significant in HTML so this is correct behavior from the html2text library, but we should probably only be calling it on fields that contain HTML.

Also, what's the intended purpose of using this library? It converts HTML to a Markdown equivalent, but does the AcrossLite PUZ specification support Markdown text representation? Are there specific programs that display it correctly? I tried putting the HTML markup directly in puzzle.notes but the resulting document contained a bunch of Markdown links which made the notes hard to read in Gnome Crosswords.

thisisparker · 2024-08-04T17:23:21Z

The intention behind html2text is to convert from something that looks "marked up" to something that doesn't, because some clients don't render html and e.g. foreign phrase probably looks worse than the same thing in _s. (In other words, I'm actually just looking for a "plaintext" representation of formatted text, and for formatting elements markdown is pretty good, but it's not great for links as you note.) This is kind of orthogonal to the puz spec itself, which is afaict silent on markup questions, though it's possible the "observed spec" has moved a bit in the direction of HTML if AcrossLite now supports it; I actually don't know whether that's the case.

That's all probably a matter of opinion! Which is why I added the --preserve-html flag, which should skip the invocation of html2text entirely. Again sorry, writing this quickly, but does that happen to do the right thing for you?

afontenot · 2024-08-04T17:27:42Z

Again sorry, writing this quickly, but does that happen to do the right thing for you?

Yes, that fixes the issue with removing new lines.

thisisparker · 2024-08-04T17:34:41Z

Yes, that fixes the issue with removing new lines.

Alright! Then one option is to pass it at runtime each time, or another would be to put a preserve_html line in your settings file (under the general section or a specific outlet). I'm not inclined to change this behavior in the short term because I personally use a client that doesn't render the HTML and I prefer the look of unformatted markdown, but I am aware that's probably increasingly idiosyncratic

afontenot · 2024-08-04T17:42:49Z

Hmm, well me not liking the look of it is one thing, but it removing any new lines in the notes string is another. That seems like it should be avoided. Should downloaders that have plain text notes replace \n with   to get the correct output?

Seems like this ought to affect the Puzzle Society downloader too, although that one is currently disabled.

thisisparker · 2024-08-04T19:30:10Z

I think that using   in these notes instead of \n is the right solution. By default they'll be converted, and if you're saving for a context that will render HTML, you'll be using the preserve flag and you'll still get the newlines.

Semantically it's probably even a touch better to just wrap paragraphs in  tags, which should have the same effect after html2text, but which might not be quite as concise as just using ' '.join(). (I can contrive a scenario where  rendering is cleaner than  s, but that's fully speculative.)

thisisparker · 2024-08-04T19:34:27Z

xword_dl/downloader/avcxdownloader.py

+            self.descriptions.append(f"Edited by {parts[2]}.")
+
+        if self.descriptions:
+            puzzle.notes = "\n\n".join(self.descriptions)


Per my last comment, I think you could do this like

puzzle.notes = "\n\n".join([f"{d}" for d in self.descriptions])

Okay, I've mostly done that. There's a little more tinkering to get a nice plain text rendering; I'm replacing any links with just the text when preserve-html is off, hopefully you think that's a reasonable compromise if no one is actually rendering the Markdown at present.

afontenot added 2 commits July 31, 2024 00:07

Improve error message in JPZ case.

6b1bc7b

thisisparker reviewed Jul 31, 2024

View reviewed changes

move puzzle_type data to subclasses

e9d4b85

afontenot added 2 commits July 31, 2024 15:59

use more robust date parsing method

9291f3e

make by-date mode more useful by printing closest match on error

51856c7

fix issue with printing an error for avcx when no URL given

303681b

thisisparker reviewed Aug 4, 2024

View reviewed changes

afontenot added 2 commits August 4, 2024 16:20

improve rendering of notes in plain text mode

5ecee4a

update help text for authentication to include AVCX

6a5a87c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add downloader for AVCX #202

Add downloader for AVCX #202

afontenot commented Jul 31, 2024

afontenot commented Jul 31, 2024

thisisparker Jul 31, 2024

afontenot Jul 31, 2024

thisisparker commented Jul 31, 2024

afontenot commented Aug 3, 2024

thisisparker commented Aug 3, 2024

afontenot commented Aug 3, 2024 •

edited

Loading

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

thisisparker Aug 4, 2024

afontenot Aug 4, 2024

Add downloader for AVCX #202

Are you sure you want to change the base?

Add downloader for AVCX #202

Conversation

afontenot commented Jul 31, 2024

afontenot commented Jul 31, 2024

thisisparker Jul 31, 2024

Choose a reason for hiding this comment

afontenot Jul 31, 2024

Choose a reason for hiding this comment

thisisparker commented Jul 31, 2024

afontenot commented Aug 3, 2024

thisisparker commented Aug 3, 2024

afontenot commented Aug 3, 2024 • edited Loading

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

afontenot commented Aug 4, 2024

thisisparker commented Aug 4, 2024

thisisparker Aug 4, 2024

Choose a reason for hiding this comment

afontenot Aug 4, 2024

Choose a reason for hiding this comment

afontenot commented Aug 3, 2024 •

edited

Loading