Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up a number of small issues and improve error checking #205

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

afontenot
Copy link
Contributor

I'm piling a large number of commits here that basically just improve code quality, with no (intended) changes in behavior (other than fixing several probable bugs caused by typos).

Summary:

  • When methods of urllib.parse are used, it's incorrect to only import urllib and in fact this only worked because importing the request module fixed it
  • Fixed several easily overlooked issues, e.g. typo == for =, missing reference to self on a method, accidentally using an undefined setting key with no default value (which happened to work because None is falsey, and False was the default behavior). Removed some code that looked broken and dead as well.
  • Clean up cases where .get() was used as a non-failing way to get the value for some key on an object, but then the result immediately fails with a type error if .get() returns None. These errors are harder to understand and debug than the clear KeyError: 'key' or equivalent.
  • Clean up a ton of cases where a subclass implementation had a different API for a method than its parent, causing breakage if something called a method on an instance of the class while assuming the parent API.
  • Try to improve the error handling by dealing with all the cases where type annotations in dependencies indicated that we do not deal with all error conditions. BeautifulSoup is very bad about silently running None and depending on the situation an error like this may not be caught for many lines, making debugging more difficult.

The last of these changes is the most involved, and it's possible I went a little too far to make the type checker happy, e.g. dealing with cases that are impossible (I think!) like soup.find("TagName") returning a NavigableString. There are also a few cases where existing error checking (with AttributeError) which did look sufficiently robust was replaced with a solution that the type checker could analyze instead. If need be some of these changes could be reverted.

Many files import urllib, but this is a package that contains the module
`urllib.parse`. This works, when it does (I haven't checked every case)
because `requests` is also imported, and it imports the module
correctly, which makes it available to xword-dl under the urllib
namespace.
Because of the way the settings dict is populated, if the preserve_html
option is not provided, the key will not exist.

`settings.get()` usually works because `None` is false-y, but has the
potential to fail unexpectedly
@afontenot
Copy link
Contributor Author

I realize reviewing this is a big ask. No hurry of course. Also, (a) I'm happy to help with anything I can in getting this in a mergeable state, and (b) I'm happy to see selected commits from this branch cherry-picked into main, and I can rebase my changes and force push here once that happens.

Clean up some uses of get where the code either immediately assumes that
the result is not None or else provides a replacement value, e.g.
through a ternary expression.

Keeping this clean makes error messages more debuggable when unexpected
errors aren't caught.
Many sub-classes modify their parents' methods and alter the allowed
arguments. This is a problem because if a function call is written on
the API of the parent class, it will result in an error if it uses
keyword arguments unavailable in the child class, and may fail or yield
unexpected results if it uses positional arguments in a different order.

Even when this can be guaranteed not to be the case through analysis of
the code, it's still a bad practice that can break newly added code
unexpectedly. I've fixed all of these issues that I could find.
This is mainly just catching and dealing with type mismatch errors
reported by Pyright. This catches a lot of cases where HTML parsing can
fall through, but not all of them, e.g. IndexError and KeyError are not
type errors. (But failing to prove an optional type has a value is.)

How you deal with these issues is ultimately a matter of some
preference. In two or three places, I removed a perfectly adequate check
that looked for an AttributeError, since exception catching doesn't
satisfy the type checker. But this commit could be revised to leave
these checks in or even do more of them.

A follow up to this should probably add checks on the result of every
`request.get` and `request.post`, as I only did that here where it was
convenient.
These lines are an implementation of `guess_date_from_id`, which is
supposed to turn a puzzle ID into a `datetime`, but they can't possibly
have worked because you can't call `datetime.strftime` that way, and in
any case the IDs now appear to be random hashes without date
information.
@thisisparker
Copy link
Owner

This is amazing and greatly appreciated. I do want to review it with some care but I anticipate I will probably merge all these commits, maybe with some very minor changes.

(It probably goes without saying, but this project started with pretty humble ambitions and my skills have grown along with it! With the exception of a pretty major refactor a few years back, it hasn't gotten this kind of holistic review, and I am very grateful for it coming from a pair of "outside" eyes.)

@afontenot
Copy link
Contributor Author

Great! Was also meaning to ask whether there is a particular code style you prefer to use for this repository? There seems to be a mix of styles from what I've seen, but I'd like to clean up at least the code I've touched for consistency. If you don't have a preference my go-to is ruff format with ruff.

This commit changes how importing the downloaders happens slightly.
The point is to avoid making any assumptions in xword_dl.py about
what plugins are available and what methods they have.

This has a number of advantages, for example, that it's trivial for
someone to add a downloader plugin that requires authentication
without adding another special case to the main control flow. This
also happens to make code checkers happy as a side effect, because
they can't determine whether the plugin classes are in the downloader
namespace or not without executing the downloader __init__ code.

The work in this commit is prefatory to an additional refactor which
would create a generic Downloader class that automatically finds
plugins and abstracts the control flow away from the application logic
in xword_dl.py.
Gets rid of a lot of the complicated introspection required to pick
up the available downloader classes, and also moves all of this
logic out of xword_dl.py.

This change could potentially be amended to add the plugins to the
downloader namespace as before, but after this change along with the
last several, this has no real benefit and leaves the namespace
cluttered.
With this commit the entire repository is Pyright clean!!
@afontenot
Copy link
Contributor Author

I've added a few more commits. Other than the usual cleaning up of nits I've noticed, there's the following two significant changes:

  • I refactored the behavior of features like authentication and embedded URLs so that xword_dl.py is entirely neutral with respect to downloader plugins. Any plugin that offers the requisite features will be allowed to use the authentication option, for example.
  • Building on this, I refactored the way importing plugins worked pretty significantly, in a way that gets rid of a lot of the introspection required previously. Now the downloader package only makes public the function get_plugins(), and (for now) xword_dl.py uses this to populate a list of the available plugin classes. With this change the downloader package's __init__.py is much cleaner (IMHO), and the none of the code needs the inspect module any more. (As a result of this and some of the small changes, Pyright issues zero complaints for xword-dl now.)

I think it would be quite nice to try to refactor some of the download logic out to a separate file, if you're okay with that idea. My idea for how to structure this is to have only main() in xword_dl.py (along with helper functions), and have main() instantiate a Downloader class. With this class, we can make the importing of plugins more explicit (instead of a file global), and centralize functions like filtering downloader plugins that are currently spread throughout the file.

Other than a few small issues (e.g. the single-file utils package with a compressed namespace), the project is linter clean as well! The only other significant change I wanted was a little more error checking around requests.

@afontenot
Copy link
Contributor Author

Note to self: need to fixup all the 3.10+ style typing annotations type | type to use Union instead before this gets merged.

Simplifies __get_subclasses(), and adds type annotations to important
methods in __init__.py, xword_dl.py, and utils.py. The result of this
work was to reveal a number of issues with API inconsistency, which
this commit fixes as well.
More appropriate as this function isn't specific to the downloader
types.
@afontenot
Copy link
Contributor Author

Just FYI, I'm pushing pause on this until you have time to review the changes (no pressure). There's already way more than enough here. I think the headline feature is that the codebase now passes pylint and pyright other than a couple of nits I was unable to fix without heavy refactoring.

I still plan to make the type annotations Python 3.9 compatible before this gets merged, but that fixup can happen after review.

The one significant future change I think would be helpful to add to this would be factoring the puzzle handling code out of xword_dl.py and into its own file. I'd like to make it possible to plug in alternative file formats, like ipuz (libipuz), by implementing the same puzzle API as puzpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants