-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor content fetching #237
Conversation
Separates code for determining target file name from code that fetches the file from URI.
We will not split URI into 2 parts and then pass it as arguments to methods. Instead we will keep the whole URI as a property of ContentBringer instances. This change will simplify the code.
The argument of various methods in this class that holds the destination directory name is always set to self.CONTENT_DOWNLOAD_LOCATION, which means that these methods don't need to have this argument but they can use self.CONTENT_DOWNLOAD_LOCATION directly.
The static method ContentBringer.__get_dest_file_name is called only in a single place with arguments self.content_uri and self.CONTENT_DOWNLOAD_LOCATION which are both members of the object. So it can become also a property of the object.
The destination file name depends on URI and changes only if the URI is set. It doesn't make sense to compute the destination file name each time it's requested. Instead, we can compute it only once, when the URI is changed.
The parameter report_callback isn't used anywhere.
Split _finish_actual_fetch into 2 methods: 1. _finish_actual_fetch 2. _analyze_fetched_content The reason is to separate fetching from content analysis.
We know the path where the file is downloaded, we have the path stored in self.dest_file_name, that means we don't need to pass the path as an argument
The self.fetched_content isn't used anywhere.
Pass the target URL as a parameter to fetch_content instead of digging it from the PolicyData Class. This change will simplify us to remove PolicyData class from ContentBringer in future.
This function only manipulates PolicyData members, so it should be a method of the PolicyData class.
The class memebers rule_data and dry_run aren't set and aren't used anywhere.
Move method use_system_content from ContentBringer class to PolicyData class. The reason is that this method only manipulates the PolicyData instance referenced from the ContentBringer so it should be a responsibility of the PolicyData instance.
The outcome of this move is that self._addon_data won't be directly read in the get_preferred_tailoring() method, and that will allow further refactoring in future.
The preferred tailoring file is a property of the ObtainedContent class, so the get_preffered_tailoring method should belong to the ObtainedContent
The outcome of this move is that self._addon_data won't be directly read in the get_preferred_content() method, and that will allow further refactoring in future.
Move method get_preffered_content to the ObtainedContent class because the preffered content path is a property of the ObtainedContent class.
Move the use_downloaded_content method from the ContentBringer to the PolicyData class. This method mostly manipulates the PolicyData class members, so the PolicyData is a more suitable place for the method.
This change will cause that allow_one_expected_tailoring_or_no_tailoring will not directl acess self._addon_data, therefore will not depend on self._addon_data and can be usable without self._addon_data.
Prevent shadowing, the name "expected_path" is already used above with different value.
Don't access self._addon_data directly in filter_discovered_content but instead pass them as parameter. This will reduce the dependency on the PolicyData class (self._addon_data).
Stop accessing self._addon_data in the _analyze_fetched_content to reduce dependency on PolicyData
Don't access self._addon_data directly in finish_content_fetch but instead pass them as parameter. This will reduce the dependency on the PolicyData class (self._addon_data).
After recent changes, the PolicyData instance isn't used in the ContentBringer at all.
This method doesn't access self and isn't oused outside of the class.
This method doesn't use self so it can be a static method
Change allow_one_expected_tailoring_or_no_tailoring method to private static method because it doesn't access self and it isn't used outside of this class.
Change filter_discovered_content to private static method. This method doesn't read self and is used only inside this class.
Change the _analyze_fetched_content method to private static method because this method doesn't use self (apart form accessing the constant) and is called only inside this class.
Move all static methods from ContentBringer out to a new class ContentAnalyzer. This faciliates the responsiibilities split. ContentBringer contains code related to fetching content from URI resources. ContentAnalyzer contains code analyzing the fetched content.
This commit will move content analysis away from the ContentBringer.finish_content_fetch() method. Therefore, the ContentBringer class will be responsible only for download of the content and not for the analysis of the downloaded content. Instead, there will be a new method static public method content_discovery.ContentAnalyzer.analyze() and we will add calling this method everywhere where ContentBringer.finish_content_fetch() is called.
These parameters of finish_content_fetch aren't used in it.
From now on, we will call finish_content_fetch only if fetch_content has been called. These 2 calls will therefore always form a kind of "pair" of calls.
The function update_progress_label isn't used anywhere.
Move the decision whether to verify fingerprint outside from the `finish_content_fetch` method. The fingerprint will be verified if the `fingerprint` parameter of the method is set (is not None). This method is called from 2 places. When calling from _init_data_fetch, the dest_filename is always not None. And, when calling from `PrepareValidContent.run()` we can set fingerprint to a non-None value if the content_dest is set. The reason for this change is that the `finish_content_fetch` method will not depend on the destination content file name so it will reduce amount of needed parameters.
We will pass the failure callback parameter in the constructor instead of the individual methods, because it's always the same function, it doesn't differ in different methods and depends only on the caller class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, it does many changes totally right, while some I don't like, so let's talk about them.
Great:
- Streamlining of the code.
- Factoring the analyzer out of the bringer.
Controversial:
- PolicyData getting new abilities. I agree with the ability to "clear" itself, but the rest should be responsibility of a different class. PolicyData should remain a data structure, an interface that other classes understand and use. This data structure shouldn't know that a thing s.a. "system content" or "downloadable content" exist.
|
||
@staticmethod | ||
def __get_dest_file_name(url, destdir): | ||
path = url.split("://")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash in some cases of invalid URLs s.a. http://
. We had a BZ about this recently, and the code that extracts path in a robust way is already in the project, that could be extracted as well.
The code also assumes that destdir is a pathlib path, so adding an appropriate type hint may be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but in later commits in this PR, the __get_dest_file_name
is further removed and the code using the URI is consolidated at a single place: the content_uri
setter method, in which the sanity of the URI is checked and an exception is raised.
@@ -84,7 +83,7 @@ def get_content_type(self, url): | |||
|
|||
@property | |||
def content_uri(self): | |||
return self.content_uri_scheme + "://" + self.content_uri_path | |||
return self._content_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming to _valid_content_uri
, so the code that uses it knows that it is safe to make certain assumptions.
path = scheme_and_maybe_path[1] | ||
if "/" not in path: | ||
msg = f"Missing the path component of the '{uri}' URL" | ||
raise KickstartValueError(msg) | ||
basename = path.rsplit("/", 1)[1] | ||
if not basename: | ||
msg = f"Unable to deduce basename from the '{uri}' URL" | ||
raise KickstartValueError(msg) | ||
self._content_uri = uri | ||
self.dest_file_name = self.CONTENT_DOWNLOAD_LOCATION / basename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to keep the code in that private method, and call it only once in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the basename is derived from URI of the content and the URI will change each time a new content will be requested to fetch.
@@ -159,7 +159,7 @@ def _start_actual_fetch(self, ca_certs_path): | |||
) | |||
return fetching_thread_name | |||
|
|||
def finish_content_fetch(self, fetching_thread_name, fingerprint, report_callback, dest_filename, | |||
def finish_content_fetch(self, fetching_thread_name, fingerprint, dest_filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky, the report callback is supposed to be a function that I can use to get information out of the depths of data processing to the user in a good form. It may not be used because of simple omission, not because it doesn't make sense to use it. IOW, the fix for this unused argument could be that it is put to a use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an idea what information it should return?
org_fedora_oscap/structures.py
Outdated
# internal values | ||
self.rule_data = rule_handling.RuleData() | ||
self.dry_run = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, almost suspicious. Note to self - needs further investigation.
@@ -1215,6 +1215,6 @@ def on_change_content_clicked(self, *args): | |||
self._refresh_ui() | |||
|
|||
def on_use_ssg_clicked(self, *args): | |||
self.content_bringer.use_system_content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree here - the content spec data class shouldn't have the knowledge what "system content" even is, and even how to get it. The "bringer" class knows this, and, of course, hammers the information into the content spec aspect of the "policy data" dataclass. It is OK for the bringer to know the data format of the content spec, and the content spec just has to store the data, it doesn't have to fully understand them to the finest extent.
org_fedora_oscap/structures.py
Outdated
@@ -209,3 +209,22 @@ def use_system_content(self): | |||
self.clear_all() | |||
self.content_type = "scap-security-guide" | |||
self.content_path = common.get_ssg_path() | |||
|
|||
def use_downloaded_content(self, content): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, the logic of obtaining content is independent from the description of the intent. The PolicyData should store data that have objective meaning, and a single line of its definition shouldn't need to be touched if we decide to use different algorithm for content download, or other implementation detail.
@@ -436,14 +436,14 @@ def update_progress_label(msg): | |||
|
|||
if actually_fetched_content: | |||
content_path = common.get_raw_preinst_content_path(self._policy_data) | |||
self.content_bringer.finish_content_fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for self: Probably the right thing to do now, but we need to change how it works.
Code should read like a prose, not like a cipher game, as it reads right now.
self.content_bringer.finish_content_fetch( | ||
fetching_thread_name, self._policy_data.fingerprint, | ||
content_dest, _handle_error) | ||
if fetching_thread_name is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Probably the right thing to do now, but we need to change how it works.
Code should read like a prose, not like a cipher game, as it reads right now. It is very well visible here that a variable storing a name of a thread is seriously abused.
Even if this breaks othe object-oriented principles, the PolicyData should remain a data structure, an interface that other classes understand and use. This data structure shouldn't know that a thing s.a. "system content" or "downloadable content" exist.
A new inspection was created. |
I have moved the methods out from the |
@matejak ^ |
LGTM, thanks! |
Description:
This PR contains a series of small refactoring of the code responsible for content fetching, with focus on the
org_fedora_oscap.content_discovery.ContentBringer
class.The main change is that the code performing analysis of the content has been moved away from
ContentBringer
to a new classContentAnalyzer
.Moreover, code manipulating the members of the
PolicyData
class has been moved away fromContentBringer
to thePolicyData
class.The outcome is that now the
ContentBringer
contains only code performing the download of the content andContentAnalyzer
contains only code performing the analysis of the already fetched content. These 2 classes are independent.For more details, please read commit messages of all commits.
Rationale:
This change prepares the code for further refactoring with the goal to consolidate content fetching functionality. The separation of content fetching and content analysis functionality will allow us to focus on each of these 2 stages separately, with lower mutual collisions, in future PRs.
Review Hints:
cd testing_files/
python3 -m http.server
./create_update_image.sh install_addon_from_repo
./reanaconda.py updates ~/work/git/oscap-anaconda-addon/update.img