-
Notifications
You must be signed in to change notification settings - Fork 23
INTEGRITY: Continuing set.dat's processing #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: integrity
Are you sure you want to change the base?
INTEGRITY: Continuing set.dat's processing #32
Conversation
93d01e2
to
1fee585
Compare
…echecksum. Add new equal checksums for set.dat's fileset match
…rtial in the same run.
…modification time
…which is never closed.
094d152
to
898ffd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I have less and less feedback, good job.
Parameterize all your sql queries, and limit the line length :-)
@@ -73,9 +77,87 @@ def get_dirs_at_depth(directory, depth): | |||
if depth == num_sep_this - num_sep: | |||
yield root | |||
|
|||
def read_be_32(byte_stream): | |||
|
|||
def my_escape_string(s: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with this function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name was given by the previous developer. The reason was most probably to avoid conflict with pymysql's escape string function - from pymysql.converters import escape_string
. Though in this particular file, that is not being used. So there should not be any issue if I make it escape_string to match dumper companion's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, a solution for would be to use as
. And example: from pymysql.converters import escape_string as pymysql_escape_string
return False | ||
|
||
|
||
def split_path_recursive(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is path a string?
If so use: [i for i in path.split(os.sep) if i ]
|
||
def appledouble_get_resfork_data(file_byte_stream): | ||
""" Returns the resource fork's data section as bytes of an appledouble file as well as its size """ | ||
""" Returns the resource fork's data section as bytes, size of resource fork (size-r) and size of data section of resource fork (size-rd) of an appledouble file""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run the formatter? This line seem long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some components in this file like CRC16_XMODEM_TABLE
that were getting messed up after running the formatter. So after talking with sev, we decided to not run the formatter on compute_hash.py and clear.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check how this is handled in dumper-companion: https://github.com/scummvm/scummvm/blob/master/devtools/dumper-companion.py#L39 and https://github.com/scummvm/scummvm/blob/master/devtools/dumper-companion.py#L87
data = f"name \"{filename}\" size {filesize}" | ||
for filename, (hashes, size, size_r, size_rd, timestamp) in hash_of_dir.items(): | ||
filename = encode_path_components(filename) | ||
data = f"name \"{filename}\" size {size} size-r {size_r} size-rd {size_rd} timestamp {timestamp}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use triple quotes around the string, then it doesn't need escaping inside.
@@ -144,7 +163,7 @@ def insert_fileset( | |||
return (existing_entry, True) | |||
|
|||
# $game and $key should not be parsed as a mysql string, hence no quotes | |||
query = f"INSERT INTO fileset (game, status, src, `key`, megakey, `timestamp`) VALUES ({game}, '{status}', '{src}', {key}, {megakey}, FROM_UNIXTIME(@fileset_time_last))" | |||
query = f"INSERT INTO fileset (game, status, src, `key`, megakey, `timestamp`, set_dat_metadata) VALUES ({game}, '{status}', '{src}', {key}, {megakey}, FROM_UNIXTIME(@fileset_time_last), '{escape_string(set_dat_metadata)}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, use parameterized queries and don't let the lines get too long.
Where's the formatter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do use parameterized queries whenever I write a new one. I'll definitely fix the older ones one day. Though I did run the formatter on this file.
db_functions.py
Outdated
cursor.execute( | ||
"SELECT status FROM fileset WHERE id = %s", (matched_fileset_id,) | ||
) | ||
status = cursor.fetchone()["status"] | ||
if status == "detection": | ||
update_fileset_status(cursor, matched_fileset_id, "partial") | ||
update_fileset_status(cursor, matched_fileset_id, "parital") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be partial? not parital?
db_functions.py
Outdated
AND f.name = %s | ||
AND f.size = %s | ||
""" | ||
cursor.execute(query, (parent_fileset, file["name"], file["size"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're only checking if there is a result, you count also do a COUNT()
in the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more performant to use LIMIT 1 instead of COUNT?
This would make the SQL engine stop as soon as a result is found.
return "".join(random.choices(string.ascii_letters + string.digits, k=length)) | ||
|
||
cursor.execute("ALTER TABLE file ADD COLUMN detection_type VARCHAR(20);") | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is a check for a generic exception and not a specific one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I should catch a more specific exception related to the existing column. Or I can directly check if the column exists instead of relying on error handling.
Thank you, and yes, some of the older queries are still left to be parametrised. |
No description provided.