-
Notifications
You must be signed in to change notification settings - Fork 470
Update database.py so that the sample database will use bigint for fi… #2603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update database.py so that the sample database will use bigint for fi… #2603
Conversation
…le_size. For users that have an existing database when upgrading CAPEv2, you need to alter the database schema before the upgrade: ALTER TABLE samples ALTER COLUMN file_size TYPE BIGINT; And restart the related services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @markykchua, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request to help everyone get up to speed quickly.
This pull request aims to address a potential limitation in the database schema by updating the data type used for storing file sizes. Currently, the file_size
column in the samples
table uses an Integer
type, which might not be sufficient for very large files. This change updates the column type to BigInteger
to accommodate larger values, preventing potential overflow issues.
For users who are upgrading CAPEv2 and have an existing database, a manual schema alteration step is required before applying the upgrade. The necessary SQL command (ALTER TABLE samples ALTER COLUMN file_size TYPE BIGINT;
) is provided in the pull request description.
Highlights
- Database Schema Update: The primary change is updating the
file_size
column in thesamples
database table fromInteger
toBigInteger
to support larger file sizes. - Manual Migration Required: Users with existing CAPEv2 databases will need to manually run an
ALTER TABLE
SQL command to update their schema before applying this change.
Changelog
- lib/cuckoo/core/database.py
- Added
BigInteger
to the list of imported SQLAlchemy types. - Changed the data type of the
file_size
column in theSample
model fromInteger()
toBigInteger()
.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Size of file grows,
Integer just won't quite fit,
BigInt saves the day.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request correctly updates the file_size
column in the Sample
model to use BigInteger
, which is essential for supporting larger files. The core code change is good.
My main concern, detailed in the review comment, revolves around the database schema migration strategy. The project appears to use Alembic for migrations, but this PR relies on a manual ALTER TABLE
command provided in the PR description. This approach can be risky and lead to inconsistencies if not handled carefully by all users. An Alembic migration script would be a more robust and standardized way to manage this schema change.
The PR description helpfully includes the ALTER TABLE
command for PostgreSQL. If a manual DDL remains the chosen path (despite the preference for Alembic), it would be beneficial to either provide equivalent commands for other supported databases (like MySQL, SQLite) or advise users to consult their database's documentation for the correct syntax.
Summary of Findings
- Database Schema Migration Strategy: The PR modifies the database schema by changing the
file_size
column type. However, it relies on a manualALTER TABLE
command instead of an Alembic migration script, which is the established tool in this project. This can lead to inconsistencies and runtime errors if users miss the manual step. An Alembic migration is strongly recommended. - SQLAlchemy Import Order: In
lib/cuckoo/core/database.py
, theBigInteger
import could be placed in better alphabetical order within its group (e.g., beforeBoolean
). This is a minor stylistic point and was not commented on directly due to review settings. (Severity: low)
Merge Readiness
This pull request addresses an important issue by allowing larger file sizes. However, due to the high
severity concern regarding the database migration strategy (the lack of an Alembic migration script), I recommend that these changes not be merged until this is addressed. Using Alembic for schema changes will ensure better consistency, safety, and maintainability for users upgrading their Cuckoo instances. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from maintainers after addressing the feedback.
@@ -307,7 +308,7 @@ class Sample(Base): | |||
__tablename__ = "samples" | |||
|
|||
id = Column(Integer(), primary_key=True) | |||
file_size = Column(Integer(), nullable=False) | |||
file_size = Column(BigInteger(), nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from Integer
to BigInteger
for file_size
is correct for supporting larger files. However, this is a database schema modification, and the project uses Alembic for managing schema migrations (as evidenced by the AlembicVersion
table and existing migration scripts).
Could you clarify why an Alembic migration script was not included for this change? Relying on a manual ALTER TABLE
command (as mentioned in the PR description) has a few potential drawbacks:
- Risk of Missed Steps: Users might upgrade the Cuckoo codebase but forget to apply the manual DDL, leading to runtime errors when encountering large files that don't fit the old
INTEGER
column type. - Inconsistency: Alembic provides a standardized, version-controlled way to apply schema changes. Manual DDLs bypass this.
SCHEMA_VERSION
Management: TheSCHEMA_VERSION
constant in this file is typically updated with Alembic revisions. Without an Alembic migration and a correspondingSCHEMA_VERSION
bump, the Cuckoo startup check (if last.version_num != SCHEMA_VERSION
) won't be able to verify if this specific schema change has been applied.- Database Portability: The
ALTER TABLE
command in the PR description is specific to PostgreSQL. Alembic handles generating the correct DDL for different database backends.
It's generally recommended to use Alembic to generate a new migration script for schema changes like this. This script would contain operations like op.alter_column('samples', 'file_size', type_=sa.BigInteger(), existing_type=sa.Integer())
.
If there's a specific reason Alembic can't be used here (e.g., performance concerns with ALTER COLUMN TYPE
on very large tables), could that be elaborated upon? In such a scenario, bumping SCHEMA_VERSION
and instructing users to manually update their alembic_version
table after the DDL would be a safer alternative to ensure consistency.
Style Guide References
hello thank you, the fix is partial, we need alembic migration to properly make users aware of it, we can't just hope that they will check readme. i will implement it once im back in june |
perfect, thanks a lot! |
…le_size.
For users that have an existing database when upgrading CAPEv2, you need to alter the database schema before the upgrade:
ALTER TABLE samples
ALTER COLUMN file_size TYPE BIGINT;
And restart the related services.