Skip to content

Update database.py so that the sample database will use bigint for fi… #2603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion lib/cuckoo/core/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
ForeignKey,
Index,
Integer,
BigInteger,
String,
Table,
Text,
Expand Down Expand Up @@ -307,7 +308,7 @@ class Sample(Base):
__tablename__ = "samples"

id = Column(Integer(), primary_key=True)
file_size = Column(Integer(), nullable=False)
file_size = Column(BigInteger(), nullable=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The change from Integer to BigInteger for file_size is correct for supporting larger files. However, this is a database schema modification, and the project uses Alembic for managing schema migrations (as evidenced by the AlembicVersion table and existing migration scripts).

Could you clarify why an Alembic migration script was not included for this change? Relying on a manual ALTER TABLE command (as mentioned in the PR description) has a few potential drawbacks:

  1. Risk of Missed Steps: Users might upgrade the Cuckoo codebase but forget to apply the manual DDL, leading to runtime errors when encountering large files that don't fit the old INTEGER column type.
  2. Inconsistency: Alembic provides a standardized, version-controlled way to apply schema changes. Manual DDLs bypass this.
  3. SCHEMA_VERSION Management: The SCHEMA_VERSION constant in this file is typically updated with Alembic revisions. Without an Alembic migration and a corresponding SCHEMA_VERSION bump, the Cuckoo startup check (if last.version_num != SCHEMA_VERSION) won't be able to verify if this specific schema change has been applied.
  4. Database Portability: The ALTER TABLE command in the PR description is specific to PostgreSQL. Alembic handles generating the correct DDL for different database backends.

It's generally recommended to use Alembic to generate a new migration script for schema changes like this. This script would contain operations like op.alter_column('samples', 'file_size', type_=sa.BigInteger(), existing_type=sa.Integer()).

If there's a specific reason Alembic can't be used here (e.g., performance concerns with ALTER COLUMN TYPE on very large tables), could that be elaborated upon? In such a scenario, bumping SCHEMA_VERSION and instructing users to manually update their alembic_version table after the DDL would be a safer alternative to ensure consistency.

Style Guide References

file_type = Column(Text(), nullable=False)
md5 = Column(String(32), nullable=False)
crc32 = Column(String(8), nullable=False)
Expand Down