Skip to content

Conversation

@EBirkenfeld
Copy link
Collaborator

@EBirkenfeld EBirkenfeld commented Oct 24, 2025

Create a FastAPI file service and add Django storage app with attachment endpoints, permission checks, auth middleware, and object-level permissions, exposing ports 8001 and 8002

Add a new FastAPI service with upload/download APIs, S3 storage integration, auth middleware, centralized exceptions, DI, and Alembic migrations; extend Django with a src.storage app, models, serializers, views, object permissions, migrations, and endpoints for listing and checking attachment access; update docker-compose to run the file service stack and proxy via Nginx; add extensive unit, integration, and E2E tests; note two defects: missing VALIDATION_ERROR_CODES['MISSING_REQUIRED_FIELD'] and a bad key in StorageError.bucket_create_failed.

📍Where to Start

Start with the FastAPI app setup in src.main in storage/src/main.py, then review the API endpoints in src.presentation.api.files in storage/src/presentation/api/files.py, and the Django attachment viewset in backend/src/storage/views.py.


📊 Macroscope summarized a040982. 42 files reviewed, 61 issues evaluated, 58 issues filtered, 1 comment posted

🗂️ Filtered Issues

backend/src/processes/models/workflows/attachment.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 99: The unique_together constraint on ('user', 'attachment') will cause IntegrityError when trying to create a new permission for the same user/attachment pair after soft-deleting the original. Since SoftDeleteModel sets is_deleted=True rather than removing the row, the database constraint still sees the old record and blocks the insert. [ Already posted ]
backend/src/processes/services/attachments.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 165: The check_user_permission method uses prefetch_related('permissions') and accesses attachment.permissions.filter(user_id=user_id), but the FileAttachment model has no permissions relation defined. This will raise an AttributeError at runtime when attempting to access .permissions on a FileAttachment instance with access_type == FileAttachmentAccessType.RESTRICTED. [ Low confidence ]
backend/src/processes/views/attachments.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 41: The AttachmentService() instantiation without any parameters in check_permission action uses src.processes.services.attachments.AttachmentService which has a check_user_permission method that queries FileAttachment model with a file_id field. However, the serializer FileAttachmentCheckPermissionSerializer has max_length=255 while the Attachment model in storage/models.py has file_id with max_length=64. If the same file_id values are expected to be used across both systems, there's a potential mismatch that could cause truncation or lookup failures. [ Low confidence ]
backend/src/storage/middleware.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 54: On line 54, auth_header.split(' ')[1] will return an empty string if the header is exactly 'Bearer ' or 'Token ' (with trailing space but no token value), causing an empty auth token to be set in the cookie. Consider validating the token is non-empty before using it. [ Already posted ]
  • line 69: Accessing settings.FRONTEND_URL on line 69 will raise AttributeError if FRONTEND_URL is not defined in Django settings. The check on line 71 (if not frontend_url:) only handles None or empty string values, but not a missing attribute. Should use getattr(settings, 'FRONTEND_URL', None) instead. [ Already posted ]
  • line 86: The domain extraction logic on line 86 (f'.{parts[-2]}.{parts[-1]}') incorrectly handles multi-part TLDs like example.co.uk or app.example.com.br, returning .co.uk or .com.br (public suffixes) instead of the intended domain. This would cause the cookie to be set on a public suffix domain, which browsers will reject, resulting in the cookie not being set. [ Already posted ]
backend/src/storage/migrations/0002_migrate_file_attachments.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 21: The migration iterates over all FileAttachment records using a queryset in a for loop (line 21) without using .iterator(). For databases with large numbers of records, this loads all objects into memory at once, potentially causing MemoryError or excessive memory consumption that could crash the migration process. [ Low confidence ]
  • line 26: Inconsistency between determine_source_type and get_task_from_attachment for the output case: when attachment.output exists but attachment.output.task is None or the task attribute doesn't exist, determine_source_type returns 'Task' (line 60-61 in referenced code) but get_task_from_attachment returns None. This creates Attachment records with source_type='Task' but task=None, which may violate data integrity constraints or cause issues in code that assumes task-sourced attachments have a valid task reference. [ Already posted ]
  • line 45: Using ignore_conflicts=True on line 47 silently discards any records that conflict on unique constraints. If the migration is run multiple times or if there are unexpected duplicates, data loss could occur without any warning or logging, making debugging migration issues difficult. [ Low confidence ]
backend/src/storage/models.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 37: The Attachment model has no validation ensuring that the source_type field matches the non-null foreign key. For example, setting source_type=SourceType.TASK while task=None and workflow is set would create an inconsistent record that could cause unexpected behavior when querying or processing attachments based on source_type. [ Low confidence ]
backend/src/storage/services/attachments.py — 1 comment posted, 8 evaluated, 7 filtered
  • line 59: Task.objects.get(id=task.id) uses BaseSoftDeleteManager which filters is_deleted=False. If the task is soft-deleted between the attachment creation and this permission assignment call (race condition), this will raise Task.DoesNotExist exception that is not handled, causing the permission assignment to fail entirely. [ Already posted ]
  • line 64: The code iterates over task.taskperformer_set.all() without filtering out performers with directly_status=DirectlyStatus.DELETED. The Task model has an exclude_directly_deleted_taskperformer_set() method specifically for this purpose. This could grant file access permissions to users or groups that were explicitly removed as performers. [ Already posted ]
  • line 115: Accessing non-existent attribute template.template_owners. The TemplateOwner model defines related_name='owners' on its template ForeignKey, so the correct accessor is template.owners, not template.template_owners. This will raise AttributeError. [ Already posted ]
  • line 135: Workflow.objects.get(id=workflow.id) uses BaseSoftDeleteManager which filters is_deleted=False. If the workflow is soft-deleted between the attachment creation and this permission assignment call (race condition), this will raise Workflow.DoesNotExist exception that is not handled, causing the permission assignment to fail entirely. [ Already posted ]
  • line 141: In _assign_workflow_permissions, workflow.tasks.all() is called at line 141 but taskperformer_set is not prefetched for those tasks. Then line 143 calls task.taskperformer_set.all() for each task, causing N+1 database queries. While not a crash, this could cause significant performance degradation with many tasks. [ Low confidence ]
  • line 222: Using bulk_create with ignore_conflicts=True does not populate primary keys on returned objects in many database backends. When iterating over created_attachments at line 228, the attachment objects may have id=None, causing _create_related() to potentially fail when assigning permissions to an attachment without a valid database ID. [ Already posted ]
  • line 288: Accessing user.account in get_user_attachments will raise AttributeError if the user object doesn't have an account attribute, or could return incorrect results if user.account is None, matching attachments with account=NULL. [ Low confidence ]
backend/src/storage/services/file_sync.py — 0 comments posted, 4 evaluated, 3 filtered
  • line 113: The _generate_file_id method generates a random 32-character string using get_salt(32), but the file_id field on FileAttachment has unique=True. If a collision occurs (unlikely but possible), attachment.save(update_fields=['file_id']) in sync_all_files will raise an IntegrityError that is not caught, causing the sync to fail unexpectedly. [ Already posted ]
  • line 155: The Attachment.objects.get_or_create() call on line 144 returns a tuple (instance, created_bool), but the return value is ignored. The code unconditionally increments stats['created'] on line 155 even when the record already existed and was just retrieved (i.e., when created_bool is False). This causes incorrect statistics reporting. [ Already posted ]
  • line 155: In sync_to_new_attachment_model, the stats['created'] counter is incremented regardless of whether get_or_create actually created a new record or returned an existing one. This results in inaccurate statistics since existing records will be counted as created. [ Already posted ]
backend/src/storage/utils.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 56: Accessing non-existent attribute source.description on Task model. The Task model has description_template and clear_description fields but no description attribute, causing AttributeError at runtime. [ Already posted ]
  • line 60: Accessing non-existent attribute source.description on Workflow model. The Workflow model does not have a description attribute, causing AttributeError at runtime. [ Already posted ]
  • line 64: Accessing non-existent attribute source.description on Template model. The Template model does not have a description attribute, causing AttributeError at runtime. [ Already posted ]
backend/src/storage/views.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 36: In get_queryset, AttachmentService is instantiated with user=self.request.user, but the AttachmentService class in storage/services/attachments.py inherits from BaseModelService and does not define __init__ to accept a user parameter. If BaseModelService does not handle arbitrary keyword arguments, this will raise a TypeError. Even if it silently ignores the kwarg, self.user won't be set on the service instance. [ Low confidence ]
  • line 56: In check_permission, AttachmentService is instantiated with user=request.user, but similar to the issue above, if BaseModelService doesn't properly set this attribute, the check_user_permission method's optimization path (hasattr(self, 'user') and self.user and self.user.id == user_id) will fail to work correctly, falling back to an extra database query unnecessarily, or worse, self.user may not be properly initialized. [ Already posted ]
storage/src/application/use_cases/file_upload.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 68: In UploadFileUseCase.execute(), the file is uploaded to S3 first (lines 68-73), then the database record is created (lines 76-78). If the database commit fails (e.g., constraint violation, connection error), the file remains orphaned in S3 with no cleanup. On retry, a new file_id is generated via uuid4(), leaving the original orphan file permanently in storage with no record pointing to it. [ Already posted ]
  • line 77: The FileRecordRepository.create() method wraps session.add() in try-except for IntegrityError, but session.add() is a synchronous operation that doesn't immediately flush to the database. Constraint violations are only raised during flush() or commit(). Since UploadFileUseCase calls await self._unit_of_work.commit() outside the repository's try-except block, any IntegrityError will bubble up as a raw SQLAlchemy exception rather than being caught and wrapped in DatabaseConstraintError. [ Already posted ]
  • line 81: If self._fastapi_base_url ends with a trailing slash, the generated public_url at line 81 will contain a double slash (e.g., https://example.com//file-id). While many HTTP clients handle this, it produces non-canonical URLs that could cause issues with URL matching or caching. [ Already posted ]
storage/src/infra/http_client.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 37: The httpx.AsyncClient() is created without a timeout configuration, meaning HTTP requests can hang indefinitely. When httpx.TimeoutException is caught, the code reports timeout=30.0 but this is misleading since no timeout is actually configured on the client. The exception would only be raised if the server itself closes the connection. [ Already posted ]
  • line 47: When user.auth_type is UserType.ANONYMOUS, no authorization headers are set and the request proceeds silently. If the backend requires some form of identification for anonymous users, this will result in permission check failures that may be difficult to debug since no explicit handling exists for this auth type. [ Low confidence ]
  • line 49: If user.auth_type is UserType.AUTHENTICATED but user.token is None or an empty string, the condition on line 49 evaluates to False and no Authorization header is sent. This could cause silent authentication failures where an authenticated user's request is treated as unauthenticated. [ Already posted ]
storage/src/infra/repositories/file_record_repository.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 41: Exception handling in create() is ineffective: session.add() only stages the ORM object in the session—it does not perform any database I/O. The IntegrityError, OperationalError, and SQLAlchemyDatabaseError exceptions will only be raised when session.flush() or session.commit() is called (outside this method), so these except blocks will never catch database errors. [ Already posted ]
storage/src/infra/repositories/storage_service.py — 0 comments posted, 8 evaluated, 8 filtered
  • line 51: When _storage_type is 'google', the code correctly sets signature_version='s3' on line 30, but the else branch (lines 51-60) is used for any non-'local' storage type, not just 'google'. If a different storage type value is configured, GCS-specific parameters (GCS_S3_ENDPOINT, etc.) will be used incorrectly, potentially causing connection failures. [ Already posted ]
  • line 119: StorageError is instantiated incorrectly on line 119-121. The constructor expects an error_code_key (a key from DOMAIN_ERROR_CODES dictionary), but a formatted message string is passed instead. This will cause a KeyError when DOMAIN_ERROR_CODES[error_code_key] is executed. Should use StorageError.bucket_create_failed(details=str(e)) instead. [ Already posted ]
  • line 128: content_type can be None (line 92), but it's passed directly to s3.put_object(ContentType=content_type) on line 128. Some S3-compatible backends may reject None for ContentType, causing a ClientError or TypeError. Consider omitting the parameter when content_type is None or defaulting to 'application/octet-stream'. [ Already posted ]
  • line 131: StorageError is instantiated incorrectly on line 131-133. The constructor expects an error_code_key, but MSG_STORAGE_010.format(details=str(e)) passes a formatted message string. This will raise a KeyError at runtime. Should use StorageError.upload_failed(details=str(e)) instead. [ Already posted ]
  • line 174: StorageError is instantiated incorrectly on line 174. A plain error message string is passed instead of an error_code_key. This will cause a KeyError. Should use StorageError.file_not_found_in_storage(file_path=file_path, bucket_name=bucket_name) instead. [ Already posted ]
  • line 174: StorageError constructor expects an error_code_key (e.g., 'STORAGE_FILE_NOT_FOUND') as the first argument, but a formatted message string is passed instead. This will cause a KeyError when DOMAIN_ERROR_CODES[error_code_key] is accessed. Should use StorageError.file_not_found_in_storage(file_path, bucket_name) instead. [ Already posted ]
  • line 175: StorageError is instantiated incorrectly on line 175-176. MSG_STORAGE_011.format(details=str(e)) passes a formatted message string instead of an error_code_key. This will raise a KeyError. Should use StorageError.download_failed(details=str(e)) instead. [ Already posted ]
  • line 175: StorageError constructor expects an error_code_key as the first argument, but MSG_STORAGE_011.format(details=str(e)) passes a formatted message string. This will cause a KeyError when looking up the key in DOMAIN_ERROR_CODES. Should use StorageError.download_failed(str(e)) instead. [ Already posted ]
storage/src/presentation/api/files.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 131: The media_type parameter is set to file_record.content_type which can be None per the FileRecord entity. While StreamingResponse may handle None gracefully, this could result in responses without a proper Content-Type header, potentially causing client-side issues when processing the file. [ Already posted ]
  • line 134: The Content-Disposition header is constructed using file_record.filename which can be None according to the FileRecord entity definition. This will result in a malformed header value attachment; filename="None" being sent to clients. [ Already posted ]
storage/src/shared_kernel/auth/guest_token.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 34: The __getitem__ method return type annotation is str | int | None, but self.payload is typed as dict[str, Any], so the actual return value could be any type. Additionally, accessing a non-existent key will raise KeyError with no indication in the signature, which may surprise callers expecting None for missing keys. [ Already posted ]
storage/src/shared_kernel/auth/public_token.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 22: The condition if token: on line 22 treats an empty string "" as falsy, causing a new token to be generated instead of validating it. If a caller passes an empty string, they likely expect a ValueError for incorrect length (0 != expected), not silent token generation. The condition should be if token is not None: to match the type hint str | None and properly validate empty strings. [ Already posted ]
storage/src/shared_kernel/auth/redis_client.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 31: The return type annotation dict[str, Any] | None may not match what pickle.loads(value) actually returns. If Redis contains pickled data that isn't a dict (e.g., a list, string, or custom object), the method will return a value that doesn't match its declared type, potentially causing AttributeError or TypeError in callers that expect a dict. [ Low confidence ]
  • line 50: pickle.loads() can raise many exceptions beyond pickle.UnpicklingError, including AttributeError, ImportError, ModuleNotFoundError, EOFError, and TypeError (e.g., when the pickled object's class is unavailable or the data is corrupted). These exceptions will propagate uncaught instead of being wrapped in RedisOperationError, causing unexpected crashes. The except block on line 50 should catch a broader set of exceptions, such as Exception or at minimum add the common deserialization failures. [ Already posted ]
storage/src/shared_kernel/auth/services.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 29: Cookie-based public token authentication will always fail. The authenticate_public_token method receives the raw cookie value from request.cookies.get('public-token') and passes it directly to PublicAuthService.get_token(). However, get_token() expects a header format like "Token <actual_token>" (checking for parts[0] != cls.HEADER_PREFIX on line 29). Cookie values don't have this prefix, so the check len(parts) != cls.HEADER_PARTS or parts[0] != cls.HEADER_PREFIX will always return None for cookie-based authentication. [ Already posted ]
storage/src/shared_kernel/auth/token_auth.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 26: The encrypt method is declared as async but contains no await expressions - it only performs synchronous hashlib.pbkdf2_hmac() operations. While this works, it unnecessarily wraps synchronous CPU-bound work in a coroutine, which could cause performance issues under load as it won't yield to the event loop during the potentially expensive PBKDF2 computation. [ Low confidence ]
storage/src/shared_kernel/database/models.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 22: Schema mismatch: FileRecordORM.content_type is defined with nullable=False but FileRecord.content_type is typed as str | None. When entity_to_orm maps an entity with content_type=None, the database will raise an IntegrityError on flush/commit. [ Already posted ]
  • line 23: Schema mismatch: FileRecordORM.filename is defined with nullable=False but FileRecord.filename is typed as str | None. When entity_to_orm maps an entity with filename=None, the database will raise an IntegrityError on flush/commit. [ Already posted ]
storage/src/shared_kernel/exceptions/base_exceptions.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 105: The to_response method accepts arbitrary **kwargs and passes them to ErrorResponse, but ErrorResponse is a dataclass that only accepts specific fields (timestamp, request_id). Passing any other keyword argument will raise a TypeError at runtime (e.g., to_response(user_id=123) would fail with TypeError: __init__() got an unexpected keyword argument 'user_id'). [ Already posted ]
storage/src/shared_kernel/exceptions/exception_handler.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 63: The validation_exception_handler always uses VALIDATION_ERROR_CODES['MISSING_REQUIRED_FIELD'] regardless of the actual validation error type. Pydantic ValidationError can contain many different error types (type errors, value errors, constraint violations, etc.), so the error code and message returned may be misleading or incorrect for non-missing-field validation failures. [ Already posted ]
storage/src/shared_kernel/middleware/auth_middleware.py — 0 comments posted, 4 evaluated, 4 filtered
  • line 122: In authenticate_guest_token, accessing guest_token['user_id'] and guest_token['account_id'] (lines 122-127) will raise KeyError if these keys are missing from the JWT payload, since GuestToken.__getitem__ directly accesses self.payload[key]. While this is caught by the except block, it means any JWT without these specific claims will silently fail authentication even if it's otherwise valid. [ Low confidence ]
  • line 122: The conditional if guest_token['user_id'] (line 123) uses truthiness to decide whether to convert to int. If user_id in the JWT payload is a falsy but valid value like 0 or "0" (empty string after stripping), the user_id will incorrectly be set to None instead of the actual value. [ Already posted ]
  • line 141: The authenticate_public_token method passes raw_token directly to PublicAuthService.get_token(), but get_token() expects a header string in the format "Token <token>" (it splits the input and checks parts[0] != cls.HEADER_PREFIX). When authenticating via the 'public-token' cookie (line 87), the cookie value is just the raw token without the "Token " prefix, causing get_token() to always return None and public token cookie authentication to silently fail. [ Already posted ]
  • line 202: The to_dict() method is called on the ErrorResponse object returned by exc.to_response(), but ErrorResponse is not defined in the visible code and may not have a to_dict() method. This would cause an AttributeError at runtime when authentication fails. [ Low confidence ]
storage/src/shared_kernel/permissions.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 109: CombinedPermissions.get_error_message() always returns a generic 'Access denied' message, but when a permission check fails, the specific permission that failed has its own get_error_message() which is more informative. The __call__ method in BasePermission uses self.get_error_message(), so when CombinedPermissions is called and fails, the user gets a generic message rather than the specific error from the failing permission (e.g., 'Authentication required' from IsAuthenticated). [ Low confidence ]
storage/src/shared_kernel/uow/unit_of_work.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 35: In UnitOfWork.commit(), calling await self._session.commit() while inside the session.begin() context manager commits the transaction. The subsequent __aexit__ call from the context manager may behave unexpectedly if the session's transaction state tracking doesn't align with SQLAlchemy's expected lifecycle. This could cause issues in error scenarios where the explicit commit fails but __aexit__ still runs. [ Already posted ]

…ion API

- Add FileAttachment.access_type field with account/restricted options
- Add FileAttachment.file_id field for unique file identification
- Create FileAttachmentPermission model for user-specific file access
- Implement AttachmentService.check_user_permission method
- Add AttachmentsViewSet with check-permission endpoint
- Add Redis caching for public template authentication
- Add response_forbidden method to BaseResponseMixin
- Include comprehensive test coverage for permission checking

This enables fine-grained file access control with two access types:
- account: accessible by all users in the same account
- restricted: accessible only by users with explicit permissions
…integration

- Add FastAPI-based file upload and download endpoints with streaming support
- Implement Clean Architecture with domain entities, use cases, and repositories
- Add authentication middleware with JWT token validation and Redis caching
- Integrate Google Cloud Storage S3-compatible API for file storage
- Add comprehensive error handling with custom exceptions and HTTP status codes
- Implement file access permissions validation through external HTTP service
- Add database models and Alembic migrations for file metadata storage
- Include Docker containerization with docker-compose for local development
- Add comprehensive test suite with unit, integration, and e2e tests
- Configure pre-commit hooks with ruff, mypy, and pytest for code quality
…e_access_rights' into 41526__сreate_file_service

# Conflicts:
#	backend/src/processes/enums.py
#	backend/src/processes/models/__init__.py
#	backend/src/processes/models/workflows/attachment.py
#	backend/src/processes/services/attachments.py
#	backend/src/processes/tests/test_services/test_attachments.py
@EBirkenfeld EBirkenfeld self-assigned this Oct 24, 2025
@EBirkenfeld EBirkenfeld added the Backend API changes request label Oct 24, 2025
)

# Upload file
await s3.put_object(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both StorageService.upload_file (https://github.com/pneumaticapp/pneumaticworkflow/pull/53/files#diff-8073b0f56943dafd3f23e27584a1233a203192e84be5f7a0685233c6ceccc7f1R1) and the put_object call in storage/src/infra/repositories/storage_service.py:124 forward a content_type that can be None directly to aioboto3 as ContentType=None, causing a ParamValidationError instead of a ClientError and bypassing StorageError wrapping. Consider conditionally including ContentType only when content_type is not None or defaulting it to a safe MIME type (e.g., application/octet-stream).

-                await s3.put_object(
-                    Bucket=bucket_name,
-                    Key=file_path,
-                    Body=file_content,
-                    ContentType=content_type,
-                )
+                params = {
+                    'Bucket': bucket_name,
+                    'Key': file_path,
+                    'Body': file_content,
+                }
+                if content_type is not None:
+                    params['ContentType'] = content_type
+                await s3.put_object(**params)

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

"""Authenticate public token"""
try:
token = PublicAuthService.get_token(raw_token)
if token:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authenticate_public_token passes the raw token into PublicAuthService.get_token(...), but get_token expects a header-like value in the form "Token <value>". This causes valid cookie/header values containing just the token to fail parsing and the method to silently return None.

Consider constructing the header-formatted string using PublicAuthService.HEADER_PREFIX before calling get_token, or alternatively consider extending get_token to accept raw tokens as well and document the accepted formats.

-            token = PublicAuthService.get_token(raw_token)
+            token = PublicAuthService.get_token(f"{PublicAuthService.HEADER_PREFIX} {raw_token}")

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

error_code = e.response['Error']['Code']
if error_code == 'NoSuchKey':
raise StorageError(
f'File {file_path} not found in bucket {bucket_name}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageError is being initialized with formatted or plain message strings as the error_code_key in multiple methods—create_bucket, upload_file, and download_file. Since the first positional argument is used to look up DOMAIN_ERROR_CODES, this will raise a KeyError and hide the original error. Use the dedicated classmethods (e.g., StorageError.bucket_create_failed, StorageError.upload_failed, StorageError.file_not_found_in_storage) or pass a valid error code key with the message in details to preserve context.

-                    raise StorageError(
-                        f'File {file_path} not found in bucket {bucket_name}'
-                    )
+                    raise StorageError.file_not_found_in_storage(file_path, bucket_name)

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

….toml to dedicated config files

- Move mypy configuration from pyproject.toml to mypy.ini for better separation of concerns
- Simplify ruff.toml configuration by removing extensive rule selections and using "ALL" selector
- Update ruff target version from py37 to py311 to match project Python version
- Remove redundant ruff configuration from pyproject.toml to avoid duplication
- Apply code formatting fixes across entire codebase
- Standardize import statements and code style according to new linting rules
- Update test files to comply with new formatting standards
),
'STORAGE_OPERATION_FAILED': ErrorCode(
code='STORAGE_001',
message=MSG_STORAGE_001,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOMAIN_ERROR_CODES['STORAGE_OPERATION_FAILED'] uses MSG_STORAGE_001 (upload-specific), which misleads for non-upload failures. Consider using a neutral message (e.g., a generic string or a new constant) so the fallback isn’t tied to uploads.

-        message=MSG_STORAGE_001,
+        message='Storage operation failed',

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue on line in backend/src/processes/services/attachments.py:69:

check_user_permission() depends on file_id, but attachments are created without one, so queries with None can match unrelated rows or deny valid access. Consider generating and assigning a unique, non-null file_id on create/clone, and short-circuit when file_id is None.

-     def _create_attachment(
-         self,
-         name: str,
-         url: str,
-         size: int,
-         thumbnail_url: Optional[str] = None,
-     ) -> FileAttachment:
+     def _create_attachment(
+         self,
+         name: str,
+         url: str,
+         size: int,
+         file_id: str,
+         thumbnail_url: Optional[str] = None,
+     ) -> FileAttachment:
+             file_id=file_id,
+         file_id = get_salt(30)
+             file_id=file_id,
+             file_id=get_salt(30),
+         if file_id is None:
+             return False

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

exc: ValidationError,
) -> JSONResponse:
"""Handle Pydantic validation errors"""
error_code = VALIDATION_ERROR_CODES['MISSING_REQUIRED_FIELD']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation_exception_handler always returns MISSING_REQUIRED_FIELD for any pydantic.ValidationError, which misreports common cases (e.g., wrong type, extra fields). Consider deriving the code from exc.errors() (e.g., use MISSING_REQUIRED_FIELD only when all errors are missing, otherwise a generic VALIDATION_FAILED).

-        error_code = VALIDATION_ERROR_CODES['MISSING_REQUIRED_FIELD']
-        details = str(exc.errors())
+        errors = exc.errors()
+        missing_only = all(err.get('type') == 'missing' for err in errors)
+        error_code = VALIDATION_ERROR_CODES['MISSING_REQUIRED_FIELD'] if missing_only else VALIDATION_ERROR_CODES['VALIDATION_FAILED']
+        details = str(errors)

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

…ore rule in ruff configuration

- Update docstrings across various modules to ensure consistency and clarity.
- Remove unused "D" rule from ruff.toml configuration.
- Enhance readability and maintainability of the codebase.
"""
# Read file content
file_content = await file.read()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upload_file reads the whole payload into RAM before checking MAX_FILE_SIZE, which can exhaust memory on oversized uploads. Consider streaming in chunks and abort once the limit is exceeded, then assemble the bytes only if within the limit.

-    file_content = await file.read()
-    file_size = len(file_content)
-
-    if file_size > settings.MAX_FILE_SIZE:
-        raise FileSizeExceededError(file_size, settings.MAX_FILE_SIZE)
-
+    total_size = 0
+    chunks: list[bytes] = []
+    while True:
+        chunk = await file.read(settings.CHUNK_SIZE)
+        if not chunk:
+            break
+        total_size += len(chunk)
+        if total_size > settings.MAX_FILE_SIZE:
+            raise FileSizeExceededError(total_size, settings.MAX_FILE_SIZE)
+        chunks.append(chunk)
+    file_content = b"".join(chunks)
+    file_size = total_size

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

from collections.abc import AsyncGenerator
from typing import Annotated

from fastapi import (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename (and content_type) can be None/empty in both upload_file and entity_to_orm, which forwards null into non-nullable DB columns and triggers an IntegrityError on flush. Consider validating and rejecting requests with missing metadata (e.g., return 400) or applying safe defaults before constructing the UploadFileCommand or mapping to the ORM so you never insert null into non-nullable columns.

-    from fastapi import (
-        APIRouter,
-        Depends,
-        File,
-        UploadFile,
-    )
+    from fastapi import (
+        APIRouter,
+        Depends,
+        File,
+        UploadFile,
+        HTTPException,
+    )
+    if not file.filename or not file.filename.strip():
+        raise HTTPException(status_code=400, detail='filename is required')

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

…ling for consistency

- Adjust import paths in test files to ensure they reference the correct locations.
- Replace instances of FileNotFoundError with DomainFileNotFoundError for better clarity in exception handling.
- Streamline fixture definitions and improve code readability across various test modules.
… configuration

- Update docstrings across various test files for consistency and clarity.
- Add new linting rules in ruff.toml for improved code quality.
- Enhance readability and maintainability of the codebase by refining fixture definitions and mock implementations.
…line permission handling

- Refactor the AuthenticationMiddleware to enhance error handling and response formatting.
- Update permission classes to use direct Request type hints instead of string annotations.
- Consolidate permission checks into FastAPI dependency wrappers for better clarity and usability.
- Remove unused exception classes and error messages to clean up the codebase.
- Adjust test cases to reflect changes in authentication and permission handling.
async for chunk in file_stream:
yield chunk

return StreamingResponse(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Disposition uses file_record.filename unescaped, so quotes or CR/LF in uploads can break the header or enable header injection. Consider sanitizing the filename (e.g., strip CR/LF and quotes, or use RFC 6266 filename* encoding) before building the header.

+    safe_name = (file_record.filename or '').replace('"', '').replace('\r', '').replace('\n', '')
-            'Content-Disposition': (
-                f'attachment; filename="{file_record.filename}"'
-            ),
+            'Content-Disposition': f'attachment; filename="{safe_name}"',

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

…exception tests

- Remove unused infrastructure error codes from error_codes.py to streamline the codebase.
- Update the AuthenticationMiddleware constructor to use direct FastAPI type hints for clarity.
- Add new tests for validation exceptions, including file size and storage errors, to improve coverage and ensure accurate error handling.
"""
self.auth_type = auth_user.auth_type
self.user_id = auth_user.user_id
self.account_id: int = auth_user.account_id # type: ignore[assignment]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# type: ignore[assignment] hides a possible None for account_id, so AuthenticatedUser.account_id: int can be None at runtime. Consider validating in __init__ and raising AuthenticationError if account_id is None instead of ignoring the type.

-        self.account_id: int = auth_user.account_id  # type: ignore[assignment]
+        account_id = auth_user.account_id
+        if account_id is None:
+            raise AuthenticationError(details='User not authenticated')
+        self.account_id = account_id

🚀 Want me to fix this? Reply ex: "fix it for me".

except httpx.RequestError as e:
raise HttpClientError(
url=self.base_url,
details=MSG_EXT_014.format(details=str(e)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

details is already formatted with MSG_EXT_014 before passing to HttpClientError, which formats its own message again. This doubles the text (e.g., "... failed: HTTP request failed: ..."). Consider passing the raw exception string so it’s formatted once.

Suggested change
details=MSG_EXT_014.format(details=str(e)),
details=str(e),

🚀 Want me to fix this? Reply ex: "fix it for me".

token: Optional token string.
"""
if token:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if token: treats "" as falsy, generating a new token instead of validating length. Consider if token is not None: so empty strings are validated and raise ValueError.

Suggested change
if token:
if token is not None:

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +95 to +99
# Convert all kwargs values to strings for ErrorResponse
str_kwargs = {
key: str(value) if value is not None else None
for key, value in kwargs.items()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_response ignores self.kwargs, so context captured at construction (e.g., file_id, user_id) never reaches the response. Consider merging self.kwargs with method kwargs (with method args overriding) before building ErrorResponse.

-         # Convert all kwargs values to strings for ErrorResponse
-         str_kwargs = {
-             key: str(value) if value is not None else None
-             for key, value in kwargs.items()
-         }
+         # Convert all kwargs values to strings for ErrorResponse
+         merged_kwargs = {**self.kwargs, **kwargs}
+         str_kwargs = {
+             key: str(value) if value is not None else None
+             for key, value in merged_kwargs.items()
+         }

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +32 to +34
def __getitem__(self, key: str) -> str | int | None:
"""Get token payload item."""
return self.payload[key]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getitem__’s return type doesn’t match payload’s dict[str, Any], and it raises KeyError for missing keys. Consider either documenting the dict-like KeyError behavior, or returning Optional[Any] via payload.get(key) to align the type and avoid surprises.

Suggested change
def __getitem__(self, key: str) -> str | int | None:
"""Get token payload item."""
return self.payload[key]
def __getitem__(self, key: str) -> Any | None:
"""Get token payload item."""
return self.payload.get(key)

🚀 Want me to fix this? Reply ex: "fix it for me".

AccountBaseMixin,
):
class Meta:
unique_together = ('user', 'attachment')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soft delete conflicts with unique_together on ('user', 'attachment'). Consider including is_deleted, or use a conditional UniqueConstraint for non-deleted rows, so recreating permissions doesn’t raise IntegrityError.

Suggested change
unique_together = ('user', 'attachment')
unique_together = ('user', 'attachment', 'is_deleted')

🚀 Want me to fix this? Reply ex: "fix it for me".

def client(self) -> httpx.AsyncClient:
"""Lazy client initialization."""
if self._client is None:
self._client = httpx.AsyncClient()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout mismatch. HttpTimeoutError reports 30.0s, but the httpx.AsyncClient has no timeout set. Consider configuring the client with the same timeout (or derive it from the exception/client) so the message stays accurate.

Suggested change
self._client = httpx.AsyncClient()
self._client = httpx.AsyncClient(timeout=30.0)

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +27 to +35
return FileRecordORM(
file_id=entity.file_id,
filename=entity.filename,
content_type=entity.content_type,
size=entity.size,
user_id=entity.user_id,
account_id=entity.account_id,
created_at=entity.created_at,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entity_to_orm() can pass None for content_type/filename into non-nullable columns, causing a NOT NULL violation on persist. Consider failing fast here (e.g., raise if either is None) or document why None is acceptable and align the schema.

-        return FileRecordORM(
-            file_id=entity.file_id,
-            filename=entity.filename,
-            content_type=entity.content_type,
-            size=entity.size,
-            user_id=entity.user_id,
-            account_id=entity.account_id,
-            created_at=entity.created_at,
-        )
+        if entity.content_type is None or entity.filename is None:
+            raise ValueError("`content_type` and `filename` must not be None when persisting")
+        return FileRecordORM(
+            file_id=entity.file_id,
+            filename=entity.filename,
+            content_type=entity.content_type,
+            size=entity.size,
+            user_id=entity.user_id,
+            account_id=entity.account_id,
+            created_at=entity.created_at,
+        )

🚀 Want me to fix this? Reply ex: "fix it for me".

…nt management, including models, serializers, views, and synchronization commands
Gets cookie domain from settings.
Extracts domain from FRONTEND_URL.
"""
frontend_url = settings.FRONTEND_URL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing settings.FRONTEND_URL will raise if it’s missing. Consider using getattr(settings, 'FRONTEND_URL', None) so absent settings don’t crash this middleware.

Suggested change
frontend_url = settings.FRONTEND_URL
frontend_url = getattr(settings, 'FRONTEND_URL', None)

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +144 to +155
Attachment.objects.get_or_create(
file_id=old_attachment.file_id,
defaults={
'account': old_attachment.account,
'access_type': old_attachment.access_type,
'source_type': source_type,
'task': self._get_task(old_attachment),
'workflow': old_attachment.workflow,
'template': None,
},
)
stats['created'] += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In file_sync, only increment stats['created'] when Attachment.objects.get_or_create(...) returns created=True to avoid inflated counts.

-                Attachment.objects.get_or_create(
+                _, created = Attachment.objects.get_or_create(
-                stats['created'] += 1
+                stats['created'] += int(created)

🚀 Want me to fix this? Reply ex: "fix it for me".


# Save record to database
async with self._unit_of_work:
await self._file_repository.create(file_record)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrity errors aren’t caught here. session.add() doesn’t raise; constraints fire on flush()/commit. Consider catching IntegrityError around await self._unit_of_work.commit() or calling flush() inside FileRecordRepository.create() so its try/except wraps the violation.

🚀 Want me to fix this? Reply ex: "fix it for me".

# Get main domain (e.g., pneumatic.app)
parts = hostname.split('.')
if len(parts) >= 2:
return f'.{parts[-2]}.{parts[-1]}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain reduction picks only the last two labels, which breaks on multi‑part TLDs (e.g., example.co.uk) and can target a public suffix, so the cookie is rejected. Consider returning hostname directly (or a configured domain) instead of collapsing to two labels.

Suggested change
return f'.{parts[-2]}.{parts[-1]}'
return hostname

🚀 Want me to fix this? Reply ex: "fix it for me".

await self._unit_of_work.commit()

# Generate public download URL
public_url = f'{self._fastapi_base_url}/{file_record.file_id}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public_url can contain a double slash if self._fastapi_base_url ends with /. Consider stripping the trailing slash before concatenation.

Suggested change
public_url = f'{self._fastapi_base_url}/{file_record.file_id}'
public_url = f'{self._fastapi_base_url.rstrip("/")}/{file_record.file_id}'

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +122 to +124
user_id=int(guest_token['user_id'])
if guest_token['user_id']
else None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using truthiness for guest_token['user_id'] can drop a valid 0. Consider checking is not None before casting to keep 0 intact.

Suggested change
user_id=int(guest_token['user_id'])
if guest_token['user_id']
else None,
user_id=int(guest_token['user_id'])
if guest_token['user_id'] is not None
else None,

🚀 Want me to fix this? Reply ex: "fix it for me".

return

# Assign permissions to template owners
template_owners = template.template_owners.all()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _assign_template_permissions, use template.owners and treat owners as UserModels; iterate template.owners.all() (or template.get_owners()) and assign perms directly to each owner to avoid AttributeError.

Suggested change
template_owners = template.template_owners.all()
template_owners = template.owners.all()

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +51 to +60
else:
# Parameters for Google Cloud Storage via S3 API
self._client_params = {
'service_name': 's3',
'endpoint_url': self._settings.GCS_S3_ENDPOINT,
'aws_access_key_id': self._settings.GCS_S3_ACCESS_KEY,
'aws_secret_access_key': self._settings.GCS_S3_SECRET_KEY,
'region_name': self._settings.GCS_S3_REGION,
'config': self._config,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else applies GCS params for any non-'local' type. Consider checking explicitly for 'google' and failing for unknown types to avoid misconfig/connection errors.

Suggested change
else:
# Parameters for Google Cloud Storage via S3 API
self._client_params = {
'service_name': 's3',
'endpoint_url': self._settings.GCS_S3_ENDPOINT,
'aws_access_key_id': self._settings.GCS_S3_ACCESS_KEY,
'aws_secret_access_key': self._settings.GCS_S3_SECRET_KEY,
'region_name': self._settings.GCS_S3_REGION,
'config': self._config,
}
elif self._storage_type == 'google':
# Parameters for Google Cloud Storage via S3 API
self._client_params = {
'service_name': 's3',
'endpoint_url': self._settings.GCS_S3_ENDPOINT,
'aws_access_key_id': self._settings.GCS_S3_ACCESS_KEY,
'aws_secret_access_key': self._settings.GCS_S3_SECRET_KEY,
'region_name': self._settings.GCS_S3_REGION,
'config': self._config,
}
else:
raise StorageError(f"Unsupported STORAGE_TYPE: {self._storage_type}")

🚀 Want me to fix this? Reply ex: "fix it for me".


if (auth_header.startswith('Bearer ') or
auth_header.startswith('Token ')):
return auth_header.split(' ')[1]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 54 can produce an empty token when Authorization is just Bearer or Token . Consider returning only a non-empty token (else None).

Suggested change
return auth_header.split(' ')[1]
token = auth_header.split(' ', 1)[1]
if token.strip():
return token.strip()
return None

🚀 Want me to fix this? Reply ex: "fix it for me".

…t management, including model, serializer, and service tests
users_set = set()

# Get task performers via intermediate model
task_performers = task.taskperformer_set.all()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permissions include deleted performers. Suggest filtering with task.taskperformer_set.exclude_directly_deleted() instead of .all() so soft-deleted users/groups don’t get access.

Suggested change
task_performers = task.taskperformer_set.all()
task_performers = task.taskperformer_set.exclude_directly_deleted()

🚀 Want me to fix this? Reply ex: "fix it for me".

Comment on lines +113 to +115
def _generate_file_id(self) -> str:
"""Generates unique file_id."""
return get_salt(32)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_generate_file_id returns a random ID without checking collisions. With unique=True, a rare collision would raise IntegrityError on save and abort the sync. Consider generating an unused ID (retry a few times, then fail) to avoid unexpected failures.

Suggested change
def _generate_file_id(self) -> str:
"""Generates unique file_id."""
return get_salt(32)
def _generate_file_id(self) -> str:
"""Generates unique file_id."""
for _ in range(5):
candidate = get_salt(32)
if not FileAttachment.objects.filter(file_id=candidate).exists():
return candidate
raise ValueError("Could not generate unique file_id")

🚀 Want me to fix this? Reply ex: "fix it for me".

return

# Reload workflow from DB to get latest ManyToMany fields
workflow = Workflow.objects.prefetch_related(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refetching models via default managers excludes soft-deleted rows and can raise DoesNotExist. Suggest using _base_manager when soft-deleted records should be included, or guard with a null-safe lookup and early return.

Suggested change
workflow = Workflow.objects.prefetch_related(
workflow = Workflow._base_manager.prefetch_related(

🚀 Want me to fix this? Reply ex: "fix it for me".

…ice and update tests for event and output properties
Comment on lines +169 to +174
for user in users_set:
assign_perm(
'storage.view_file_attachment',
user,
self.instance,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group performers aren’t granted permissions in _assign_workflow_permissions. Only users get access, so group-only performers can’t view restricted workflow attachments. Consider also assigning storage.view_file_attachment to each performer.group, similar to _assign_task_permissions.

Suggested change
for user in users_set:
assign_perm(
'storage.view_file_attachment',
user,
self.instance,
)
for user in users_set:
assign_perm(
'storage.view_file_attachment',
user,
self.instance,
)
# Assign permissions to groups
for task in tasks:
for performer in task.taskperformer_set.all():
if performer.group:
assign_perm(
'storage.view_file_attachment',
performer.group,
self.instance,
)

🚀 Want me to fix this? Reply ex: "fix it for me".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend API changes request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants