Skip to content

feat: Check for __all__ presence in __init__.py files#766

Draft
AcquaDiGiorgio wants to merge 10 commits intoDIRACGrid:mainfrom
AcquaDiGiorgio:issue-426-__all__-in-__init__
Draft

feat: Check for __all__ presence in __init__.py files#766
AcquaDiGiorgio wants to merge 10 commits intoDIRACGrid:mainfrom
AcquaDiGiorgio:issue-426-__all__-in-__init__

Conversation

@AcquaDiGiorgio
Copy link
Contributor

See #426

I tried something more sofisticated in python using importlib to check the variables in the module, both instantiated and imported, but there were multiple situations that were extremely difficult to check.

We also might want to set a common way of defining this variable, using either tuples or lists:

  • diracx/routers/health/__init__.py: __all__ = ["router"]
  • diracx/db/os/__init__.py __all__ = ("JobParametersDB",)

@read-the-docs-community
Copy link

read-the-docs-community bot commented Feb 5, 2026

Documentation build overview

📚 diracx | 🛠️ Build #31386613 | 📁 Comparing 613851c against latest (5e30814)


🔍 Preview build

No files changed.

@aldbr aldbr linked an issue Feb 5, 2026 that may be closed by this pull request
@aldbr
Copy link
Contributor

aldbr commented Feb 6, 2026

We also might want to set a common way of defining this variable, using either tuples or lists:

I didn't see any standard way of defining __all__ but indeed, it would be better if consistent.
From what I observed, the list is generally preferred (https://stackoverflow.com/questions/66098828/using-list-instead-of-tuple-in-module-all).

@chrisburr chrisburr self-requested a review February 6, 2026 09:36
chrisburr

This comment was marked as off-topic.

chrisburr

This comment was marked as off-topic.

chrisburr

This comment was marked as off-topic.

Copy link
Member

@chrisburr chrisburr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another fake review for me to be able to test something 😄

@DIRACGridBot DIRACGridBot marked this pull request as draft February 7, 2026 06:06
@AcquaDiGiorgio
Copy link
Contributor Author

AcquaDiGiorgio commented Feb 9, 2026

I don't know how much of an improvement this is, but we might not want to use importlib for this task.

To import an __init__.py file, the only way I was able to find is doing:

spec = importlib.util.spec_from_file_location(module_name, file)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)

Here, module_name would be for example, diracx.db.sql.utils for the file diracx-db/src/diracx/db/sql/utils/__init__.py. With this, importlib is able to resolve relative paths.

While importing the module, if the __all__ dunder is defined after the imports, they are resolved first and raise an Exception if the module to be imported is not found. If we want this for the gubbins extensions, we will need to use the gubbins pixi environment, because gubbins.db.sql.utils does not exist in the default one.

However, as a counterpart of this method, importing the module each time is quite slow, and I don't see any issues on using regex.

Also, I found a wild __init__.py file in the tests for the db: diracx-db/tests/pilot_agents/__init__.py. Is this intentional?

@AcquaDiGiorgio
Copy link
Contributor Author

Note: I keep the PR as a draft to not interrupt whatever test you are doing, but is ready to review.

@aldbr
Copy link
Contributor

aldbr commented Feb 9, 2026

Also, I found a wild init.py file in the tests for the db: diracx-db/tests/pilot_agents/init.py. Is this intentional?

No, I don't think so. Can you remove it please?

@AcquaDiGiorgio AcquaDiGiorgio marked this pull request as ready for review February 16, 2026 11:01
@AcquaDiGiorgio
Copy link
Contributor Author

This every dependency I have extracted from the imports on every file:

├─┬ diracx
| ├─┬ diracx.core
| | ├─┬ diracx.core.utils
│ │ │ ├──> EXPIRES_GRACE_SECONDS
│ │ │ ├──> serialize_credentials
│ │ │ ├──> dotenv_files_from_environment
│ │ │ ├──> read_credentials
│ │ │ └──> write_credentials
│ │ │ 
| | ├─┬ diracx.core.models
| | | ├─┬ diracx.core.models.auth
│ │ │ │ ├──> TokenResponse
│ │ │ │ ├──> AccessTokenPayload
│ │ │ │ ├──> RefreshTokenPayload
│ │ │ │ ├──> UserInfo
│ │ │ │ ├──> Metadata
│ │ │ │ ├──> OpenIDConfiguration
│ │ │ │ ├──> InitiateDeviceFlowResponse
│ │ │ │ ├──> GrantType
│ │ │ │ ├──> TokenTypeHint
│ │ │ │ ├──> GroupInfo
│ │ │ │ └──> TokenPayload
│ │ │ │ 
| | | ├─┬ diracx.core.models.search
│ │ │ │ ├──> SearchSpec
│ │ │ │ ├──> VectorSearchOperator
│ │ │ │ ├──> SearchParams
│ │ │ │ ├──> SummaryParams
│ │ │ │ ├──> ScalarSearchOperator
│ │ │ │ ├──> VectorSearchSpec
│ │ │ │ ├──> SortDirection
│ │ │ │ └──> SortSpec
│ │ │ │ 
| | | ├─┬ diracx.core.models.sandbox
│ │ │ │ ├──> SandboxDownloadResponse
│ │ │ │ ├──> SandboxInfo
│ │ │ │ ├──> SandboxUploadResponse
│ │ │ │ └──> SandboxType
│ │ │ │ 
| | | ├─┬ diracx.core.models.job
│ │ │ │ ├──> InsertedJob
│ │ │ │ ├──> HeartbeatData
│ │ │ │ ├──> JobCommand
│ │ │ │ ├──> JobMetaData
│ │ │ │ ├──> JobStatusUpdate
│ │ │ │ ├──> SetJobStatusReturn
│ │ │ │ ├──> JobLoggingRecord
│ │ │ │ ├──> JobStatus
│ │ │ │ ├──> JobAttributes
│ │ │ │ ├──> JobMinorStatus
│ │ │ │ ├──> JobParameters
│ │ │ │ └──> JobStatusReturn
│ │ │ 
| | ├─┬ diracx.core.preferences
│ │ │ ├──> DiracxPreferences
│ │ │ ├──> get_diracx_preferences
│ │ │ └──> OutputFormats
│ │ │ 
| | ├─┬ diracx.core.config
│ │ │ ├──> ConfigSource
│ │ │ ├──> Config
│ │ │ ├──> ConfigSourceUrl
│ │ │ │ 
| | | ├─┬ diracx.core.config.schema
│ │ │ │ ├──> Config
│ │ │ │ ├──> DIRACConfig
│ │ │ │ ├──> GroupConfig
│ │ │ │ ├──> IdpConfig
│ │ │ │ ├──> OperationsConfig
│ │ │ │ ├──> RegistryConfig
│ │ │ │ ├──> UserConfig
│ │ │ │ ├──> Field
│ │ │ │ └──> SupportInfo
│ │ │ 
| │ ├─┬ diracx.core.exceptions
│ │ │ ├──> DiracError
│ │ │ ├──> DiracHttpResponseError
│ │ │ ├──> NotReadyError
│ │ │ ├──> InvalidCredentialsError
│ │ │ ├──> TokenNotFoundError
│ │ │ ├──> AuthorizationError
│ │ │ ├──> IAMClientError
│ │ │ ├──> IAMServerError
│ │ │ ├──> PendingAuthorizationError
│ │ │ ├──> SandboxAlreadyAssignedError
│ │ │ ├──> SandboxNotFoundError
│ │ │ ├──> SandboxAlreadyInsertedError
│ │ │ └──> InvalidQueryError
│ │ │ 
| | ├─┬ diracx.core.extensions
│ │ │ ├──> DiracEntryPoint
│ │ │ ├──> select_from_extension
│ │ │ └──> supports_extending
│ │ │ 
| | ├─┬ diracx.core.settings
│ │ │ ├──> ServiceSettingsBase
│ │ │ ├──> AuthSettings
│ │ │ ├──> DevelopmentSettings
│ │ │ ├──> SandboxStoreSettings
│ │ │ └──> SqlalchemyDsn
│ │ │ 
| | ├─┬ diracx.core.properties
│ │ │ ├──> SecurityProperty
│ │ │ ├──> UnevaluatedProperty
│ │ │ ├──> PROXY_MANAGEMENT
│ │ │ ├──> GENERIC_PILOT
│ │ │ ├──> JOB_ADMINISTRATOR
│ │ │ ├──> NORMAL_USER
│ │ │ └──> JOB_SHARING
│ │ │ 
| | ├─┬ diracx.core.s3
│ │ │ ├──> s3_bucket_exists
│ │ │ ├──> generate_presigned_upload
│ │ │ ├──> s3_bulk_delete_with_retry
│ │ │ └──> s3_object_exists
│ │ │ 
| | ├─┬ diracx.core.resources
│ │ │ └──> find_compatible_platforms
│ │ 
| ├─┬ diracx.client
| | ├─┬ diracx.client.aio
│ │ │ └──> AsyncDiracClient
│ │ │ 
| | ├─┬ diracx.client.models
│ │ │ ├──> DeviceFlowErrorResponse
│ │ │ └──> SandboxInfo
│ │ 
| ├─┬ diracx.db
| | ├─┬ diracx.db.exceptions
│ │ │ └──> DBUnavailableError
│ │ │ 
| | ├─┬ diracx.db.os
│ │ │ ├──> JobParametersDB
│ │ │ │
| | | ├─┬ diracx.db.os.utils
│ │ │ │ └──> BaseOSDB
│ │ │ │ 
| | | ├─┬ diracx.db.os.job_parameters
│ │ │ │ └──> JobParametersDB
│ │ │ 
| | ├─┬ diracx.db.sql
│ │ │ ├──> AuthDB
│ │ │ ├──> JobDB
│ │ │ ├──> JobLoggingDB
│ │ │ ├──> PilotAgentsDB
│ │ │ ├──> SandboxMetadataDB
│ │ │ ├──> TaskQueueDB
│ │ │ │
| | | ├─┬ diracx.db.sql.utils
│ │ │ │ ├──> BaseSQLDB
│ │ │ │ ├──> uuid7_to_datetime
│ │ │ │ ├──> str32
│ │ │ │ ├──> str64
│ │ │ │ ├──> str128
│ │ │ │ ├──> str255
│ │ │ │ ├──> EnumBackedBool
│ │ │ │ ├──> hash
│ │ │ │ ├──> substract_date
│ │ │ │ ├──> uuid7_from_datetime
│ │ │ │ ├──> datetime_now
│ │ │ │ ├──> enum_column
│ │ │ │ ├──> str1024
│ │ │ │ ├──> str512
│ │ │ │ │ 
| | | | ├─┬ diracx.db.sql.utils.functions
│ │ │ │ │ ├──> substract_date
│ │ │ │ │ ├──> utcnow
│ │ │ │ │ └──> days_since
│ │ │ │ │ 
| | | | ├─┬ diracx.db.sql.utils.types
│ │ │ │ │ └──> SmarterDateTime
│ │ │ │ │ 
| | | | ├─┬ diracx.db.sql.utils.base
│ │ │ │ │ └──> BaseSQLDB
│ │ │ │ 
| | | ├─┬ diracx.db.sql.task_queue
| | | | ├─┬ diracx.db.sql.task_queue.db
│ │ │ │ │ └──> TaskQueueDB
│ │ │ │
| | | ├─┬ diracx.db.sql.auth
| | | | ├─┬ diracx.db.sql.auth.schema
│ │ │ │ │ ├──> FlowStatus
│ │ │ │ │ └──> RefreshTokenStatus
│ │ │ │
| | | ├─┬ diracx.db.sql.job
| | | | ├─┬ diracx.db.sql.job.db
│ │ │ │ │ └──> JobDB
│ │ │ │
| | | ├─┬ diracx.db.sql.sandbox_metadata
| | | | ├─┬ diracx.db.sql.sandbox_metadata.db
│ │ │ │ │ └──> SandboxMetadataDB
│ │ │ │
| | | ├─┬ diracx.db.sql.job_logging
| | | | ├─┬ diracx.db.sql.job_logging.db
│ │ │ │ │ └──> JobLoggingDB
│ │ 
| ├─┬ diracx.routers
| | ├─┬ diracx.routers.access_policies
│ │ │ ├──> BaseAccessPolicy
│ │ │ └──> check_permissions
│ │ │ 
| | ├─┬ diracx.routers.dependencies
│ │ │ ├──> DevelopmentSettings
│ │ │ └──> AuthSettings
│ │ │ 
| | ├─┬ diracx.routers.utils
| | | ├─┬ diracx.routers.utils.users
│ │ │ │ ├──> AuthorizedUserInfo
│ │ │ │ └──> verify_dirac_access_token
│ │ 
| ├─┬ diracx.logic
| | ├─┬ diracx.logic.auth
| | | ├─┬ diracx.logic.auth.utils
│ │ │ │ ├──> read_token
│ │ │ │ └──> verify_dirac_refresh_token
│ │ │ │
| | | ├─┬ diracx.logic.auth.management
│ │ │ │ ├──> get_refresh_tokens
│ │ │ │ ├──> revoke_refresh_token_by_jti
│ │ │ │ └──> revoke_refresh_token_by_refresh_token
│ │ │ │ 
| | | ├─┬ diracx.logic.auth.authorize_code_flow
│ │ │ │ ├──> complete_authorization_flow
│ │ │ │ └──> initiate_authorization_flow
│ │ │ │ 
| | | ├─┬ diracx.logic.auth.well_known
│ │ │ │ ├──> get_installation_metadata
│ │ │ │ ├──> get_jwks
│ │ │ │ └──> get_openid_configuration
│ │ │ │ 
| | | ├─┬ diracx.logic.auth.device_flow
│ │ │ │ ├──> do_device_flow
│ │ │ │ ├──> finish_device_flow
│ │ │ │ └──> initiate_device_flow
│ │ │ │ 
| | | ├─┬ diracx.logic.auth.token
│ │ │ │ ├──> create_token
│ │ │ │ ├──> get_oidc_token
│ │ │ │ └──> perform_legacy_exchange
│ │ │ 
| | ├─┬ diracx.logic.jobs
| | | ├─┬ diracx.logic.jobs.sandboxes
│ │ │ │ ├──> SANDBOX_PFN_REGEX
│ │ │ │ ├──> assign_sandbox_to_job
│ │ │ │ ├──> get_job_sandbox
│ │ │ │ ├──> get_job_sandboxes
│ │ │ │ ├──> get_sandbox_file
│ │ │ │ ├──> initiate_sandbox_upload
│ │ │ │ └──> unassign_jobs_sandboxes
│ │ │ │ 
| | | ├─┬ diracx.logic.jobs.submission
│ │ │ │ └──> submit_jdl_jobs
│ │ │ │ 
| | | ├─┬ diracx.logic.jobs.status
│ │ │ │ ├──> add_heartbeat
│ │ │ │ ├──> get_job_commands
│ │ │ │ ├──> reschedule_jobs
│ │ │ │ ├──> set_job_parameters_or_attributes
│ │ │ │ └──> set_job_statuses
│ │ │ │ 
| | | ├─┬ diracx.logic.jobs.query
│ │ │ │ ├──> search
│ │ │ │ └──> summary
│ │ │ │ 
| | | ├─┬ diracx.logic.jobs.utils
│ │ │ │ ├──> check_and_prepare_job
│ │ │ │ └──> make_job_manifest_config
│ │ │ 
| | ├─┬ diracx.logic.task_queues
| | | ├─┬ diracx.logic.task_queues.priority
│ │ │ │ └──> recalculate_tq_shares_for_entity

Sometimes a module is imported, such as from diracx.core.utils import EXPIRES_GRACE_SECONDS and other times a package with __all__ is imported, such as from diracx.db.sql import AuthDB. How do you want to handle __all__s in this case?

I tried to put in the __all__ list every dependency, but for example, the diracx.core package ends up containing the following:

from __future__ import annotations

__all__ = [
    "SecurityProperty",
    "UnevaluatedProperty",
    "PROXY_MANAGEMENT",
    "GENERIC_PILOT",
    "JOB_ADMINISTRATOR",
    "NORMAL_USER",
    "JOB_SHARING",
    "s3_bucket_exists",
    "generate_presigned_upload",
    "s3_bulk_delete_with_retry",
    "s3_object_exists",
    "find_compatible_platforms",
    "ServiceSettingsBase",
    "AuthSettings",
    "DevelopmentSettings",
    "SandboxStoreSettings",
    "SqlalchemyDsn",
    "DiracEntryPoint",
    "select_from_extension",
    "supports_extending",
    "DiracError",
    "DiracHttpResponseError",
    "NotReadyError",
    "InvalidCredentialsError",
    "TokenNotFoundError",
    "AuthorizationError",
    "IAMClientError",
    "IAMServerError",
    "PendingAuthorizationError",
    "SandboxAlreadyAssignedError",
    "SandboxNotFoundError",
    "SandboxAlreadyInsertedError",
    "InvalidQueryError",
    "ConfigSource",
    "Config",
    "ConfigSourceUrl",
    "DiracxPreferences",
    "get_diracx_preferences",
    "OutputFormats",
    "EXPIRES_GRACE_SECONDS",
    "serialize_credentials",
    "EXPIRES_GRACE_SECONDS",
    "serialize_credentials",
    "dotenv_files_from_environment",
    "read_credentials",
    "write_credentials",
]

from .config import Config, ConfigSource, ConfigSourceUrl
from .exceptions import (
    AuthorizationError,
    DiracError,
    DiracHttpResponseError,
    IAMClientError,
    IAMServerError,
    InvalidCredentialsError,
    InvalidQueryError,
    NotReadyError,
    PendingAuthorizationError,
    SandboxAlreadyAssignedError,
    SandboxAlreadyInsertedError,
    SandboxNotFoundError,
    TokenNotFoundError,
)
from .extensions import (
    DiracEntryPoint,
    select_from_extension,
    supports_extending,
)
from .preferences import (
    DiracxPreferences,
    OutputFormats,
    get_diracx_preferences,
)
from .properties import (
    GENERIC_PILOT,
    JOB_ADMINISTRATOR,
    JOB_SHARING,
    NORMAL_USER,
    PROXY_MANAGEMENT,
    SecurityProperty,
    UnevaluatedProperty,
)
from .resources import find_compatible_platforms
from .s3 import (
    generate_presigned_upload,
    s3_bucket_exists,
    s3_bulk_delete_with_retry,
    s3_object_exists,
)
from .settings import (
    AuthSettings,
    DevelopmentSettings,
    SandboxStoreSettings,
    ServiceSettingsBase,
    SqlalchemyDsn,
)
from .utils import (
    EXPIRES_GRACE_SECONDS,
    dotenv_files_from_environment,
    read_credentials,
    serialize_credentials,
    write_credentials,
)

Is this what we want?

@aldbr
Copy link
Contributor

aldbr commented Feb 18, 2026

Here is an example of what we want I think:

  • general rule: consumers should import from submodules (e.g. diracx.logic.auth import do_device_flow, diracx.core.model import TokenPayload). We should only add modules to __all__ if used elsewhere (we want to limit the public API).
  • diracx.(cli/api/client/core/routers/logic/db): __all__ = [] because we don't want to import modules like diracx.logic import device_flow.
  • diracx.db.(sql/os): a bit special but here we keep as-is, consumers "don't get direct access to the modules under sql/os.

I didn't check all the sub directories and modules, it's just to give you a sense of what we should probably aim for.
Any opinion?

@AcquaDiGiorgio
Copy link
Contributor Author

AcquaDiGiorgio commented Feb 18, 2026

Yeah, that makes sense, the example of diracx.logic.auth is 100% clear, my problem is with the modules at the same level of the root __init__.py of each project.

For example, the diracx-routers project:

diracx-routers/src/diracx/routers
├── auth
├── health
├── jobs
├── utils
├── __init__.py
├── access_policies.py
├── configuration.py
├── dependencies.py
├── factory.py
├── fastapi_classes.py
└── otel.py

You said that in this case __all__ must be empty, but should't we discover dependencies from access_policies.py, configuration.py, ..., if any?

If so, those dependencies should be in the the __init__.py of the root package, no?
Or will those dependencies be discovered by importing the modules (diracx.routers.access_policies, diracx.routers.configuration, ...) instead of the package?

@aldbr
Copy link
Contributor

aldbr commented Feb 18, 2026

For example, the diracx-routers project:
You said that in this case __all__ must be empty, but should't we discover dependencies from access_policies.py, configuration.py, ..., if any?
If so, those dependencies should be in the the __init__.py of the root package, no? Or will those dependencies be discovered by importing the modules (diracx.routers.access_policies, diracx.routers.configuration, ...) instead of the package?

Some of them could be if there are meant to be used in the extensions indeed. You can have a look at the lhcbdiracx extension to have an example: https://gitlab.cern.ch/lhcb-dirac/lhcbdiracx

If you have any doubt, you can present the different options with pros and cons 🙂

@AcquaDiGiorgio
Copy link
Contributor Author

AcquaDiGiorgio commented Feb 19, 2026

Okay, after looking through lhcbdiracx a bit , then maybe the best way of proceeding would be:

  • Root packages: __all__ = []
  • Modules at the root of the project: __all__ have to present (e.g.: diracx.routers.access_policies must include it)
  • Subpackages: __all__ present at the __init__.py file, but not in the other modules. When importing, import the names from the package. E.g., instead of from diracx.logic.auth.device_flow import device_flow, do from diracx.logic.auth import device_flow

More or less the same way as you said, but taking into account the modules at root

I could change the pre-commit hook to check modules at the root if we go with this

@AcquaDiGiorgio AcquaDiGiorgio marked this pull request as draft February 19, 2026 14:32
@AcquaDiGiorgio
Copy link
Contributor Author

AcquaDiGiorgio commented Feb 24, 2026

Things to check out

  • Comment at about a closed pr at diracx-core/src/diracx/core/models/__init__.py (TODO: remove after DIRACGrid/DIRAC#8433) [DONE]
  • diracx-routers/src/diracx/routers/jobs/legacy.py is empty
  • diracx.client.patches is a namespace
  • diracx-routers __init__.py has some comments that might not permit an empty all
  • diracx-client __init__.py, what do we do? Should we set the __all__ empty? Will that affect the client generation?

__all__ on other projects

Markitdown

  • Only __init__.py files contain __all__. It contains every single used function, variable, ...
  • Always same level modules imports, never sub-package imports (from .p1.p2 import func)

Django

  • Sometimes, both __init__.py files and module files might contain __all__
  • Other times, no __all__ at all

CPython Lib/

  • Generally, every single module contain __all__ and the __init__.py file contain star imports with the concatenation of each module __all__s.

In summary, no seeming convention

Possible DiracX convention

  • I think having the root __init__.py with an empty __all__ is the best way to proceed
  • To be able to have an API for the root modules, we can setup an __all__ on every module at that level, such as diracx.core.properties (located at diracx/core/properties.py)
  • Inside sub-modules, the __init__.py has to contain the public API of every module at its level, and those modules must not contain __all__

Important info

  • Private objects cannot go to the __all__, for example:
    diracx/core/models/__init__.py
from .replica_map import ReplicaMap

diracx-core/tests/test_replica_map.py

from diracx.core.models.replica_map import (
    _validate_adler32,
    _validate_guid,
    _validate_lfn,
    _validate_pfn,
)
  • To avoid circular imports
    While importing same level modules -> Relative import
    For example, diracx-core:
[diracx]
└─ [core]
   ├─ [config]           # Package
   ├─ [exceptions]       # Package
   ├─ [extensions]       # Package
   ├─ [models]           # Package
   │  └─ [replica_map]   # Module
   ├─ [preferences]      # Module
   ├─ [properties]       # Module
   ├─ [resources]        # Module
   ├─ [s3]               # Module
   ├─ [settings]         # Module
   └─ [utils]            # Module

At diracx/core/config/source.py

...

# vvv Different level vvv
from diracx.core.exceptions import BadConfigurationVersionError
from diracx.core.extensions import DiracEntryPoint, select_from_extension
from diracx.core.utils import TwoLevelCache
# ^^^ --------------- ^^^

# vvv Same level vvv
from .schema import Config  # Located at diracx/core/config/schema.py
# ^^^ ---------- ^^^

...

@AcquaDiGiorgio
Copy link
Contributor Author

This ended up being a huge change, we should discuss it further

@aldbr aldbr requested a review from chrisburr February 26, 2026 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure __all__ is present in the __init__ files and up to date

3 participants