Skip to content

migrated arangodb driver to async version#1483

Merged
phenobarbital merged 2 commits into
masterfrom
or_conditions
Feb 13, 2026
Merged

migrated arangodb driver to async version#1483
phenobarbital merged 2 commits into
masterfrom
or_conditions

Conversation

@phenobarbital
Copy link
Copy Markdown
Owner

@phenobarbital phenobarbital commented Feb 13, 2026

Summary by Sourcery

Migrate the ArangoDB driver and its tests to the async arangoasync client, extend MongoDB driver utilities, and add release and example scripts while bumping the library version to 2.14.0.

New Features:

  • Introduce async ArangoDB driver usage with arangoasync, including async connection handling, collection/graph operations, and AQL query execution.
  • Add MongoDB helper methods for creating indexes, listing collections, inserting and updating documents, and batch inserts.
  • Provide example scripts for async ArangoDB playground usage and timed pandas queries against PostgreSQL.
  • Add a release automation script to bump versions, commit, and tag releases.

Enhancements:

  • Refactor ArangoDB tests to use async mocks and iterators aligned with the async driver interface and updated error messages.
  • Adjust collection creation to use ArangoDB col_type for document and edge collections and update logging configuration to match the new driver.
  • Improve write, document, and graph operations in the ArangoDB driver to work with async methods and async cursor iteration throughout the API.

Tests:

  • Rework ArangoDB test suite to cover async connection lifecycle, AQL queries, document and bulk write operations, graph operations, and performance paths using the new async driver abstractions.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Feb 13, 2026

Reviewer's Guide

Migrates the ArangoDB driver and its tests to the async arangoasync client, updating all Arango interactions (connections, collections, graphs, queries, writes) to be fully async-aware, while also adding some Mongo helper methods, a new asyncpg example, a release helper script, and bumping the project version.

Sequence diagram for async ArangoDB connection and query flow

sequenceDiagram
    actor User
    participant ArangoDBDriver
    participant ArangoClient
    participant SystemDatabase as SystemDB__system
    participant TargetDatabase as TargetDB_userdb

    User->>ArangoDBDriver: connection(database)
    activate ArangoDBDriver
    ArangoDBDriver->>ArangoClient: ArangoClient(hosts=url)
    activate ArangoClient
    ArangoDBDriver->>ArangoDBDriver: create Auth(username, password)
    ArangoDBDriver->>ArangoClient: db("_system", auth)
    ArangoClient-->>ArangoDBDriver: SystemDatabase (await)
    deactivate ArangoClient

    ArangoDBDriver->>SystemDatabase: has_database(database_name) (await)
    alt database_missing
        SystemDatabase-->>ArangoDBDriver: False
        ArangoDBDriver->>SystemDatabase: create_database(database_name) (await)
        SystemDatabase-->>ArangoDBDriver: True
    else database_exists
        SystemDatabase-->>ArangoDBDriver: True
    end

    ArangoDBDriver->>ArangoClient: db(database_name, auth) (await)
    activate ArangoClient
    ArangoClient-->>ArangoDBDriver: TargetDatabase
    deactivate ArangoClient

    ArangoDBDriver->>ArangoDBDriver: set _connection, _connected
    deactivate ArangoDBDriver

    User->>ArangoDBDriver: query(sentence, bind_vars)
    activate ArangoDBDriver
    ArangoDBDriver->>TargetDatabase: aql.execute(sentence, bind_vars) (await)
    activate TargetDatabase
    TargetDatabase-->>ArangoDBDriver: async_cursor
    deactivate TargetDatabase

    loop async_iteration
        ArangoDBDriver->>ArangoDBDriver: collect [doc async for doc in cursor]
    end
    ArangoDBDriver-->>User: result list
    deactivate ArangoDBDriver

    User->>ArangoDBDriver: close()
    activate ArangoDBDriver
    ArangoDBDriver->>ArangoClient: close() (await)
    ArangoClient-->>ArangoDBDriver: closed
    deactivate ArangoDBDriver
Loading

Class diagram for updated async ArangoDB and Mongo drivers

classDiagram
    class ArangoDBDriver {
        - ArangoClient _client
        - Database _connection
        - Auth _auth
        - str _database_name
        - str _auth_method
        - str _username
        - str _password
        + connection(database: str) async
        + close() async
        + use(database: str) async
        + create_database(database: str) async
        + drop_database(database: str) async
        + create_collection(name: str, edge: bool, **kwargs) async
        + drop_collection(name: str) async
        + collection_exists(name: str) bool async
        + create_graph(name: str, edge_definitions: list, orphan_collections: list) async
        + drop_graph(name: str, drop_collections: bool) async
        + graph_exists(name: str) bool async
        + query(sentence: str, bind_vars: dict, **kwargs) async
        + queryrow(sentence: str, bind_vars: dict) async
        + fetch_all(sentence: str, bind_vars: dict) async
        + fetch_one(sentence: str, bind_vars: dict) async
        + fetchval(sentence: str, bind_vars: dict, column: any) async
        + execute(sentence: str, bind_vars: dict) async
        + insert_document(collection: str, document: dict, return_new: bool) async
        + update_document(collection: str, document: dict, return_new: bool) async
        + delete_document(collection: str, document_key: str) async
        + write(collection: str, data: any, batch_size: int) async
        + create_vertex(graph: str, collection: str, vertex: dict) async
        + create_edge(graph: str, collection: str, edge: dict) async
    }

    class ArangoClient {
        + db(name: str, auth: Auth) Database async
        + close() async
    }

    class Database {
        + name str
        + has_database(name: str) bool async
        + create_database(name: str) async
        + delete_database(name: str) async
        + has_collection(name: str) bool async
        + create_collection(name: str, **kwargs) async
        + delete_collection(name: str) async
        + collection(name: str) Collection
        + has_graph(name: str) bool async
        + create_graph(name: str, edge_definitions: list, orphan_collections: list) Graph async
        + delete_graph(name: str, drop_collections: bool) async
        + aql AQLInterface
    }

    class Auth {
        + username str
        + password str
    }

    class Collection {
        + insert(document: dict, return_new: bool) async
        + insert_many(documents: list) async
        + update(document: dict, return_new: bool) async
        + delete(document_key: str) async
    }

    class Graph {
        + vertex_collection(name: str) VertexCollection
        + edge_collection(name: str) EdgeCollection
    }

    class VertexCollection {
        + insert(vertex: dict) async
    }

    class EdgeCollection {
        + insert(edge: dict) async
    }

    class AQLInterface {
        + execute(query: str, bind_vars: dict) Cursor async
    }

    class Cursor {
        + async iteration
    }

    class MongoDriver {
        + create_index(collection_name: str, keys: any, **kwargs) str async
        + create_indexes(collection_name: str, indexes: list, **kwargs) list~str~ async
        + list_collections(filter: dict, **kwargs) list~str~ async
        + insert(collection_name: str, data: any, **kwargs) async
        + update(collection_name: str, filter: dict, update: dict, many: bool, **kwargs) async
        + batch_insert(collection_name: str, data: list, **kwargs) async
        - _select_database() async
    }

    class MongoDatabase {
        + list_collection_names(filter: dict) list~str~ async
        + __getitem__(name: str) MongoCollection
    }

    class MongoCollection {
        + create_index(keys: any, **kwargs) str async
        + create_indexes(indexes: list, **kwargs) list~str~ async
        + insert_one(document: dict, **kwargs) async
        + insert_many(documents: list, **kwargs) async
        + update_one(filter: dict, update: dict, **kwargs) async
        + update_many(filter: dict, update: dict, **kwargs) async
    }

    ArangoDBDriver --> ArangoClient : uses
    ArangoDBDriver --> Database : holds_connection
    ArangoDBDriver --> Auth : uses
    Database --> Collection : exposes
    Database --> Graph : exposes
    Database --> AQLInterface : exposes
    AQLInterface --> Cursor : returns
    Graph --> VertexCollection : exposes
    Graph --> EdgeCollection : exposes

    MongoDriver --> MongoDatabase : uses_via__select_database
    MongoDatabase --> MongoCollection : exposes
Loading

File-Level Changes

Change Details Files
Migrate ArangoDB driver to use the async arangoasync client and async APIs throughout.
  • Switch driver imports and logging from the sync arango client to arangoasync, including using Auth and Database types
  • Change connection, database selection, and close logic to await client.db(...), sys_db.has_database(...), create/delete_database, and client.close()
  • Refactor collection operations to use awaitable has_collection/create_collection/delete_collection and to map the previous edge flag into col_type values
  • Refactor graph operations to await has_graph, create_graph, and delete_graph, preserving existing semantics
  • Update all AQL query helpers (query, queryrow, fetch_all, fetch_one, fetchval, execute) to await aql.execute and consume async cursors via async iteration
  • Update document-level operations (insert/update/delete) and bulk write logic to await collection methods and ensure collections are created via async calls before use
  • Adjust vertex/edge helpers to await graph vertex/edge insert calls and return their results
asyncdb/drivers/arangodb.py
Update ArangoDB tests to work against the async driver and async arango client API.
  • Introduce an AsyncIterator helper to mock async AQL cursors
  • Change ArangoClient mocks so db(), close(), collection, graph, and AQL execute are AsyncMocks where appropriate and system DB helpers (has_database/create/delete) are async
  • Update tests to use assert_awaited/assert_awaited_once instead of sync assert_called for async methods
  • Adjust test expectations around error messages and DriverError propagation for connection and query failures
  • Adapt query, fetch, and execute tests to consume async iterators and simplified behaviors (e.g., fewer cases, explicit error string assertions)
  • Update document, write, and graph tests to assume async collection/graph APIs and the new collection creation semantics
  • Add additional safety in integration-like tests to ensure mocked collections/graphs expose AsyncMock methods
tests/test_arangodb.py
Extend Mongo driver with basic high-level helpers for indexes, collection listing, and batch writes.
  • Add async create_index and create_indexes wrappers that select the current db and call collection.create_index/create_indexes with DriverError wrapping
  • Add async list_collections wrapper around db.list_collection_names
  • Add async insert, update, and batch_insert helpers that call insert_one/insert_many and update_one/update_many depending on inputs and flags
asyncdb/drivers/mongo.py
Add new examples and tooling for async drivers and releases.
  • Add a timed_pandas example in the asyncpg test script to demonstrate measuring DataFrame query performance and switch main runner to call it instead of pooler
  • Add an arangoasync_playground script showcasing basic async Arango operations (connect, create db/collection, insert/get docs, run AQL, cleanup)
  • Introduce a release helper script that bumps the version in asyncdb/version.py and .bumpversion.cfg, commits, and tags the release
examples/test_asyncpg.py
scripts/arangoasync_playground.py
scripts/release.py
Bump library version to 2.14.0 and align bumpversion config.
  • Update version in asyncdb/version.py to 2.14.0
  • Update current_version in .bumpversion.cfg to 2.14.0
asyncdb/version.py
.bumpversion.cfg

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@phenobarbital phenobarbital merged commit 15ff7bc into master Feb 13, 2026
2 of 3 checks passed
@phenobarbital phenobarbital deleted the or_conditions branch February 13, 2026 00:54
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 security issues, 2 other issues, and left some high level feedback:

Security issues:

  • Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
  • Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • In the ArangoDB connection method you now unconditionally build an Auth object from username/password and ignore the previous jwt path and jwt_token parameter, which looks like a regression in JWT support; consider preserving distinct handling for JWT-based auth or clearly deprecating it.
  • The new write implementation treats any exception from self._connection.collection(collection) as a signal that the collection does not exist and then creates it, which can mask real connection or permission errors; it would be safer to distinguish CollectionNotFound from other exceptions before falling back to create_collection.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the ArangoDB `connection` method you now unconditionally build an `Auth` object from username/password and ignore the previous `jwt` path and `jwt_token` parameter, which looks like a regression in JWT support; consider preserving distinct handling for JWT-based auth or clearly deprecating it.
- The new `write` implementation treats any exception from `self._connection.collection(collection)` as a signal that the collection does not exist and then creates it, which can mask real connection or permission errors; it would be safer to distinguish `CollectionNotFound` from other exceptions before falling back to `create_collection`.

## Individual Comments

### Comment 1
<location> `asyncdb/drivers/arangodb.py:699-702` </location>
<code_context>
         """
         try:
-            col = self._connection.collection(collection)
+            if not await self._connection.has_collection(collection):
+                col = await self.create_collection(collection)
+            else:
+                col = self._connection.collection(collection)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Collection creation in write() can mask real errors and perform duplicate work.

Because `write` treats any exception from `has_collection` as "collection does not exist" and immediately tries `create_collection`, transient connection/permission errors will be misclassified and masked, and the create will likely fail differently. This also duplicates the create-on-missing logic you already added. Please narrow the exception handling to true "not found" cases and let other errors propagate so callers see the real failure instead of always falling back to creation.

```suggestion
            col = self._connection.collection(collection)
```
</issue_to_address>

### Comment 2
<location> `scripts/release.py:62-68` </location>
<code_context>
+    if BUMP_CONFIG.exists():
+        cfg_content = BUMP_CONFIG.read_text()
+        print(f"Updating .bumpversion.cfg to {new_version}")
+        # Simply replacing current_version line if it looks like standard bumpversion logic
+        new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content)
+        BUMP_CONFIG.write_text(new_cfg)
+
</code_context>

<issue_to_address>
**suggestion:** Regex for updating .bumpversion.cfg current_version is fragile for non-pure numeric versions.

This pattern `(current_version\s*=\s*)[\d\.]+` only matches purely numeric dotted versions, so values like `2.14.0-dev` or `2.14.0rc1` would be missed or have their suffix removed. Since `bump_version` already accounts for `rc`/`dev` etc., consider matching any non-whitespace sequence (e.g. `(current_version\s*=\s*)([^\s]+)`) or reusing the parsed `current_version` rather than re-parsing here.

```suggestion
    # Try to update .bumpversion.cfg if it exists
    if BUMP_CONFIG.exists():
        cfg_content = BUMP_CONFIG.read_text()
        print(f"Updating .bumpversion.cfg to {new_version}")
        # Replace current_version value (supports non-pure numeric versions like rc/dev)
        new_cfg = re.sub(r'(current_version\s*=\s*)([^\s]+)', rf'\1{new_version}', cfg_content)
        BUMP_CONFIG.write_text(new_cfg)
```
</issue_to_address>

### Comment 3
<location> `scripts/arangoasync_playground.py:54` </location>
<code_context>
        cursor = await db.aql.execute(aql)
</code_context>

<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.

*Source: opengrep*
</issue_to_address>

### Comment 4
<location> `scripts/release.py:75` </location>
<code_context>
    subprocess.check_call(["git", "add"] + files_to_add)
</code_context>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +699 to +702
if not await self._connection.has_collection(collection):
col = await self.create_collection(collection)
else:
col = self._connection.collection(collection)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Collection creation in write() can mask real errors and perform duplicate work.

Because write treats any exception from has_collection as "collection does not exist" and immediately tries create_collection, transient connection/permission errors will be misclassified and masked, and the create will likely fail differently. This also duplicates the create-on-missing logic you already added. Please narrow the exception handling to true "not found" cases and let other errors propagate so callers see the real failure instead of always falling back to creation.

Suggested change
if not await self._connection.has_collection(collection):
col = await self.create_collection(collection)
else:
col = self._connection.collection(collection)
col = self._connection.collection(collection)

Comment thread scripts/release.py
Comment on lines +62 to +68
# Try to update .bumpversion.cfg if it exists
if BUMP_CONFIG.exists():
cfg_content = BUMP_CONFIG.read_text()
print(f"Updating .bumpversion.cfg to {new_version}")
# Simply replacing current_version line if it looks like standard bumpversion logic
new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content)
BUMP_CONFIG.write_text(new_cfg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Regex for updating .bumpversion.cfg current_version is fragile for non-pure numeric versions.

This pattern (current_version\s*=\s*)[\d\.]+ only matches purely numeric dotted versions, so values like 2.14.0-dev or 2.14.0rc1 would be missed or have their suffix removed. Since bump_version already accounts for rc/dev etc., consider matching any non-whitespace sequence (e.g. (current_version\s*=\s*)([^\s]+)) or reusing the parsed current_version rather than re-parsing here.

Suggested change
# Try to update .bumpversion.cfg if it exists
if BUMP_CONFIG.exists():
cfg_content = BUMP_CONFIG.read_text()
print(f"Updating .bumpversion.cfg to {new_version}")
# Simply replacing current_version line if it looks like standard bumpversion logic
new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content)
BUMP_CONFIG.write_text(new_cfg)
# Try to update .bumpversion.cfg if it exists
if BUMP_CONFIG.exists():
cfg_content = BUMP_CONFIG.read_text()
print(f"Updating .bumpversion.cfg to {new_version}")
# Replace current_version value (supports non-pure numeric versions like rc/dev)
new_cfg = re.sub(r'(current_version\s*=\s*)([^\s]+)', rf'\1{new_version}', cfg_content)
BUMP_CONFIG.write_text(new_cfg)


aql = f"FOR u IN {col_name} RETURN u"
print(f"Running AQL: {aql}")
cursor = await db.aql.execute(aql)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.sqlalchemy.security.sqlalchemy-execute-raw-query): Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.

Source: opengrep

Comment thread scripts/release.py
if BUMP_CONFIG.exists():
files_to_add.append(str(BUMP_CONFIG))

subprocess.check_call(["git", "add"] + files_to_add)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

phenobarbital added a commit that referenced this pull request Mar 20, 2026
migrated arangodb driver to async version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant