migrated arangodb driver to async version#1483
Conversation
Reviewer's GuideMigrates the ArangoDB driver and its tests to the async arangoasync client, updating all Arango interactions (connections, collections, graphs, queries, writes) to be fully async-aware, while also adding some Mongo helper methods, a new asyncpg example, a release helper script, and bumping the project version. Sequence diagram for async ArangoDB connection and query flowsequenceDiagram
actor User
participant ArangoDBDriver
participant ArangoClient
participant SystemDatabase as SystemDB__system
participant TargetDatabase as TargetDB_userdb
User->>ArangoDBDriver: connection(database)
activate ArangoDBDriver
ArangoDBDriver->>ArangoClient: ArangoClient(hosts=url)
activate ArangoClient
ArangoDBDriver->>ArangoDBDriver: create Auth(username, password)
ArangoDBDriver->>ArangoClient: db("_system", auth)
ArangoClient-->>ArangoDBDriver: SystemDatabase (await)
deactivate ArangoClient
ArangoDBDriver->>SystemDatabase: has_database(database_name) (await)
alt database_missing
SystemDatabase-->>ArangoDBDriver: False
ArangoDBDriver->>SystemDatabase: create_database(database_name) (await)
SystemDatabase-->>ArangoDBDriver: True
else database_exists
SystemDatabase-->>ArangoDBDriver: True
end
ArangoDBDriver->>ArangoClient: db(database_name, auth) (await)
activate ArangoClient
ArangoClient-->>ArangoDBDriver: TargetDatabase
deactivate ArangoClient
ArangoDBDriver->>ArangoDBDriver: set _connection, _connected
deactivate ArangoDBDriver
User->>ArangoDBDriver: query(sentence, bind_vars)
activate ArangoDBDriver
ArangoDBDriver->>TargetDatabase: aql.execute(sentence, bind_vars) (await)
activate TargetDatabase
TargetDatabase-->>ArangoDBDriver: async_cursor
deactivate TargetDatabase
loop async_iteration
ArangoDBDriver->>ArangoDBDriver: collect [doc async for doc in cursor]
end
ArangoDBDriver-->>User: result list
deactivate ArangoDBDriver
User->>ArangoDBDriver: close()
activate ArangoDBDriver
ArangoDBDriver->>ArangoClient: close() (await)
ArangoClient-->>ArangoDBDriver: closed
deactivate ArangoDBDriver
Class diagram for updated async ArangoDB and Mongo driversclassDiagram
class ArangoDBDriver {
- ArangoClient _client
- Database _connection
- Auth _auth
- str _database_name
- str _auth_method
- str _username
- str _password
+ connection(database: str) async
+ close() async
+ use(database: str) async
+ create_database(database: str) async
+ drop_database(database: str) async
+ create_collection(name: str, edge: bool, **kwargs) async
+ drop_collection(name: str) async
+ collection_exists(name: str) bool async
+ create_graph(name: str, edge_definitions: list, orphan_collections: list) async
+ drop_graph(name: str, drop_collections: bool) async
+ graph_exists(name: str) bool async
+ query(sentence: str, bind_vars: dict, **kwargs) async
+ queryrow(sentence: str, bind_vars: dict) async
+ fetch_all(sentence: str, bind_vars: dict) async
+ fetch_one(sentence: str, bind_vars: dict) async
+ fetchval(sentence: str, bind_vars: dict, column: any) async
+ execute(sentence: str, bind_vars: dict) async
+ insert_document(collection: str, document: dict, return_new: bool) async
+ update_document(collection: str, document: dict, return_new: bool) async
+ delete_document(collection: str, document_key: str) async
+ write(collection: str, data: any, batch_size: int) async
+ create_vertex(graph: str, collection: str, vertex: dict) async
+ create_edge(graph: str, collection: str, edge: dict) async
}
class ArangoClient {
+ db(name: str, auth: Auth) Database async
+ close() async
}
class Database {
+ name str
+ has_database(name: str) bool async
+ create_database(name: str) async
+ delete_database(name: str) async
+ has_collection(name: str) bool async
+ create_collection(name: str, **kwargs) async
+ delete_collection(name: str) async
+ collection(name: str) Collection
+ has_graph(name: str) bool async
+ create_graph(name: str, edge_definitions: list, orphan_collections: list) Graph async
+ delete_graph(name: str, drop_collections: bool) async
+ aql AQLInterface
}
class Auth {
+ username str
+ password str
}
class Collection {
+ insert(document: dict, return_new: bool) async
+ insert_many(documents: list) async
+ update(document: dict, return_new: bool) async
+ delete(document_key: str) async
}
class Graph {
+ vertex_collection(name: str) VertexCollection
+ edge_collection(name: str) EdgeCollection
}
class VertexCollection {
+ insert(vertex: dict) async
}
class EdgeCollection {
+ insert(edge: dict) async
}
class AQLInterface {
+ execute(query: str, bind_vars: dict) Cursor async
}
class Cursor {
+ async iteration
}
class MongoDriver {
+ create_index(collection_name: str, keys: any, **kwargs) str async
+ create_indexes(collection_name: str, indexes: list, **kwargs) list~str~ async
+ list_collections(filter: dict, **kwargs) list~str~ async
+ insert(collection_name: str, data: any, **kwargs) async
+ update(collection_name: str, filter: dict, update: dict, many: bool, **kwargs) async
+ batch_insert(collection_name: str, data: list, **kwargs) async
- _select_database() async
}
class MongoDatabase {
+ list_collection_names(filter: dict) list~str~ async
+ __getitem__(name: str) MongoCollection
}
class MongoCollection {
+ create_index(keys: any, **kwargs) str async
+ create_indexes(indexes: list, **kwargs) list~str~ async
+ insert_one(document: dict, **kwargs) async
+ insert_many(documents: list, **kwargs) async
+ update_one(filter: dict, update: dict, **kwargs) async
+ update_many(filter: dict, update: dict, **kwargs) async
}
ArangoDBDriver --> ArangoClient : uses
ArangoDBDriver --> Database : holds_connection
ArangoDBDriver --> Auth : uses
Database --> Collection : exposes
Database --> Graph : exposes
Database --> AQLInterface : exposes
AQLInterface --> Cursor : returns
Graph --> VertexCollection : exposes
Graph --> EdgeCollection : exposes
MongoDriver --> MongoDatabase : uses_via__select_database
MongoDatabase --> MongoCollection : exposes
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 security issues, 2 other issues, and left some high level feedback:
Security issues:
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In the ArangoDB
connectionmethod you now unconditionally build anAuthobject from username/password and ignore the previousjwtpath andjwt_tokenparameter, which looks like a regression in JWT support; consider preserving distinct handling for JWT-based auth or clearly deprecating it. - The new
writeimplementation treats any exception fromself._connection.collection(collection)as a signal that the collection does not exist and then creates it, which can mask real connection or permission errors; it would be safer to distinguishCollectionNotFoundfrom other exceptions before falling back tocreate_collection.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the ArangoDB `connection` method you now unconditionally build an `Auth` object from username/password and ignore the previous `jwt` path and `jwt_token` parameter, which looks like a regression in JWT support; consider preserving distinct handling for JWT-based auth or clearly deprecating it.
- The new `write` implementation treats any exception from `self._connection.collection(collection)` as a signal that the collection does not exist and then creates it, which can mask real connection or permission errors; it would be safer to distinguish `CollectionNotFound` from other exceptions before falling back to `create_collection`.
## Individual Comments
### Comment 1
<location> `asyncdb/drivers/arangodb.py:699-702` </location>
<code_context>
"""
try:
- col = self._connection.collection(collection)
+ if not await self._connection.has_collection(collection):
+ col = await self.create_collection(collection)
+ else:
+ col = self._connection.collection(collection)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Collection creation in write() can mask real errors and perform duplicate work.
Because `write` treats any exception from `has_collection` as "collection does not exist" and immediately tries `create_collection`, transient connection/permission errors will be misclassified and masked, and the create will likely fail differently. This also duplicates the create-on-missing logic you already added. Please narrow the exception handling to true "not found" cases and let other errors propagate so callers see the real failure instead of always falling back to creation.
```suggestion
col = self._connection.collection(collection)
```
</issue_to_address>
### Comment 2
<location> `scripts/release.py:62-68` </location>
<code_context>
+ if BUMP_CONFIG.exists():
+ cfg_content = BUMP_CONFIG.read_text()
+ print(f"Updating .bumpversion.cfg to {new_version}")
+ # Simply replacing current_version line if it looks like standard bumpversion logic
+ new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content)
+ BUMP_CONFIG.write_text(new_cfg)
+
</code_context>
<issue_to_address>
**suggestion:** Regex for updating .bumpversion.cfg current_version is fragile for non-pure numeric versions.
This pattern `(current_version\s*=\s*)[\d\.]+` only matches purely numeric dotted versions, so values like `2.14.0-dev` or `2.14.0rc1` would be missed or have their suffix removed. Since `bump_version` already accounts for `rc`/`dev` etc., consider matching any non-whitespace sequence (e.g. `(current_version\s*=\s*)([^\s]+)`) or reusing the parsed `current_version` rather than re-parsing here.
```suggestion
# Try to update .bumpversion.cfg if it exists
if BUMP_CONFIG.exists():
cfg_content = BUMP_CONFIG.read_text()
print(f"Updating .bumpversion.cfg to {new_version}")
# Replace current_version value (supports non-pure numeric versions like rc/dev)
new_cfg = re.sub(r'(current_version\s*=\s*)([^\s]+)', rf'\1{new_version}', cfg_content)
BUMP_CONFIG.write_text(new_cfg)
```
</issue_to_address>
### Comment 3
<location> `scripts/arangoasync_playground.py:54` </location>
<code_context>
cursor = await db.aql.execute(aql)
</code_context>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Comment 4
<location> `scripts/release.py:75` </location>
<code_context>
subprocess.check_call(["git", "add"] + files_to_add)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if not await self._connection.has_collection(collection): | ||
| col = await self.create_collection(collection) | ||
| else: | ||
| col = self._connection.collection(collection) |
There was a problem hiding this comment.
suggestion (bug_risk): Collection creation in write() can mask real errors and perform duplicate work.
Because write treats any exception from has_collection as "collection does not exist" and immediately tries create_collection, transient connection/permission errors will be misclassified and masked, and the create will likely fail differently. This also duplicates the create-on-missing logic you already added. Please narrow the exception handling to true "not found" cases and let other errors propagate so callers see the real failure instead of always falling back to creation.
| if not await self._connection.has_collection(collection): | |
| col = await self.create_collection(collection) | |
| else: | |
| col = self._connection.collection(collection) | |
| col = self._connection.collection(collection) |
| # Try to update .bumpversion.cfg if it exists | ||
| if BUMP_CONFIG.exists(): | ||
| cfg_content = BUMP_CONFIG.read_text() | ||
| print(f"Updating .bumpversion.cfg to {new_version}") | ||
| # Simply replacing current_version line if it looks like standard bumpversion logic | ||
| new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content) | ||
| BUMP_CONFIG.write_text(new_cfg) |
There was a problem hiding this comment.
suggestion: Regex for updating .bumpversion.cfg current_version is fragile for non-pure numeric versions.
This pattern (current_version\s*=\s*)[\d\.]+ only matches purely numeric dotted versions, so values like 2.14.0-dev or 2.14.0rc1 would be missed or have their suffix removed. Since bump_version already accounts for rc/dev etc., consider matching any non-whitespace sequence (e.g. (current_version\s*=\s*)([^\s]+)) or reusing the parsed current_version rather than re-parsing here.
| # Try to update .bumpversion.cfg if it exists | |
| if BUMP_CONFIG.exists(): | |
| cfg_content = BUMP_CONFIG.read_text() | |
| print(f"Updating .bumpversion.cfg to {new_version}") | |
| # Simply replacing current_version line if it looks like standard bumpversion logic | |
| new_cfg = re.sub(r'(current_version\s*=\s*)[\d\.]+', f'\g<1>{new_version}', cfg_content) | |
| BUMP_CONFIG.write_text(new_cfg) | |
| # Try to update .bumpversion.cfg if it exists | |
| if BUMP_CONFIG.exists(): | |
| cfg_content = BUMP_CONFIG.read_text() | |
| print(f"Updating .bumpversion.cfg to {new_version}") | |
| # Replace current_version value (supports non-pure numeric versions like rc/dev) | |
| new_cfg = re.sub(r'(current_version\s*=\s*)([^\s]+)', rf'\1{new_version}', cfg_content) | |
| BUMP_CONFIG.write_text(new_cfg) |
|
|
||
| aql = f"FOR u IN {col_name} RETURN u" | ||
| print(f"Running AQL: {aql}") | ||
| cursor = await db.aql.execute(aql) |
There was a problem hiding this comment.
security (python.sqlalchemy.security.sqlalchemy-execute-raw-query): Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
Source: opengrep
| if BUMP_CONFIG.exists(): | ||
| files_to_add.append(str(BUMP_CONFIG)) | ||
|
|
||
| subprocess.check_call(["git", "add"] + files_to_add) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'check_call' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
migrated arangodb driver to async version
Summary by Sourcery
Migrate the ArangoDB driver and its tests to the async arangoasync client, extend MongoDB driver utilities, and add release and example scripts while bumping the library version to 2.14.0.
New Features:
Enhancements:
Tests: