Skip to content

Conversation

@bobbai00
Copy link
Contributor

@bobbai00 bobbai00 commented Apr 28, 2025

This PR persists the computing unit information to the Database and uniforms the underlying implementation of single mode(with local computing master as the computing unit) and cluster mode(with kubernetes pod as the computing unit).

For Developers

Please do the following steps to incorporate with new changes:

  1. Apply core/scripts/sql/updates/07.sql to your local postgres instance
  2. Always starts ComputingUnitManagingService when you launch the Texera's backend micro-services.

Database Schema Changes

Modification of computing_unit Table

The following fields have been added:

  • type: Type of computing unit (e.g., "local" or "kubernetes").
  • resource: A json string contains:
    • cpu_limit: CPU resource limit allocated to the computing unit.
    • memory_limit: Memory resource limit allocated to the computing unit.
    • gpu_limit: GPU resource limit allocated to the computing unit.
    • jvm_memory_size: JVM memory size allocated to the computing unit.
    • node_addresses: List of node addresses associated with the computing unit.

Modification of workflow_execution Table

A new column cuid has been added to the existing workflow_execution table to establish a relationship between workflow executions and computing units. This column serves as a foreign key referencing the cuid in the computing_unit table.

Configuration File Changes

computing-unit.conf has been added

A variable computing-unit.local.enabled is added. The default value is true. In cluster mode, this should be set to false.

kubernetes.conf has been changed

A variable kubernetes.enabled is added. The default value is false. In cluster mode, this should be set to true.

User Experience

1. Cluster Mode

  • Run & interact with the workflow on different computing units at the same time
    2025-05-12 21 47 14

  • Check which computing unit is used in the execution via the execution history table
    2025-05-12 21 49 21

  • If a workflow is running in some computing units, the workflow CANNOT be modified until there are no RUNNING executions
    2025-05-12 21 51 56

2. Single Node Mode (Local Development)

Single node Mode needs a little extra step of connecting to a computing unit: in the pop-up window, users can input an address that points to the computing unit master service (the default value is http://localhost:8085)
2025-05-12 21 41 15

Users can also disconnect from the local computing unit by clicking on the delete button in the drop-down item:
2025-05-12 21 41 32

Related Changes

1. Helm Chart

  • KUBERNETES_COMPUTING_UNIT_ENABLED and COMPUTING_UNIT_LOCAL_ENABLED have been added
  • request to the endpoint /api/executions/{wid}/stats/{eid} will be routed to envoy

2. Execution Lifecycle

  • The latest execution of a workflow is now identified by BOTH wid and cuid, i.e. the latest execution of a workflow on a computing unit.
  • Every time a new execution starts, the previous execution on the same computing unit will be cleared. The latest executions on other computing units will NOT be cleared.

3. Execution & Result Lifecycle Binding

  • Currently, the lifecycle of the execution results and computing units are bundled.
  • As a consequence, querying the result, including port result tables, stats, and console message, need to carry the cuid

@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch 2 times, most recently from 52daba5 to 0f3d50d Compare May 7, 2025 23:46
@bobbai00 bobbai00 marked this pull request as ready for review May 12, 2025 06:17
@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch from 207e2a0 to ba7986c Compare May 12, 2025 06:23
@bobbai00 bobbai00 self-assigned this May 12, 2025
@bobbai00 bobbai00 requested a review from shengquan-ni May 13, 2025 01:32
@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch 5 times, most recently from b44761d to 95022be Compare May 14, 2025 16:02
@bobbai00 bobbai00 changed the title Persist the relationship between computing unit and execution in DB Persist the spec of the computing_unit entity and its relationship with workflow_executions into DB May 15, 2025
@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch from d71158c to af58603 Compare May 20, 2025 00:29
@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch from af58603 to ff09926 Compare May 26, 2025 18:01
@bobbai00 bobbai00 force-pushed the jiadong-add-cuid-to-execution branch from bd050f3 to 0a9ff4d Compare May 27, 2025 04:04
@bobbai00 bobbai00 merged commit a910bcf into master May 27, 2025
9 checks passed
@bobbai00 bobbai00 deleted the jiadong-add-cuid-to-execution branch May 27, 2025 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants