Skip to content

Conversation

YoungHypo
Copy link

@YoungHypo YoungHypo commented Jul 22, 2025

Description

This PR introduces VIDEX as a new storage engine plugin for MariaDB. VIDEX is a Disaggregated and Extensible Virtual Index Engine designed to perform efficient and accurate what-if analysis for tasks like index recommendation.

The VIDEX architecture is composed of two core parts:

  1. VIDEX-Optimizer: Implemented as a MariaDB storage engine, it hooks into the query optimizer and simulates the behavior and cost models of other engines (in this PR, InnoDB). It allows developers and DBAs to evaluate the impact of potential indexes on query plans without the overhead of building them on actual data.
  2. VIDEX-Statistic-Server: This is a decoupled service that handles complex statistical computations. The VIDEX-Optimizer forwards requests for cardinality and Number of Distinct Values (NDV) estimation to this server via an HTTP protocol. This design allows users to plug in their own estimation models—from simple heuristics on sampled data to sophisticated AI-powered algorithms—and deploy them independently.

The statistic server can be implemented in any language or framework. A reference implementation using Python/Flask has already been merged in bytedance/videx#47.

As discussed, this PR contains the implementation for the VIDEX-Optimizer. The corresponding VIDEX-Statistic-Server (developed mainly in Python) will be submitted in a follow-up PR.

As tested on the TPC-H benchmark, VIDEX is capable of producing query plans that are 100% identical to those from MariaDB's native InnoDB engine. The detailed results can be found in the description of bytedance/videx#47.

Features

  • Statistics Service Integration: Communicates with VIDEX statistics server through HTTP
  • Query Optimizer Support: Implements key interfaces such as records_in_range, info_low, etc.
  • InnoDB Compatibility: Simulates InnoDB's cost model and cardinality estimation behavior
  • Pluggable Architecture: Supports both dynamic loading and static linking

File Structure

storage/videx/
├── ha_videx.cc                              # Main storage engine implementation file
├── videx_utils.cc                            # Utility function implementation file
├── videx_utils.h                             # Utility function header file
├── CMakeLists.txt                            # CMake build configuration file
└── mysql-test/                               # Test suite directory
    └── videx/
        ├── suite.opt                         # Test suite configuration options
        ├── include/                          # Test include files directory
        │   └── have_videx.inc                # VIDEX engine availability check
        ├── create-table-and-index.test       # Table creation and index test
        ├── create-table-and-index.result     # Table creation and index test expected results
        ├── set-debug-skip-http.test          # Debug variable setting test
        └── set-debug-skip-http.result        # Debug variable setting test expected results

Configuration Variables

  • debug_skip_http: skip HTTP requests for debugging
  • server_ip: VIDEX server address
  • options: connection options in JSON format
SET SESSION debug_skip_http = TRUE;
SET SESSION server_ip = 'your_own_ip';
SET SESSION options = '{"timeout": 30, "retry": 3}';

How to start VIDEX-Server

see videx/PR-47: Implemented VIDEX-Server on MariaDB

mysql-test

Test Case 1: create-table-and-index.test

Validates basic VIDEX engine functionality:

  • Table creation and deletion
  • Index creation and management
  • Primary key and foreign key support

Test Case 2: set-debug-skip-http.test

Validates debugging functionality:

  • DEBUG_SKIP_HTTP variable setting
  • HTTP skip logic
  • EXPLAIN query execution

Build Configuration

Dependencies

  • libcurl: HTTP client library
  • zlib: Compression support

CMake Options

  • PLUGIN_VIDEX=YES: Enable VIDEX plugin (default)
  • PLUGIN_VIDEX=STATIC: Static linking
  • PLUGIN_VIDEX=DYNAMIC: Dynamic loading

Compilation Command Example

cmake -DPLUGIN_VIDEX=YES \
      -DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
      -G Ninja --fresh \
      -S /path/to/mariadb \
      -B /path/to/build

Contributor Information

  • Authors: Haibo Yang, Rong Kang

Future Plans

  • Add trace recording in HTTP requests for tracking
  • Complete column_bitmaps_signal to support indexed virtual columns for VIDEX engine
  • Improve server-side cardinality precision by introducing AI model

Summary

The VIDEX storage engine provides MariaDB with a flexible and extensible statistics information management solution. Through external statistics services, it achieves high compatibility with InnoDB while maintaining architectural flexibility and maintainability. This plugin is particularly suitable for scenarios requiring rapid iteration of statistics strategies or deployment of distributed statistics services.

Release Notes

Added a VIDEX engine in storage/videx to support what-if analysis for index strategies, integrates AI-based cardinality and NDV (number of distinct values) estimation algorithms

How can this PR be tested?

TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".

If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the 11.8 tag.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@svoj svoj added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jul 23, 2025
@svoj svoj marked this pull request as draft July 23, 2025 05:24
Copy link
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick review, mostly compilation issues outlined.

To keep things organised in one place, we should do most of this effort under this pull request. If you face some issues and you feel a need to create new pull request, let us know. We could probably help you to edit this one.

@YoungHypo YoungHypo marked this pull request as ready for review August 25, 2025 22:19
@YoungHypo YoungHypo requested a review from svoj August 25, 2025 22:32
@YoungHypo YoungHypo changed the title [WIP] MDEV-36737: Research and Estimation for Adapting VIDEX to MariaDB MDEV-36737: Research and Estimation for Adapting VIDEX to MariaDB Aug 27, 2025
@YoungHypo
Copy link
Author

YoungHypo commented Aug 28, 2025

Hi @svoj @gl-sergei @kr11,

This PR implemented the Videx storage engine with the Optimizer part. For videx server usage, you can refer to bytedance/videx#47. The server will be covered in a future PR.

One of the CI tests is currently failing — could you please help check it, or try re-running the workflow?

Thanks a lot!

Copy link
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Would it be possible to squash all commits in one? Some comments inline.

@svoj
Copy link
Contributor

svoj commented Aug 29, 2025

@YoungHypo regarding failing test: please disregard it. It is unrelated to this PR, we will sort it out when we're ready to merge.

@YoungHypo YoungHypo force-pushed the videx-integration branch 3 times, most recently from f8136b7 to c36a948 Compare August 31, 2025 01:29
@YoungHypo
Copy link
Author

Thanks for your feedback @svoj. We’ve completed the changes: all non-Videx code has been removed, and the core files containes only two parts, ha_videx.cc and videx_utils. The mysql-test directory has also been simplified as you suggested, and the commits have been squashed. Please let me know if further adjustments are needed.

storage/videx/
├── ha_videx.cc                              # Main storage engine implementation file
├── videx_utils.cc                            # Utility function implementation file
├── videx_utils.h                             # Utility function header file
├── CMakeLists.txt                            # CMake build configuration file
└── mysql-test/                               # Test suite directory
    └── videx/
        ├── suite.opt                         # Test suite configuration options
        ├── include/                          # Test include files directory
        │   └── have_videx.inc                # VIDEX engine availability check
        ├── create-table-and-index.test       # Table creation and index test
        ├── create-table-and-index.result     # Table creation and index test expected results
        ├── set-debug-skip-http.test          # Debug variable setting test
        └── set-debug-skip-http.result        # Debug variable setting test expected results

Copy link
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stabilise test suite.

With cmake -DPLUGIN_VIDEX=NO I get expected result:

videx.create-table-and-index             [ skipped ]  Need VIDEX engine
videx.set-debug-skip-http                [ skipped ]  Need VIDEX engine

With cmake -DPLUGIN_VIDEX=STATIC I get unexpected results:

videx.create-table-and-index             [ fail ]  Found warnings/errors in server log file!
        Test ended at 2025-08-31 19:05:54
line
2025-08-31 19:05:53 4 [Warning] VIDEX: access videx_server failed res_code != curle_ok: 127.0.0.1:5001
2025-08-31 19:05:53 4 [Warning] VIDEX: access videx_server failed res_code != curle_ok: 127.0.0.1:5001
2025-08-31 19:05:53 4 [Warning] VIDEX: access videx_server failed res_code != curle_ok: 127.0.0.1:5001
2025-08-31 19:05:54 4 [Warning] VIDEX: access videx_server failed res_code != curle_ok: 127.0.0.1:5001
^ Found warnings in /dev/shm/build/videx-static/mysql-test/var/log/mysqld.1.err
ok

 - saving '/dev/shm/build/videx-static/mysql-test/var/log/videx.create-table-and-index/' to '/dev/shm/build/videx-static/mysql-test/var/log/videx.create-table-and-index/'
videx.set-debug-skip-http                [ fail ]
        Test ended at 2025-08-31 19:05:54

CURRENT_TEST: videx.set-debug-skip-http
mysqltest: At line 16: query 'SET SESSION videx_debug_skip_http = 'True'' failed: ER_WRONG_VALUE_FOR_VAR (1231): Variable 'videx_debug_skip_http' can't be set to the value of 'True'

The result from queries just before the failure was:
CREATE TABLE `part` (
`P_PARTKEY` int NOT NULL,
`P_NAME` varchar(55) NOT NULL,
`P_MFGR` char(25) NOT NULL,
`P_BRAND` char(10) NOT NULL,
`P_TYPE` varchar(25) NOT NULL,
`P_SIZE` int NOT NULL,
`P_CONTAINER` char(10) NOT NULL,
`P_RETAILPRICE` decimal(15,2) NOT NULL,
`P_COMMENT` varchar(23) NOT NULL,
PRIMARY KEY (`P_PARTKEY`)
) ENGINE=VIDEX;
SET SESSION videx_debug_skip_http = 'True';

With cmake -DPLUGIN_VIDEX=DYNAMIC I get unexpected results:

videx.create-table-and-index             [ skipped ]  Need VIDEX engine
videx.set-debug-skip-http                [ skipped ]  Need VIDEX engine

We should load ha_videx.so in this case. There should be suite.pm:

package My::Suite::Videx;

@ISA = qw(My::Suite);

return "No VIDEX" unless $ENV{HA_VIDEX_SO} or
                           $::mysqld_variables{'videx'} eq "ON";

return "Not run for embedded server" if $::opt_embedded_server;

sub is_default { 1 }

bless { };

suite.opt should probably have:

--plugin-load-add=$HA_VIDEX_SO

If tests require videx server running, it should be checked by either have_videx.inc or suite.pm. We should probably even start/stop videx server for particular tests, but then we still have to check for videx server existence.

Anyway, it'd be great to make suite either "skipped" or "passed" in all of the above cases, that is -DPLUGIN_VIDEX=NO|STATIC|DYNAMIC.

@YoungHypo
Copy link
Author

YoungHypo commented Sep 2, 2025

Thanks for your feedback @svoj ! I've removed some unnecessary code from ha_videx.cc and updated suite.opt and suite.pm to support dynamic builds. I also updated the tests and results to align with the current Videx code.

In my local testing, cmake -DPLUGIN_VIDEX=NO resulted in skipped, while setting DPLUGIN_VIDEX to YES/DYNAMIC/STATIC all passed.

Currently, the tests set skip_http to True, so they do not depend on the Videx server. In the next PR, we plan to discuss how to introduce a Python-based server implementation. Look forward to your further review and guidance. Thanks again!

BTW, once it’s ready to be merged, I’ll squash all commits into one again.

Copy link
Contributor

@svoj svoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor issues outlined. Otherwise I'm happy to recommend it for merging.

@YoungHypo
Copy link
Author

YoungHypo commented Sep 2, 2025

Thanks @svoj. I’ve removed videx_debug_skip_http and now use videx_server_ip to control server enable/disable. I’ve also updated the mysql-test. If it’s merge-ready, I’ll squash the commits.

@YoungHypo YoungHypo requested a review from svoj September 2, 2025 18:58
@svoj
Copy link
Contributor

svoj commented Sep 2, 2025

@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.

@svoj
Copy link
Contributor

svoj commented Sep 2, 2025

@YoungHypo it'd also be good to reword main commit description so that it says something like:

MDEV-36737: Research and Estimation for Adapting VIDEX to MariaDB

VIDEX is a Disaggregated and Extensible Virtual Index Engine designed
to perform efficient and accurate what-if analysis for tasks like
index recommendation.

@YoungHypo
Copy link
Author

YoungHypo commented Sep 2, 2025

I’ve squashed the commits and updated the description. Thanks again @svoj for your guidance and feedback — it’s truly great to have your help.

VIDEX is a Disaggregated and Extensible Virtual Index Engine designed
to perform efficient and accurate what-if analysis for tasks such as
index recommendation.
@YoungHypo
Copy link
Author

YoungHypo commented Sep 2, 2025

@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.

@svoj May I ask what the next steps in the review process will be? We’re planning to share our current progress on Jira and Zulip — do you think that’s a good idea? Really looking forward to receiving feedback from other developers.

@svoj
Copy link
Contributor

svoj commented Sep 2, 2025

@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.

@svoj May I ask what the next steps in the review process will be? We’re planning to share our current progress on Jira and Zulip — do you think that’s a good idea? Really looking forward to receiving feedback from other developers.

I will ask some other developers for feedback here. Feel free to share our current progress via jira/zulip.

@svoj
Copy link
Contributor

svoj commented Sep 2, 2025

@vuvova, @spetrunia I believe initial VIDEX version is in a good enough shape. I aim to get it merged to 11.8, disabled by default, plugin marked as experimental. Do you have any suggestions/objections? Will you want to review this PR too?

@YoungHypo
Copy link
Author

Hi @svoj , just following up on the status of this PR.
I was wondering if there’s been any further feedback from other developers.
I’m happy to help with any changes if needed — looking forward to the next steps!

@svoj
Copy link
Contributor

svoj commented Sep 11, 2025

Hi @YoungHypo. No feedback yet. I will be trying to get things rolling. In the meantime, unless absolutely necessary, it'd be good to keep this PR intact, no need to perform merges. So that we can anchor to certain revision. We can update the tree when we're ready to merge. It'd be good to revert recent merge to rev 2f8993d, as it was at the time I approved it.

@YoungHypo
Copy link
Author

Hi @YoungHypo. No feedback yet. I will be trying to get things rolling. In the meantime, unless absolutely necessary, it'd be good to keep this PR intact, no need to perform merges. So that we can anchor to certain revision. We can update the tree when we're ready to merge. It'd be good to revert recent merge to rev 2f8993d, as it was at the time I approved it.

Thanks @svoj! I've reset the branch to commit 2f8993d as suggested

@kr11
Copy link

kr11 commented Sep 15, 2025

Hi @svoj , the branch has been reset as you suggested. It looks like the CI workflow is now awaiting approval (1 workflow awaiting approval). Could you please approve the workflow to get it running?

@svoj
Copy link
Contributor

svoj commented Sep 15, 2025

@kr11 done, though it was just Windows on ARM, rather minor builder.

@vuvova
Copy link
Member

vuvova commented Sep 26, 2025

@YoungHypo, just wanted to say, that we're still testing VIDEX — a couple of developers have it installed and run various queries. Unfortunately, it's not very visible in the PR, but we are working on it

@YoungHypo
Copy link
Author

YoungHypo commented Sep 27, 2025

@vuvova @svoj @kr11
Thank you very much for the update, and really happy to hear that everything is moving forward.

VIDEX - PR 47 already includes the installation and execution steps for VIDEX in MariaDB and its dependency (Statistic Server), as well as the TPC-H benchmark results. If anything in the PR description is unclear, we can continue the discussion either here in this PR or in the Zulip channel. Please feel free to let me know if there’s anything I can assist with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants