Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage_e2e_single_threaded test in bazel #23875

Closed
wants to merge 69 commits into from

Conversation

travisdowns
Copy link
Member

Adds the storage_e2e_single_threaded test to the bazel build, including prerequisite work, mostly adding libraries.

Fixes CORE-7649.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

redpanda_test_cc_library(
name = "async",
hdrs = [
"async.h",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also exposed in :gtest.

Can we remove it from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug on my part, I didn't see that seastar_boost already existed, which was the correct dep for the e2e test, changed in next push.

@vbotbuildovich
Copy link
Collaborator

mmaslankaprv and others added 18 commits October 22, 2024 10:40
Previously the recursive type error message was very short and not
descriptive. Made the message more rich in the context so that it is
easier to understand where the problem is.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
Pulls some utilities out of a test to make it easier to reuse building
state updates.
For use by the upcoming datalake coordinator STM to track files that
need to be committed to Iceberg.
Adds topics_state to the STM, and updates the apply method to
appropriately update it.
In some upcoming changes, we'll expect the STM to be an implementation
detail, and for callers to go through a separate coordinator
abstraction. To that end, this moves most STM methods out of the public
interface and exposes some internals for use in the upcoming
coordinator.
Adds some initial structure for the coordinator. This commit just adds
wrapper around a couple of calls to the underlying STM, so that it can
be plugged into the frontend.

Later commits will have the coordinator reconcile the STM with the
Iceberg catalog, perform local/Raft snapshots, etc.
It is being hidden because we in subsequent commits we are going to
dynamically change this list and want to avoid unexpected consumers of
the list in the future from not taking into account the changes.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
As a first step to adding the iceberg rest client, an oauth token
struct, static credentials and a token parser and associated schema and
structs are added.

The token is returned in exchange for credentials.
The http client now extends a pure virtual class. This virtual
class/interface exposes a single method which fully reads a response
before returning it as part of an aggregate struct. This makes testing
and mocking the http client easier specifically for requests with short
responses. Intended to be used with the iceberg rest-client primarily.
A retry policy contains logic for deciding whether an http call should
and can be retried.

It accepts a resolved future returning a collected response (a collected
response is one whose status has been read).

If the future failed, then the exception is used to decide if the failed
attempt can be retried.

If the future is available but the status code is not of the 2xx
variety, another decision is made if the call can be retried.

Finally if the future is successful and the call is also deemed
successful, no retries are necessary.

This is largely modelled after s3/abs retries but the intent is to make
the logic isolated to make testing easier, so that a retry policy can be
passed in to another section of code.
The catalog client exposes a high level set of operations over the
catalog API. In this changeset a single method is added which acquires
tokens, and a set of helper methods which are expected to be used with
all API calls.

The pattern to follow with all methods of the catalog client should be
similar:

* ensure token exists.
* serialize arguments to json (if payload is required)
* perform call to rest endpoints with optional payload
* deserialize response to domain object

The methods should use expected<> as return type to support chaining
together calls and returning early error in case of failure.
dotnwat and others added 20 commits October 22, 2024 10:40
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Found via clang-tidy warnings.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/10580/execroot/_main/src/v/cluster/cluster_uuid.h:20:23: error: unused variable 'cluster_uuid_key' [clang-diagnostic-unused-const-variable]
   20 | constexpr const char* cluster_uuid_key = "cluster_uuid";
      |                       ^~~~~~~~~~~~~~~~
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/10580/execroot/_main/src/v/cluster/cluster_uuid.h:21:39: error: unused variable 'cluster_uuid_key_space' [clang-diagnostic-unused-const-variable]
   21 | constexpr storage::kvstore::key_space cluster_uuid_key_space

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/10979/execroot/_main/src/v/cluster/commands.h:96:25: error: unused variable 'create_non_replicable_topic_cmd_type' [clang-diagnostic-unused-const-variable]
   96 | static constexpr int8_t create_non_replicable_topic_cmd_type = 6;

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/11273/execroot/_main/src/v/cluster/controller_stm.h:102:31: error: unused variable 'controller_stm_shard' [clang-diagnostic-unused-const-variable]
  102 | static constexpr ss::shard_id controller_stm_shard = 0;

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/11561/execroot/_main/src/v/cluster/health_monitor_types.h:36:31: error: unused variable 'health_monitor_backend_shard' [clang-diagnostic-unused-const-variable]
   36 | static constexpr ss::shard_id health_monitor_backend_shard = 0;

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/12282/execroot/_main/src/v/cluster/prefix_truncate_record.h:22:26: error: unused variable 'prefix_truncate_key' [clang-diagnostic-unused-const-variable]
   22 | static constexpr uint8_t prefix_truncate_key = 0;

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/13997/execroot/_main/src/v/cluster/rm_stm_types.h:18:31: error: unused variable 'abort_snapshot_version' [clang-diagnostic-unused-const-variable]
   18 | static constexpr const int8_t abort_snapshot_version = 0;
      |                               ^~~~~~~~~~~~~~~~~~~~~~
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/13997/execroot/_main/src/v/cluster/rm_stm_types.h:19:25: error: unused variable 'prepare_control_record_version' [clang-diagnostic-unused-const-variable]
   19 | static constexpr int8_t prepare_control_record_version{0};
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/13997/execroot/_main/src/v/cluster/rm_stm_types.h:20:25: error: unused variable 'fence_control_record_v0_version' [clang-diagnostic-unused-const-variable]
   20 | static constexpr int8_t fence_control_record_v0_version{0};
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/13997/execroot/_main/src/v/cluster/rm_stm_types.h:21:25: error: unused variable 'fence_control_record_v1_version' [clang-diagnostic-unused-const-variable]
   21 | static constexpr int8_t fence_control_record_v1_version{1};
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/13997/execroot/_main/src/v/cluster/rm_stm_types.h:22:25: error: unused variable 'fence_control_record_version' [clang-diagnostic-unused-const-variable]
   22 | static constexpr int8_t fence_control_record_version{2};
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/14429/execroot/_main/src/v/cluster/raft0_utils.h:24:34: error: unused function 'create_raft0' [clang-diagnostic-unused-function]
   24 | static ss::future<consensus_ptr> create_raft0(

/home/nwatkins/.cache/bazel/_bazel_nwatkins/31f1aa7012681f681e8d49a1818a8564/sandbox/linux-sandbox/15018/execroot/_main/src/v/cluster/scheduling/constraints.h:25:35: error: unused variable 'rack_label' [clang-diagnostic-unused-const-variable]
   25 | static constexpr std::string_view rack_label = "rack";

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Adds an class to be used by the coordinator to encapsulate committing
files to the Iceberg catalog. It takes as inputs the STM state for a
given topic, and returns the resulting STM updates that capture the
catalog updates it has performed.

The separate abstraction isn't strictly required, but it does make it
easier to test, and helps avoid making the coordinator control loop very
bloated.
In order to disambiguate between topics with the same name during a
mount process, add the `initial_revision_id` to the `topic_mount_manifest`
path.
If the mount manifest is found in cloud storage during the unmount
process, it implies that a topic with the same name and `initial_revision_id`
has been previously unmounted.

Add a warning log for this case for now.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Added by redpanda-data#23302

Signed-off-by: Michael Boquard <michael@redpanda.com>
A managed notification currently happens before the partition boostraps.
A bug surfaced in datalake/coordinator_mgr where it was trying to access
stm state as a part of notification which resulted in a segfault because
the state machines haven't been initialized by the time notification
triggered.

This fix waits until the partition boostraps to trigger a managed
notification so the waiters have access to stm state. Judging by the
current users of this notification, it doesn't seem the timing of
notification (whether it happens before the boostrap) is important.
Add bazel test libraries needed for the storage e2e test:

 - common
 - storage_test_fixture
 - log_gap_analysis

Ref CORE-7649.
Fix compile warnings (as errors) which appear only in the bazel build,
prior to enabling this test in bazel.

Ref CORE-7649.
Add the storage_e2e_single_threaded test to the bazel build.

Ref CORE-7649.
@travisdowns travisdowns requested review from a team and BenPope as code owners October 22, 2024 13:43
@travisdowns travisdowns requested review from ivotron and removed request for a team October 22, 2024 13:43
@travisdowns travisdowns marked this pull request as draft October 22, 2024 13:43
@travisdowns
Copy link
Member Author

travisdowns commented Oct 22, 2024

Please ignore, bad rebase, will re-upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.