Skip to content

Ow fork main merge #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2,056 commits into from
Apr 8, 2025
Merged

Ow fork main merge #148

merged 2,056 commits into from
Apr 8, 2025

Conversation

MarkSh1
Copy link

@MarkSh1 MarkSh1 commented Mar 31, 2025

No description provided.

kakaiu and others added 30 commits December 3, 2024 22:01
* Rust external workload modifications

- add readme
- add simulation configuration file
- minor Rust bindings changes
- fix FDBPerfMetric::format_code default value in the C++ bindings
- Add CWorkload.c to CMake
- Fix cpp_workload test file

---------

Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com>
fdbclient so can be used in fdbclient. Remove the copies of
BackupTLSConfig we had in place named BlobTLSConfig.*.
Keep the old name though it a little clunky.
version only after receiving a commit version reply from the sequencer.
Advancing the min committed version prior to that point may result in
invalid DBRecoveryDurability errors (if a recovery happens after the
advancement) in simulation tests.
Bug introduced by recent refactor adding being able to specify
storage as an option.

Found by Paymaan Raza.

Co-authored-by: michael stack <stack@duboce.com>
Seaweed is started in a subprocess so setting the global had no
effect. Instead write the pid to a file so its available at
cleanup time.

Found by Zhe Wang.

Co-authored-by: michael stack <stack@duboce.com>
Refactor. Replace S3Cp with S3Client (S3Cp is too limiting of a name)
Found by ASAN s3_backup_tests
restartShardTrackers will invalidate `it->range()` used in the trace events.
Fix heap-use-after-free bugs
…ty_error

DBRecoveryDurability errors with version vector
apple#11815)

is based over "TagData::popped" to decide how long to keep the disk
queue positions of versions in memory (instead of using the logic that
is based over "LogData::persistentDataVersion", which is applicable to
spill by value case).
* Implement gRPC support

* Move some CMake stuff around.

* Fix typo

* Add some test

* Add async client

* Add test for checking destroy

* [testing] Automatically discover unit-test and register as ctest

This patch adds `collect_unit_tests()` to CMake which searches over
the codebase and finds all the unit-tests written using Flow's TEST_CASE
macro and register as ctest.

The test then can be then run using ctest command or directly via Test
Explorer in VSCode.

* Fix some tests

* Use NetworkAddress

* Add another variant of call method

* Add a failed call test

* Refactor

* Cleanup shutdown

* Start working on streaming

* Implement server streaming

* Cleanup some unnecessary templating

* Cleanup some tests

* WIP Client Streaming

* WIP

* File Transfer WIP

* Remove UnitTest.h

* Take grpc addresses from command line

* startup grpc in fdbserver

* Cancel if future ref is 0

* noop

* Update some Cmake files

* Fix some build/run issues

* Review comments and remove file transfer

* Compile with gRPC present

* format

* Address review comments

* Add  assert

* fix FLOW_GRPC_ENABLED flag

* include grpc/proto headers for generated files

* fix arm build not finding generated proto

* add debug message for protobuf generation

* add generated dir again

* add check for protoc compiler
* Rename error variable in go tests to err apple#8829

renamed the variables from e to err as mentioned in the https://go.dev/doc/effective_go

* Update packaging/docker/samples/golang/app/main.go

* fixed accidental renames, renamed file from directoryLayer.go to directory_layer.go, to work on go formatting

* Rename error variable in go tests to err apple#8829

renamed the variables from e to err as mentioned in the https://go.dev/doc/effective_go

* fixed accidental renames, renamed file from directoryLayer.go to directory_layer.go, to work on go formatting

* Update packaging/docker/samples/golang/app/main.go

* renamed directoryPartition.go -> directory_partition.go and directorySubspace.go -> directory_subspace.go

* updated: comments in files that were renamed, fixed accidental rename in bindings/go/src/fdb/fdb.go

* Update doc.go

Removed unintentional whitespaces due to formatter

* Update get_encryption_keys.go

removed unintentional whitespaces in get_encryption_keys.go

* removed accidental whitespaces in get_encryption_keys.go

* fixed few minor issues while renaming variable

* updated: minor tweaks, typos

* updated: CMakeLists

* removed: named return value

* handling nil Pointer exception

---------

Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com>
…cpp on appleclang (apple#11830)

* On both Apple clang version 15.0.0 (clang-1500.1.0.2.5) and 16.0.0,
compile fails here:

/Users/stack/checkouts/fdb/foundationdb/fdbserver/workloads/ClogRemoteTLog.actor.cpp:254:67: error: invalid operands to binary expression ('Standalone<StringRef>' and 'Optional<Standalone<StringRef>>')
                        if (ssi.locality.dcId().present() && ssi.locality.dcId().get() == g_simulator->remoteDcId)

* Update fdbserver/workloads/ClogRemoteTLog.actor.cpp

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>

* Formatting

---------

Co-authored-by: stack <stack@duboce.com>
Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
* Fix issues with clang 19

* Fix format

* Ignore --undefined-version for gcc
dlambrig and others added 25 commits March 13, 2025 18:15
…VECTOR_TLOG_UNICAST is T (apple#12021)

* ENABLE_VERSION_VECTOR_REPLY_RECOVERY can be T only if ENABLE_VERSION_VECTOR_TLOG_UNICAST is T

* Respond to review comments

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>
* change a event name

* add bulkload task count to fdbcli

* nit
`getFuture()` should be called before post as `send`/`sendError`
operation in `ThreadReturnPromise` moves the underlying Promise to
`tagAndForward()`.

Ideally, `ThreadReturnPromise` behavior should stay consistent with the
`Promise`. However, the problem is that it relies on the invariant that
there will always be one owner of its internal `Promise` which is either
itself or `tagOrForward`  -- which is necessary to ensure that only one
thread can operate on the Promise's internal state (ref count, flags
etc) and avoid race conditions.

This patch (1) makes sure that in case of `post()` function we get
future before, (2) adds an ASSERT as this should never happen, (3)
documentation for future users and (4) a test case for potentially
fixing this in future.
* make bulkload job manager logic clear

* bypass task if the task has been completed

* improve scheduleBulkLoadJob
* add level for DDBulkLoad except for datadistribution

* nits
* Add a bulkload user guide

* Forgot to add a file

* Address review comments

---------

Co-authored-by: stack <stack@duboce.com>
This patch adds TLS support for GrpcServer and AsyncGrpcClient by
implementing `GrpcCredentialsProvider` and using that to get channel
credentials. It adds `FlowGrpc` which is a flow global instance, and
initializes TLS credentials that are consistent with the ones provided
to FlowTransport.

- Added `FlowGrpc` to manage gRPC server initialization and TLS
  configuration globally.
- `GrpcCredentialsProvider` abstracts secure/insecure communications
  configurations for server/clients.
- Introduced `GrpcTlsCredentialProvider` for dynamic TLS certificate
  reloading from filesystem and `GrpcTlsCredentialStaticProvider` for
  static in-memory credentials.
- Updated `GrpcServer` to accept a `GrpcCredentialProvider`, enabling
  dynamic TLS credential management.
- Modified `fdbserver` to use `FlowGrpc::init()` for gRPC server
  initialization instead of `GrpcServer::initInstance()`, aligning it
  with FlowTransport behavior.
- Modified `GrpcServer::run()` to use the provided
  `GrpcCredentialProvider` instead of hardcoded insecure credentials.

Testing:
- Implemented a basic mTLS test case (`/fdbrpc/grpc/basic_tls`) to
  verify secure gRPC connections using
  `GrpcTlsCredentialStaticProvider`.

Todo:
- Generate certificates during testruns instead statically.
- Add test for `GrpcTlsCredentialProvider` which reads keys/certs from
  filesystem and monitors changes.
- Verify peers rules/criterias like FDB --verify-peer feature.
Found by simulation:
seed:  -f tests/slow/ApiCorrectnessAtomicRestore.toml -s 177856328 -b on
Commit: 51ad842
Compiler: clang++
Env: Rhel9 okteto

applyMutations() has processed version 801400000-803141392, and before calling sendCommitTransactionRequest(),
which was going to update apply begin version to 803141392. But DID NOT wait for the transaction commit.

Then there is an update on the apply end version to 845345760, which picks up the PREVIOUS apply begin version 801400000.
Thus started another applyMutation() with version range 801400000-845345760. Note because previous
applyMutation() has finished and didn't wait for the transaction commit, thus the starting version
is wrong. As a result, this applyMutation() re-processed version range 801400000-803141392.

The test failed during re-processing, because mutations are missing for the overlapped range.

The fix is to wait for the transaction to commit in sendCommitTransactionRequest().

This bug probably affects DR as well.

See rdar://146877552

20250317-162835-jzhou-ff4c4d6d7c51bfed
Previously I used arena size for accouting, which has problems such as arena
memory usage changes and arena memory block circular reference. And we are
acquiring flow lock multiple times, even though the cursor has already fetched
data into memory.

This change simplifies the accounting to just use message sizes, reducing the
number of flow lock acquisitions.
Avoid calling take() when the number of bytes is 0 and add an overhead factor
for memory usage of a mutation.
20250318-000905-jzhou-f92206b417dfabb7
lock->release(toRelease) can trigger assertion failure that release size is larger
than the active size. This is because lock->take() has not returned yet. So the
memory released has not been reserved. Fix by breaking release into two steps.
After the first release(), the lock->take() will be unblocked, thus the second
release() does the correct accounting.

gcc correctness:
20250319-002950-jzhou-a9ecc09dd1c8812a
* Fix backup worker assertion failure on memory usage

pullAsyncData() can be cancelled thus not take() some of the memory used by
the in memory queue.

20250320-171737-jzhou-752fd8c45fdadd4a

100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml
20250320-172251-jzhou-3bc7db8e7ce5e1de

* Refactor such that messages in the queue have to reserve memory first

Thus, when we release the memory, it must have already been reserved.

100k ParallelRestoreNewBackupCorrectnessAtomicOp.toml
20250320-234748-jzhou-a159972dd0a72e03

20250320-235252-jzhou-54a6ae8c67873b59

* Fix a backup worker assertion failure

The pop version obtained from GRV proxy replies may go backwards, which can
cause assertion failure when pulling mutations, i.e., missing mutations. Fix
this bug and add an assertion that popVersion can go lower.

100k 20250321-012019-jzhou-01eb56938cf0a3fc

100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml
20250321-012144-jzhou-68181bb51aedbb5d

* Fix assertion failure of triggered version

The triggered version could be the same as popVersion.

20250321-025429-jzhou-a088c3f119331b93

100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml
20250321-025532-jzhou-77e2dbd5fd035157

* Update fdbserver/BackupWorker.actor.cpp

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>

---------

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
)

* Disable version vector unicast with idempotent transactions

* Respond to review requests

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>
…ck Type (apple#12047)

* fix range lock

* make bulkload workload correct

* fix bugs and improve test coverage

* nits

* address comments

* nits

* address comments
@MarkSh1 MarkSh1 requested a review from oleg68 March 31, 2025 11:18
@@ -163,16 +178,16 @@ if(WIN32)
# properly for config mode. So we use the old way on Windows
# find_package(Boost 1.72.0 EXACT QUIET REQUIRED CONFIG PATHS ${BOOST_HINT_PATHS})
# I think depending on the cmake version this will cause weird warnings
find_package(Boost 1.78 COMPONENTS filesystem iostreams serialization system program_options)
find_package(Boost 1.86 COMPONENTS filesystem iostreams serialization system program_options url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of places where the boost version is defined. It's a time to define it in one place and use everywhere.

@MarkSh1 MarkSh1 merged commit e1a9789 into owtech:ow-fork-main Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.