Flytekit Rust entrypoint #2307

kumare3 · 2024-03-29T05:44:09Z

Tracking issue

Faster performance - faster startup. In limited testing more than 3.8x speedup
Smaller footprint - our goal is to remove object-store dependencies and replace with rust
Also remove dependency on grpc in python and only replace with rust

This also introduces rust more wholistically in flytekit

- Faster performance - faster startup. In limited testing more than 3.8x speedup - Smaller footprint - our goal is to remove object-store dependencies and replace with rust - Also remove dependency on grpc in python and only replace with rust This also introduces rust more wholistically in flytekit Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

codecov · 2024-03-29T05:48:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.71%. Comparing base (d32ce8f) to head (d34ce56).
Report is 27 commits behind head on master.

❗ Current head d34ce56 differs from pull request most recent head 06752b3. Consider uploading reports for the commit 06752b3 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2307       +/-   ##
===========================================
+ Coverage   75.95%   95.71%   +19.75%     
===========================================
  Files         181       19      -162     
  Lines       18295      560    -17735     
  Branches     3788        0     -3788     
===========================================
- Hits        13896      536    -13360     
+ Misses       3807       24     -3783     
+ Partials      592        0      -592

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

vlad-ivanov-name · 2024-04-04T10:03:38Z

src/distribution.rs

+
+#[logfn_inputs(Info, fmt = "Downloading distribution from {} to {}")]
+#[logfn(ok = "INFO", err = "ERROR")]
+pub async fn download_unarchive_distribution(src: &Url, dst: &String) -> Result<(), Box<dyn std::error::Error>> {


Consider using https://docs.rs/tracing/latest/tracing/ framework -- it's considered more modern and is more flexible in terms of what type of data can be collected and how it can be processed. If you'd like, I can share a minimal example of tracing setup for CLI output.

@vlad-ivanov-name can you check

vlad-ivanov-name · 2024-04-04T10:05:34Z

flyrs_test/test.sh

+export PYTHONPATH=`pwd`:$PYTHONPATH
+
+TESTDATA_PATH=`pwd`/testdata
+
+../target/debug/flyrs --inputs ${TESTDATA_PATH}/inputs.pb --output-prefix ${TESTDATA_PATH} --raw-output-data-prefix ${TESTDATA_PATH} --dynamic-addl-distro file://${TESTDATA_PATH}/schedule.tar.gz --dynamic-dest-dir . --resolver "flytekit.core.python_auto_container.default_task_resolver" -- task-module schedule task-name say_hello
+
+cmp testdata/outputs.pb testdata/expected_outputs.pb || echo -e "----------- Outputs file comparision failed! ----------"


Consider writing a test inside Rust code for this usecase instead. Rust has excellent testing support, so it's as simple as annotating a function.

https://doc.rust-lang.org/book/ch11-01-writing-tests.html

hmm i also wanted to test the cli, but let me learn

vlad-ivanov-name · 2024-04-04T10:08:40Z

src/executor.rs

+
+impl Display for ExecutorArgs {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        write!(f, "ExecutorArgs {{ inputs: {}, output_prefix: {}, test: {}, raw_output_data_prefix: {}, resolver: {}, resolver_args: {:?}, checkpoint_path: {:?}, prev_checkpoint: {:?}, dynamic_addl_distro: {:?}, dynamic_dest_dir: {:?} }}",


The notation you've described here is more or less what Debug derive outputs. So you can for example do write!("{:?}", self)

vlad-ivanov-name · 2024-04-04T10:23:21Z

src/executor.rs

+
+#[logfn_inputs(Info, fmt = "Invoking task with {}")]
+#[logfn(ok = "INFO", err = "ERROR")]
+pub async fn execute_task(args: &ExecutorArgs) -> Result<(), Box<dyn std::error::Error>>{


consider using https://docs.rs/anyhow/latest/anyhow/ or https://docs.rs/miette/latest/miette/ as error types: they provide useful facilities such as backtrace, additional error context, and user-friendly error messages

vlad-ivanov-name · 2024-04-04T10:24:55Z

src/executor.rs

+        if executor_args.dynamic_dest_dir.is_none() {
+            return Err("Dynamic distro requires a destination directory".into());
+        }
+        let src_url = url::Url::parse(executor_args.dynamic_addl_distro.clone().unwrap().as_str())?;


same about unwrap -- you can use a match expression

kumare3 · 2024-04-04T17:36:05Z

@vlad-ivanov-name thank you for all the comments. As you can tell I am complete newbie and I will definitely try to adapt all your suggestions. If you think you can take over the PR and improve it - please feel free too :) I would love to hand it over

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com> Co-authored-by: Ketan Umare <kumare3@users.noreply.github.com>

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

vlad-ivanov-name

👍 looks good overall, left a few comments

vlad-ivanov-name · 2024-05-10T08:10:00Z

rust/flyrs/src/distribution.rs

+    let store = store_box.0;
+
+    let src_path = Path::parse(src.path())?;
+    let tar_gz_stream = store.get(&src_path).await.unwrap();


you probably still want to use ? here:

Suggested change

let tar_gz_stream = store.get(&src_path).await.unwrap();

let tar_gz_stream = store.get(&src_path).await?;

vlad-ivanov-name · 2024-05-10T08:10:20Z

rust/flyrs/src/distribution.rs

+    let tar_gz_stream = store.get(&src_path).await.unwrap();
+
+    // TODO figure out how to stream unarchive the tar.gz file
+    let tar_gz_data = tar_gz_stream.bytes().await.unwrap();


same

Suggested change

let tar_gz_data = tar_gz_stream.bytes().await.unwrap();

let tar_gz_data = tar_gz_stream.bytes().await?;

vlad-ivanov-name · 2024-05-10T08:14:38Z

rust/flyrs/src/executor.rs

+        debug!("Python path: {:?}", path);
+        debug!("Python version: {:?}", version);
+        debug!("Python modules: {:?}", keys);


there's a better syntax you can use here

Suggested change

debug!("Python path: {:?}", path);

debug!("Python version: {:?}", version);

debug!("Python modules: {:?}", keys);

debug!(

path = ?path,

version = ?version,

modules = ?keys,

"debug python setup"

);

what this will achieve is structured tracing: for backends that support it, you won't just log a string but rather a map with fields. it's really useful when e.g. analyzing traces in cloud

Wow you are indeed making me a better rust programmer - thank you appreciate it

vlad-ivanov-name · 2024-05-10T08:16:37Z

rust/flyrs/src/executor.rs

+
+fn debug_python_setup(py: Python) {
+    if tracing::enabled!(tracing::Level::DEBUG) {
+        let sys = PyModule::import_bound(py, "sys").unwrap();


instead of unwrap you can create another function that returns anyhow::Result, and here you can write something like

let if let Ok((path, version, keys)) = get_python_info() { // log it } else { tracing::error!("failed to get python info") }

vlad-ivanov-name · 2024-05-10T08:17:06Z

rust/flyrs/src/executor.rs

+    pyo3::prepare_freethreaded_python();
+    let _ = Python::with_gil(|py| -> Result<()> {
+        debug_python_setup(py);
+        let entrypoint = PyModule::import_bound(py, "flytekit.bin.entrypoint").unwrap();


since you already return a result ? would probably work instead of unwrap

vlad-ivanov-name · 2024-05-10T08:19:07Z

rust/flyrs/src/distribution.rs

+
+
+#[tracing::instrument(err)]
+pub async fn download_unarchive_distribution(src: &Url, dst: &String) -> Result<()> {


the dst type should probably be &str

kdubovikov · 2024-08-16T13:55:44Z

@kumare3 hi. Wanted to check when this is planned to be merged?

kumare3 · 2024-08-17T18:35:49Z

This needs more work, especially tooling so - not yet planned but definitely later this year

kdubovikov · 2024-08-19T14:03:31Z

This needs more work, especially tooling so - not yet planned but definitely later this year

@kumare3 is there anything that outside contributors can help with so that the release of this feature will be faster? I have some experience working with pyo3

austin362667 · 2024-08-19T14:27:05Z

Hi @kdubovikov thanks for your interest. I'm an outside contributor too.

I'm working on removing dependencies over grpc and protobuf in python and only replace them with rust recently. You can check the ongoing works here for flyte and flytekit. It's almost done; it's not an easy task, though.

The Flytekit Rust entrypoint is also a challenging task with significant potential for performance gains. Perhaps we can figure out the blockers and share some pitfalls with each other. wdyt?

kdubovikov · 2024-08-20T07:23:01Z

@austin362667 , @kumare3, hi. Just a surface-level thought on the overall design. So, Flyte uses GRPC as a default transport, most likely due to being written in Go. Then, we inherited this transport in Python Entrypoint and flytekit. Afterward, community has realized that GRPC clients in Python is bound to some performance and stability issues in our specific use cases. So, the proposed solution is to create native GRPC clients in Rust and wrap them as Python packages with PyO3. Am I correct with this chain of reasoning?

@austin362667 By inspecting your PRs I see what you say when you mention that the task is not easy. You seem to employ quite deep workarounds for solving ref outliving.

If I am, then I wanted to ask if any alternative and probably simpler approaches were considered:

What if Flyte API had a set of RESTful ports/endpoints in addition to the GRPC ones? Would it remove the performance issues in the Entrypoint? Do we know if the primary contributor to the slowness of the entrypoint is the GRPC client? I did not see much CPU-bound code in there. This approach would also remove GRPC dependencies from flytekit and make it more lightweight. asyncio could be used for communication to resolve any performance issues, and if the endpoint was over compressed HTTP/2 I suppose that the difference between GRPC and REST would be minimal. Just looking at the efforts needed to introduce Rust to the project along with additional GRPC client packaging using PyO3, I am wondering if simpler alternatives are viable? Are there any significant drawbacks that you see?
Was it considered encapsulating all communication with Flyte APIs into a sidecar, making Task pods lighter. There would be no need for entrypoint this way, or at least it would be a minimal setup script.

I am not saying that the current solution is bad in any way, just asking questions from a fresh perspective.

austin362667 · 2024-08-20T08:25:50Z

@kdubovikov Exactly! That's sort of the whole context and what we are doing now.

@kumare3 EDIT: Grpc is not used because of golang, but because of it being way faster than rest and type safety.

Actually, I think you're almost right about your suggestion.

Refactoring gRPC into RESTful is another huge story, too. And, yes, it indeed mitigates the dependencies over Python grpcio and protobuf.

Let's say if we want to stay in gRPC architecture. It's true the transport layer is not CPU-bounded, and I've already enabling non-blocking network by leveraging asyncio during this run / register asynchronously. So performance is not an issue or bottleneck any more. Another reason to introduce Rust in the Flytekit remote client is to reduce the wheel size, thereby decreasing the Docker image pull wait time for Flytekit when starting a pod. Am I missing anything? @pingsutw

IMO, a better use case for introducing Rust is rewriting the entry point, just like what @kumare3 is trying here. Refactor this component might have much more potential for performance gains.

kumare3 · 2024-08-20T08:34:51Z

Grpc is not used because of golang, but because of it being way faster than rest and type safety. Problem has been Grpc python support by Google has been not great. All the top projects in the world today use Grpc and I still prefer it. We are looking migrating to an http2 compliant implementation of Grpc - connect!

austin362667 · 2024-08-20T08:55:36Z

Yeh, no doubt speed and type safety are the two most important reasons to use gRPC!

Was it considered encapsulating all communication with Flyte APIs into a sidecar, making Task pods lighter. There would be no need for entrypoint this way, or at least it would be a minimal setup script.

I'm not sure on this one. Can you elaborate more? @kdubovikov
And wdyt? @MortalHappiness

kdubovikov · 2024-08-20T09:00:53Z

Grpc is not used because of golang, but because of it being way faster than rest and type safety. Problem has been Grpc python support by Google has been not great. All the top projects in the world today use Grpc and I still prefer it. We are looking migrating to an http2 compliant implementation of Grpc - connect!

@kumare3 understood. I can see reasoning behind this. I prefer strong typing in remote calls as well. However, if Python is the primary client language and GRPC support there is not great, is it a solid reason for introducing another language (and a quite tricky one) to the project?

In case of the Endpoint, why not implement it in Go, for example? Is it a drastic difference between calling the Task function from Go, compared to using PyO3 in Rust? If Go was used, probably we would achieve the same goals along with reducing the maintainability pressure by introducing a new language to the project.

And feel free to tell me if I am not making much sense, I will stop debating on this particular point :)

Another reason to introduce Rust in the Flytekit remote client is to reduce the wheel size, thereby decreasing the Docker image pull wait time for Flytekit when starting a pod.

@austin362667 this is a good point indeed. Do we know by how much the wheel size is reduced?

MortalHappiness · 2024-08-20T09:07:30Z

Was it considered encapsulating all communication with Flyte APIs into a sidecar, making Task pods lighter. There would be no need for entrypoint this way, or at least it would be a minimal setup script.

@kdubovikov @austin362667

Although I do not have much context with the Rust gRPC implementation, from a K8s perspective, I do not think this is correct. First off, sidecar container also run in the same Pod with the main container, so the task pods will not be lighter. Besides, I don't think it is easier to implement it as a sidecar container. As far as I know, the main communication ways between sidecar and main container is via network or shared volumes, so the sidecar container must be either consistently watch for file system changes or serves as a translator between the main container and the Flyte API, by capturing the network requests from the main container and translate it to gRPC. In either way the implementation will not be easier.

Feel free to tell me if you have a better implementation for the sidecar pattern. Maybe I misunderstand something.

kdubovikov · 2024-08-20T09:55:20Z

Was it considered encapsulating all communication with Flyte APIs into a sidecar, making Task pods lighter. There would be no need for entrypoint this way, or at least it would be a minimal setup script.

@kdubovikov @austin362667

Although I do not have much context with the Rust gRPC implementation, from a K8s perspective, I do not think this is correct. First off, sidecar container also run in the same Pod with the main container, so the task pods will not be lighter. Besides, I don't think it is easier to implement it as a sidecar container. As far as I know, the main communication ways between sidecar and main container is via network or shared volumes, so the sidecar container must be either consistently watch for file system changes or serves as a translator between the main container and the Flyte API, by capturing the network requests from the main container and translate it to gRPC. In either way the implementation will not be easier.

Feel free to tell me if you have a better implementation for the sidecar pattern. Maybe I misunderstand something.

@MortalHappiness I meant that the Task container will get lighter and possibly won't need extra GRPC dependencies, and the sidecar can act as a bridge and use any more efficient implementation that we consider to be faster. This way the task container won't need much except from the user code itself, and sidecar would take care of handling all of the platform functionality. However, I agree that adding a sidecar is not contributing much to the entrypoint performance by itself, and the problem of communication between sidecar and task containers still must be solved. The key to this issue is optimizing the entrypoint cold start time, and sidecar is more about decoupling one from another. So yes, let's dismiss the sidecar thread for now.

It's just that using PyO3 and Rust as a workaround for standard Google GRPC clients in Python is not looking as a very common or widely accepted pattern that I have seen in other codebases (especially those that are not primarily in Rust), so I ask if everyone involved are sure that introducing Rust to the project is the best possible solution considering all tradeoffs. Maybe I am not aware of other roadmap points that will broaden the usage of Rust for different use cases as well. Again, not saying that using Rust for creating Python GRPC client packages or entrypoint script is wrong in any way, just mentioning that it adds a whole new layer of maintainability in the long term.

The second usecase is to use Rust for the Entrypoint script, but I am not entirely sure if it will be much different than using Go for the same: run task setup code and then run and monitor the Python process, communicating back to Flyte when neccessary.

In short

Does Flyte has far-reaching plans to use Rust in the future for any other cases than Entrypoint?
If no, why won't we write entrypoint in Go, which is already the primary language of the platform and should give us performance benefits as well.

** Update **
I see the benefit of using PyO3 in the following

It embeds Python in the same process
No IPC is needed (no pipes / sockets)
Much more flexible in back-and-forth communication between Rust code and Python code

Go entrypoint benefits:

Does not introduce a new language to the project, better maintanability
Most likely will be almost as fast as Rust, but this needs checking
Needs IPC between Python and Endpoint processes, but the only things I am aware about are sending the startup arguments to the task and checking for exceptions or errors during execution, which is not much

kumare3 · 2024-08-20T12:05:07Z

@kdubovikov grpc is not used at runtime. This communication happens through blob store.
The decision on a new language is dependent on what we can achieve with the new language.
The reason for rust is the amazing benefit of interoperability with many different languages. Goal is to build a common rust runtime that allows any language sdk.

As for the sidecar pattern - it is already supported in Flyte through raw container tasks. There are many disadvantages with it too. Check it out

kdubovikov · 2024-08-20T12:35:57Z

@kdubovikov grpc is not used at runtime. This communication happens through blob store. The decision on a new language is dependent on what we can achieve with the new language. The reason for rust is the amazing benefit of interoperability with many different languages. Goal is to build a common rust runtime that allows any language sdk.

As for the sidecar pattern - it is already supported in Flyte through raw container tasks. There are many disadvantages with it too. Check it out

Thanks for the context, @kumare3 . Go would not solve the interoperability for sure, if that's our goal. Then Rust it is 🦀 , closing this discussion.

My first question remains: is there anything that can be done by outside contributors to speed up the release of this feature?

kumare3 requested review from wild-endeavor, eapolinario, pingsutw and cosmicBboy as code owners March 29, 2024 05:44

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 29, 2024

kumare3 marked this pull request as draft March 29, 2024 05:44

improved logging

66ae902

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

vlad-ivanov-name reviewed Apr 4, 2024

View reviewed changes

austin362667 mentioned this pull request Apr 4, 2024

[WIP] Replace Python gRPC with Rust #2328

Closed

16 tasks

wild-endeavor added 2 commits April 22, 2024 13:20

Merge remote-tracking branch 'origin/master' into refactor-entrypoint

4ec68ea

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

Refactor entrypoint more (#2371)

d34ce56

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

wild-endeavor changed the title ~~[wip] Introducing Flytekit Rust entrypoint [flyrs]~~ Flytekit Rust entrypoint Apr 22, 2024

kumare3 and others added 3 commits April 25, 2024 10:36

updated logger to tracing (#2380)

0027fe1

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com> Co-authored-by: Ketan Umare <kumare3@users.noreply.github.com>

Added anyhow

bdf27de

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

Fixed merge conflict

06752b3

Signed-off-by: Ketan Umare <kumare3@users.noreply.github.com>

vlad-ivanov-name reviewed May 10, 2024

View reviewed changes

This was referenced Jun 27, 2024

Add another Rust remote client that directly utilizes the flyteidl-rust Python bindings. #2535

Closed

Add remote client that directly utilizes the flyteidl-rust Python bindings #2536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flytekit Rust entrypoint #2307

Flytekit Rust entrypoint #2307

kumare3 commented Mar 29, 2024

codecov bot commented Mar 29, 2024 •

edited

Loading

vlad-ivanov-name Apr 4, 2024

kumare3 May 10, 2024

kumare3 May 10, 2024

vlad-ivanov-name Apr 4, 2024

kumare3 May 10, 2024

vlad-ivanov-name Apr 4, 2024

vlad-ivanov-name Apr 4, 2024

kumare3 May 10, 2024

vlad-ivanov-name Apr 4, 2024

kumare3 commented Apr 4, 2024

vlad-ivanov-name left a comment

vlad-ivanov-name May 10, 2024

vlad-ivanov-name May 10, 2024

vlad-ivanov-name May 10, 2024

kumare3 May 11, 2024

vlad-ivanov-name May 10, 2024

vlad-ivanov-name May 10, 2024

vlad-ivanov-name May 10, 2024

kdubovikov commented Aug 16, 2024

kumare3 commented Aug 17, 2024

kdubovikov commented Aug 19, 2024 •

edited

Loading

austin362667 commented Aug 19, 2024

kdubovikov commented Aug 20, 2024

austin362667 commented Aug 20, 2024 •

edited

Loading

kumare3 commented Aug 20, 2024

austin362667 commented Aug 20, 2024 •

edited

Loading

kdubovikov commented Aug 20, 2024

MortalHappiness commented Aug 20, 2024 •

edited

Loading

kdubovikov commented Aug 20, 2024 •

edited

Loading

kumare3 commented Aug 20, 2024

kdubovikov commented Aug 20, 2024

	let tar_gz_stream = store.get(&src_path).await.unwrap();
	let tar_gz_stream = store.get(&src_path).await?;

	let tar_gz_data = tar_gz_stream.bytes().await.unwrap();
	let tar_gz_data = tar_gz_stream.bytes().await?;

-        debug!("Python path: {:?}", path);
-        debug!("Python version: {:?}", version);
-        debug!("Python modules: {:?}", keys);
+        debug!(
+            path = ?path,
+            version = ?version,
+            modules = ?keys,
+            "debug python setup"
+        );



		#[tracing::instrument(err)]
		pub async fn download_unarchive_distribution(src: &Url, dst: &String) -> Result<()> {

Flytekit Rust entrypoint #2307

Are you sure you want to change the base?

Flytekit Rust entrypoint #2307

Conversation

kumare3 commented Mar 29, 2024

Tracking issue

codecov bot commented Mar 29, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumare3 commented Apr 4, 2024

vlad-ivanov-name left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdubovikov commented Aug 16, 2024

kumare3 commented Aug 17, 2024

kdubovikov commented Aug 19, 2024 • edited Loading

austin362667 commented Aug 19, 2024

kdubovikov commented Aug 20, 2024

austin362667 commented Aug 20, 2024 • edited Loading

kumare3 commented Aug 20, 2024

austin362667 commented Aug 20, 2024 • edited Loading

kdubovikov commented Aug 20, 2024

MortalHappiness commented Aug 20, 2024 • edited Loading

kdubovikov commented Aug 20, 2024 • edited Loading

kumare3 commented Aug 20, 2024

kdubovikov commented Aug 20, 2024

codecov bot commented Mar 29, 2024 •

edited

Loading

kdubovikov commented Aug 19, 2024 •

edited

Loading

austin362667 commented Aug 20, 2024 •

edited

Loading

austin362667 commented Aug 20, 2024 •

edited

Loading

MortalHappiness commented Aug 20, 2024 •

edited

Loading

kdubovikov commented Aug 20, 2024 •

edited

Loading