Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM green commit tracking for coordination experiment #1178

Open
sstamenova opened this issue Aug 8, 2022 · 86 comments
Open

LLVM green commit tracking for coordination experiment #1178

sstamenova opened this issue Aug 8, 2022 · 86 comments

Comments

@sstamenova
Copy link

Please use #1135 for process discussion.

I'm adding the new issue, so that we can track just the green LLVM commits and corresponding commits/branches in torch-mlir, onnx-mlir and MHLO until we have a more formal process.

@sstamenova
Copy link
Author

Green LLVM commit for the week of 8/1/2022: ec5def5

@sstamenova
Copy link
Author

sstamenova commented Aug 8, 2022

Week of 8/8/2022:
Green LLVM commit: 061e0189a3dab6b1831a80d489ff1b15ad93aafb
Green MHLO commit: 0430519b7ebf11a3f44c469fce8b579561fa6052 (branch greencommit/2022-08-08-061e0189)
Corresponding torch-mlir commit: bb47c166a047ddbc913950cb0f2571594c0fe261
Corresponding onnx-mlir commit: 993b86b57febc42c83e13e6533cbb50fe77cf076

@ashay
Copy link
Collaborator

ashay commented Aug 15, 2022

Week of 8/15/2022:
Green LLVM commit: 2dde4ba63974daf59f8ce5c346505f194f920131
Green MHLO commit: 9c49473d80a8667e94232ddb5ed60a1a9d8ad266 (branch greencommit/2022-08-15-2dde4ba6)
Corresponding Torch-MLIR commit: 84d345c650adac1645ea4dcb5dae88b4c91d3413

@ashay
Copy link
Collaborator

ashay commented Aug 22, 2022

Week of 8/22/2022:
Green LLVM commit: e5d5146323ffaa13eb5185616c6ae5c36b69352d (works on Windows and Linux, but not on macOS)
Green MHLO commit: ace4030dd55fce2a74e46f71f1937feb61ed1e3f (branch greencommit/2022-08-22-e5d51463)
Corresponding onnx-mlir commit: a9767f31a933a55aaa99bbb8a6dadcee7f27ad62

@ashay
Copy link
Collaborator

ashay commented Aug 29, 2022

Week of 8/29/2022:
Green LLVM commit: 00d648bdb5a8b71785269b4851b651c883de2cd9
Green MHLO commit: 305a2f25229660ea789bf70ed8e7336227f6228a (branch greencommit/2022-08-29-00d648bd)

@ashay
Copy link
Collaborator

ashay commented Sep 5, 2022

Week of 9/5/2022:
Green LLVM commit: d2613d5bb5dca0624833e4747f67db6fe3236ce8
Green MHLO commit: 3bfb91e4ee44352f6620603078e2e2fc587d9a1e (branch greencommit/2022-09-05-d2613d5b)
Corresponding Torch-MLIR commit: 93f7c0ceb566587f67a8ff8b0eaacf85cbbb7376

@alextsao1999 alextsao1999 unpinned this issue Sep 5, 2022
@ashay ashay pinned this issue Sep 7, 2022
@ashay
Copy link
Collaborator

ashay commented Sep 12, 2022

Week of 9/12/2022:
Green LLVM commit: 4d4ca6c9d036a06bf0723786112dd17e491b2f53
Green MHLO commit: 2c8256b49219b4677963ce409a004648d8972df1 (branch greencommit/2022-09-12-4d4ca6c9)
Corresponding Torch-MLIR commit: 2bb5f4d8fe1f91715716f46b9b8b895aec078021

@ashay
Copy link
Collaborator

ashay commented Sep 19, 2022

Week of 9/19/2022:
Green LLVM commit: 458598ccc50c5118107f05d60f3d043772a91f26
Green MHLO commit: cd9da150e729fd046109e7962e5f63f5fe067a3b (branch greencommit/2022-09-19-458598cc)
Corresponding Torch-MLIR commit: e17fcea94ec384209df5412bb2bac71ff004942c

@hanchenye hanchenye unpinned this issue Sep 21, 2022
@ashay
Copy link
Collaborator

ashay commented Sep 26, 2022

Week of 9/26/2022:
Green LLVM commit: bebc96956b76bdbc36f1d82a788c810e5b12e2c5
Green MHLO commit: 7b0ecf7827e3fc07d2af90e147bcedc165bc78ac (branch greencommit/2022-09-26-bebc9695)
Corresponding Torch-MLIR commit: a60acf272d618160d210015ff7c19fa773b1f345

@ashay
Copy link
Collaborator

ashay commented Oct 2, 2022

Week of 10/02/2022:
Green LLVM commit: 6f46ff3765dcdc178b9cf52ebd8c03437806798a
Green MHLO commit: 2f7c1454bbe4c4ad0ae1c86c5539ac58b6053b6a (branch greencommit/2022-10-02-6f46ff37)

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
…ld in a few places. My team found this while investigating building llvm-project and onnx-mlir in parallel using a single cmake command. (llvm#1178)

I've chosen to add the over-arching `MLIRIR` target rather than individual dependencies to avoid having to update these as interfaces are added/removed/moved within the MLIR IR. Below is the exact list of dependencies i found while researching this problem:

    MLIRCallInterfacesIncGen
    MLIRCastInterfacesIncGen
    MLIRDataLayoutInterfacesIncGen
    MLIROpAsmInterfaceIncGen
    MLIRRegionKindInterfaceIncGen
    MLIRSideEffectInterfacesIncGen
    MLIRSymbolInterfacesIncGen
    MLIRTensorEncodingIncGen
  ###ResultTypeInferenceOpInterface.cpp/HasOnnxSubgraphOpInterface.cpp/SpecializedKernelOpInterface.cpp
    MLIRBuiltinTypeInterfacesIncGen
    MLIRTensorEncodingIncGen

    MLIREmitCAttributesIncGen
    MLIRGPUOpsAttributesIncGen
    MLIRGPUOpsEnumsGen
    MLIROpenACCOpsIncGen
    MLIRSparseTensorAttrDefsIncGen
    MLIRTosaStructsIncGen

Signed-off-by: Ian Bearman <ianb@microsoft.com>

Co-authored-by: Stella Stamenova <stilis@microsoft.com>
@ashay
Copy link
Collaborator

ashay commented Oct 10, 2022

Week of 10/10/2022:
Green LLVM commit: 438e59182b0c2e44c263f5bacc1add0e514354f8
Green MHLO commit: e6f3ec05c679e86c0a922d1c24ef488d99dfd8aa (branch greencommit/2022-10-10-438e5918)

@ashay
Copy link
Collaborator

ashay commented Oct 17, 2022

Week of 10/17/2022:
Green LLVM commit: 4546397e39589f0a6a707218349d1bf65fe54645
Green MHLO commit: 4de853aa6882ae1aa262dbb983c4ee2d5022e78d (branch greencommit/2022-10-17-4546397e)

cc: @silvasean

@ashay
Copy link
Collaborator

ashay commented Oct 24, 2022

Week of 10/24/2022:
Green LLVM commit: f8b8426861a7a26ff60fe085800cc338591bee41
Green MHLO commit: ec92f151ddee9c82d405558754612a001ab6cf98 (branch greencommit/2022-10-24-f8b84268)

cc: @silvasean

@ashay
Copy link
Collaborator

ashay commented Oct 31, 2022

Week of 10/31/2022:
Green LLVM commit: 74fb770de9399d7258a8eda974c93610cfde698e
Green MHLO commit: 2341f70343a5361d4611557c2af9d24b01aa427e (branch greencommit/2022-10-31-74fb770d)

cc: @tanyokwok

@ashay
Copy link
Collaborator

ashay commented Nov 7, 2022

Week of 11/07/2022:
Green LLVM commit: a2620e00ffa232a406de3a1d8634beeda86956fd
Green MHLO commit: 57ba12a2a1934c3c9fc3cd1580f28f0c233f41d4 (branch greencommit/2022-11-07-a2620e00)

cc: @qiuxiafei

@ashay
Copy link
Collaborator

ashay commented Nov 14, 2022

Week of 11/14/2022:
Green LLVM commit: e864ac694540342d5e59f59c525c5082f2594fb8
Green MHLO commit: eab364ba2a66bd0613efb94f8a738c1c97aaee92 (branch greencommit/2022-11-14-e864ac69)

cc: @Shukla-Gaurav

@ashay
Copy link
Collaborator

ashay commented Aug 14, 2023

Week of 08/14/2023:

Green LLVM commit: a3f2751f782f3cdc6ba4790488ec20163a40ac37
Green MHLO commit: 97c7e4b4506c3a2441c923e592833f45da439009 (branch greencommit/2023-08-14-a3f2751f)

cc: @silvasean

@ramiro050
Copy link
Collaborator

Sean is out of office this week, so I've also taken this week's bump: #2397

@ashay
Copy link
Collaborator

ashay commented Aug 21, 2023

Week of 08/21/2023:

Green LLVM commit: 91088978d712cd7b33610c59f69d87d5a39e3113
Green MHLO commit: 01402e75f8e8a95aa2cd9c267ce929095a2f2e54 (branch greencommit/2023-08-21-91088978)

cc: @tanyokwok

@ashay
Copy link
Collaborator

ashay commented Aug 28, 2023

Week of 08/28/2023:

Green LLVM commit: 7ffc50708a80fdb1a2d913394b9de3892766c380
Green MHLO commit: a7fc4fec03f47733d7d04b5e5ead0ff912b50571 (branch greencommit/2023-08-28-7ffc5070)

cc: @qiuxiafei

@ashay
Copy link
Collaborator

ashay commented Sep 4, 2023

Week of 09/04/2023:

Green LLVM commit: 6098d7d5f6533edb1b873107ddc1acde23b9235b
Green MHLO commit: 5a4c066815ba4358919ced9539e899dfd8dee287 (branch greencommit/2023-09-04-6098d7d5)

cc: @Shukla-Gaurav

@stellaraccident
Copy link
Collaborator

stellaraccident commented Sep 6, 2023

Hi folks, I see that @ashay posted on Discord that they are no longer able to post green commits. I think we all appreciate the work that has gone into keeping the wheels on the cart: thank you.

I'd like to give the community a chance to speak up, but I don't think without a volunteer we can keep the stablehlo backend enabled in its present state, given that it is tied to mlir-hlo, which is both deprecated and a very finicky dep. Can some folks speak up about the need/investment here so that we don't do something that is disconnected from expectations?

I'd be open to various pragmatic options to take us forward as needed, including any of:

  • Accept a commit that dropped the mlir-hlo dep in favor of stablehlo and get a volunteer to keep it updated.
  • Disable stablehlo by default but keep it available in the codebase until the situation is sorted.
  • Disable stablehlo by default and accept a CI job which performs some level of testing to enable it and keep it green.

In any situation I can see, without some work done on the dependency and testing situation, I don't see how we can continue to keep this enabled in the default build. Please speak up if you are invested here and would like to discuss and find a path forward. I'm being a bit direct because this has been dragging on for a year, and I feel that we need to force a resolution. But I'm perfectly willing to discuss and reach a compromise on a supportable path forward.

@vivekkhandelwal1
Copy link
Collaborator

Hi folks, I see that @ashay posted on Discord that they are no longer able to post green commits. I think we all appreciate the work that has gone into keeping the wheels on the cart: thank you.
......
In any situation I can see, without some work done on the dependency and testing situation, I don't see how we can continue to keep this enabled in the default build. Please speak up if you are invested here and would like to discuss and find a path forward. I'm being a bit blunt because this has been dragging on for a yeah, and I feel that we need to force a resolution. But I'm perfectly willing to discuss and reach a compromise on a supportable path forward.

CC: @tanyokwok @qingyunqu @Vremold

@powderluv
Copy link
Collaborator

I agree we should drop the MLHO dep (given it is deprecated and we don't need some blackmagic greencommits where mhlo works).

I know there are some users of StableHLO (not necessarily with the LinAlg lowering). So leaving it off by default and without e2e tests is ok. We can add a FYI CI that builds it with it ON and once @burmako work to get LinAlg lowering in StableHLO lands we can potentially re-enable StableHLO e2e testing.

@powderluv
Copy link
Collaborator

I will also add a RollLLVM CI that rolls LLVM nightly like we do for Pytorch and if anything fails we can fix it up right away so we are always on top of LLVM (just like we are on PyTorch)

@sstamenova
Copy link
Author

The other aspect of publishing the green commits was that they pointed to a state of LLVM where we were reasonably sure that it would work correctly on most platforms (including Windows). Then onnx-mlir was able to snap to the same commits and keep torch-mlir and onnx-mlir in sync for those that depend on both.

I am not familiar with the RollLLVM CI, would that help in this situation?

@ashay
Copy link
Collaborator

ashay commented Sep 7, 2023

Thanks for initiating this discussion. For what it's worth, we don't use MHLO at work, so my opinion (understandably) carries less weight than someone who depends on MHLO.

In the short term, I'd be in favor of not building the MHLO backend (and tests) in the Torch-MLIR CI. I don't think it would help to run a separate CI with the MHLO backend enabled, since we wouldn't have a way to fix problems until we rebase MHLO/StableHlo to the same LLVM commit chosen by Torch-MLIR (which we can't do without creating a branch/fork of MHLO/StableHlo).

My (likely very naive) understanding is that if we drop the MHLO/StableHlo integration, there would be no way to lower PyTorch models to StableHlo (please correct me if I am wrong), which would be sad. If that's correct, I can maintain a fork of StableHlo that periodically rebases upstream StableHlo to the chosen LLVM green commit. Assuming that we don't build Torch-MLIR CI with MHLO enabled, it would not be a problem that Torch-MLIR's submodule of StableHlo would use a lagging LLVM commit.

Of course, that doesn't resolve the problem of picking the LLVM commit hash. The RollLLVM CI sounds nice, but unless ONNX-MLIR uses the same LLVM commit, we'd be risking a similar fragmentation between Torch-MLIR and ONNX-MLIR as we currently have between Torch-MLIR and MHLO/StableHlo, causing headaches for consumers of both Torch-MLIR and ONNX-MLIR (which was the reason we started the coordination experiment).

For the longer term, I wonder if we could kick this upstream and have the LLVM project produce periodic green commits that downstream projects sync on. My (arguably naive) opinion is that unless dependent projects (that build their dependencies from source) use the same commit, there's no way to guarantee problem-free builds.

@stellaraccident
Copy link
Collaborator

I suspect that we need to switch to a rolling commit branch where we land fixes to API breaks and make automatic LLVM bumps. That way everyone can use the branch as a source of patches to sync to their needs vs having one clock that is impossible to synchronize.

Specific constraints on a version really need to belong to the consumer in this kind of situation. The upstream project can "keep the train moving and tested" so that others can grab what they need, but it should really only have two buttons: pause-and-fix-for-breakage and move-forward.

I've been running an experiment for a couple of months and, leaving aside testing infra deps and all of the gunk that mlir-hlo pulls in, the core of the project is quite tolerant to version skew, and I think we need to make that go faster and patch more incrementally -- not slow it down to the lowest common denominator. Having more sync points is really the only way to meet everyone and stay patched (also, fixing things around actual breakages at the granularity of ~a few upstream patches is the most effective way to stay up to date).

(as usual, I'm expressing a strong opinion to avoid ambiguity but am open to discuss it more)

@stellaraccident
Copy link
Collaborator

I believe something like what I am describing above is what @powderluv is referring to as a RollLLVMCI. I don't think it exists at the moment but could relatively easily.

What I would like to see is for that CI to roll forward a branch that we keep patched when it breaks and then periodically merge back to main. Then anyone can do a roll-up of the project at the commit that they need. When I've run this experiment in the past, I had the branch move forward only to mlir/ impacting changes, which is relatively low rate. ccache makes smoketest builds of that pretty fast.

@sstamenova
Copy link
Author

I believe something like what I am describing above is what @powderluv is referring to as a RollLLVMCI. I don't think it exists at the moment but could relatively easily.

What I would like to see is for that CI to roll forward a branch that we keep patched when it breaks and then periodically merge back to main. Then anyone can do a roll-up of the project at the commit that they need. When I've run this experiment in the past, I had the branch move forward only to mlir/ impacting changes, which is relatively low rate. ccache makes smoketest builds of that pretty fast.

If automation like that is fairly easy to setup and fixes end up being cheaper, it seems like that's the way forward. How complex would it be to make sure it runs on all platforms of interest (e.g. windows, linux, mac) and to replicate elsewhere (e.g. onnx-mlir)?

@stellaraccident
Copy link
Collaborator

If automation like that is fairly easy to setup and fixes end up being cheaper, it seems like that's the way forward. How complex would it be to make sure it runs on all platforms of interest (e.g. windows, linux, mac) and to replicate elsewhere (e.g. onnx-mlir)?

When I've done this in the past, I've just had each platform be a normal GitHub actions workflow that either turns green or doesn't and is owned by someone with enough stake in that platform being green for such a project. In my experience there, the issue is not complexity but cost: Windows runners are an order of magnitude more costly than anything else and they need to be funded by someone who has the need and is willing to pay. Technically, MacOS runners aren't cheap either but that community seems to get by with little armies of mac minis toiling away in basements.

I expect that the automation part of this can largely be on the order of ~a python script with some knobs.

@qingyunqu
Copy link
Collaborator

qingyunqu commented Sep 12, 2023

From the point of view of Stablehlo users, first of all, we fully agree to remove the mhlo dependency. Regarding the timing of removing mhlo, we prefer to wait for @burmako to migrate Stablehlo-To-Linalg from IREE to the Stablehlo repository(#2177). But if we want to remove the mhlo dependency before this migration is complete, we can accept temporarily shutting down the e2e test of Stablehlo.
Secondly, regarding green commit, we agree with @ashay , we want to push LLVM upstream to give green commit. For torch-mlir users, upgrading llvm every day may be an unacceptable thing. BTW, here(https://github.com/bytedance/byteir/tree/main/frontends/torch-frontend) is our repo which integrates torch-mlir.

@Vremold
Copy link
Collaborator

Vremold commented Sep 12, 2023

Basically, I agree with @qingyunqu. One thing I'm concerned is that disabling stablehlo e2e tests might cause trouble merging relevant features into torch-mlir. So we hope the migration of Stablehlo-To-Linalg pass from IREE to the Stablehlo can be finished as soon as possible.

@eric-k256
Copy link
Collaborator

I think I follow what @stellaraccident is describing and it seems reasonable to me. How would we determine when to merge the rolling branch into 'main'? Is it time based, or as needed/desired? Those merges should be low effort, just trying to form a mental model of the flow.

One feature that would be nice would be to include more information when the RollLLVMCI fails. I get the RollPyTorch action failure notifications, but not a lot of information is in the mail. To get more, I need to read the CI log. A summary of the error messages or similar would be convenient, and might help get the right person to act on the failures quickly. Not a blocker, just a nice to have after watching what's happening the the LLVM move to GitHub PR notifications.

@stellaraccident
Copy link
Collaborator

stellaraccident commented Sep 12, 2023

I think the thing we (nod+Iree) are balancing is how much to keep the current state of torch-mlir intact vs starting a fork specifically for pytorch2/dynamo that ejects the legacy and sets up the next major version.

Having spent some time getting reacquainted, I don't think there is a direct line between the current state and where we need to be (and in fact are already for some of our leading edge downstream work): all of the old torchscript stuff, the jit importer, native pytorch dep, shape inference infra, module builder IR/APIs, and several of the other things go away with pytorch2. What is left is the torch dialect infra and conversation pipelines. The rest is provided by direct Python interop with frontends.

This is pretty in line with the long term vision that Sean layed out -- except that the time seems to be now to be going this path for real.

Given that there is clearly utility and people relying on the existing setup, I am thinking that we just let people with a dependency on the pt1 world to stay on main while we take a stripped down dynamo branch forward in line with the long term roadmap. Once the dynamo branch gets far enough, existing users can choose to migrate without going through the disruption of an in place migration and intentional removal of the current infra.

This would let those of us working on the limited subset that exists in the pt2 world to run at a higher rate and disconnected from the stability needs of the main branch. We can likely keep these two in sync and mergable for a while so as to avoid duplicate patching. In addition, the patch history of the dynamo branch can be used to make it easier to keep the main branch updated, since I assume we will be running that one faster.

Happy to have a bigger discussion, but I just wanted to signal that based on what I see, the project will need to break a lot of compatibility to upgrade cleanly to pytorch2, and if we accept that and branch, that also fixes some of the conflicting integrate needs in the short term.

I expect that it will be a few months of work to fully upgrade to pytorch2 and be ready for existing users who prefer a more stable current version to think about migrating. We can build out a lighter weight CI and testing integration as part of the new work (since the integration surface area and testing needs are drastically simplified).

@stellaraccident
Copy link
Collaborator

Basically, I agree with @qingyunqu. One thing I'm concerned is that disabling stablehlo e2e tests might cause trouble merging relevant features into torch-mlir. So we hope the migration of Stablehlo-To-Linalg pass from IREE to the Stablehlo can be finished as soon as possible.

Just to be clear: I have no visibility into whether any of this can or will happen, and since the project goals I work on don't require it, I would request that people who do have such a need be proactive about pushing to get this done (or finding an alternative way to convince ourselves that it is tested adequately).

@ashay
Copy link
Collaborator

ashay commented Sep 12, 2023

What I would like to see is for that CI to roll forward a branch that we keep patched when it breaks and then periodically merge back to main. Then anyone can do a roll-up of the project at the commit that they need.

Thanks, I think I am beginning to understand your suggestion. The part that I don't comprehend yet is if/when the dynamo branch starts depending on StableHlo, how do we reconcile the possibly-different LLVM commits chosen by Torch-MLIR and StableHlo? More precisely, isn't there a possibility that HEAD on the StableHlo submodule might have chosen an older LLVM commit than the one chosen by the RollLLVM CI, causing the Torch-MLIR build to break?

@stellaraccident
Copy link
Collaborator

stellaraccident commented Sep 12, 2023

What I would like to see is for that CI to roll forward a branch that we keep patched when it breaks and then periodically merge back to main. Then anyone can do a roll-up of the project at the commit that they need.

Thanks, I think I am beginning to understand your suggestion. The part that I don't comprehend yet is if/when the dynamo branch starts depending on StableHlo, how do we reconcile the possibly-different LLVM commits chosen by Torch-MLIR and StableHlo? More precisely, isn't there a possibility that HEAD on the StableHlo submodule might have chosen an older LLVM commit than the one chosen by the RollLLVM CI, causing the Torch-MLIR build to break?

I don't claim to have all of the answers yet, but it is pretty clear to me that the future state has to have more optionality, both in terms of downstream "exits" (i.e. StableHLO, et al) and the testing approach to them. The StableHLO dep itself is light-weight enough that it does not experience much version skew (and in my experience, it is within the realm of what this project could keep patched -- if operating at the level of conversion pipeline and lit tests). My expectation is that if the upstream project is responsible for the conversion pipeline and downstreams are responsible for contributing a CI job that validates it against a battery of tests (maybe even using the StableHLO interpreter in that case), then the dependency is very cheap.

In short, torch-mlir/dynamo is:

  • Dialects/conversion pipelines, validated by lit and basic IR-level tests as the first line of defense.
  • Optional conversion pipelines like StableHLO that the project chooses are light enough to keep on by default from a build/lit-test perspective.
  • Integration testing jobs are provided by contributing some form of a Dynamo backend as a pip package or equivalent (this should work well for StableHLO specifically because it is -- by its nature -- wire compatible for the level of drift this would create).
  • The torch-mlir project may host a battery of Dynamo-based tests (i.e. we got a significant portion of ours for a downstream by adapting Jason's work to scrape them from GitHub) and the glue to create a CI job, but it will not actually be responsible for the lock-step integration testing of every backend (but, like LLVM, people supporting it can hook integration test jobs up to its CI to provide signal).

So I guess to answer your question: I don't think it will be a big problem when the time comes so long as we keep this project out of the business of lock-step integration testing. Dynamo itself lends itself quite well to decoupling middleware like this from backends, and we should just lean in there. The legacy of pt1 was that if we didn't have lock-step integration testing, everything would fall apart. However, I think we've moved on from that need and re-introducing the testing approach from the perspective of pt2, we'd do it completely differently and in a much more manageable way.

With the integration testing sequestered in such a way, the compile/unit-test scoped dep itself isn't too bad to manage.

@stellaraccident
Copy link
Collaborator

(and yes, on the rolling integrate branch, we may be broken with respect to some backends when core mlir APIs break, but this will often clear on its own as everything catches up. We just don't down merge the integrate branch to main/dynamo until everything is sufficiently green for our tastes. But more aggressive projects can also pin to a state that doesn't have all backends passing. The key is to not have monolithic builds/tests so we let the timeline progress in a partially green state, acknowledging that many kinds of API breaks can be ignored while things catch up and eventually align)

@ashay
Copy link
Collaborator

ashay commented Sep 12, 2023

My expectation is that if the upstream project is responsible for the conversion pipeline and downstreams are responsible for contributing a CI job that validates it against a battery of tests (maybe even using the StableHLO interpreter in that case), then the dependency is very cheap.

Thanks, that resolves my concerns. I agree that we should keep the dependencies light weight to permit a higher tolerance to skews in chosen LLVM commits, and non-monolithic builds/tests sounds like a nice way to make progress faster.

@shauheen
Copy link

For anyone on this discussion who may not already know, PyTorch/XLA already does provide API to save StableHLO for your PyTorch models. Please feel free to file issues there if you would like the team to look into any particular features.

@stellaraccident
Copy link
Collaborator

stellaraccident commented Sep 12, 2023

For anyone on this discussion who may not already know, PyTorch/XLA already does provide API to save StableHLO for your PyTorch models. Please feel free to file issues there if you would like the team to look into any particular features.

+1 - it hasn't been clear to me that long-term the people using StableHLO based pipelines should be serviced by torch-mlir (if there is another tool for the job, supported by the people invested there). I don't have an opinion on that (our projects are not currently doing this). Definitely not looking to deplatform anything -- just supporting the investigation of other supported options that limit cross-tech-stack issues.

For those not using StableHLO, there are a lot of fundamental features that keep us from using StableHLO+PyTorch/XLA, but that is a different conversation vs seeing if there is another, better supported way out of PyTorch to StableHLO. I don't know enough about the motivations of the people here that are using the Torch-MLIR->StableHLO pipelines to know what the gaps are.

Just scanning the StableHLO pipelines, it does appear that they are heavily using dynamic shapes and other lowering/policy tweaks.

Looking at the other pipelines, there are still further gaps that bar entry to using StableHLO at all (more just presenting as an FYI -- we find each of these to be essential and cannot indirect through a layer that constricts them). With the torch-mlir linalg pipeline, for example, we can:

  • Match dynamic shape semantics with how Dynamo formulates them (which is a restricted form of dynamism that is easier on compilers).
  • Have custom ops and lowerings for constant and data management ops.
  • Support weird quantized types at will.
  • Support custom ops.
  • Intermingle other program constructs with the exported functions.
  • Just depend on some dialects/transforms vs taking complicated, out of ecosystem deps.
  • Have one pipeline from start to finish without jumping through serialization or out of the in-memory representation.

@stellaraccident
Copy link
Collaborator

FYI - the deprecated mlir-hlo dep hit me today with the inability to patch or navigate the contribution process. It is time: here is a patch to remove it: #2460

@dbabokin
Copy link
Contributor

Development Notes still refer to this issue for "green commit", but there are not updates anymore and git submodules use top-of-the-trunk for LLVM and StableHLO.

So a couple of questions here:

  1. What is the recommended way to pick the "green" commit nowadays?
  2. And should we close this issue and update the Development Notes document?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests