Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Portable SCC #7710

Open
2 of 5 tasks
dsouzai opened this issue Nov 6, 2019 · 15 comments
Open
2 of 5 tasks

Define Portable SCC #7710

dsouzai opened this issue Nov 6, 2019 · 15 comments

Comments

@dsouzai
Copy link
Contributor

dsouzai commented Nov 6, 2019

For the majority of the time the SCC feature has existed, there was no real reason for the SCC to ever be moved from one machine to another. However, with the emergence of Docker containers it is much more likely that a SCC generated on one machine could be used on another. In fact, multi-layered SCCs (introduced in 0.17) facilitates such a workflow. However, because the SCC was never moved from one machine to another, it is not completely portable1. This high level issue exists to track the discussion and follow up work to define and implement a completely portable SCC.

The following is a list of ideas that worth discussing in the OpenJ9 Community Call; feel free to comment other ideas and I'll add it to the list.

  1. Define baseline target processor features / use target processor features specified in SCC (Define baseline target environment for AOT compilations #7635).
  2. Make Target Environment local to compilation (Make target environment local to the compiler eclipse-omr/omr#4518).
  3. Add option to -Xshareclasses to denote "portable" mode.
  4. Save options string to SCC, have ability to set up JVM according to stored options in SCC.
  5. Force compressed shift amount in "portable" mode.
  6. Standardize write barrier (write barrier sequence can vary even in the same GC policy depending on the heap (e.g. fixed heap))

Sub Issues (Discussion):

Overall Principles:

  • A Portable SCC:
    • Should maximize the amount of AOT being used when the cache is moved to a different machine.
    • Should be willing to accept slightly slower AOT code if it makes the code more portable

1 - The lack of portability comes mostly from the stored AOT code; however, this issue tags other components because defining a portable SCC still requires coordination with the other components.

@hangshao0
Copy link
Contributor

I guess the Portable SCC applies to persistent cache only.

@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 6, 2019

I guess the Portable SCC applies to persistent cache only.

Is there a non-persistent SCC?

@hangshao0
Copy link
Contributor

hangshao0 commented Nov 6, 2019

Is there a non-persistent SCC ?

Yes. But the default cache type is persistent on all platforms except z/OS.

@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 6, 2019

Ah right; then yeah, this is only relevant for the persistent cache. More specifically, only if the user wishes to make it portable. IMO I don't think we should enforce that the SCC is always portable since portability limits the amount of optimization that can be done.

@DanHeidinga
Copy link
Member

Notes from OpenJ9 community call:

  • Required for the JIT.
  • gcc mtune / march style options?
  • Global to SCC or per-compilation? Only validate the env once so maybe better to do it globally?
  • Per-method would possibly be easier for multi-layer SCC. Would also allow creating new AOT for current processor config if the previous AOT body wasn't valid for this processor.
  • Discussion on tomorrow's OMR architecture call
  • Add option to -Xshareclasses to denote "portable" mode.

For base layers of SCC, we want more broadly applicable AOT code.

  • Save options string to SCC, have ability to set up JVM according to stored options in SCC.
  • Options already saved in the cache to allow associating them with the GC hints.
  • usability feature for not needing to respecify options?
  • Force compressed shift amount in "portable" mode.
  • Would require / want to ensure the heap layout so that the heap is in the right location for the required shift.
  • Make AOT load the shift from the J9JavaVM at the start of the compiled body
  • Standardize write barrier (write barrier sequence can vary even in the same GC policy depending on the heap (e.g. fixed heap))
  • Would result in the heaviest barriers being compiled.
  • Maybe compile in the heaviest barrier and patch parts on load?
  • Fixing -Xmx / -Xms results in a simpler barrier for gencon, maybe don't use this when doing AOT?
  • Probably don't want to support changing gc policies - already have a mechanism to validate the read/write barrier kinds for the AOT code

Check:

  • gc policy
  • arraylet leaf size
  • more <<TODO: fill in the link to the code that does these checks>>

Make compressedrefs independent of shift. Use a base register rather than a shift.

@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 13, 2019

@andrewcraik @mstoodle @vijaysun-omr @fjeremic could you guys take a look at #7710 (comment) ; what are your thoughts?

@fjeremic
Copy link
Contributor

  • Force compressed shift amount in "portable" mode.
    • Would require / want to ensure the heap layout so that the heap is in the right location for the required shift.
    • Make AOT load the shift from the J9JavaVM at the start of the compiled body

Can you please elaborate on what this means? What does forcing compressed shift amount mean?

@andrewcraik
Copy link
Contributor

andrewcraik commented Nov 13, 2019

Make AOT load the shift from the J9JavaVM at the start of the compiled body

that is going to be really bad for performance - variable width shifts on x86 are massively slower than fixed shifts. Worse still, most of the fixed shifts are folded into memrefs / leas which wouldn't be possible with the variable variant. Fixing the shift would make a lot more sense - could force 3 shift or 4 shift. Could just tell the GC to use the shift amount from the SCC instead.

Maybe compile in the heaviest barrier and patch parts on load?

Would result in way more barriers in general because some GC modes need a lot more barriers than the standard gencon. While you might be able to patch them out you will still have a loss of icache and footprint growth from all of this stuff.

Probably don't want to support changing gc policies

agreed - for reasonable performance we would want the same policy with the same shift amount to allow the compiler to make reasonable assumptions. If there are good diagnostics from the cache when the user overrides the config on the command line it would make sense to just take the config from the cache unless the user specifies overrides on the command line which could then be vetted for compatibility IMO.

Make compressedrefs independent of shift. Use a base register rather than a shift.

This is a lot of work for the x codegen and likely going to have a very bad impact on perf - I don't necessarily agree that this is the right way to go. I think fixing the shift and forcing shifts will give better performance. We only have to worry about 3 vs 4 shift then since that affects object alignment.

Fixing -Xmx / -Xms results in a simpler barrier for gencon, maybe don't use this when doing AOT

Not baking in that assumption seems reasonable to allow flexibility under AOT

@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 13, 2019

Can you please elaborate on what this means? What does forcing compressed shift amount mean?

The idea is that regardless of the heap size, we set the shift to be a certain value (if in compressedrefs mode)

@andrewcraik
Copy link
Contributor

There are also the other "big" JIT mode decisions like HCR support that will have to be decided. They can be enabled, but they will have a big impact on performance (esp since OSR isn't support with AOT yet). Similarly, field watch, FSD, etc.

We may also want to make different choices in the virtual guards on methods to reduce the likelihood of guard patching? Something to think about anyway...

@fjeremic
Copy link
Contributor

The idea is that regardless of the heap size, we set the shift to be a certain value (if in compressedrefs mode)

Ok I suppose the question is under the portable SCC mode what do we care about more:

  • AOT code performance
  • JIT code performance after AOT bodies have been recompiled

If we fix the compressedrefs shift under the portable SCC mode then AOT code performance will be good (in comparison) however JIT code performance will not be as good since we have an artificially large shift value. I'm also not sure of the performance delta of the GC.

I do agree that fixing the shift value under portable SCC is definitely the easiest thing we can do to get started.

@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 13, 2019

There are also the other "big" JIT mode decisions like HCR support that will have to be decided. They can be enabled, but they will have a big impact on performance (esp since OSR isn't support with AOT yet). Similarly, field watch, FSD, etc.

If we don't go down the path described by [1] then what I'm envisioning we could do is what I described in [2]. Until features like OSR, FSD, etc. are enabled in AOT, then I think we should maintain the current approach of not loading AOT in those modes.

[1] eclipse-omr/omr#4518
[2] #7635 (comment)

@dsouzai
Copy link
Contributor Author

dsouzai commented Dec 4, 2019

We had the first portable SCC workgroup meeting today (as decided in [1]). I've updated this PR's description with links to sub issues based on the points brought up where relevant discussions can happen.

[1] https://openj9.slack.com/archives/C8312LCV9/p1574881291118000

@DanHeidinga
Copy link
Member

The other issue that was mentioned during the workgroup discussion was that the SCC does timestamp checking for the JARs / modules file on first access.

In Docker, we expect those checks to continue to pass (file timestamps aren't changed on image run) but are more likely to fail if the cache is manually copied.

There are existing options to disable timestamp checking to work around the manual case.

@pshipton
Copy link
Member

pshipton commented Dec 5, 2019

There are existing options to disable timestamp checking to work around the manual case.

-Xshareclasses: suboption noTimestampChecks.

Using it is not a good idea, if any jar file changes, the changes will be ignored. Better to add support to update the cache with the new timestamps on a machine migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants