Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the CacheUniqueID #7772

Merged
merged 1 commit into from
Dec 13, 2019
Merged

Conversation

doomerxe
Copy link
Contributor

@doomerxe doomerxe commented Nov 15, 2019

  1. Add the creation time to the OSCache_header2
  2. Fix the CacheUniqueID by adding creation time, metadatabytes, classesbytes, linenumbertablebytes and variabletablebytes at the end
  3. Add new Trace Events j9shr.2326-2329 for the CacheUniqueID

Fixes #8096

Signed-off-by: Jiahan Xi doomerXe@gmail.com

@doomerxe doomerxe changed the title Fix the CacheUniqueID WIP Fix the CacheUniqueID Nov 15, 2019
@doomerxe doomerxe force-pushed the FixCacheUniqueID branch 2 times, most recently from d5971f3 to 95b1d6e Compare November 15, 2019 15:48
@doomerxe doomerxe force-pushed the FixCacheUniqueID branch 6 times, most recently from 59416b9 to d1acb2d Compare November 25, 2019 15:35
@doomerxe
Copy link
Contributor Author

Hi @pshipton , I have fixed the CacheUniqueID by adding more information at the end. The only issue I have is:

On the ppc64_aix, if I use the option -Xshareclasses:listallcaches, the timestamp of the cache file will be changed. In this case, the multi-layer tests will fail after I use the option -Xshareclasses:listallcaches to check the number of layers. On the other platforms, it is not an issue (the timestamp won't be changed after using -Xshareclasses:listallcaches).

For now, I have removed the timestamp and used the filesize , metadatabytes, classesbytes, linenumbertablebytes and variabletablebytes as the cacheuniqueID. Do you think we need more fields on the cacheuniqueID? Could you please give me some advice? Thank you!

@pshipton
Copy link
Member

We can add a creation date/time to the OSCache_header2 structure, there is some reserved room, and use that as part of the cacheuniqueID.

@hangshao0
Copy link
Contributor

There is an issue #7710 about moving the cache from one machine to another. Not sure whether the create time will be changed if it is moved.

@doomerxe
Copy link
Contributor Author

doomerxe commented Nov 25, 2019

Hi @pshipton , to add the creation time, instead of using the st_mtime, I need to use the st_ctime which is actually the time of last status change (the birth time is not provided in Linux). The st_ctime will change if you move the file. Thus, according to @hangshao0 's comment, if there is a case which requires us to move the cache from one machine to another, we probably should not use the st_ctime.

Do you have any other suggestion? Thanks!

@pshipton
Copy link
Member

You can use the current time/date when the cache is created, like omrtime_hires_clock(), it's not required to use st_mtime or st_ctime. This becomes part of the unique ID for the cache, it doesn't matter if the cache is moved to another machine. It gives a value which is fairly unique for every cache. The other values (filesize , metadatabytes, classesbytes, linenumbertablebytes and variabletablebytes) will capture changes to the cache. Together they provide a unique ID which is unlikely to be duplicated by another cache.

@pshipton
Copy link
Member

I think omrtime_nano_time() is better than omrtime_hires_clock(), as I think hires_clock() can be affected by the boot time of the machine, so may repeat.

@pshipton
Copy link
Member

Sorry, I mean omrtime_current_time_nanos().

@doomerxe
Copy link
Contributor Author

doomerxe commented Nov 26, 2019

You can use the current time/date when the cache is created, like omrtime_hires_clock(), it's not required to use st_mtime or st_ctime. This becomes part of the unique ID for the cache, it doesn't matter if the cache is moved to another machine. It gives a value which is fairly unique for every cache. The other values (filesize , metadatabytes, classesbytes, linenumbertablebytes and variabletablebytes) will capture changes to the cache. Together they provide a unique ID which is unlikely to be duplicated by another cache.

Thank you for the explanation! I will make this change.

@doomerxe doomerxe force-pushed the FixCacheUniqueID branch 8 times, most recently from 974b1e2 to 1f5c9b3 Compare December 5, 2019 15:59
@doomerxe doomerxe marked this pull request as ready for review December 5, 2019 16:10
@doomerxe
Copy link
Contributor Author

doomerxe commented Dec 5, 2019

Tests have passed on ppc64_aix, ppc64le_linux, x86-64_linux, x86-64_mac (JDK 8)

@hangshao0 , could you please review it? Thank you!

@doomerxe doomerxe changed the title WIP Fix the CacheUniqueID Fix the CacheUniqueID Dec 5, 2019
@pshipton
Copy link
Member

Please stop rebasing before force pushing. Github will show me your changes if you push just your changes. If you need to rebase, please push it separately.

@pshipton
Copy link
Member

jenkins compile win32 jdk8

@doomerxe doomerxe force-pushed the FixCacheUniqueID branch 3 times, most recently from 29d457b to 1b8796d Compare December 12, 2019 16:25
@doomerxe
Copy link
Contributor Author

I have made the change and resolved the conflict.

@hangshao0
Copy link
Contributor

I believe when this is merged, we can remove

This suboption is experimental; do not use it in production.

From https://eclipse.github.io/openj9-docs-staging/385/xshareclasses/#layer and https://eclipse.github.io/openj9-docs-staging/385/xshareclasses/#createlayer

@pshipton
Copy link
Member

jenkins test sanity,extended win jdk8

@pshipton
Copy link
Member

I believe when this is merged, we can remove This suboption is experimental; do not use it in production.

Not sure I agree, let's discuss it in person.

@pshipton
Copy link
Member

testSCCacheManagement_win_0 is failing
20:21:46.644 0x1479a00 j9shr.1009 * ** ASSERTION FAILED ** at OSCachemmap.cpp:1348: ((0 ))

@doomerxe
Copy link
Contributor Author

testSCCacheManagement_win_0 is failing
20:21:46.644 0x1479a00 j9shr.1009 * ** ASSERTION FAILED ** at OSCachemmap.cpp:1348: ((0 ))

I'm taking a look at this, it's a new function which is added in #7955.

@hangshao0
Copy link
Contributor

Looks like it is caused by my change.

@hangshao0
Copy link
Contributor

hangshao0 commented Dec 13, 2019

Found in the tracepoint;

3XEHSTTYPE 20:21:46:645032016 GMT j9shr.1126 - >OSCache::removeCacheVersionAndGen: Entry - versionLen=14 cacheNameWithVGen=290M4F1A64P_cache3_multilayercache_G41L00
...
3XEHSTTYPE 20:21:46:645123360 GMT j9prt.788 - <j9shmem_getDir: Exiting with buffer=C:\Users\jenkins\AppData\Local\javasharedresources\\
...
3XEHSTTYPE 20:21:46:645312096 GMT j9shr.726 - <SH_OSCachemmap::startup: j9file_open failed for cache path name = C:\Users\jenkins\AppData\Local\javasharedresources\\C290M4F1A64P_ache3_multilayercache_G41L00

Looks like the cache unique ID starts with C:\Users\jenkins\AppData\Local\javasharedresources\C290M4F1A64P_cache3_multilayercache_G41L00, but when getting the cache file from the unique ID, we are using uniqueID + dirLen to get the file name:

SH_OSCache::getCacheNameAndLayerFromUnqiueID(J9JavaVM* vm, const char* cacheDirName, const char* uniqueID, UDATA idLen, char* nameBuf, UDATA nameBuffLen, I_8* layer)
{
	PORT_ACCESS_FROM_JAVAVM(vm);
	UDATA dirLen = strlen(cacheDirName);
	const char* cacheNameWithVGenStart = uniqueID + dirLen;

I guess there are 2 \ in the cacheDirName after "javasharedresources" but only one in the unique ID, making the file name incorrect (290M4F1A64P_cache3_multilayercache_G41L00 missing "C" at the begining). Then we fail to acquire write lock on a non-existing cache file:

3XEHSTTYPE 20:21:46:645312096 GMT j9shr.726 - <SH_OSCachemmap::startup: j9file_open failed for cache path name = C:\Users\jenkins\AppData\Local\javasharedresources\\C290M4F1A64P_ache3_multilayercache_G41L00

"ache3" should be "cache3" here.

@hangshao0
Copy link
Contributor

hangshao0 commented Dec 13, 2019

Can you change SH_OSCache::getCacheNameAndLayerFromUnqiueID() to be more robust without relying on the size of passed in cacheDirName ? @doomerxe

@doomerxe
Copy link
Contributor Author

Hi @pshipton , I have made the changes, could you please re-run the test? I will squash the commits after it passes. Thank you!

@hangshao0
Copy link
Contributor

rc = SH_OSCachemmap::getCacheStats(vm, cacheDirName, groupPerm, cacheNameWithVGen, result, reason, lowerLayerList);

The second parameter should be ctrlDirName instead of cacheDirName.

@pshipton
Copy link
Member

jenkins test sanity,extended win jdk8

@pshipton
Copy link
Member

@hangshao0 can you please review the changes before they are squashed.

@hangshao0
Copy link
Contributor

Except the NLS sample inputs, the rest looks good to me.

@doomerxe
Copy link
Contributor Author

Hi @pshipton , may I squash the commits now? Thanks!

@pshipton
Copy link
Member

@doomerxe yes, please go ahead and squash

1. Add the creation time to the OSCache_header2
2. Fix the CacheUniqueID by adding creation time, metadatabytes, classesbytes, linenumbertablebytes and variabletablebytes at the end
3. Add new Trace Events j9shr.2326-2329 for the CacheUniqueID

Signed-off-by: Jiahan Xi <doomerXe@gmail.com>
@pshipton pshipton merged commit 76c715d into eclipse-openj9:master Dec 13, 2019
@pshipton pshipton self-assigned this Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openjdk8_j9_extended.functional_x86-64_windows ASSERTION FAILED ** at OSCachemmap.cpp:1336: ((0 ))
3 participants