Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HybridDirectory MMAP Extensions Configurable #3837

Merged
merged 2 commits into from
Jul 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ public final class IndexScopedSettings extends AbstractScopedSettings {
BitsetFilterCache.INDEX_LOAD_RANDOM_ACCESS_FILTERS_EAGERLY_SETTING,
IndexModule.INDEX_STORE_TYPE_SETTING,
IndexModule.INDEX_STORE_PRE_LOAD_SETTING,
IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS,
IndexModule.INDEX_RECOVERY_TYPE_SETTING,
IndexModule.INDEX_QUERY_CACHE_ENABLED_SETTING,
FsDirectoryFactory.INDEX_LOCK_FACTOR_SETTING,
Expand Down
12 changes: 12 additions & 0 deletions server/src/main/java/org/opensearch/index/IndexModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,18 @@ public final class IndexModule {
Property.NodeScope
);

/** Which lucene file extensions to load with the mmap directory when using hybridfs store.
* This is an expert setting.
* @see <a href="https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/codecs/lucene92/package-summary.html#file-names">Lucene File Extensions</a>.
*/
public static final Setting<List<String>> INDEX_STORE_HYBRID_MMAP_EXTENSIONS = Setting.listSetting(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expert parameter which, I suspect, few are going to touch. Can we javadoc this as expert with a description similar to above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize sure can

"index.store.hybrid.mmap.extensions",
List.of("nvd", "dvd", "tim", "tip", "dim", "kdd", "kdi", "cfs", "doc"),
Function.identity(),
Property.IndexScope,
Property.NodeScope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to set this as a final setting (Property.Final)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize I don't think so, I would want to be able to change this by closing index, updating setting, then reopening. I thought final would prevent that scenario, but I could be wrong since I am not super familiar with that setting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would prevent that scenario. +1 to go w/o final.

);

public static final String SIMILARITY_SETTINGS_PREFIX = "index.similarity";

// whether to use the query cache
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,10 @@ protected Directory newFSDirectory(Path location, LockFactory lockFactory, Index
case HYBRIDFS:
// Use Lucene defaults
final FSDirectory primaryDirectory = FSDirectory.open(location, lockFactory);
final Set<String> mmapExtensions = new HashSet<>(indexSettings.getValue(IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS));
Copy link
Collaborator

@reta reta Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattweber I am trying to wrap my head around these two properties INDEX_STORE_PRE_LOAD_SETTING and INDEX_STORE_HYBRID_MMAP_EXTENSIONS, which apparently serve the same purpose (and do have same set of values), the HybridDirectory is looking like PreLoadMMapDirectory (except the setPreload thing) ... why do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta INDEX_STORE_PRE_LOAD_SETTING controls which files automatically get loaded into os cache and I don't believe this has any defaults as it is an expensive operation. This PR adds INDEX_STORE_HYBRID_MMAP_EXTENSIONS which controls which extensions are with opened with mmap store. The current defaults were picked by elastic after doing benchmarks. I would like it configurable so a user can tune to their specific usecase. For example, I want to mmap everything EXCEPT stored fields. The settings are related, but not the same.

if (primaryDirectory instanceof MMapDirectory) {
MMapDirectory mMapDirectory = (MMapDirectory) primaryDirectory;
return new HybridDirectory(lockFactory, setPreload(mMapDirectory, lockFactory, preLoadExtensions));
return new HybridDirectory(lockFactory, setPreload(mMapDirectory, lockFactory, preLoadExtensions), mmapExtensions);
} else {
return primaryDirectory;
}
Expand Down Expand Up @@ -142,10 +143,12 @@ public static boolean isHybridFs(Directory directory) {
*/
static final class HybridDirectory extends NIOFSDirectory {
private final MMapDirectory delegate;
private final Set<String> mmapExtensions;

HybridDirectory(LockFactory lockFactory, MMapDirectory delegate) throws IOException {
HybridDirectory(LockFactory lockFactory, MMapDirectory delegate, Set<String> mmapExtensions) throws IOException {
super(delegate.getDirectory(), lockFactory);
this.delegate = delegate;
this.mmapExtensions = mmapExtensions;
}

@Override
Expand All @@ -164,43 +167,16 @@ public IndexInput openInput(String name, IOContext context) throws IOException {
}
}

boolean useDelegate(String name) {
final String extension = FileSwitchDirectory.getExtension(name);
return mmapExtensions.contains(extension);
}

@Override
public void close() throws IOException {
IOUtils.close(super::close, delegate);
}

boolean useDelegate(String name) {
String extension = FileSwitchDirectory.getExtension(name);
switch (extension) {
// Norms, doc values and term dictionaries are typically performance-sensitive and hot in the page
// cache, so we use mmap, which provides better performance.
case "nvd":
case "dvd":
case "tim":
// We want to open the terms index and KD-tree index off-heap to save memory, but this only performs
// well if using mmap.
case "tip":
// dim files only apply up to lucene 8.x indices. It can be removed once we are in lucene 10
case "dim":
case "kdd":
case "kdi":
// Compound files are tricky because they store all the information for the segment. Benchmarks
// suggested that not mapping them hurts performance.
case "cfs":
// MMapDirectory has special logic to read long[] arrays in little-endian order that helps speed
// up the decoding of postings. The same logic applies to positions (.pos) of offsets (.pay) but we
// are not mmaping them as queries that leverage positions are more costly and the decoding of postings
// tends to be less a bottleneck.
case "doc":
return true;
// Other files are either less performance-sensitive (e.g. stored field index, norms metadata)
// or are large and have a random access pattern and mmap leads to page cache trashing
// (e.g. stored fields and term vectors).
default:
return false;
}
}

MMapDirectory getDelegate() {
return delegate;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,20 +70,52 @@ public void testPreload() throws IOException {
try (Directory directory = newDirectory(build)) {
assertTrue(FsDirectoryFactory.isHybridFs(directory));
FsDirectoryFactory.HybridDirectory hybridDirectory = (FsDirectoryFactory.HybridDirectory) directory;
assertTrue(hybridDirectory.useDelegate("foo.dvd"));
// test default hybrid mmap extensions
assertTrue(hybridDirectory.useDelegate("foo.nvd"));
assertTrue(hybridDirectory.useDelegate("foo.dvd"));
assertTrue(hybridDirectory.useDelegate("foo.tim"));
assertTrue(hybridDirectory.useDelegate("foo.tip"));
assertTrue(hybridDirectory.useDelegate("foo.cfs"));
assertTrue(hybridDirectory.useDelegate("foo.dim"));
assertTrue(hybridDirectory.useDelegate("foo.kdd"));
assertTrue(hybridDirectory.useDelegate("foo.kdi"));
assertFalse(hybridDirectory.useDelegate("foo.bar"));
assertTrue(hybridDirectory.useDelegate("foo.cfs"));
assertTrue(hybridDirectory.useDelegate("foo.doc"));
assertFalse(hybridDirectory.useDelegate("foo.pos"));
assertFalse(hybridDirectory.useDelegate("foo.pay"));
MMapDirectory delegate = hybridDirectory.getDelegate();
assertThat(delegate, Matchers.instanceOf(FsDirectoryFactory.PreLoadMMapDirectory.class));
FsDirectoryFactory.PreLoadMMapDirectory preLoadMMapDirectory = (FsDirectoryFactory.PreLoadMMapDirectory) delegate;
assertTrue(preLoadMMapDirectory.useDelegate("foo.dvd"));
assertTrue(preLoadMMapDirectory.useDelegate("foo.bar"));
assertFalse(preLoadMMapDirectory.useDelegate("foo.cfs"));
}
build = Settings.builder()
.put(IndexModule.INDEX_STORE_TYPE_SETTING.getKey(), IndexModule.Type.HYBRIDFS.name().toLowerCase(Locale.ROOT))
.putList(IndexModule.INDEX_STORE_PRE_LOAD_SETTING.getKey(), "nvd", "dvd", "cfs")
.putList(IndexModule.INDEX_STORE_HYBRID_MMAP_EXTENSIONS.getKey(), "nvd", "dvd", "tim", "pos", "pay")
.build();
try (Directory directory = newDirectory(build)) {
assertTrue(FsDirectoryFactory.isHybridFs(directory));
FsDirectoryFactory.HybridDirectory hybridDirectory = (FsDirectoryFactory.HybridDirectory) directory;
// test custom hybrid mmap extensions
assertTrue(hybridDirectory.useDelegate("foo.nvd"));
assertTrue(hybridDirectory.useDelegate("foo.dvd"));
assertTrue(hybridDirectory.useDelegate("foo.tim"));
assertFalse(hybridDirectory.useDelegate("foo.tip"));
assertFalse(hybridDirectory.useDelegate("foo.dim"));
assertFalse(hybridDirectory.useDelegate("foo.kdd"));
assertFalse(hybridDirectory.useDelegate("foo.kdi"));
assertFalse(hybridDirectory.useDelegate("foo.cfs"));
assertFalse(hybridDirectory.useDelegate("foo.doc"));
assertTrue(hybridDirectory.useDelegate("foo.pos"));
assertTrue(hybridDirectory.useDelegate("foo.pay"));
MMapDirectory delegate = hybridDirectory.getDelegate();
assertThat(delegate, Matchers.instanceOf(FsDirectoryFactory.PreLoadMMapDirectory.class));
FsDirectoryFactory.PreLoadMMapDirectory preLoadMMapDirectory = (FsDirectoryFactory.PreLoadMMapDirectory) delegate;
assertTrue(preLoadMMapDirectory.useDelegate("foo.dvd"));
assertFalse(preLoadMMapDirectory.useDelegate("foo.bar"));
assertTrue(preLoadMMapDirectory.useDelegate("foo.cfs"));
assertTrue(preLoadMMapDirectory.useDelegate("foo.nvd"));
}
}

Expand Down