Skip to content

Add new Lucene directory to track Lucene files read access usages #50283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Dec 17, 2019

Note: this pull request targets the feature/searchable-snapshotsbranch

This pull request adds a new StatsDirectoryWrapper which is a Lucene FilterRepository that provides various statistics about the files it opens. It works by maintaining "live" stats for every file it opens and by wrapping all IndexInput it creates so that any operation on them (or on their slices or clones) is susceptible to contribute to the file statistics.

The directory provides two methods (getStats() and getStatsOrNull(fileName)) to capture the statistics for all files or a specific file at a given time, returning an IndexInputStats object.

This object contains information about the number of times the file has been opened, closed, sliced and cloned. It also contains information about read operations and makes the distinction between contiguous (sequential) bytes reads and non-contiguous bytes reads. For each (non-)contiguous type of reads it maintains the number of times the operation has been made, the total number of bytes read and the minimum and maximum length of bytes read at a time. Contiguous and non-contiguous stats are also aggregated under total reads stats.

Similarly to reads, IndexInputStats provides information on seek operations and makes the distinction between forward and backward seeks. For each type of seeking it maintains the number of times the operation has been made, the total number of "skipped" bytes, the min and max and average seeking distance. Both forward and backward seeks stats are also aggregated under total seeks statistics.

Finally, StatsDirectoryWrapper maintains "live" stats so that it can report (on a best effort basis) statistics on currently opened IndexInputs. A follow-up PR will be to expose those stats using a dedicated REST API.

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Dec 17, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

*/
public class StatsDirectoryWrapper extends FilterDirectory {

private final Map<String, LiveStats> records = ConcurrentCollections.newConcurrentMap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have added a precision here. The current implementation tracks "live" stats for IndexInputs, meaning that stats can be reported while they are actively updated. It allows to track progress while index inputs are used which I find interesting with IndexInput that can be slow to read. But it comes with additional burden since LiveStats must be used concurrently and captured on demand.

Another option would be to allow each IndexInput to update its own stats and only merge them when the IndexInput is closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think this is a good choice, because I think the updating of the stats should be a good deal cheaper than the operations for which we're collecting stats.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; I left two minor comments.

try {
super.close();
} finally {
records.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we discard the Directory immediately then it makes no difference whether we clear this or not, but if we keep hold of the closed Directory (e.g. for further analysis) then I think it would be useful to keep its stats in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let's not remove stats on directory closing.

if (n >= 0) {
forwardSeeks.add(n);
} else {
backwardSeeks.add(n != Long.MIN_VALUE ? -n : 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just backwardSeeks.add(n); for simplicity, and deal with the negative total elsewhere.

I also think it might be useful to distinguish forward seeks by size. One possibility would be to set a threshold to distinguish large and small seeks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just backwardSeeks.add(n); for simplicity, and deal with the negative total elsewhere.

Agreed.

I also think it might be useful to distinguish forward seeks by size. One possibility would be to set a threshold to distinguish large and small seeks.

Yes. I added the length of the index input to the stats (it is useful to interpret the stats) and I made the distinction between small and large seeks (I picked < 25% of length as a threshold)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but the idea makes sense to me. If the purpose is to expose these stats for searchable snapshots only I wonder if this would fit more naturally in the SearchableSnapshotDirectory and SearchableSnapshotIndexInput directly since they use a buffered access pattern ?

/**
* The last read position is kept around in order to detect (non)contiguous reads
**/
private final AtomicLong lastReadPosition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an AtomicLong ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed, I thought I removed it already but 🤦‍♂️

public void seek(final long pos) throws IOException {
final long filePointer = getFilePointer();
input.seek(pos);
stats.incrementSeekCount(pos - filePointer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this works with buffered index input that use seekInternal and readInternal to access the underlying input ? The count could be misleading since it doesn't take into account the internal seek that the buffered input imposes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the count could be misleading.

My initial wish was to have a filter directory that tracks stats and that could wrap any other directory, so that we could capture stats at different layers. For example, capturing stats at the top level searchable snapshot directory or capturing stats at another filter directory (that could take care of caching index input parts etc). I'm still struggling to follow this design though.

Knowing how Lucene access the top level directory is very interesting and correlate the numbers with the internal buffered reads executed by SearchableSnapshotIndexInput is also nice to have so I followed your suggestion and moved stats into the searchable snapshot dir and input. It allows to push the LiveStats object further down in case we need to easily track other things.

@tlrx
Copy link
Member Author

tlrx commented Dec 19, 2019

@DaveCTurner @jimczi Thanks for your comments and sorry for the time it took me to address them. I went back and forth with changes and finally fold the StatsDirectoryWrapper into the existings SearchableSnapshotDirectory and SearchableSnapshotIndexInput. This is something we could revert later but for now I think it allows to easily track what we want in the current implementation.

@DaveCTurner DaveCTurner mentioned this pull request Jan 14, 2020
19 tasks
@tlrx
Copy link
Member Author

tlrx commented Feb 7, 2020

Instrumentation has been added in #51637 and #51815, so I'm closing this one.

Thanks @jimczi and @DaveCTurner for your feedback here which has been incorporated to the two PRs I mentioned above.

@tlrx tlrx closed this Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants