-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In addition to persisting the elfie index as a txt file, persist it as a binary file #76509
In addition to persisting the elfie index as a txt file, persist it as a binary file #76509
Conversation
…s a binary file, allowing it to on sunsequent loads to be read from the binary file (which ScottLo indicated should be more allocation friendly). Created as a draft PR to get speedometer results.
@@ -289,17 +289,45 @@ private async Task<TimeSpan> DownloadFullDatabaseAsync(FileInfo databaseFileInfo | |||
// Write the file out to disk so we'll have it the next time we launch VS. Do this | |||
// after we set the in-memory instance so we at least have something to search while | |||
// we're waiting to write. | |||
await WriteDatabaseFileAsync(databaseFileInfo, bytes, cancellationToken).ConfigureAwait(false); | |||
await WriteDatabaseTextFileAsync(databaseFileInfo, bytes, cancellationToken).ConfigureAwait(false); | |||
await WriteDatabaseBinaryFileAsync(database, databaseFileInfo, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a part of me that is concerned about torn writes. specifically that one of these succeeds and the other does not. leading to an inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK for the txt file to be written out and the binary file to not be. The subsequent VS session will just read from the txt file and again try to persist out the binary file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the reverse? or do these functions fail instead of continuing on? if this is the case, doc that it's ok to fail to write this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this a little closer, this is tricky and I don't know if it works the way it's intended currently. The RepeatIOAsync doesn't appear to throw if it fails the maximum number of times. I'm hesitant to change it as maybe I'm not understanding the exception flow through that call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more consideration, I think it's ok if the bin file is persisted and the txt file isn't. The next VS run will not find the txt file and will go through the DownloadFullDatabaseAsync path in UpdateDatabaseInBackgroundWorkerAsync and use the newly downloaded file to populate the db rather than the bin file.
{ | ||
LogInfo("Writing database file"); | ||
|
||
await WriteDatabaseFileAsync(databaseFileInfo, new ArraySegment<byte>(bytes), cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosu what arraysegment is buying us here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally had this as a ReadOnlySpan, but that caused the following difficulties:
-
IOService.WriteAndFlushAllBytes needed to have different code for the FileStream.Write calls as netstandard2.0 doesn't have the ReadOnlySpan overload.
-
The unit tests hit some issues because they are written in VB and using an It.IsAny specifying a ReadOnlySpan was causing compiler errors and me to bang my head against the wall.
src/Features/Core/Portable/SymbolSearch/Windows/SymbolSearchUpdateEngine.Update.cs
Outdated
Show resolved
Hide resolved
var databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName); | ||
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo); | ||
var isBinary = _service._ioService.Exists(databaseBinaryFileInfo); | ||
var databaseBytes = _service._ioService.ReadAllBytes(isBinary ? databaseBinaryFileInfo.FullName : databaseFileInfo.FullName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a race-condition here between checking existence and then reading. could we just attempt to ReadAllBytes of hte binary file, and fall over it we get null there?
var delayUntilUpdate = await ProcessPatchXElementAsync( | ||
databaseFileInfo, | ||
element, | ||
getDatabaseBytes: () => isBinary ? _service._ioService.ReadAllBytes(databaseFileInfo.FullName) : databaseBytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took me a second. can you doc.
stepping back. if binary is supported (i wasn't aware of that), why not do 100% of our work in the binary space? |
Commit 1 test insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/599405 |
Get rid of a potential race and instead use an exception via ReadAllBytes.
I didn't think that was an option because of the patching code. Although I didn't really try to fully understand that code,. Could patching work without having the original bytes? Manual debugging isn't going through the patch code, but manually inspecting the code leads me to NativePatching.ApplyPatch, which looks like the patchBytes is based upon the original bytes. |
Probably not. Good point :) |
@CyrusNajmabadi -- Does the logging in this code provide value? It doesn't write it to telemetry, but just stores the logged string in a static LinkedList. If it doesn't provide value for you, I'd be inclined to just remove it as it's a bit of clutter. |
It should get written into vs' activity log file. And it has definitely provided value as to why things aren't working. We used it to track down a cdn issue one time :-) |
It's not obvious to me how it's doing that if so In reply to: 2554945112 |
|
||
// Obtain the underlying array from the memory stream. If for some reason this isn't available, | ||
// fall back to reading the stream into a new byte array. | ||
if (!memoryStream.TryGetBuffer(out var arraySegmentBuffer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outside of this PR. but i think we could benefit from thsi pattern in a few more places. like in the sqlite system. def takea look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find anything with my quick search, was there a particular class you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking about Sqlite Accessor.WriteStream which which works with streams and does some funky stuff with SQLitePersistentStorage.GetBytes to try to an array to write to to then write to the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method is a very light allocator in the csharp editing speedometer (less than 1 MB allocated). I took a look at the code and it wasn't straight forward to me how to map this pattern to it, so I'll avoid this change unless you push back.
if (!memoryStream.TryGetBuffer(out var arraySegmentBuffer)) | ||
{ | ||
memoryStream.Position = 0; | ||
arraySegmentBuffer = new ArraySegment<byte>(writer.BaseStream.ReadAllBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you doing writer.BaseStream, instead of just memoryStream here? is there a difference between the two? if so, please doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadAllBytes is defined on a Stream. To get to that from the BinaryWriter, you need to get the BaseStream. (will add doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatted offline, and I misread the request. Updating in commit 4
// (intentionally not wrapped in IOUtilities. If this throws we want to restart). | ||
var databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName); | ||
|
||
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like i want tshi section extracted into a method or local function.
|
||
// We have patch data. Apply it to our current database bytes to produce the new | ||
// database. | ||
LogInfo("Applying patch"); | ||
var finalBytes = _service._patchService.ApplyPatch(databaseBytes, patchBytes); | ||
LogInfo($"Applying patch completed. finalBytes.Length={finalBytes.Length}"); | ||
|
||
CreateAndSetInMemoryDatabase(finalBytes); | ||
database = CreateAndSetInMemoryDatabase(finalBytes, isBinary: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you should doc why you're passing isBinary:false here.
// (intentionally not wrapped in IOUtilities. If this throws we want to restart). | ||
databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName); | ||
} | ||
var databaseBytes = GetDatabaseBytes(databaseFileInfo, out var isBinary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var databaseBytes = GetDatabaseBytes(databaseFileInfo, out var isBinary); | |
var (databaseBytes, isBinary) = GetDatabaseBytes(databaseFileInfo); |
byte[] GetDatabaseBytes(FileInfo databaseFileInfo, out bool isBinary) | ||
{ | ||
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo); | ||
isBinary = true; | ||
|
||
try | ||
{ | ||
// First attempt to read from the binary file. If that fails, fall back to the text file. | ||
return _service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName); | ||
} | ||
catch (Exception e) when (IOUtilities.IsNormalIOException(e)) | ||
{ | ||
} | ||
|
||
isBinary = false; | ||
|
||
// (intentionally not wrapped in IOUtilities. If this throws we want to restart). | ||
return _service._ioService.ReadAllBytes(databaseFileInfo.FullName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte[] GetDatabaseBytes(FileInfo databaseFileInfo, out bool isBinary) | |
{ | |
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo); | |
isBinary = true; | |
try | |
{ | |
// First attempt to read from the binary file. If that fails, fall back to the text file. | |
return _service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName); | |
} | |
catch (Exception e) when (IOUtilities.IsNormalIOException(e)) | |
{ | |
} | |
isBinary = false; | |
// (intentionally not wrapped in IOUtilities. If this throws we want to restart). | |
return _service._ioService.ReadAllBytes(databaseFileInfo.FullName); | |
} | |
(byte[] dataBytes, bool isBinary) GetDatabaseBytes(FileInfo databaseFileInfo) | |
{ | |
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo); | |
try | |
{ | |
// First attempt to read from the binary file. If that fails, fall back to the text file. | |
return (_service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName), isBinary: true); | |
} | |
catch (Exception e) when (IOUtilities.IsNormalIOException(e)) | |
{ | |
} | |
// (intentionally not wrapped in IOUtilities. If this throws we want to restart). | |
return (_service._ioService.ReadAllBytes(databaseFileInfo.FullName), isBinary: false); | |
} |
…and out parameter
This will allow subsequent loads of this index to be read from the binary file (which ScottLo indicated should be more allocation friendly).
Speedometer results (from commit 1) came back good. Filtered CPU/Allocations with IncPaths "elfie"
Old CPU:
Old Allocations:
New CPU:
New Allocations: