Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In addition to persisting the elfie index as a txt file, persist it as a binary file #76509

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

ToddGrun
Copy link
Contributor

@ToddGrun ToddGrun commented Dec 18, 2024

This will allow subsequent loads of this index to be read from the binary file (which ScottLo indicated should be more allocation friendly).

Speedometer results (from commit 1) came back good. Filtered CPU/Allocations with IncPaths "elfie"

Old CPU:
image

Old Allocations:
image

New CPU:
image

New Allocations:
image

…s a binary file, allowing it to on sunsequent loads to be read from the binary file (which ScottLo indicated should be more allocation friendly).

Created as a draft PR to get speedometer results.
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 18, 2024
@@ -289,17 +289,45 @@ private async Task<TimeSpan> DownloadFullDatabaseAsync(FileInfo databaseFileInfo
// Write the file out to disk so we'll have it the next time we launch VS. Do this
// after we set the in-memory instance so we at least have something to search while
// we're waiting to write.
await WriteDatabaseFileAsync(databaseFileInfo, bytes, cancellationToken).ConfigureAwait(false);
await WriteDatabaseTextFileAsync(databaseFileInfo, bytes, cancellationToken).ConfigureAwait(false);
await WriteDatabaseBinaryFileAsync(database, databaseFileInfo, cancellationToken).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a part of me that is concerned about torn writes. specifically that one of these succeeds and the other does not. leading to an inconsistent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK for the txt file to be written out and the binary file to not be. The subsequent VS session will just read from the txt file and again try to persist out the binary file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the reverse? or do these functions fail instead of continuing on? if this is the case, doc that it's ok to fail to write this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a little closer, this is tricky and I don't know if it works the way it's intended currently. The RepeatIOAsync doesn't appear to throw if it fails the maximum number of times. I'm hesitant to change it as maybe I'm not understanding the exception flow through that call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit more consideration, I think it's ok if the bin file is persisted and the txt file isn't. The next VS run will not find the txt file and will go through the DownloadFullDatabaseAsync path in UpdateDatabaseInBackgroundWorkerAsync and use the newly downloaded file to populate the db rather than the bin file.

{
LogInfo("Writing database file");

await WriteDatabaseFileAsync(databaseFileInfo, new ArraySegment<byte>(bytes), cancellationToken).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosu what arraysegment is buying us here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally had this as a ReadOnlySpan, but that caused the following difficulties:

  1. IOService.WriteAndFlushAllBytes needed to have different code for the FileStream.Write calls as netstandard2.0 doesn't have the ReadOnlySpan overload.

  2. The unit tests hit some issues because they are written in VB and using an It.IsAny specifying a ReadOnlySpan was causing compiler errors and me to bang my head against the wall.

var databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName);
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo);
var isBinary = _service._ioService.Exists(databaseBinaryFileInfo);
var databaseBytes = _service._ioService.ReadAllBytes(isBinary ? databaseBinaryFileInfo.FullName : databaseFileInfo.FullName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a race-condition here between checking existence and then reading. could we just attempt to ReadAllBytes of hte binary file, and fall over it we get null there?

var delayUntilUpdate = await ProcessPatchXElementAsync(
databaseFileInfo,
element,
getDatabaseBytes: () => isBinary ? _service._ioService.ReadAllBytes(databaseFileInfo.FullName) : databaseBytes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a second. can you doc.

@CyrusNajmabadi
Copy link
Member

stepping back. if binary is supported (i wasn't aware of that), why not do 100% of our work in the binary space?

@ToddGrun
Copy link
Contributor Author

Get rid of a potential race and instead use an exception via ReadAllBytes.
@ToddGrun
Copy link
Contributor Author

ToddGrun commented Dec 19, 2024

stepping back. if binary is supported (i wasn't aware of that), why not do 100% of our work in the binary space?

I didn't think that was an option because of the patching code. Although I didn't really try to fully understand that code,. Could patching work without having the original bytes? Manual debugging isn't going through the patch code, but manually inspecting the code leads me to NativePatching.ApplyPatch, which looks like the patchBytes is based upon the original bytes.

@CyrusNajmabadi
Copy link
Member

Could patching work without having the original bytes?

Probably not. Good point :)

@ToddGrun
Copy link
Contributor Author

@CyrusNajmabadi -- Does the logging in this code provide value? It doesn't write it to telemetry, but just stores the logged string in a static LinkedList. If it doesn't provide value for you, I'd be inclined to just remove it as it's a bit of clutter.

@CyrusNajmabadi
Copy link
Member

Does the logging in this code provide value? It doesn't write it to telemetry,

It should get written into vs' activity log file.

And it has definitely provided value as to why things aren't working. We used it to track down a cdn issue one time :-)

@ToddGrun ToddGrun marked this pull request as ready for review December 19, 2024 17:11
@ToddGrun ToddGrun requested a review from a team as a code owner December 19, 2024 17:11
@ToddGrun
Copy link
Contributor Author

It should get written into vs' activity log file.

It's not obvious to me how it's doing that if so


In reply to: 2554945112


// Obtain the underlying array from the memory stream. If for some reason this isn't available,
// fall back to reading the stream into a new byte array.
if (!memoryStream.TryGetBuffer(out var arraySegmentBuffer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside of this PR. but i think we could benefit from thsi pattern in a few more places. like in the sqlite system. def takea look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find anything with my quick search, was there a particular class you had in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about Sqlite Accessor.WriteStream which which works with streams and does some funky stuff with SQLitePersistentStorage.GetBytes to try to an array to write to to then write to the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method is a very light allocator in the csharp editing speedometer (less than 1 MB allocated). I took a look at the code and it wasn't straight forward to me how to map this pattern to it, so I'll avoid this change unless you push back.

if (!memoryStream.TryGetBuffer(out var arraySegmentBuffer))
{
memoryStream.Position = 0;
arraySegmentBuffer = new ArraySegment<byte>(writer.BaseStream.ReadAllBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you doing writer.BaseStream, instead of just memoryStream here? is there a difference between the two? if so, please doc.

Copy link
Contributor Author

@ToddGrun ToddGrun Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadAllBytes is defined on a Stream. To get to that from the BinaryWriter, you need to get the BaseStream. (will add doc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatted offline, and I misread the request. Updating in commit 4

// (intentionally not wrapped in IOUtilities. If this throws we want to restart).
var databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName);

var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like i want tshi section extracted into a method or local function.


// We have patch data. Apply it to our current database bytes to produce the new
// database.
LogInfo("Applying patch");
var finalBytes = _service._patchService.ApplyPatch(databaseBytes, patchBytes);
LogInfo($"Applying patch completed. finalBytes.Length={finalBytes.Length}");

CreateAndSetInMemoryDatabase(finalBytes);
database = CreateAndSetInMemoryDatabase(finalBytes, isBinary: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should doc why you're passing isBinary:false here.

Use memoryStream directly instead of writer.BaseStream
Break some code into a separate method
// (intentionally not wrapped in IOUtilities. If this throws we want to restart).
databaseBytes = _service._ioService.ReadAllBytes(databaseFileInfo.FullName);
}
var databaseBytes = GetDatabaseBytes(databaseFileInfo, out var isBinary);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var databaseBytes = GetDatabaseBytes(databaseFileInfo, out var isBinary);
var (databaseBytes, isBinary) = GetDatabaseBytes(databaseFileInfo);

Comment on lines 435 to 453
byte[] GetDatabaseBytes(FileInfo databaseFileInfo, out bool isBinary)
{
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo);
isBinary = true;

try
{
// First attempt to read from the binary file. If that fails, fall back to the text file.
return _service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName);
}
catch (Exception e) when (IOUtilities.IsNormalIOException(e))
{
}

isBinary = false;

// (intentionally not wrapped in IOUtilities. If this throws we want to restart).
return _service._ioService.ReadAllBytes(databaseFileInfo.FullName);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte[] GetDatabaseBytes(FileInfo databaseFileInfo, out bool isBinary)
{
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo);
isBinary = true;
try
{
// First attempt to read from the binary file. If that fails, fall back to the text file.
return _service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName);
}
catch (Exception e) when (IOUtilities.IsNormalIOException(e))
{
}
isBinary = false;
// (intentionally not wrapped in IOUtilities. If this throws we want to restart).
return _service._ioService.ReadAllBytes(databaseFileInfo.FullName);
}
(byte[] dataBytes, bool isBinary) GetDatabaseBytes(FileInfo databaseFileInfo)
{
var databaseBinaryFileInfo = GetBinaryFileInfo(databaseFileInfo);
try
{
// First attempt to read from the binary file. If that fails, fall back to the text file.
return (_service._ioService.ReadAllBytes(databaseBinaryFileInfo.FullName), isBinary: true);
}
catch (Exception e) when (IOUtilities.IsNormalIOException(e))
{
}
// (intentionally not wrapped in IOUtilities. If this throws we want to restart).
return (_service._ioService.ReadAllBytes(databaseFileInfo.FullName), isBinary: false);
}

@ToddGrun ToddGrun merged commit 9a27ef7 into dotnet:main Dec 19, 2024
25 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Dec 19, 2024
@dibarbet dibarbet modified the milestones: Next, 17.13 P3 Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead VSCode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants