Description
This has taken some time to investigate, and seems that the problem is due to a confluence of issues across separate Seq/serilog libraries
The problem description:
We are using serilog to log to Seq using the DurableSeqSink with a file size limit set. For a particular use case we have implemented an web endpoint that has the same api as Seq raw events endpoint.
What we were finding is that the data being posted often had an extra "}," (close brace and comma) like this:
{
"Events": [{
"Timestamp": "2015-05-09T22:09:08.12345+10:00",
"Level": "Warning",
"MessageTemplate": "Disk space is low on {Drive}",
"Properties": {
"Drive": "C:",
"MachineName": "nblumhardt-rmbp"
}
},
}
]
}
which meant that the payload was not valid json and so would fail to deserialize and return an error.
Eventually I determined that the buffer file for the serilog seq logger had additional bytes (invisible in a normal text editior). These were the bytes "ef bb bf" which happens to be the UTF8 BOM.
Again, while not ideal, this would not have caused the problem but for the fact that the HttpLogShipper does it's own byte counting instead of just relying on StreamReader (which seems to handle the extra BOM part way through a file) and it assumes that the BOM can only exist at the start of the file so it ends up incorrectly positioning its cursor 3 bytes before the end of the line instead of at the beginning of the next line, hence the extra '}'.
Now, as to why the extra bytes get into the file in the first place, it turns out that due to a quirk in the StreamWriter implementation (https://referencesource.microsoft.com/#mscorlib/system/io/streamwriter.cs,208) that only skips writing the encoding preamble when appending to a stream if the stream is seekable, when you set a fileSizeLimitBytes on the FileSink the preamble gets written again.
When using the DurableSeqSink, it uses one file per day by default so we get the error as manytimes as the process starts on a given day.
I have created some unit tests to demonstrate the problem (pull request coming) and I can think of several ways to work around the problem, but ultimately I think that coordinated changes need to be made to both the Serilog.Sinks.File.FileSink and Serilog.Sinks.Seq.HttpLogShipper.
The fix closest to the source of the problem is to make the WriteCountingStream seekable or to use an encoding that does not include a preamble (don't use the Encoding.UTF8 static member). The second option is not viable here because of the way that the HttpLogShipper works as it assumes the file is written using UTF8 with a preamble and it will fail in a different way if it is not. I think that continuing to have the HttpLogShipper do its own stream position management instead of using a StreamReader is a timebomb. I don't actually understand why it is tracking it's own stream position in the first place.