Add a more complex schema benchmark #78

yanfali · 2017-01-02T19:54:46Z

one request and one response schema based off real world schema usage
remove allocation of new byte.Buffer from benchmarks, we're not that
interested in measuring that.

@smyrman here's a more realistic Schema based on something in production.

yanfali · 2017-01-02T19:59:06Z

I've also remove the allocation of byte.Buffer from the benchmarks since that's not we're measuring here and simply truncate the buffer before use. It makes a small but measurable difference.

smyrman

As for removing buffer allocation from the benchmark, sure, that's OK, but it needs to be done in a different way to avoid benchmarks affecting each other's results.

smyrman · 2017-01-02T22:09:33Z

schema/encoding/jsonschema/benchmark_complex_test.go

+)
+
+// Optional schema field
+func Optional(s schema.Field) schema.Field {


I would vote for removing this method, and rely only on using Required().

These are general helpers. I actually have a library of validation help functions that have different defaults, I copied these over for expediency. I'm not concerned since they are in the _test framework and not for general use here. I can certainly remove it, it just made it easier to port the code over.

smyrman · 2017-01-02T22:16:56Z

schema/encoding/jsonschema/benchmark_complex_test.go

+	return s
+}
+
+func rfc3339Nano() schema.Field {


This could "safely" be a var, just like schema.IDField

Since it's not a pointer value, it will be copied on assignment, so that you could still do:

... s := Schema{ Fields: schema.Fields{ "x": Required(rfc3339Nano), }, }

smyrman · 2017-01-02T22:24:04Z

schema/encoding/jsonschema/benchmark_complex_test.go

+func String(min, max int, description string) schema.Field {
+	return schema.Field{
+		Description: description,
+		Required:    true,


Would it not be more appropriate to rely on Required(String(...))?

probably, all just ported over from places where default was different.

smyrman · 2017-01-02T22:28:47Z

schema/encoding/jsonschema/benchmark_complex_test.go

+	}
+}
+
+var (


All this schemas are left in the namespace... Perhaps it would be better just to leave Complex1 (rather than requestSchema) and Complex2(Rather than responseSchema) in the namespace and initalize them in an init() function.

Also, would be nice to place it next to the Student schema definition or move the Student schema definition to be in the same place.

We could also rename the Student schema to e.g. "small". Then the benchmark results would read quite well when put in the same table test list.

.../Schema=Small .../Schema=Complex1 .../Schema=Complex2

Wheter it's a "student list", "request" or a "response" is really quite irrelevant...

it's a pretty specific namespace and only run during benchmarking but sure we can clean that up.

I don't really consider the student schema to be complex. This schema approaches the complexity of my actual uses cases in production.

it's a pretty specific namespace and only run during benchmarking but sure we can clean that up.

The namespace is the same for the entire jsonschema_test package. Anyway thanks for cleaning it up.

I don't really consider the student schema to be complex.

No, which is why I suggest you call it "Small":-)

smyrman · 2017-01-02T22:33:38Z

schema/encoding/jsonschema/benchmark_complex_test.go

+	})
+	b.Run("response", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			buf.Truncate(0)


Better to use a freshly allocated buffer to get as similar results as possible. Here the first sub-test might grow the buffer (in the first loop), so that when the second test starts, it doesn't need to grow the buffer like the first test did.

Also the ordering of the benchmark tests could change their results, which is always bad.

PS! It's possible to pause the benchmark timer:

b.StopTimer() buf := new(bytes.Buffer) b.StartTimer() enc := jsonschema.NewEncoder(buf)

I understand but I'm more concerned we are measuring the noise of new, rather than the code under test. One thing specifically I want to see is the effect of all the buffer allocation in the new code vs just simply writing to a io.writer. The point of the benchmark is to test the code, not new(bytes.Buffer) from the runtime. Turning this into a truncate amortizes the cost once across all the benchmarks and effectively removes it from the test.

smyrman · 2017-01-02T22:40:06Z

schema/encoding/jsonschema/benchmark_complex_test.go

+	}
+)
+
+func BenchmarkEncoderComplex(b *testing.B) {


Why not add these benchmarks as cases to the existing table of sub-tests in benchmark_test.go?

I tried initially but took a while to sort out the import naming and conventions. I can merge them now I have it working.

smyrman · 2017-01-02T22:47:36Z

schema/encoding/jsonschema/benchmark_test.go

 		b.Run(tc.Name, func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				enc := jsonschema.NewEncoder(new(bytes.Buffer))
+				buf.Truncate(0)


Same here, we still need to allocate a new buffer to avoid benchmarks affecting each other in a bad way.

this is deliberate. The allocations are measurable. Switching to truncate removes them and makes it clearer where the time is being spent.

See the docs:

Truncate discards all but the first n unread bytes from the buffer but continues to use the same allocated storage.

So at least we need to allocate a new buffer per sub-test, even if we don't do so within the loop itself... That should eliminate the chance of Grow affecting different sub-tests differently.

Or use ioutil.Discard as the writer.

Sorry but this is a strawman argument. we completely control the test loop. The encoder is only using the io.Writer interface.

(See discussion above instead)

yanfali · 2017-01-03T02:44:14Z

Here's without the new bytes.Buffer

BenchmarkEncoder/Schema={Fields:{"b":Bool{}}}-4         	 1000000	      1027 ns/op	     240 B/op	      12 allocs/op
BenchmarkEncoder/Schema={Fields:{"s":String{}}}-4       	 2000000	       977 ns/op	     224 B/op	      12 allocs/op
BenchmarkEncoder/Schema={Fields:{"s":String{MaxLen:42}}}-4         	 1000000	      1179 ns/op	     248 B/op	      14 allocs/op
BenchmarkEncoder/Schema=Simple-4                                   	  200000	      8489 ns/op	    1200 B/op	      55 allocs/op
BenchmarkEncoder/Schema=Complex1-4                                 	   50000	     31846 ns/op	    6162 B/op	     228 allocs/op
BenchmarkEncoder/Schema=Complex2-4                                 	   20000	     79457 ns/op	   14237 B/op	     545 allocs/op

Here's with

BenchmarkEncoder/Schema={Fields:{"b":Bool{}}}-4         	 1000000	      1243 ns/op	     496 B/op	      14 allocs/op
BenchmarkEncoder/Schema={Fields:{"s":String{}}}-4       	 1000000	      1243 ns/op	     480 B/op	      14 allocs/op
BenchmarkEncoder/Schema={Fields:{"s":String{MaxLen:42}}}-4         	 1000000	      1639 ns/op	     504 B/op	      16 allocs/op
BenchmarkEncoder/Schema=Simple-4                                   	  200000	      9659 ns/op	    2528 B/op	      59 allocs/op
BenchmarkEncoder/Schema=Complex1-4                                 	   50000	     34222 ns/op	    9827 B/op	     233 allocs/op
BenchmarkEncoder/Schema=Complex2-4                                 	   20000	     84637 ns/op	   26321 B/op	     552 allocs/op

rs · 2017-01-03T03:40:19Z

schema/encoding/jsonschema/benchmark_test.go

 	}
-	buf := new(bytes.Buffer)
+	buf := &bytes.Buffer{}
+	buf.Grow(65535)


Wouldn't it be cleaner to write buf := bytes.NewBuffer(make([]byte, 65535))?

smyrman · 2017-01-03T08:17:50Z

schema/encoding/jsonschema/benchmark_complex_test.go

+	}
+}
+
+func getComplexSchema() (schema.Schema, schema.Schema) {


It's not obvious why this method need to return two schemas, and why we want to call them requestSchema and responseSchema.

I would suggest ~~moving everything to benchmark_test.go file~~ renaming the file to testutils_test.go and initialize the schemas like this:

// reusable Schemas for benchmarks and tests. var ( schemaSmall *schema.Schema schemaComplex1 *schema.Schema schemaComplex2 *schema.Schema ) func init() { // Initalize the schemas: // schemaSmall = todays Student schema (or one with similar complexity). // schemaComplex1 = this review's requestSchema // schemaComplex2 = this review's responseSchema }

After that, replace any reference to Student with Small in the other tests.

I would suggest moving everything to benchmark_test.go file

On second thought.. maybe just rename the file to testutils_test.go. That would be a good place for this helper functions, and perhaps also some helper functions from all_test.go should move here later.

The definitions of the reusable test/benchark schemas, I think both testutils_test.go, all_test.go and benchmark_test.go could work...

I disagree with your suggestion, now you're back to polluting the name space. You can't have it both ways. Sure we can rename the file but then you loose the intent of what the file was for.

getComplexSchema is a template generator for two schemas based off actual production code. The fact the signature returns multiple schema is irrelevant; it was done for name spacing reasons. I have an idea that may convey intent better, if we return the test structs directly maybe?

I where not ~~convened~~ concerned with the so called "requestSchema" and "responseSchema" polluting the namespace but about all the intermediate ones.

The suggested names complexSchema1/2 could be useful for reuse in tests.

getComplexSchema

Even test code should be clean. Perhaps even cleaner than the main code, since tests also serve as examples/documentation to package maintainers and users.

I believe a good clean function should do one and only one thing. I think this method does two which makes it unclean, and it makes it harder than it has to be to reuse (one of) the complex schema for tests.

Good argument. I will separate this in to two helper functions.

As an aside, this is test code. Getting too attached to excessively "clean" test code is among the worst mistakes I feel you can make because it it can become a burden on the developers; with test code you must always be prepared to completely throw it out, it's not your product - it's an expression of the current state of the implementation that helps you have comfort that the code is doing what you intended.

I do agree it should serve as a form of documentation though, but by conveying the intent.

I will separate this into two helper functions.

Better👍

smyrman · 2017-01-03T08:25:31Z

schema/encoding/jsonschema/benchmark_test.go

+			Schema: responseSchema,
+		},
 	}
+	buf := bytes.NewBuffer(make([]byte, 65535))


This helps, but if any one benchmark was to trigger a Grow (which usually means doubling the buffer size I think), that one benchmark would be unfairly punished. Maybe it's better to just rely on ioutil.Discard? Then we don't need to worry about the time spent allocating/growing the buffer at all. Of course, the write itself also becomes free.

Alternatively, we should at the very least move this to inside for i := range testCases (before the b.Run call), to assure each benchmarks starts with a writer that has the same capacity for it's first loop.

I think relying on ioutil.Discard as the io.Writeris probably the best and simplest option btw.

Sorry, but why would anything call grow? we control the buffer and we only pass it in as an io.Writer. I like the Discard idea, but it could make debugging harder if we needed to pin point any problems. Discard would also not trigger any of the io.Writer error behavior since it's essentially a null instance. Truncate has the behavior I'm after. It's resets the buffer without doing any new allocations.

67 func (b *Buffer) Truncate(n int) { 68 b.lastRead = opInvalid 69 switch { 70 case n < 0 || n > b.Len(): 71 panic("bytes.Buffer: truncation out of range") 72 case n == 0: 73 // Reuse buffer space. 74 b.off = 0 75 } 76 b.buf = b.buf[0 : b.off+n] 77 }

On reflection, I actually think the Discard idea is probably not a good idea because the writes are a part of the performance of the API. The memory bandwidth of the test host would affect the outcome and if we use Discard we wouldn't get a measurement of that. The only thing these changes do is hoist of the allocation Buffer itself outside of the measurement loop.

Sorry, but why would anything call grow?

If any schema ends up bigger than the prealocated buffer, Grow will be called implicitly.

yes I see your argument, but we also control the test environment. At worst we can increase the initial buffer size to 1MiB. I don't see any realistic schema ever growing beyond that size.

This is interesting. Setting the buffer to 2<<20 made the test worse. Setting the buffer to 2<<19 actually made it improve. It's probably some hardware related caching, but it's still interesting. I'm on a 13" macbook pro mid 2014.

Actually, we could also just run the encoding once before the sub-benchmark loop starts! That would also ensure we always have enough space!

I'm happy to move the buffer inside the test harness loop and outside of the benchmark loop. That's a reasonable compromise. I don't know if we want to be too clever in a benchmark or test code. I find it just gets you into trouble :)

for i := range testCases { tc := testCases[i] enc := jsonschema.NewEncoder(buf) // We don't want buf grow to affect the benchmark, so we run through the encoding // once before the benchmark starts. buf.Truncate(0) enc.Encode(&tc.Schema) b.Run(tc.Name, func(b *testing.B) { for i := 0; i < b.N; i++ { buf.Truncate(0) enc.Encode(&tc.Schema) ...

yanfali · 2017-01-03T16:54:59Z

Performance appears the same, we're just making the GC work a bit harder.

smyrman

Nice work!

smyrman · 2017-01-03T16:59:56Z

Performance appears the same, we're just making the GC work a bit harder.

Did you see my last suggestion?

We could run the Encode once before the b.Run starts (code example in inline comment). Then we only allocate just-enough bytes, and we reuse the buffer, but we always do it before the benchmarks start so it's not counted.

yanfali · 2017-01-03T17:02:42Z

I did, though I'm not sure it's worth the extra complexity, let me run a quick benchmark and see how it compares. If it's roughly the same I think we should just leave it out.

yanfali · 2017-01-03T17:06:02Z

Over three runs it's about the same performance as before, so I'd vote to keep it as straightforward as possible and this avoids having the extra comment explaining why we are doing this.

smyrman · 2017-01-03T17:07:36Z

Over three runs it's about the same performance as before, so I'd vote to keep it as straightforward as possible and this avoids having the extra comment explaining why we are doing this.

Sounds good. And thanks a lot for the new benchmarks. I will re-test #76 once this is merged :-)

yanfali · 2017-01-03T17:08:43Z

@rs this is ready to go, I'm going to squash it first.

rs · 2017-01-03T18:12:39Z

schema/encoding/jsonschema/benchmark_complex_test.go

+}
+
+func getComplexSchema2() schema.Schema {
+


Extra return

sorry? a bit confused by this comment.

There is an extra blank line.

- Complex schema 1 is based off a request and Complex schema 2 is based off a response schema in real world schema usage - move allocation of byte.Buffer outside of benchmark loop, as we're not interested in measuring that. - set Buffer to 512Kb to avoid dynamic schema growth for expected test usage.

yanfali · 2017-01-03T18:41:31Z

@rs thanks fixed. It was close enough to the return statement above that I was confused by the comment.

smyrman reviewed Jan 2, 2017

View reviewed changes

rs reviewed Jan 3, 2017

View reviewed changes

smyrman reviewed Jan 3, 2017

View reviewed changes

smyrman approved these changes Jan 3, 2017

View reviewed changes

smyrman mentioned this pull request Jan 3, 2017

jsonschema: Switch to map based encoding #76

Merged

yanfali force-pushed the yanfali-benchmark branch 2 times, most recently from 21d162c to c372bd1 Compare January 3, 2017 17:13

rs reviewed Jan 3, 2017

View reviewed changes

yanfali force-pushed the yanfali-benchmark branch from c372bd1 to eb32177 Compare January 3, 2017 18:40

rs merged commit c354f40 into rs:master Jan 3, 2017

+              	}
+              }
+              var (

Add a more complex schema benchmark #78

Add a more complex schema benchmark #78

Uh oh!

Conversation

yanfali commented Jan 2, 2017

Uh oh!

yanfali commented Jan 2, 2017

Uh oh!

smyrman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman Jan 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman Jan 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman Jan 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanfali Jan 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanfali commented Jan 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman Jan 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman Jan 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanfali Jan 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smyrman left a comment •

edited

Loading

smyrman Jan 2, 2017 •

edited

Loading

smyrman Jan 2, 2017 •

edited

Loading

smyrman Jan 2, 2017 •

edited

Loading

yanfali Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading

yanfali Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading

smyrman Jan 3, 2017 •

edited

Loading