Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): Support lte for indexed fast filters #22932

Merged
merged 19 commits into from
Apr 14, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix tests, fix lte binary search lookup
  • Loading branch information
pvdz committed Apr 9, 2020
commit 04349bb41ababe530a36cc2659484299be67b99f
34 changes: 19 additions & 15 deletions packages/gatsby/src/redux/nodes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ function addNodeToBucketWithElemMatch(
const binarySearch = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this function may be handling certain edge cases incorrectly? For example with repeating values:

binarySearch([0, 1], 1)
-> Array [ 1, 1 ]

binarySearch([0, 1, 1], 1)
-> Array [ 1, 1 ]

binarySearch([0, 1, 1, 1], 1)
-> Array [ 2, 2 ]

binarySearch([0, 1, 1, 1, 1], 1)
-> Array [ 2, 2 ]

binarySearch([0, 1, 1, 1, 1, 1], 1)
-> Array [ 3, 3 ]

binarySearch([0, 1, 1, 1, 1, 1, 2], 1)
-> Array [ 3, 3 ]

binarySearch([0, 1, 1, 1, 1, 1, 2, 3], 1)
-> Array [ 4, 4 ]

Maybe we can use existing tools like lodash sortedIndex (it also works even if the element is not in the list)?

Edit: I realized that it always receive a list with unique values but then it deserves a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, unique values guaranteed. I'll add a comment, good point :)

values: Array<FilterValue>,
needle: FilterValue
): number => {
): [number, number] => {
let min = 0
let max = values.length - 1
let pivot = Math.floor(values.length / 2)
Expand All @@ -413,33 +413,30 @@ const binarySearch = (
} else {
// This means needle === value
// TODO: except for NaN ... and potentially certain type casting cases
return pivot
return [pivot, pivot]
}

if (min === max) {
if (max - min <= 1) {
// End of search. Needle not found (as expected). Use pivot as index.
return pivot
// If the needle was not found, max-min==1 and max is returned.
return [min, max]
}

pivot = Math.floor((max - min) / 2)
}

// Shouldn't be reachable
return 0
return [0, 0]
}

/**
* Given a ("flat") filter path leading up to "eq", a target value to filter
* for, a set of node types, and a pre-generated lookup cache, return the set
* of Nodes (or, if the property is `id` just the Node) which pass the filter.
* This returns `undefined` if there is Node that passes the filter.
* Given the cache key for a filter and a target value return the set of nodes
* that resolve to this value.
* This returns `undefined` if there is no such node
*
* Basically if the filter was {a: {b: {slug: {eq: "foo/bar"}}}} then it will
* return all the nodes that have `node.slug === "foo/bar"`. That usually (but
* not always) at most one node for slug, but this filter can apply to anything.
*
* The only exception is `id`, since internally there can be at most one node
* per `id` so there's a minor optimization for that (no need for Sets).
*/
export const getNodesFromCacheByValue = (
filterCacheKey: FilterCacheKey,
Expand Down Expand Up @@ -473,14 +470,21 @@ export const getNodesFromCacheByValue = (

// Note: for lte, the valueAsc array must be set at this point
const values = filterCache.meta.valuesAsc as Array<FilterValue>
const pivot = binarySearch(values, filterValue)
// It shouldn't find the targetValue (but it might) and return the index of
// the two value between which targetValue sits, or first/last element.
const [pivotMin, pivotMax] = binarySearch(values, filterValue)
// Each pivot index must have a value and a range
// The returned min/max index may include the lower/upper bound, so we still
// have to do lte checks for both values.
let pivotValue = values[pivotMax]
if (pivotValue > filterValue) {
pivotValue = values[pivotMin]
}

// Note: the pivot value _shouldnt_ match the filter value because that
// means the value was actually found, but those should have been indexed
// so should have yielded a result in the .get() above.

// Each pivot index must have a value and a range
const pivotValue = values[pivot]
const [start, stop] = filterCache.meta.valueRanges!.get(pivotValue) as [
number,
number
Expand Down
9 changes: 7 additions & 2 deletions packages/gatsby/src/redux/run-sift.js
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,9 @@ function handleMany(siftArgs, nodes) {
* filter.
* Only nodes of given node types will be considered
* A fast index is created if one doesn't exist yet so cold call is slower.
* Returns undefined if an op was not supported for fast indexes or when no
* nodes were found for given (query) value. In the zero nodes case, we have to
* go through Sift to make sure we're not missing an edge case, for now.
*
* @param {Array<DbQuery>} filters Resolved. (Should be checked by caller to exist)
* @param {Array<string>} nodeTypeNames
Expand Down Expand Up @@ -184,6 +187,9 @@ const runFiltersWithoutSift = (filters, nodeTypeNames, filtersCache) => {
// Consider the case of {a: {eq: 5}, b: {eq: 10}}, do we cache the [5,10]
// case for all value pairs? How likely is that to ever be reused?

if (result.length === 0) {
return undefined
}
return result
}

Expand Down Expand Up @@ -487,8 +493,7 @@ const filterWithoutSift = (filters, nodeTypeNames, filtersCache) => {

if (filters.length === 0) {
// If no filters are given, go through Sift. This does not appear to be
// slower than s
// hortcutting it here.
// slower than shortcutting it here.
return undefined
}

Expand Down
53 changes: 43 additions & 10 deletions packages/gatsby/src/schema/__tests__/run-query.js
Original file line number Diff line number Diff line change
Expand Up @@ -216,9 +216,11 @@ function resetDb(nodes) {
)
}

let nodesAfterLastRunQuery
async function runQuery(queryArgs, filtersCache) {
const nodes = makeNodes()
resetDb(nodes)
nodesAfterLastRunQuery = nodes
const { sc, type: gqlType } = makeGqlType(nodes)
const args = {
gqlType,
Expand Down Expand Up @@ -372,8 +374,7 @@ it(`should use the cache argument`, async () => {
let result = await runFilter({ hair: { lt: 2 } })

expect(result.length).toEqual(2)
expect(result[0].hair).toEqual(1)
expect(result[1].hair).toEqual(0)
result.forEach(r => expect(r.hair <= 2).toBe(true))
})

it(`handles lt operator with null`, async () => {
Expand All @@ -388,30 +389,51 @@ it(`should use the cache argument`, async () => {
it(`handles lte operator with number`, async () => {
let result = await runFilter({ hair: { lte: 1 } })

expect(result.length).toEqual(2)
expect(result[0].hair).toEqual(1)
expect(result[1].hair).toEqual(0)
let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.hair <= 1 ? acc + 1 : acc),
0
)

expect(actual).not.toBe(0) // Test should keep this invariant!
expect(result.length).toEqual(actual)
result.forEach(r => expect(r.hair <= 1).toBe(true))
})

it(`should lte when value is lower than all found values`, async () => {
if (IS_LOKI) return

let result = await runFilter({ float: { lte: 1 } })

expect(result).toEqual(null) // Zero results yields undefined
let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.float <= 1 ? acc + 1 : acc),
0
)

expect(actual).toEqual(0) // Make sure test nodes keep this invariant!
expect(result).toEqual(null) // Zero results yields null
})

it(`should lte when value is in the middle of all found values`, async () => {
let result = await runFilter({ float: { lte: 2 } })

expect(result.length > 0).toEqual(true)
let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.float <= 2 ? acc + 1 : acc),
0
)

expect(result.length).toEqual(actual)
result.forEach(r => expect(r.float <= 2).toBe(true))
})

it(`should lte when value is higher than all found values`, async () => {
let result = await runFilter({ float: { lte: 5 } })

expect(result.length).toEqual(3)
let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.float <= 5 ? acc + 1 : acc),
0
)

expect(result.length).toEqual(actual)
})

it.skip(`should lte when type coercion fails direct value lookup`, async () => {
Expand All @@ -420,9 +442,14 @@ it(`should use the cache argument`, async () => {
// value wasn't mapped, that it can't be found.
let result = await runFilter({ float: { lte: `1.5` } })

let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.float <= 1.5 ? acc + 1 : acc),
0
)

expect(result).not.toBe(undefined)
expect(result).not.toBe(null)
expect(result.length > 0).toEqual(true)
expect(result.length).toEqual(actual)
result.forEach(r => expect(r.float <= 2).toBe(true))
})

Expand All @@ -431,8 +458,14 @@ it(`should use the cache argument`, async () => {

let result = await runFilter({ nil: { lte: null } })

let actual = nodesAfterLastRunQuery.reduce(
(acc, node) => (node.nil <= null ? acc + 1 : acc),
0
)

// lte null matches null but no nodes without the property (NULL)
expect(result.length).toEqual(1)
expect(actual).not.toBe(0) // Test should keep this invariant!
expect(result.length).toEqual(actual)
expect(result[0].name).toEqual(`The Mad Wax`)
expect(result[0].nil).toEqual(null)
})
Expand Down