Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer type of schema from JSON and RDF mutations. #4328

Merged
merged 30 commits into from
Dec 19, 2019

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Nov 27, 2019

This change changes the parser (for both RDF and JSON) to send hints about the type
of the predicates (list or non-list) so predicates that are created during mutations are
created with the appropriate type.

Fixes #3788

This change is Reviewable

If the same subject-predicate pair appears multiple times in a mutation
and the predicate needs to be created, create it as a list to avoid
losing data.
@martinmr martinmr changed the base branch from master to martinmr/infer-schema-as-list November 27, 2019 20:09
@martinmr martinmr marked this pull request as ready for review December 3, 2019 00:32
@martinmr martinmr requested review from manishrjain and a team as code owners December 3, 2019 00:32
@martinmr martinmr changed the base branch from martinmr/infer-schema-as-list to master December 3, 2019 22:41
@martinmr martinmr changed the title Infer type of schema from JSON mutations. Infer type of schema from JSON and RDF mutations. Dec 3, 2019
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 4 unresolved discussions (waiting on @danielmai, @manishrjain, @martinmr, and @MichaelJCompton)


chunker/json_parser.go, line 237 at r1 (raw file):

	nquads            []*api.NQuad
	nqCh              chan []*api.NQuad
	forcedSinglePreds map[string]bool

predHints map[string]enum


chunker/json_parser.go, line 250 at r1 (raw file):

		buf.nquads = make([]*api.NQuad, 0, batchSize)
	}
	buf.forcedSinglePreds = make(map[string]bool)

can have the proto here directly. Protos have maps.


chunker/json_parser.go, line 302 at r1 (raw file):

	for _, pred := range metadata.GetForcedListPreds() {
		if _, ok := buf.forcedSinglePreds[pred]; ok {
			return errors.Errorf("")

fill in the blank.


chunker/json_parser.go, line 417 at r2 (raw file):

			}
			buf.Push(&nq)
			if err := buf.PushPredHint(pred, pb.ParseMetadata_SINGLE); err != nil {

Do we really need to return an error just in case our "hints" don't match up? They are meant to be hints and guesses, not stop the program.

Defaults to a list instead to avoid destroying data.
Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 11 files reviewed, 4 unresolved discussions (waiting on @danielmai, @manishrjain, and @MichaelJCompton)


chunker/json_parser.go, line 237 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

predHints map[string]enum

Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.


chunker/json_parser.go, line 250 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

can have the proto here directly. Protos have maps.

Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.


chunker/json_parser.go, line 302 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

fill in the blank.

Done. This is a comment from the previous time you looked at the PR but didn't submit your comments.


chunker/json_parser.go, line 417 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do we really need to return an error just in case our "hints" don't match up? They are meant to be hints and guesses, not stop the program.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a few comments. Good to go after.

Reviewed 1 of 9 files at r1, 7 of 10 files at r2, 3 of 3 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @danielmai, @manishrjain, @martinmr, and @MichaelJCompton)


chunker/json_parser.go, line 536 at r3 (raw file):

// ParseJSON is a convenience wrapper function to get all NQuads in one call. This can however, lead
// to high memory usage. So be careful using this.
func ParseJSON(b []byte, op int) ([]*api.NQuad, *pb.ParseMetadata, error) {

Maybe return *pb.Mutation.


chunker/rdf_parser.go, line 58 at r3 (raw file):

// ParseRDFs is a convenience wrapper function to get all NQuads in one call. This can however, lead
// to high memory usage. So, be careful using this.
func ParseRDFs(b []byte) ([]*api.NQuad, *pb.ParseMetadata, error) {

This can return *pb.Mutation, I suppose.


protos/pb.proto, line 224 at r3 (raw file):

	string drop_value = 8;

	ParseMetadata parse_metadata = 9;

Metadata metadata


protos/pb.proto, line 227 at r3 (raw file):

}

message ParseMetadata {

s/ParseMetadata/Metadata

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 11 files reviewed, 8 unresolved discussions (waiting on @danielmai, @manishrjain, and @MichaelJCompton)


chunker/json_parser.go, line 536 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Maybe return *pb.Mutation.

pb.Mutations does not have a way to store the nquads inside it so this method would still need to return two objects. There's not much I would gain from switching pb.Metadata to pb.Mutation if that's what you meant instead.


chunker/rdf_parser.go, line 58 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can return *pb.Mutation, I suppose.

pb.Mutations does not have a way to store the nquads inside it so this method would still need to return two objects.


protos/pb.proto, line 224 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Metadata metadata

Done.


protos/pb.proto, line 227 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

s/ParseMetadata/Metadata

Done.

@martinmr martinmr merged commit 4da3614 into master Dec 19, 2019
@martinmr martinmr deleted the martinmr/json-list-inference branch December 19, 2019 19:35
danielmai pushed a commit that referenced this pull request Jan 12, 2020
If the same subject-predicate pair appears multiple times in a mutation
and the predicate needs to be created, create it as a list to avoid
losing data.

(cherry picked from commit 4da3614)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Dgraph isn't automatically setting values ​​as list in structured mutations as list.
2 participants