Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: fix json marshal unmarshal for namespace > 127 #7810

Merged
merged 6 commits into from
May 13, 2021

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented May 12, 2021

We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

  • Live loader using guardian of galaxy
  • Backup and List Backup
  • Http clients and Ratel

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal) <namespace> is a string "81" in hex


This change is Reviewable

@NamanJain8 NamanJain8 changed the title fix json marshal unmarshal for namespace > 127 [DO NOT MERGE]: fix json marshal unmarshal for namespace > 127 May 12, 2021
@github-actions github-actions bot added area/enterprise Related to proprietary features area/graphql Issues related to GraphQL support on Dgraph. area/live-loader Issues related to live loading. labels May 12, 2021
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Looks good. One thing to consider is if we can use hex for representing namespaces to be consistent, instead of decimal. Don't need 0x in the front. Just the hex should be sufficient.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pawanrawal and @vvbalaji-dgraph)

@NamanJain8 NamanJain8 changed the title [DO NOT MERGE]: fix json marshal unmarshal for namespace > 127 BREAKING: fix json marshal unmarshal for namespace > 127 May 13, 2021
@NamanJain8 NamanJain8 merged commit 90d77f3 into release/v21.03-slash May 13, 2021
@NamanJain8 NamanJain8 deleted the naman/fix-json-127 branch May 13, 2021 08:19
NamanJain8 added a commit that referenced this pull request May 13, 2021
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)

(cherry picked from commit 90d77f3)
NamanJain8 added a commit that referenced this pull request May 13, 2021
This PR adds `dgraph updatemanifest` tool. This can be used to migrate the manifest of the backups taken on release/v21.03 or their parallels to the state after this change #7810
NamanJain8 added a commit that referenced this pull request May 17, 2021
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.
NamanJain8 added a commit that referenced this pull request May 19, 2021
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.
NamanJain8 added a commit that referenced this pull request May 24, 2021
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)

(cherry picked from commit 90d77f3)
NamanJain8 added a commit that referenced this pull request May 24, 2021
This PR adds `dgraph updatemanifest` tool. This can be used to migrate the manifest of the backups taken on release/v21.03 or their parallels to the state after this change #7810

(cherry picked from commit 22db22f)
NamanJain8 added a commit that referenced this pull request May 24, 2021
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.

(cherry picked from commit 83a0c53)
NamanJain8 added a commit that referenced this pull request May 24, 2021
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.

(cherry picked from commit 8504fb1)
@MichelDiz MichelDiz added v22.0.0 Issues related to v22.0.0 cherry-pick labels Oct 24, 2022
mangalaman93 pushed a commit that referenced this pull request Jan 2, 2023
We used to store predicate as <namespace>|<attribute>
(pipe | signifies concatenation). We store this as a string.
<namespace> is 8 bytes uint64, which when marshaled to JSON bytes
mess up the predicate. This is because for the namespace greater
than 127, the UTF-8 encoding might take up several bytes
(also if the mapping does not exist, then it replaces it with
some other rune).

This affects three identified places in Dgraph:
  * Live loader
  * Backup and List Backup
  * Http clients and Ratel

Fix is to have a UTF-8 string when dealing with JSON. A better idea
is to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
mangalaman93 pushed a commit that referenced this pull request Jan 2, 2023
This PR adds `dgraph updatemanifest` tool. This can be
used to migrate the manifest of the backups taken on
release/v21.03 or their parallels to the state after
this change #7810
mangalaman93 pushed a commit that referenced this pull request Jan 3, 2023
This PR adds `dgraph updatemanifest` tool. This can be
used to migrate the manifest of the backups taken on
release/v21.03 or their parallels to the state after
this change #7810
harshil-goel pushed a commit that referenced this pull request Jan 13, 2023
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.
harshil-goel pushed a commit that referenced this pull request Jan 16, 2023
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.
harshil-goel pushed a commit that referenced this pull request Jan 30, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
harshil-goel pushed a commit that referenced this pull request Jan 30, 2023
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.
harshil-goel pushed a commit that referenced this pull request Jan 30, 2023
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.
harshil-goel pushed a commit that referenced this pull request Jan 31, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
harshil-goel pushed a commit that referenced this pull request Jan 31, 2023
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.
harshil-goel pushed a commit that referenced this pull request Jan 31, 2023
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.
harshil-goel pushed a commit that referenced this pull request Feb 2, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
harshil-goel pushed a commit that referenced this pull request Feb 2, 2023
With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.
harshil-goel pushed a commit that referenced this pull request Feb 2, 2023
…o 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.
harshil-goel pushed a commit that referenced this pull request Feb 3, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)

fix(restore): update the schema and type from 2103 (#7838)

With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.

fix(state): fix hex to uint64 response of list of namespaces (#8091)

There is an issue in ExtractNamespaceFromPredicate. The issue is the parsing was done assuming ns in <ns>-<attr> to be decimal (actually it is hexadecimal). This leads to the following issues.

A predicate a-name, it was skipped.
A predicate 11-name was parsed as namespace 11, actually it is namespace 17 (0x11).

fix(backup): handle manifest version logic, update manifest version to 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.

fix(updatemanifest): update the version of manifest after update (#7828)

We were not updating the manifest version after the updation. This PR fixes that.
harshil-goel pushed a commit that referenced this pull request Feb 3, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)

fix(restore): update the schema and type from 2103 (#7838)

With #7810 change, we changed the format of the predicate. We missed updating the schema and predicate. This PR fixes it.

fix(state): fix hex to uint64 response of list of namespaces (#8091)

There is an issue in ExtractNamespaceFromPredicate. The issue is the parsing was done assuming ns in <ns>-<attr> to be decimal (actually it is hexadecimal). This leads to the following issues.

A predicate a-name, it was skipped.
A predicate 11-name was parsed as namespace 11, actually it is namespace 17 (0x11).

fix(backup): handle manifest version logic, update manifest version to 2105 (#7825)

The backward compatibility of the backup's manifest was broken by #7810, although the tool was added (#7815) that enables smooth migration of manifest.
This PR makes backup backward compatible, by updating the manifest(in-memory) after reading.

fix(updatemanifest): update the version of manifest after update (#7828)

We were not updating the manifest version after the updation. This PR fixes that.
harshil-goel pushed a commit that referenced this pull request Feb 3, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
harshil-goel added a commit that referenced this pull request Feb 3, 2023
…t json marshal issues (#8601)

We used to store predicate as <namespace>|<attribute> (pipe | signifies
concatenation). We store this as a string. <namespace> is 8 bytes
uint64, which when marshaled to JSON bytes mess up the predicate. This
is because for the namespace greater than 127, the UTF-8 encoding might
take up several bytes (also if the mapping does not exist, then it
replaces it with some other rune). This affects three identified places
in Dgraph:

- Live loader using guardian of galaxy
- Backup and List Backup
- Http clients and Ratel
- Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is
to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal)
<namespace> is a string "81" in hex

We also update the manifest version after update. This diff takes care
that older backups are still compatible and can be used to restore.

Contains: 
#7838
#7828
#7825
#7815
#7810
all-seeing-code pushed a commit that referenced this pull request Feb 8, 2023
…t json marshal issues (#8601)

We used to store predicate as <namespace>|<attribute> (pipe | signifies
concatenation). We store this as a string. <namespace> is 8 bytes
uint64, which when marshaled to JSON bytes mess up the predicate. This
is because for the namespace greater than 127, the UTF-8 encoding might
take up several bytes (also if the mapping does not exist, then it
replaces it with some other rune). This affects three identified places
in Dgraph:

- Live loader using guardian of galaxy
- Backup and List Backup
- Http clients and Ratel
- Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is
to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal)
<namespace> is a string "81" in hex

We also update the manifest version after update. This diff takes care
that older backups are still compatible and can be used to restore.

Contains: 
#7838
#7828
#7825
#7815
#7810
all-seeing-code pushed a commit that referenced this pull request Feb 8, 2023
…t json marshal issues (#8601)

We used to store predicate as <namespace>|<attribute> (pipe | signifies
concatenation). We store this as a string. <namespace> is 8 bytes
uint64, which when marshaled to JSON bytes mess up the predicate. This
is because for the namespace greater than 127, the UTF-8 encoding might
take up several bytes (also if the mapping does not exist, then it
replaces it with some other rune). This affects three identified places
in Dgraph:

- Live loader using guardian of galaxy
- Backup and List Backup
- Http clients and Ratel
- Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is
to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal)
<namespace> is a string "81" in hex

We also update the manifest version after update. This diff takes care
that older backups are still compatible and can be used to restore.

Contains: 
#7838
#7828
#7825
#7815
#7810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/enterprise Related to proprietary features area/graphql Issues related to GraphQL support on Dgraph. area/live-loader Issues related to live loading. cherry-pick v22.0.0 Issues related to v22.0.0
Development

Successfully merging this pull request may close these issues.

3 participants