Skip to content

Conversation

paterczm
Copy link

This is a document outlining value generator. I created a pull request to gather feedback.

Issue: lightblue-platform/lightblue-core#204
Internal discussion: https://mojo.redhat.com/message/957916

@jewzaam
Copy link
Member

jewzaam commented Mar 30, 2015

While it may make sense for our use cases to only generate for required fields I would prefer we do not force this on all clients. We have other constraints that can be used to mark a field as required. Field value generation would be better off as independent.

@jewzaam
Copy link
Member

jewzaam commented Mar 30, 2015

I'll pick on the integer generator, but some of this is applicable to any of them

"_id": {
  "constraints": {
    "identity": true
  },
  "generator": "longId",
  "description": "The identifier of this entity.",
  "type": "integer"
}

You're making a new field that is part of the field, not a new constraint. Additionally, we have a requirement to start generation above a specific number. The type generated is implied by the field's type, integer in this example. Adding two specific properties beyond type gives lots of power: minimum and maximum. With those two we can constrain within a range of values, which means we can control for signed and unsigned as well. For example to specify a long generation above 150000 without a maximum (implying a default max for signed 64 bit long):

"_id": {
  "constraints": {
    "identity": true,
    "generated": {
      "minimum": 150000
    }
  },
  "description": "The identifier of this entity.",
  "type": "integer"
}

For string, the uid behavior is good. I would probably be explicit in the type of generation though:

"_id": {
  "constraints": {
    "identity": true,
    "generated": "uid"
  },
  "description": "The identifier of this entity.",
  "type": "string"
}

And for date there's a need to specify what date to use. The current use case is current date/time:

"someDate": {
  "constraints": {
    "generated": "$now"
  },
  "description": "some date",
  "type": "date"
}

Long term we might consider adding more to the generated constraint, at least for integer:

@bserdar
Copy link

bserdar commented Mar 30, 2015

What exactly does this do? Generate a random integer above 150000?

"constraints": {
"identity": true,
"generated": {
"minimum": 150000
}
}

On Mon, Mar 30, 2015 at 12:31 PM, Naveen Malik notifications@github.com wrote:

I'll pick on the integer generator, but some of this is applicable to any of
them

"_id": {
"constraints": {
"identity": true
},
"generator": "longId",
"description": "The identifier of this entity.",
"type": "integer"
}

You're making a new field that is part of the field, not a new constraint.
Additionally, we have a requirement to start generation above a specific
number. The type generated is implied by the field's type, integer in this
example. Adding two specific properties beyond type gives lots of power:
minimum and maximum. With those two we can constrain within a range of
values, which means we can control for signed and unsigned as well. For
example to specify a long generation above 150000 without a maximum
(implying a default max for signed 64 bit long):

"_id": {
"constraints": {
"identity": true,
"generated": {
"minimum": 150000
}
},
"description": "The identifier of this entity.",
"type": "integer"
}

For string, the uid behavior is good. I would probably be explicit in the
type of generation though:

"_id": {
"constraints": {
"identity": true,
"generated": "uid"
},
"description": "The identifier of this entity.",
"type": "string"
}

And for date there's a need to specify what date to use. The current use
case is current date/time:

"someDate": {
"constraints": {
"generated": "$now"
},
"description": "some date",
"type": "date"
}

Long term we might consider adding more to the generated constraint, at
least for integer:

signed flag (by itself would default to the full range available with a
signed or unsigned 64 bit integer, see
http://jewzaam.gitbooks.io/lightblue-specifications/content/backend_specifications/mongodb/metadata.html)
sequence flag (if/when sequences are supported)


Reply to this email directly or view it on GitHub.

@jewzaam
Copy link
Member

jewzaam commented Mar 30, 2015

@bserdar yes

@bserdar
Copy link

bserdar commented Mar 30, 2015

Is there a use case for a generated random non-unique value of any type?

On Mon, Mar 30, 2015 at 12:52 PM, Naveen Malik notifications@github.com
wrote:

@bserdar https://github.com/bserdar yes


Reply to this email directly or view it on GitHub
#22 (comment)
.

@paterczm
Copy link
Author

@jewzaam

You're making a new field that is part of the field, not a new constraint.

Constraints prevent you from doing/force you to do something. A generator is different - even if you specify a minimum for a generated integer, it will not prevent you from persisting a value below that minimum. In my mind generators and constraints are independent entities.

In your examples, you are specifying generation properties in the schema. I placed them in entityInfo section, because I think generators should be reusable, like constants.

@jewzaam
Copy link
Member

jewzaam commented Mar 31, 2015

@bserdar I can't think of any use case for random non-unique types. Only non-unique example I can think of is date, which is not random.

@paterczm you had the generator definition under "more ideas" so I didn't give it much weight in my comments. If it's the way generators are defined it shouldn't be just an idea, it's part of the spec. Agree generators are not equal to constraints. Is there value in reusable generator definitions? Would the generator definition really be global to all versions of metadata or could it change per version?

@bserdar
Copy link

bserdar commented Mar 31, 2015

Since having non-unique random numbers is not really useful, why don't we
do something simple like this:

  • Have back-ends support generator implementations. Mongodb and rdbms can
    support sequences. We can add uid generation support to the core.
  • For mongodb sequences: have a document containing sequence name
    (indexed), increment, and current value. You can increment and retrieve the
    old value atomically in mongodb.
  • For rdbms sequences: use a sequence

On Tue, Mar 31, 2015 at 7:14 AM, Naveen Malik notifications@github.com
wrote:

@bserdar https://github.com/bserdar I can't think of any use case for
random non-unique types. Only non-unique example I can think of is date,
which is not random.

@paterczm https://github.com/paterczm you had the generator definition
under "more ideas" so I didn't give it much weight in my comments. If it's
the way generators are defined it shouldn't be just an idea, it's part of
the spec. Agree generators are not equal to constraints. Is there value in
reusable generator definitions? Would the generator definition really be
global to all versions of metadata or could it change per version?


Reply to this email directly or view it on GitHub
#22 (comment)
.

@paterczm
Copy link
Author

@jewzaam

you had the generator definition under "more ideas" so I didn't give it much weight in my comments. If it's the way generators are defined it shouldn't be just an idea, it's part of the spec.

I can think of only 2 use cases right now (defined at the top of the document). I proposed a minimum set of changes to cover them. The 'more ideas' section covers everything I cold think of for lightblue-platform/lightblue-core#204. I think we should have a design for everything which is potentially useful, but focus on those 2 use cases for now, especially since this is blocking the terms work. Sorry for not making this clear enough. I will update the document.

Is there value in reusable generator definitions?

I think so. StringId generator (to replace uid type) will be reusable, same thing with integerId and currentDate.

Would the generator definition really be global to all versions of metadata or could it change per version?

Add the very bottom, in the 'Further ideas' section, I proposed: 'allow generator properties to be overridden in the schema'. If there is a need, we can make them versioned, though I don't see such a need right now.

@paterczm
Copy link
Author

@bserdar

Wouldn't that make it harder to keep Lightblue db agnostic? I don't think there is a common sequence api for all rdbms databases.

@bserdar
Copy link

bserdar commented Mar 31, 2015

The actual implementation of sequence would be in a Dialect class that
should be defined separately for every DB anyway. We will have such a
dialect class to write SQL statements. The dialect to be used will be
defined in the metadata. So, I don't think adding sequence support affects
DB agnosticity

On Tue, Mar 31, 2015 at 8:19 AM, Marek notifications@github.com wrote:

@bserdar https://github.com/bserdar

Wouldn't that make it harder to keep Lightblue db agnostic? I don't think
there is a common sequence api for all rdbms databases.


Reply to this email directly or view it on GitHub
#22 (comment)
.

@jewzaam
Copy link
Member

jewzaam commented Apr 1, 2015

Given there is not a requirement for an actual sequence of identifiers, why not just generate them? The benefit of the sequence in RDBMS is you don't have to worry about uniqueness violation. For mongo we may have issues given lack of transactions and that "update" in lightblue is not atomic. Unless we go against the db directly. I just don't see a reason to do the work to support a true sequence when there isn't a requirement for it.

@paterczm
Copy link
Author

@jewzaam @bserdar @dcrissman

Let's finish this design, it has become a blocker again. I see following open questions:

  1. Should generator definition be placed in the schema or in the entityInfo (like in the current design proposal) or perhaps even be hardcoded (like types). The last option would be least flexible but also most convenient. All we need right now is auto generated numeric IDs and perhaps auto generated current dates.
  2. Do we want to support sequences or is random generation enough? Uniqueness is required, not sequences, though I'm not sure how to ensure the former without the latter. @jewzaam, can you elaborate on Value generator document #22 (comment)? Without sequences we may need a minimum parameter for numeric id generation to limit the number of conflicts with migrated data (just saying).

@jewzaam
Copy link
Member

jewzaam commented Jul 22, 2015

@paterczm re: my comment

If we do random values for the numeric generation we have to deal with collision. With a incrementing sequence this isn't a problem.

If definition is in the entityInfo it's not versioned. I think this makes sense. Meaning, if you change how the values are generated it impacts all versions of the entity. There shouldn't be variance in how things are generated probably. This would be the expectation in a sequence or trigger based approach in RDBMS at least.

For random vs sequence, what is easiest? If it's sequence then we could reduce updates to the sequence doc by grabbing a block of numbers from the sequence instead of one at a time. This is something RDBMS sequences support as well. It's a trade off of managing a set of values that have been consumed already in the sequence doc vs having to deal with unique key violations if using a random value. Which is easier?

@bserdar
Copy link

bserdar commented Jul 23, 2015

Sequences can be done the same way as the synchronization apis, as an
extension to the crud with a specialized implementation for a back end. For
mongodb we can use a special collection with one doc for each sequnce.
On Jul 22, 2015 2:06 PM, "Naveen Malik" notifications@github.com wrote:

@paterczm https://github.com/paterczm re: my comment

If we do random values for the numeric generation we have to deal with
collision. With a incrementing sequence this isn't a problem.

If definition is in the entityInfo it's not versioned. I think this makes
sense. Meaning, if you change how the values are generated it impacts all
versions of the entity. There shouldn't be variance in how things are
generated probably. This would be the expectation in a sequence or trigger
based approach in RDBMS at least.

For random vs sequence, what is easiest? If it's sequence then we could
reduce updates to the sequence doc by grabbing a block of numbers from the
sequence instead of one at a time. This is something RDBMS sequences
support as well. It's a trade off of managing a set of values that have
been consumed already in the sequence doc vs having to deal with unique key
violations if using a random value. Which is easier?


Reply to this email directly or view it on GitHub
#22 (comment)
.

@jewzaam
Copy link
Member

jewzaam commented Jul 23, 2015

Ok

On Wed, Jul 22, 2015 at 10:39 PM, Burak Serdar notifications@github.com
wrote:

Sequences can be done the same way as the synchronization apis, as an
extension to the crud with a specialized implementation for a back end. For
mongodb we can use a special collection with one doc for each sequnce.
On Jul 22, 2015 2:06 PM, "Naveen Malik" notifications@github.com wrote:

@paterczm https://github.com/paterczm re: my comment

If we do random values for the numeric generation we have to deal with
collision. With a incrementing sequence this isn't a problem.

If definition is in the entityInfo it's not versioned. I think this makes
sense. Meaning, if you change how the values are generated it impacts all
versions of the entity. There shouldn't be variance in how things are
generated probably. This would be the expectation in a sequence or
trigger
based approach in RDBMS at least.

For random vs sequence, what is easiest? If it's sequence then we could
reduce updates to the sequence doc by grabbing a block of numbers from
the
sequence instead of one at a time. This is something RDBMS sequences
support as well. It's a trade off of managing a set of values that have
been consumed already in the sequence doc vs having to deal with unique
key
violations if using a random value. Which is easier?


Reply to this email directly or view it on GitHub
<
#22 (comment)

.


Reply to this email directly or view it on GitHub
#22 (comment)
.

@jewzaam
Copy link
Member

jewzaam commented Jan 19, 2016

@paterczm is this PR still valid? It's been put into the user guide http://docs.lightblue.io/cookbook/value_generators.html

@jewzaam
Copy link
Member

jewzaam commented Apr 22, 2025

10 years later, probably safe to close? 😆
(shows up in my PR list because I'm mentioned, just cleaning house..)

@jewzaam jewzaam closed this Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants