Skip to content

Commit

Permalink
refactor(pdl): convert all pdsc to pdl (datahub-project#1678)
Browse files Browse the repository at this point in the history
Use the automated tool in https://linkedin.github.io/rest.li/pdl_migration
Also update all relevant docs
  • Loading branch information
mars-lan authored May 21, 2020
1 parent 18ce1e1 commit 1283dd3
Show file tree
Hide file tree
Showing 365 changed files with 3,779 additions and 4,226 deletions.
2 changes: 1 addition & 1 deletion datahub-web/@datahub/metadata-types/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import { IDatasetView } from '@datahub/metadata-types/entity/dataset';
The folder structure is laid out similarly to how psdc models are namespaced.
This is to aid in familiarity and ease of mental mapping between the TypeScript types defined here and the respective MP models.

Please adhere to similar namespace and path structure when creating new types that are representative of pdsc models.
Please adhere to similar namespace and path structure when creating new types that are representative of PDL models.

For cases where a type is needed, for example, an alias for convenience, that is not a corollary with a MP defined model, these types should be defined in local-types if there is no js emit.

Expand Down
6 changes: 3 additions & 3 deletions docs/architecture/metadata-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,9 @@ To prevent circular dependency ([rest.li] service depends on remote DAO, which i
Remote DAO will need to construct raw rest.li requests directly, instead of using each entity’s rest.li request builder.


[AutocompleteResult]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/AutoCompleteResult.pdsc
[Filter]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/Filter.pdsc
[SortCriterion]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/SortCriterion.pdsc
[AutocompleteResult]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/AutoCompleteResult.pdl
[Filter]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/Filter.pdl
[SortCriterion]: ../../metadata-dao/src/main/pegasus/com/linkedin/metadata/query/SortCriterion.pdl
[SearchResult]: ../../metadata-dao/src/main/java/com/linkedin/metadata/dao/SearchResult.java
[RecordTemplate]: https://github.com/linkedin/rest.li/blob/master/data/src/main/java/com/linkedin/data/template/RecordTemplate.java
[GenericRecord]: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericRecord.java
Expand Down
2 changes: 1 addition & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ It’s very similar to what you see on the community version. We have added scre
We’re working on a similar [feature](https://engineering.linkedin.com/blog/2020/data-sentinel-automating-data-validation) internally. Will evaluate and update the roadmap once we have a better idea of the timeline.

## Is DataHub capturing/showing column level [constraints](https://www.w3schools.com/sql/sql_constraints.asp) set at table definition?
The [SchemaField](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/schema/SchemaField.pdsc) model currently does not capture any property/field corresponding to constraints defined in the table definition. However, it should be fairly easy to extend the model to support that if needed.
The [SchemaField](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/schema/SchemaField.pdl) model currently does not capture any property/field corresponding to constraints defined in the table definition. However, it should be fairly easy to extend the model to support that if needed.

## How does DataHub manage extracting metadata from stores residing in different security zones?
MCE is the ideal way to push metadata from different security zones, assuming there is a common Kafka infrastructure that aggregates the events from various security zones.
Expand Down
6 changes: 3 additions & 3 deletions docs/how/add-new-aspect.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# How to add a new metadata aspect?

Adding a new metadata [aspect](https://github.com/linkedin/datahub/blob/master/docs/what/aspect.md) is one of the most common ways to extend an existing [entity](https://github.com/linkedin/datahub/blob/master/docs/what/entity.md).
We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdsc) as an example here.
We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdl) as an example here.

1. Add the aspect model to the corresponding namespace (e.g. [`com.linkedin.identity`](https://github.com/linkedin/datahub/tree/master/metadata-models/src/main/pegasus/com/linkedin/identity))

2. Extend the entity's aspect union to include the new aspect (e.g. [`CorpUserAspect`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/CorpUserAspect.pdsc))
2. Extend the entity's aspect union to include the new aspect (e.g. [`CorpUserAspect`](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/CorpUserAspect.pdl))

3. Rebuild the rest.li [IDL & snapshot](https://linkedin.github.io/rest.li/modeling/compatibility_check) by running the following command from the project root
```
./gradlew :gms:impl:build -Prest.model.compatibility=ignore
```

4. To surface the new aspect at the top-level [resource endpoint](https://linkedin.github.io/rest.li/user_guide/restli_server#writing-resources), extend the resource data model (e.g. [`CorpUser`](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/identity/CorpUser.pdsc)) with an optional field (e.g. [`editableInfo`](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/identity/CorpUser.pdsc#L19)). You'll also need to extend the `toValue` & `toSnapshot` methods of the top-level resource (e.g. [`CorpUsers`](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsers.java)) to convert between the snapshot & value models.
4. To surface the new aspect at the top-level [resource endpoint](https://linkedin.github.io/rest.li/user_guide/restli_server#writing-resources), extend the resource data model (e.g. [`CorpUser`](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/identity/CorpUser.pdl)) with an optional field (e.g. [`editableInfo`](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/identity/CorpUser.pdl#L21)). You'll also need to extend the `toValue` & `toSnapshot` methods of the top-level resource (e.g. [`CorpUsers`](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsers.java)) to convert between the snapshot & value models.

5. (Optional) If there's need to update the aspect via API (instead of/in addition to MCE), add a [sub-resource](https://linkedin.github.io/rest.li/user_guide/restli_server#sub-resources) endpoint for the new aspect (e.g. [`CorpUsersEditableInfoResource`](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsersEditableInfoResource.java)). The sub-resource endpiont also allows you to retrieve previous versions of the aspect as well as additional metadata such as the audit stamp.

Expand Down
8 changes: 4 additions & 4 deletions docs/how/entity-onboarding.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ GMS uses [Spring Framework](https://docs.spring.io/spring-framework/docs/current
## 7. UI for entity onboarding [WIP]

[Aspect]: ../what/aspect.md
[`DatasetAspect`]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/DatasetAspect.pdsc
[`DatasetAspect`]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/DatasetAspect.pdl
[Snapshot]: ../what/snapshot.md
[`DatasetSnapshot`]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/DatasetSnapshot.pdsc
[Snapshot Union]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/Snapshot.pdsc
[`DatasetSnapshot`]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/DatasetSnapshot.pdl
[Snapshot Union]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/Snapshot.pdl
[Entity]: ../what/entity.md
[DatasetEntity]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/entity/DatasetEntity.pdsc
[DatasetEntity]: ../../metadata-models/src/main/pegasus/com/linkedin/metadata/entity/DatasetEntity.pdl
[`CorpUsers`]: ../../gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsers.java
[resource endpoint]: https://linkedin.github.io/rest.li/user_guide/restli_server#writing-resources
[sub-resource endpoint]: https://linkedin.github.io/rest.li/user_guide/restli_server#sub-resources
Expand Down
38 changes: 16 additions & 22 deletions docs/how/graph-onboarding.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,22 @@ If you need to define a [relationship] which is not available in the set of [rel
that relationship model should be implemented as a first step for graph onboarding.
Below is an example model for `OwnedBy` relationship:

```json
{
"type": "record",
"name": "OwnedBy",
"namespace": "com.linkedin.metadata.relationship",
"doc": "A generic model for the Owned-By relationship",
"include": [
"BaseRelationship"
],
"pairings": [
{
"source": "com.linkedin.common.urn.DatasetUrn",
"destination": "com.linkedin.common.urn.CorpuserUrn"
}
],
"fields": [
{
"name": "type",
"type": "com.linkedin.common.OwnershipType",
"doc": "The type of the ownership"
}
]
```
namespace com.linkedin.metadata.relationship
import com.linkedin.common.OwnershipType
/**
* A generic model for the Owned-By relationship
*/
@pairings = [ {
"destination" : "com.linkedin.common.urn.CorpuserUrn",
"source" : "com.linkedin.common.urn.DatasetUrn"
} ]
record OwnedBy includes BaseRelationship {
/** The type of the ownership */
type: OwnershipType
}
```

Expand Down
2 changes: 1 addition & 1 deletion docs/how/metadata-modelling.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# How to model metadata ?
[GMA](../what/gma.md) uses [rest.li](https://rest.li), which is LinkedIn's open source REST framework. All metadata in GMA needs to be modelled using [Pegasus schema (PDSC)](https://linkedin.github.io/rest.li/pdsc_syntax) which is the data schema for [rest.li](https://rest.li).
[GMA](../what/gma.md) uses [rest.li](https://rest.li), which is LinkedIn's open source REST framework. All metadata in GMA needs to be modelled using [Pegasus schema (PDL)](https://linkedin.github.io/rest.li/pdl_schema) which is the data schema for [rest.li](https://rest.li).

Conceptually we’re modelling metadata as a hybrid graph of nodes ([entities](../what/entity.md)) and edges ([relationships](../what/relationship.md)), with additional documents ([metadata aspects](../what/aspect.md)) attached to each node. You can also think of it as a modified [Entity-Relationship Model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model).

Expand Down
2 changes: 0 additions & 2 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
Below is DataHub's roadmap for the short and medium term. We'll revise this on a regular basis and welcome suggestions from the communities.

## Short term (3-6 months)
### Replace PDSC with [PDL](https://linkedin.github.io/rest.li/pdl_schema) [*WIP*]
- More readable, Java-like syntax + code-gen based on annotations
### Aspect-specific MCE & MAE [*WIP*]
- Split up unified events to improve scalability & modularity
### Metrics as entities [*LinkedIn-internal, waiting to open source*]
Expand Down
47 changes: 20 additions & 27 deletions docs/what/aspect.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What is a metadata aspect?

A metadata aspect is a structured document, or more precisely a `record` in [PDSC](https://linkedin.github.io/rest.li/DATA-Data-Schema-and-Templates),
A metadata aspect is a structured document, or more precisely a `record` in [PDL](https://linkedin.github.io/rest.li/pdl_schema),
that represents a specific kind of metadata (e.g. ownership, schema, statistics, upstreams).
A metadata aspect on its own has no meaning (e.g. ownership for what?) and must be associated with a particular entity (e.g. ownership for PageViewEvent).
We purposely not to impose any model requirement on metadata aspects, as each aspect is expected to differ significantly.
Expand All @@ -22,31 +22,24 @@ Here’s an example metadata aspect. Note that the `admin` and `members` fields
It’s very natural to save such relationships as URNs in a metadata aspect.
The [relationship](relationship.md) section explains how this relationship can be explicitly extracted and modelled.

```json
{
"type": "record",
"name": "Membership",
"namespace": "com.linkedin.group",
"doc": "The membership metadata for a group",
"fields": [
{
"name": "auditStamp",
"type": "com.linkedin.common.AuditStamp",
"doc": "Audit stamp for the last change"
},
{
"name": "admin",
"type": "com.linkedin.common.CorpuserUrn",
"doc": "Admin of the group"
},
{
"name": "members",
"type": {
"type": "array",
"items": "com.linkedin.common.CorpuserUrn"
},
"doc": "Members of the group, ordered in descending importance"
}
]
```
namespace com.linkedin.group
import com.linkedin.common.AuditStamp
import com.linkedin.common.CorpuserUrn
/**
* The membership metadata for a group
*/
record Membership {
/** Audit stamp for the last change */
auditStamp: AuditStamp
/** Admin of the group */
admin: CorpuserUrn
/** Members of the group, ordered in descending importance */
members: array[CorpuserUrn]
}
```
97 changes: 40 additions & 57 deletions docs/what/delta.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,60 @@
# What is a metadata delta?

Rest.li supports [partial update](https://linkedin.github.io/rest.li/user_guide/restli_server#partial_update) natively without needing explicitly defined models.
However, the granularity of update is always limited to each field in a PDSC model.
However, the granularity of update is always limited to each field in a PDL model.
There are cases where the update need to happen at an even finer grain, e.g. adding or removing items from an array.

To this end, we’re proposing the following entity-specific metadata delta model that allows atomic partial updates at any desired granularity.
Note that:
1. Just like metadata [aspects](aspect.md), we’re not imposing any limit on the partial update model, as long as it’s a valid PDSC record.
1. Just like metadata [aspects](aspect.md), we’re not imposing any limit on the partial update model, as long as it’s a valid PDL record.
This is because the rest.li endpoint will have the logic that performs the corresponding partial update based on the information in the model.
That said, it’s common to have fields that denote the list of items to be added or removed (e.g. `membersToAdd` & `membersToRemove` from below)
2. Similar to metadata [snapshots](snapshot.md), entity that supports metadata delta will add an entity-specific metadata delta
(e.g. `GroupDelta` from below) that unions all supported partial update models.
3. The entity-specific metadata delta is then added to the global `Delta` typeref, which is added as part of [Metadata Change Event](mxe.md#metadata-change-event-mce) and used during [Metadata Ingestion](../architecture/metadata-ingestion.md).

```json
{
"type": "record",
"name": "MembershipPartialUpdate",
"namespace": "com.linkedin.group",
"doc": "A metadata delta for a specific group entity.",
"fields": [
{
"name": "membersToAdd",
"doc": "List of members to be added to the group.",
"type": {
"type": "array",
"items": "com.linkedin.common.CorpuserUrn"
}
},
{
"name": "membersToRemove",
"doc": "List of members to be removed from the group.",
"type": {
"type": "array",
"items": "com.linkedin.common.CorpuserUrn"
}
}
]
}
```
namespace com.linkedin.group
import com.linkedin.common.CorpuserUrn
/**
* A metadata delta for a specific group entity
*/
record MembershipPartialUpdate {
/** List of members to be added to the group */
membersToAdd: array[CorpuserUrn]
```json
{
"type": "record",
"name": "GroupDelta",
"namespace": "com.linkedin.metadata.delta",
"doc": "A metadata delta for a specific group entity.",
"fields": [
{
"name": "urn",
"type": "com.linkedin.common.CorpGroupUrn",
"doc": "URN for the entity the metadata delta is associated with."
},
{
"name": "delta",
"doc": "The specific type of metadata delta to apply.",
"type": [
"com.linkedin.group.MembershipPartialUpdate"
]
}
]
/** List of members to be removed from the group */
membersToRemove: array[CorpuserUrn]
}
```

```json
{
"type": "typeref",
"name": "Delta",
"namespace": "com.linkedin.metadata.delta",
"doc": "A union of all supported metadata delta types.",
"ref": [
"DatasetDelta",
"GroupDelta"
]
```
namespace com.linkedin.metadata.delta
import com.linkedin.common.CorpGroupUrn
import com.linkedin.group.MembershipPartialUpdate
/**
* A metadata delta for a specific group entity
*/
record GroupDelta {
/** URN for the entity the metadata delta is associated with */
urn: CorpGroupUrn
/** The specific type of metadata delta to apply */
delta: union[MembershipPartialUpdate]
}
```

```
namespace com.linkedin.metadata.delta
/**
* A union of all supported metadata delta types.
*/
typeref Delta = union[GroupDelta]
```
Loading

0 comments on commit 1283dd3

Please sign in to comment.