Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 entity-validation/catalog: Add possibility to validate multiple Entities that depend on each other #568

Open
2 tasks done
knowacki23 opened this issue Jun 24, 2024 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@knowacki23
Copy link
Contributor

knowacki23 commented Jun 24, 2024

Plugin Name

🚀 entity-validation/catalog

🔖 Feature description

Add possibility to send more then one entity to /api/catalog/validate-entity and add temporary registry of validated entities to allow validation of multiple entities that are depending on each other.
This feature would allow validation of multiple entities at once which would be helpful when user is trying to introduce multiple entities that depend on each other at once.

We are using validate-entity endpoint to validate entities in Pull Requests. If user tries to introduce more than one entity and these entities are depending on each other validator fails.

🎤 Context

While using entity-validation we have realized that it is impossible to validate two (or more) separate entities that depend on each other.
For example, if we want to validate new Group entity and a Component that should be owned by that Group entity validator will return error for the Component because it doesn't know about the group entity.
Sample validator input:

apiVersion: backstage.io/v1beta1
kind: Group
metadata:
  namespace: default
  annotations:
  name: fake-team
  title: FAKE-Team
  description: This is a fake team for test purposes
spec:
  type: Team
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: backstage
  title: Backstage
  namespace: delivery
  description: Backstage
spec:
  type: website
  lifecycle: production
  owner: fake-team
  system: fake-system

validator response:
Validates Group entity correctly.
for the Component entity returns following error:

Processor CustomEntityValidationProcessor threw an error while validating the entity component:fake-namespace/backstage; caused by ValidationError: spec.owner: "fake-team" failed endpoint validation as it is not a valid group in backstage; entityOwner: fake-team;

In the networking tab of developer web browser tools I see that validator sends two separate POST requests to the https://<backstage-url>/api/catalog/validate-entity endpoints. Each of the requests is for each of the components passed to Entity Validator.
Screenshot from 2024-06-24 16-00-00
The first one is for the Group entity and it returns 200, the second one is for the Component and it returns 400 due to missing Group entity.

✌️ Possible Implementation

Modify validator and validate-entity endpoint form catalog so the endpoint would accept more than one entity with a single API call.
We would send all the entities which should be validated within single API call and then store them in some temporary array and validate other entities from the same request against Software Catalog and that array as well.
Or maybe some temporary Software Catalog only for validation purposes?

Would it make sense?

👀 Have you spent some time to check if this feature request has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

@knowacki23 knowacki23 added the enhancement New feature or request label Jun 24, 2024
@awanlin
Copy link
Contributor

awanlin commented Jul 2, 2024

Hi @knowacki23, sounds like a solid feature to add, feel free to submit a PR 🚀

@awanlin awanlin added the help wanted Extra attention is needed label Jul 16, 2024
@knowacki23
Copy link
Contributor Author

Hey @awanlin.
I would like to contribute regarding that issue.
May I ask you to assign it to me?

@knowacki23
Copy link
Contributor Author

knowacki23 commented Jul 26, 2024

Hey,
I did some changes to catalog-backend in order to be able to send multiple entities in a single API call.
What I did is:

{
    "location":"url:https://url-to-file/catalog-info.yaml",
    "entities": [
        {
            "entity":{
                "apiVersion":"backstage.io/v1alpha1",
                "kind":"Component",
                "metadata":{
                    "name":"test",
                    "title":"Test",
                    "namespace":"test-namespace",
                    "description":"Component description",
                    "links":[],
                    "tags":[],
                    "annotations":{}
                },
                "spec":{
                    "type":"service",
                    "lifecycle":"experimental",
                    "owner":"entity-owner",
                    "system":"entity-system"
                }
            }
        },
        {
            "entity":{
                "apiVersion":"backstage.io/v1alpha1",
                "kind":"Component",
                "metadata":{
                    "name":"test",
                    "title":"Test",
                    "namespace":"test-namespace",
                    "description":"Component description",
                    "links":[],
                    "tags":[],
                    "annotations":{}
                },
                "spec":{
                    "type":"service",
                    "lifecycle":"experimental",
                    "owner":"entity-owner",
                    "system":"entity-system"
                }
            }
        }
    ]
}

since all the entities sent to validator are coming from the same location we only need one location parameter per api call, in the request body we are also passing entities array build of entity.
Actual response is not yet created, but I'm thinking that it should be an array made of entity name, validation status and errors

And while I'm looking at that solution I'm thinking if it makes sense, or if its even a good solution.

I'm afraid that this might be a bad approach.

Does anyone has some idea or strong opinions regarding this matter?

The problem I want to solve is that we are using a custom validation rules which are checking if an entity under validation has an owner that actually exists in Software Catalog or the system that it is going to be a part of exists in the catalog as well.
And in a scenario when we want to validate two entities, lets say Component and System (both are not yet registered to catalog), and that Component is a part of that new System the validation for Component will fail.
We would like to be able to pass an array of entities to the validator processor so our custom validator would receive all of the entities which are being validated so we can check if missing entities for some components are also being validated - if yes then the validation should be successful.
image

@namco1992
Copy link

I was facing a similar situation when I built a linter to verify the catalog-info.yaml files:

  1. The existing api/catalog/validate-entity that triggers a dry-run of processing doesn't validate the relationships, which is what we want to do in the linter.
  2. When we have multiple entities to validate and they depend on each other (the exact situation you described in the issue), Backstage won't have the context as the entities have yet to be ingested in the catalog.
  3. We can introduce a custom processor to solve point 1. The custom processor can emit all the refs and validate if they exist in the catalog. However, the custom processor must not run during the actual processing, since the backstage process entities one by one during ingestion, and the refs check will fail the process.

What I did to solve the problem:

  • Created a custom processor to emit all the relations and validate them against the catalog.
  • (Hack 1) Added an annotation enforce-ref-check, and the custom processor only runs when it sees enforce-ref-check=true in the annotations.
    • When we call api/catalog/validate-entity, we will add the annotation for all the entities we send to the API, which triggers the custom processor.
    • During the normal processing, there is no such annotation in the entities and the custom processor is a no-op.
  • (Hack 2)On the linter side, we did another hack to go through the validation errors and exclude the ones caused by the entities submitted together.

As you can see here, I have to do some "dirty" hacks to pull this off. I personally would love to see some progress on this matter so I don't have to maintain all the hacks I did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants