Skip to content

JSON Schema Ingestion Fails on $ref Loop in Definitions #14358

@shonigbaum

Description

@shonigbaum

Describe the bug
When ingesting a JSON Schema using the json-schema source type, the ingestion process fails to detect and handle $ref loops correctly. This results in an infinite loop during import, and the process never completes.

To Reproduce
Run an ingestion of recipe for source type json-schema with the following stripped down schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://schema.com/schema/1.0",
  "type": "object",
  "properties": {
    "test": {
      "$ref": "#/definitions/condition"
    }
  },
  "required": [
    "test"
  ],
  "definitions": {
    "condition": {
      "type": "object",
      "properties": {
        "condition": {
          "$ref": "#/definitions/condition"
        }
      }
    }
  }
}

Expected behavior
The ingestion process should detect the $ref loop and either:

  • Resolve it safely (e.g., by limiting recursion depth), or
  • Fail gracefully with a clear error message indicating the loop.

Actual behavior
The ingestion process enters an infinite loop and does not complete.

Desktop (please complete the following information):

  • Docker Image: acryldata/datahub-ingestion:v1.1.0
  • DataHub version: v1.2.0

Additional context
This kind of $ref loop is valid in some schema use cases (e.g., recursive structures), and many JSON Schema parsers handle it by tracking visited references. It would be helpful if DataHub could implement similar handling or provide guidance on supported schema patterns.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions