Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
a915fab
Merge pull request #87 from sqlparser/feature/shenhuan
shenhuan2021 Jun 2, 2024
2de9db6
Update getcsv.py
shenhuan2021 Jun 2, 2024
946e2b8
Update GenerateDataLineageDemo.py
shenhuan2021 Jun 2, 2024
9aeb3e8
Update getcsv.py
shenhuan2021 Jun 2, 2024
9a85afc
Update getcsv.py
shenhuan2021 Jun 2, 2024
36e8902
Update getcsv.py
shenhuan2021 Jun 10, 2024
4018232
Update GenerateDataLineageDemo.py
shenhuan2021 Jun 10, 2024
0cf11eb
Update GenerateTokenDemo.py
shenhuan2021 Jun 11, 2024
ce96a2f
Rename GenerateTokenDemo.py to GenerateToken.py
shenhuan2021 Jun 11, 2024
f90ad0e
Update GenerateToken.py
shenhuan2021 Jun 11, 2024
2487454
Update getcsv.py
shenhuan2021 Jun 11, 2024
54f7ce0
Create GenerateLineageParam.py
shenhuan2021 Jun 11, 2024
c417163
Update GenerateDataLineageDemo.py
shenhuan2021 Jun 11, 2024
0bf3949
Update getcsv.py
shenhuan2021 Jun 16, 2024
cf2aeeb
Update getcsv.py
shenhuan2021 Jun 25, 2024
9b12156
Update GenerateToken.py
shenhuan2021 Jun 25, 2024
2a814d1
Update GenerateDataLineageDemo.py
shenhuan2021 Jun 25, 2024
8b1c6d3
fix typo
sqlparser Aug 9, 2024
0279842
add more doc about lineage model
sqlparser Aug 9, 2024
d0f193a
add document for identifier and string literal
sqlparser Aug 28, 2024
5d4c869
add document for identifier and string literal
sqlparser Aug 28, 2024
60366f7
add document for identifier and string literal
sqlparser Aug 28, 2024
5c1b935
sql server proc return record set
sqlparser Aug 28, 2024
187757f
Update identifier-and-string-literal.md
shenhuan2021 Sep 1, 2024
f834029
Update identifier-and-string-literal.md
shenhuan2021 Sep 1, 2024
fa58dc4
add python demo to illustrates how to get token and call the api
sqlparser Sep 12, 2024
bd9fe4d
Create CheckSyntax.py
shenhuan2021 Sep 19, 2024
00f9b2f
Update CheckSyntax.py
shenhuan2021 Sep 19, 2024
7f355c5
Update and rename CheckSyntax.py to checksyntax.py
shenhuan2021 Sep 20, 2024
0d1f187
Create toxml.py
shenhuan2021 Sep 20, 2024
1108d8f
Update checksyntax.py
shenhuan2021 Sep 20, 2024
717b3dc
Update checksyntax.py
shenhuan2021 Sep 20, 2024
5b4e046
Update toxml.py
shenhuan2021 Sep 21, 2024
3e2d74e
Update toxml.py
shenhuan2021 Sep 21, 2024
0332868
Update toxml.py
shenhuan2021 Sep 21, 2024
76c28a8
Update toxml.py
shenhuan2021 Sep 21, 2024
7652835
refine get started doc
sqlparser Sep 29, 2024
e06e998
add doc for intermidate resultset
sqlparser Sep 30, 2024
b331836
add info to intermediate resultset
sqlparser Oct 1, 2024
e726812
add pivot function doc in intermediate reseultset
sqlparser Oct 1, 2024
e3f42a7
table alias intermediate resultset doc
sqlparser Oct 1, 2024
ec3fe7d
add doc for transform and temporary table
sqlparser Oct 2, 2024
e0b0275
update index in doc readme
sqlparser Oct 2, 2024
379cbb9
upload readme index for basic concepts
sqlparser Oct 3, 2024
90610bc
the basic concepts and elements of the data lineage
sqlparser Oct 3, 2024
9874570
basic concepts doc update
sqlparser Oct 7, 2024
d78df2a
add new doc for discover data lineage in cte and sub-queries
sqlparser Oct 13, 2024
5977969
widget doc
sqlparser Oct 25, 2024
1398d93
add widget component
sqlparser Oct 25, 2024
e069aed
replace sqlflowjs with widget
sqlparser Oct 25, 2024
fa62a26
fix bugs
Nov 6, 2024
d782cd3
visualize a job from json
Nov 6, 2024
2e604de
add demo.json
Nov 6, 2024
eb208ba
dlineage doc
sqlparser Nov 21, 2024
4384411
add new dlineage parameters
sqlparser Nov 27, 2024
efa9f64
add new dlineage parameters
sqlparser Nov 27, 2024
f91c0f3
update upstream and downstream parameter
sqlparser Dec 8, 2024
5c4db13
add instructions about how to remove relationship in where clause
sqlparser Feb 21, 2025
f569492
Initial content for MkDocs
sqlparser Apr 29, 2025
eac6ec6
github action for mkdocs
sqlparser Apr 29, 2025
ad206cb
github action for mkdocs 1
sqlparser Apr 29, 2025
d5faf93
doc for gsp, use release/docs branch to build site
sqlparser Apr 29, 2025
e5a0fef
udpate gsp and sqlflow doc
sqlparser Apr 29, 2025
05545ef
rename action name
sqlparser Apr 29, 2025
be0259c
Start to add data lineage schema v2
sqlparser Aug 23, 2025
89fab2f
Docs: Refine data lineage introduction for beginners
sqlparser Aug 31, 2025
1189875
Docs: Add section on upcoming v2 lineage schema
sqlparser Aug 31, 2025
ba456ce
Docs: Enhance direct dataflow documentation with effectType annotations
sqlparser Aug 31, 2025
e2771a0
Docs: Refine indirect dataflow and RelationRows; add v1→v2 guidance w…
sqlparser Aug 31, 2025
55308c6
Docs: Expand indirect dataflow, RelationRows, and v1→v2 mapping
sqlparser Aug 31, 2025
5aa9898
docs(basic-concepts): refine WHERE/GROUP BY indirect dataflow; add v2…
sqlparser Sep 1, 2025
e590ae8
docs(basic-concepts): refine dataflow chain for beginners and v2 model
sqlparser Sep 2, 2025
47c9c58
I'll check the repository's commit message style guide to format the …
sqlparser Sep 3, 2025
0b1de1f
Refactor transforms doc for beginners and v2 mapping
sqlparser Sep 3, 2025
1ec9c99
Add join modeling guide: v1 and v2 with examples
sqlparser Sep 3, 2025
8f5f020
Add ER diagram guide: v1 example and v2 modeling
sqlparser Sep 3, 2025
e2b9361
Docs: Clarify GROUP BY data lineage explanation
sqlparser Sep 4, 2025
b9aca41
Restructure indirect dataflow doc for clarity
sqlparser Sep 4, 2025
a21c7f5
Clarify v1 vs v2 function modeling
sqlparser Sep 4, 2025
070c5f6
Docs: Refine data lineage concepts and JOIN modeling
sqlparser Sep 6, 2025
6393f2a
init for gudu sql omni
sqlparser Sep 28, 2025
58a43d9
add images for gudu sql omni
sqlparser Sep 28, 2025
e8b1489
update gudu sql omni readme
sqlparser Sep 29, 2025
6a8e857
refine license file
sqlparser Oct 2, 2025
4200b01
add blog image https://www.dpriver.com/blog/2025/10/tracking-column-l…
sqlparser Oct 29, 2025
a391785
Add CASE expression lineage documentation
Feb 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .cursor/rules/git-commit-styleguide.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
alwaysApply: true
---
# Git Commit Message Style Guide

When writing commit messages, follow these seven rules:

1. Separate subject from body with a blank line
2. Limit the subject line to 50 characters
3. Capitalize the subject line
4. Do not end the subject line with a period
5. Use the imperative mood in the subject line
6. Wrap the body at 72 characters
7. Use the body to explain what and why vs. how
44 changes: 44 additions & 0 deletions .cursor/rules/refine-data-lineage-docs.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
alwaysApply: true
---
# Data Lineage Documentation Refinement Guide

When refining or refactoring data lineage documentation, remember that the primary audience is beginners in data governance. The language should be simple, clear, and use analogies to explain complex concepts. The document should be structured into two main parts, following the example set in [1-introduction.md](mdc:sqlflow_public/doc/basic-concepts/1-introduction.md).

## Part 1: Current Data Lineage Model (v1)

This section should explain the foundational concepts of the original SQLFlow data lineage model. Focus on simplicity and core ideas.

- **Core Concepts**: Explain data objects (`dbobjs`) and relationships (`relations`).
- **Relationship Types**: Clearly define `fdd` (direct flow) and `fdr` (indirect/impact flow) with simple SQL examples.
- **Effect Type (v1)**: When available in examples, explain `effectType` as the SQL statement/operation kind that produced the relationship (e.g., `select`, `insert`, `update`, `merge_update`, `create_view`). Use short callouts like “effectType: select” near the example so readers connect the edge to its producing statement.
- **Source of Truth**: Base this section on the information from the v1 schema and design documents.
- v1 Schema: [data_lineage_schema_v1.json](mdc:gsp_java/docs/AI/cline_sqlflow/data_lineage_schema_v1.json)
- v1 Design Explanation: [data_lineage_design_explanation_v1.md](mdc:gsp_java/docs/AI/cline_sqlflow/data_lineage_design_explanation_v1.md)

## Part 2: Next-Generation Data Lineage Model (v2)

This section should introduce the new, more powerful v2 schema as an evolution of the v1 model. Emphasize that it's an improvement and still under development.

- **Key Improvements**: Explain the benefits of the new model, such as enhanced precision, traceability, and scalability.
- **Concept Mapping**: Provide a clear mapping from v1 concepts to v2 concepts (e.g., `fdr` becomes `restricts` and `groups`).
- **New Features**: Introduce new concepts like `lineageObjects` with `qualifiedName`, atomic relationships, `observations` for evidence, and `transforms` for detailing logic.
- **Effect Type (v2)**: Use `effectType` on relationships to convey the nature/strength of the mapping. Where possible, add a brief parenthetical after examples, e.g., “effectType: EXACT_COPY”. Recommended guidance:
- Simple alias or field passthrough: `EXACT_COPY` (no change in meaning)
- Expression/function transforms (e.g., `ROUND`, `UPPER`): `WEAK_COPY` (value changed)
- Aggregations (`SUM`, `COUNT`, `AVG`): `AGGREGATION` (or `WEAK_COPY` if `AGGREGATION` isn’t supported)
- Multi-source expressions (e.g., `a + b`): `PARTIAL_COPY`
- Uncertain or heuristic mapping: `AMBIGUOUS`
- For detailed categories, please check ## 15. Lineage Categories(effectType)与 SQL 推导规则 in [data_lineage_design_explanation.md](mdc:gsp_java/docs/AI/cline_sqlflow/data_lineage_design_explanation.md)
Add a one-line rationale when helpful (e.g., “aggregation changes granularity”).
- **Source of Truth**: Base this section on the information from the v2 schema and design documents.
- v2 Schema: [data_lineage_schema.json](mdc:gsp_java/docs/AI/cline_sqlflow/data_lineage_schema.json)
- v2 Design Explanation: [data_lineage_design_explanation.md](mdc:gsp_java/docs/AI/cline_sqlflow/data_lineage_design_explanation.md)

## Writing Tips (applies to both parts)

- Prefer short, concrete examples; add a compact effect type note where it clarifies intent.
- When showing function-based flows, include the `transforms.code` (e.g., `ROUND(salary)`) and set a plausible `effectType` as above.
- For statements producing multiple edges (e.g., `INSERT ... SELECT`), show separate 1→1 edges (v2) and mention a shared `statementKey`; add edge-level `effectType` where appropriate.

The goal is to create a seamless document that first teaches the basics (v1) and then introduces the advanced, more detailed concepts (v2) as a natural progression, with `effectType` annotations to make relationships more precise and intuitive.
74 changes: 74 additions & 0 deletions .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Deploy GSP and SQLFlow Documentation to GitHub Pages

on:
# Trigger the workflow on push events to the main branch
push:
branches:
- release/docs
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read # Read access to checkout the code
pages: write # Write access to deploy to Pages
id-token: write # Needed for OIDC token if using advanced deployment methods

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
# If you have git-submodules
# with:
# submodules: recursive

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.x # Use a recent Python 3 version
cache: 'pip' # Cache pip dependencies

- name: Install dependencies
run: pip install -r requirements.txt # Install from requirements.txt

# --- !!! ---
# Add steps here to generate automatic content if needed
# Example:
# - name: Generate Javadoc
# run: |
# echo "Running Javadoc generation..."
# # Actual command to generate Javadoc into e.g., docs/reference/javadoc
# mkdir -p docs/reference/javadoc
# echo "<html><body>Generated Javadoc Placeholder</body></html>" > docs/reference/javadoc/index.html
# --- !!! ---

- name: Build MkDocs site
working-directory: ./site-docs
run: mkdocs build --verbose # Build into the 'site' directory

- name: Upload artifact
uses: actions/upload-pages-artifact@v3 # Use updated action
with:
# Upload entire site directory built by mkdocs
path: './site-docs/site'

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }} # Output the deployed URL
runs-on: ubuntu-latest
needs: build # Run after the build job is successful
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4 # Use updated action for deployment
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,21 @@
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

# Python virtual environments
venv/
.venv/
env/
.env/
*/venv/ # Ignore venv directories in any subdirectory too (optional)
*/.venv/ # Ignore .venv directories in any subdirectory too (optional)

./site-docs/site
.cache/
__pycache__/
*.pyc
*.pyo
*.pyd
*.pyw
*.pyz
*.pywz
*.pyzw
14 changes: 14 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"type": "chrome",
"request": "launch",
"name": "Open index.html",
"file": "f:\\depot\\github\\sqlflow_public\\widget\\index.html"
}
]
}
Loading