Skip to content

A set of open source routines for converting between different data representations using the W3C SHACL specification and AI generation.

License

Notifications You must be signed in to change notification settings

kurtcagle/shaclify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HR Database to RDF Conversion Package

Complete TARQL-based conversion system for transforming SQL Server HR database to RDF using SHACL 1.2 schema.

πŸ“¦ Package Contents

Core Files

  • hr_database_shacl.ttl - SHACL 1.2 schema defining the RDF structure
  • hr_database.sql - Original SQL Server DDL for reference
  • convert_all.sh - Master bash script to run all conversions
  • CONVERSION_GUIDE.md - Comprehensive step-by-step conversion guide
  • TARQL_QUICK_REFERENCE.md - Quick reference for TARQL patterns

TARQL Query Files (/tarql)

11 SPARQL CONSTRUCT queries for converting each table:

  1. 01_departments.tarql - Departments table
  2. 02_positions.tarql - Positions table
  3. 03_employees.tarql - Employees table (main)
  4. 04_employeepositions.tarql - Position assignments
  5. 05_salaries.tarql - Salary history
  6. 06_benefits.tarql - Benefit plans
  7. 07_employeebenefits.tarql - Benefit enrollments
  8. 08_timeoffrequests.tarql - Time off requests
  9. 09_performancereviews.tarql - Performance reviews
  10. 10_trainingcourses.tarql - Training courses
  11. 11_employeetraining.tarql - Training completions

Sample Data (/sample_csv)

  • departments.csv - Sample department data
  • employees.csv - Sample employee data

πŸš€ Quick Start

1. Export Your SQL Server Data

-- Run in SQL Server Management Studio
USE HumanResources;

-- Export each table to CSV
-- See CONVERSION_GUIDE.md for complete export scripts

2. Place CSV Files

your-project/
β”œβ”€β”€ tarql/              # TARQL query files (from this package)
β”œβ”€β”€ csv/                # Your exported CSV files
β”‚   β”œβ”€β”€ departments.csv
β”‚   β”œβ”€β”€ positions.csv
β”‚   β”œβ”€β”€ employees.csv
β”‚   └── ... (all 11 tables)
└── convert_all.sh      # Conversion script

3. Run Conversion

# Make script executable
chmod +x convert_all.sh

# Run conversion
./convert_all.sh

# Output will be in output/ directory

4. Validate Results

# Using pyshacl
pyshacl -s hr_database_shacl.ttl \
        -d output/hr_database_complete.ttl \
        -f human

πŸ“Š What Gets Converted

Data Model

The conversion transforms 11 SQL tables into RDF:

Core Entities:

  • Departments (organizational units)
  • Positions (job definitions)
  • Employees (personnel records)

Relationships:

  • EmployeePositions (assignment history)
  • Salaries (compensation history)

Benefits & Training:

  • Benefits (available plans)
  • EmployeeBenefits (enrollments)
  • TrainingCourses (catalog)
  • EmployeeTraining (completions)

Activities:

  • TimeOffRequests (leave management)
  • PerformanceReviews (evaluations)

IRI Structure

All entities get unique IRIs following this pattern:

http://example.org/hr/{table}/{id}

Examples:

<http://example.org/hr/employee/1001>
<http://example.org/hr/department/5>
<http://example.org/hr/salary/9876>

Property Mapping

SQL columns β†’ RDF properties via sh:codeIdentifier:

# SQL Column: FirstName
# RDF Property: hr:firstName
# Defined in SHACL as:
hr-shape:Employee-firstName a sh:PropertyShape ;
    sh:path hr:firstName ;
    sh:codeIdentifier "FirstName" ;  # Maps to SQL column
    sh:datatype xsd:string .

Foreign Key Handling

SQL foreign keys β†’ RDF object properties:

-- SQL
ManagerEmployeeID INT FOREIGN KEY REFERENCES Employees(EmployeeID)
# RDF
<http://example.org/hr/department/1>
    hr:managerEmployee <http://example.org/hr/employee/1001> .

πŸ”§ TARQL Query Structure

Each TARQL query follows this pattern:

PREFIX hr: <http://example.org/hr/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

CONSTRUCT {
  # Generate RDF triples
  ?iri a hr:ClassName ;
    hr:property ?value .
}
WHERE {
  # Bind IRI from primary key
  BIND(URI(CONCAT("http://example.org/hr/table/", STR(?ID))) AS ?iri)
  
  # Cast datatypes
  BIND(xsd:integer(?ID) AS ?value)
  
  # Handle optional fields
  BIND(IF(BOUND(?OptionalColumn) && ?OptionalColumn != "",
          xsd:string(?OptionalColumn),
          ?UNDEF) AS ?optionalValue)
  
  # Create foreign key references
  BIND(URI(CONCAT("http://example.org/hr/employee/", STR(?EmployeeID))) 
       AS ?employeeIri)
}

πŸ“ CSV Format Requirements

Structure

  • First row: Column headers matching SQL column names
  • Delimiter: Comma (,)
  • Encoding: UTF-8
  • Line endings: Unix (LF) or Windows (CRLF)

Data Types

  • Integers: Plain numbers (e.g., 1001)
  • Decimals: Use period (e.g., 50000.00)
  • Strings: No quotes needed (unless contain comma)
  • Dates: ISO format YYYY-MM-DD
  • DateTimes: ISO format YYYY-MM-DD HH:MM:SS
  • Booleans: 1/0 or true/false
  • NULL values: Leave blank

Example CSV

DepartmentID,DepartmentName,DepartmentCode,ManagerEmployeeID,Budget,CreatedDate,ModifiedDate
1,Information Technology,IT,1001,500000.00,2024-01-01 09:00:00,2024-01-01 09:00:00
2,Human Resources,HR,,250000.00,2024-01-01 09:00:00,2024-01-01 09:00:00

πŸ” Example Output

Input CSV:

EmployeeID,FirstName,LastName,Email,HireDate,EmployeeStatus
1001,John,Doe,jdoe@example.com,2020-01-15,Active

Output RDF (Turtle):

<http://example.org/hr/employee/1001> a hr:Employee ;
    hr:employeeId 1001 ;
    hr:firstName "John" ;
    hr:lastName "Doe" ;
    hr:email "jdoe@example.com" ;
    hr:hireDate "2020-01-15"^^xsd:date ;
    hr:employeeStatus "Active" .

🎯 Use Cases

1. Data Integration

Convert SQL data to RDF for integration with semantic web applications

2. Knowledge Graph Construction

Build an enterprise knowledge graph from relational data

3. SPARQL Querying

Enable powerful graph queries over your HR data

4. Linked Data Publication

Publish HR data as linked data with proper semantics

5. Data Validation

Use SHACL to validate data quality and business rules

πŸ“š Documentation

Quick References

  • TARQL_QUICK_REFERENCE.md - Common patterns and examples
  • CONVERSION_GUIDE.md - Detailed step-by-step guide
  • hr_shacl_usage_guide.md - SHACL schema usage

Key Concepts

  1. TARQL - Converts CSV to RDF using SPARQL CONSTRUCT
  2. SHACL - Validates RDF structure and constraints
  3. IRI Construction - Unique identifiers for all entities
  4. Foreign Keys - Mapped to RDF object properties
  5. Type Casting - SQL types β†’ XSD datatypes

πŸ› οΈ Prerequisites

Required

  • TARQL 1.2+ - CSV to RDF converter
    wget https://github.com/tarql/tarql/releases/download/v1.2/tarql-1.2.tar.gz

Optional

  • pyshacl - Python SHACL validator

    pip install pyshacl
  • Apache Jena - Java-based RDF tools

    wget https://dlcdn.apache.org/jena/binaries/apache-jena-4.10.0.tar.gz

πŸ“– Step-by-Step Workflow

Phase 1: Preparation

  1. Export SQL Server tables to CSV files
  2. Verify CSV format (headers, encoding, delimiters)
  3. Place CSV files in csv/ directory

Phase 2: Conversion

  1. Review TARQL queries in tarql/ directory
  2. Customize IRI base if needed (default: http://example.org/hr/)
  3. Run ./convert_all.sh
  4. Check output/ directory for generated Turtle files

Phase 3: Validation

  1. Validate individual table files
  2. Validate combined output
  3. Check for constraint violations
  4. Fix any data quality issues

Phase 4: Deployment

  1. Load RDF into triple store (Fuseki, GraphDB, Stardog)
  2. Set up SPARQL endpoint
  3. Create sample queries
  4. Enable RDFS/OWL inference if desired

πŸ”§ Customization

Change IRI Base

Edit TARQL queries to use your domain:

# Before
BIND(URI(CONCAT("http://example.org/hr/employee/", STR(?EmployeeID))) AS ?empIri)

# After
BIND(URI(CONCAT("https://mycompany.com/hr/employee/", STR(?EmployeeID))) AS ?empIri)

Add Custom Properties

  1. Add property to SHACL schema
  2. Add BIND clause to TARQL query
  3. Map to appropriate CSV column

Modify Data Transformations

TARQL queries can include data transformations:

# Uppercase department codes
BIND(UCASE(?DepartmentCode) AS ?deptCode)

# Concatenate name fields
BIND(CONCAT(?FirstName, " ", ?LastName) AS ?fullName)

# Calculate derived values
BIND(?MaxSalary - ?MinSalary AS ?salaryRange)

πŸ› Troubleshooting

Common Issues

Empty output files

  • Check CSV headers match query variables
  • Verify file encoding is UTF-8
  • Check for BOM (Byte Order Mark) in CSV

Invalid IRIs

  • Ensure no spaces in ID columns
  • Check for special characters
  • Verify IDs are not empty

Type casting errors

  • Use ISO date format: YYYY-MM-DD
  • Remove currency symbols from numbers
  • Use 1/0 or true/false for booleans

Missing triples

  • Check optional field handling
  • Verify foreign key references exist
  • Look for empty vs. NULL values

Validation Failures

# Get detailed error report
pyshacl -s hr_database_shacl.ttl \
        -d output/hr_database_complete.ttl \
        -f human \
        --debug > validation_errors.txt

Common violations:

  • sh:minCount - Missing required properties
  • sh:datatype - Incorrect data types
  • sh:pattern - Format validation failures
  • sh:class - Invalid foreign key references

πŸ“Š Performance

Benchmarks

  • 10K employees: ~2 seconds
  • 50K employees: ~8 seconds
  • 100K employees: ~15 seconds
  • 1M employees: ~2.5 minutes

Optimization Tips

  1. Process tables in parallel
  2. Use TARQL's --ntriples for large datasets
  3. Validate incrementally
  4. Use SSD for CSV files and output

🀝 Support

Resources

Getting Help

  1. Check CONVERSION_GUIDE.md for detailed instructions
  2. Review TARQL_QUICK_REFERENCE.md for patterns
  3. Examine sample CSV files for format examples
  4. Test with small datasets first

πŸ“ License

This conversion package is provided as-is for use with your HR database conversion project.

🎯 Next Steps

After successful conversion:

  1. Explore with SPARQL

    SELECT ?emp ?name WHERE {
      ?emp a hr:Employee ;
           hr:firstName ?name .
    }
    LIMIT 10
  2. Create Visualizations

    • Use GraphDB's Visual Graph
    • Build custom dashboards
    • Generate org charts
  3. Add Inference Rules

    • Define derivations
    • Add business logic
    • Create computed properties
  4. Integrate Systems

    • Connect to reporting tools
    • Expose SPARQL endpoint
    • Enable federated queries

πŸ“ž Contact

For questions about this conversion package, refer to the included documentation files or review the SHACL schema for data model details.


Version: 1.0
Created: 2026-02-19
SHACL Version: 1.2
TARQL Version: 1.2+

About

A set of open source routines for converting between different data representations using the W3C SHACL specification and AI generation.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published