Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: Test UnityDataTools

on:
workflow_dispatch:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
test:
strategy:
matrix:
os: [windows, macos]
arch: [x64, arm64]
exclude:
- os: windows
arch: arm64
- os: macos
arch: x64
fail-fast: false

runs-on: ${{ matrix.os }}-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: 9.0.x

- name: Restore dependencies
run: dotnet restore

- name: Build solution
run: dotnet build -c Release --no-restore

- name: Run UnityFileSystem.Tests
run: dotnet test UnityFileSystem.Tests/UnityFileSystem.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=UnityFileSystem.Tests.trx"

- name: Run Analyzer.Tests
run: dotnet test Analyzer.Tests/Analyzer.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=Analyzer.Tests.trx"

- name: Run UnityDataTool.Tests
run: dotnet test UnityDataTool.Tests/UnityDataTool.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=UnityDataTool.Tests.trx"

- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.os }}-${{ matrix.arch }}
path: |
**/TestResults/*.trx
retention-days: 30
236 changes: 236 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# AGENTS.md

This file provides guidance to AI agents when working with code in this repository.

## Project Overview

UnityDataTools is a .NET 9.0 command-line tool for analyzing Unity build output (AssetBundles, Player builds, Addressables). It extracts data from Unity's proprietary binary formats into SQLite databases and human-readable text files. The tool showcases the UnityFileSystemApi native library and serves as both a production tool and reference implementation.

## Common Commands

### Building
```bash
# Build entire solution in Release mode
dotnet build -c Release

# Build from solution file
dotnet build UnityDataTools.sln -c Release

# Build specific project
dotnet build UnityDataTool/UnityDataTool.csproj -c Release
```

Output location (Windows): `UnityDataTool\bin\Release\net9.0\UnityDataTool.exe`

### Publishing (Mac-specific)
```bash
# Intel Mac
dotnet publish UnityDataTool -c Release -r osx-x64 -p:PublishSingleFile=true -p:UseAppHost=true

# Apple Silicon Mac
dotnet publish UnityDataTool -c Release -r osx-arm64 -p:PublishSingleFile=true -p:UseAppHost=true
```

### Testing
```bash
# Run all tests
dotnet test

# Run tests for specific project
dotnet test UnityFileSystem.Tests/UnityFileSystem.Tests.csproj
dotnet test Analyzer.Tests/Analyzer.Tests.csproj
dotnet test UnityDataTool.Tests/UnityDataTool.Tests.csproj

# Run tests with filter
dotnet test --filter "FullyQualifiedName~SerializedFile"
```

Test projects: UnityFileSystem.Tests, Analyzer.Tests, UnityDataTool.Tests, TestCommon (helper library)

### Running the Tool
```bash
# Show all commands
UnityDataTool --help

# Analyze AssetBundles into SQLite database
UnityDataTool analyze /path/to/bundles -o database.db

# Dump binary file to text format
UnityDataTool dump /path/to/file.bundle -o /output/path

# Extract archive contents
UnityDataTool archive extract file.bundle -o contents/

# Find reference chains to an object
UnityDataTool find-refs database.db -n "ObjectName" -t "Texture2D"
```

## Architecture

### Component Hierarchy
```
UnityDataTool (CLI executable)
├── Analyzer → SQLite database generation
├── TextDumper → Human-readable text output
├── ReferenceFinder → Object reference chain tracing
└── UnityFileSystem → C# wrapper for native library
└── UnityFileSystemApi (native .dll/.dylib/.so)
```

### Key Architectural Patterns

**Native Interop**: UnityFileSystem wraps UnityFileSystemApi (native library from Unity Editor) via P/Invoke in `DllWrapper.cs`. The native library reads Unity Archive and SerializedFile formats.

**TypeTree Navigation**: Unity binary files contain TypeTrees that describe object serialization. The `RandomAccessReader` class navigates these trees like property accessors: `reader["m_Name"].GetValue<string>()`. This enables the tool to interpret objects without hardcoded type knowledge.

**Parser Pattern**: `ISQLiteFileParser` interface allows multiple parsers to handle different file formats:
- `SerializedFileParser` - Unity binary files (AssetBundles, Player data)
- `AddressablesBuildLayoutParser` - JSON build reports

**Handler Registry**: Type-specific handlers extract specialized properties for Unity object types. Handlers implement `ISQLiteHandler` and are registered in `SerializedFileSQLiteWriter.m_Handlers`:
- `MeshHandler` - vertices, indices, bones, blend shapes
- `Texture2DHandler` - width, height, format, mipmaps
- `ShaderHandler` - variants, keywords, subprograms
- `AudioClipHandler` - compression, channels, frequency
- `AnimationClipHandler` - legacy flag, events
- `AssetBundleHandler` - dependencies, preload data
- `PreloadDataHandler` - preloaded assets

**SQL Schema Resources**: Each handler has an embedded `.sql` resource file defining its tables and views (e.g., `Analyzer/SQLite/Resources/Mesh.sql`). Views join type-specific tables with the base `objects` table.

**Command Pattern**: SQL operations are encapsulated in classes derived from `AbstractCommand` with `CreateCommand()`, `SetValue()`, `ExecuteNonQuery()` methods.

### Data Flow (Analyze Command)

1. `Program.cs` → `HandleAnalyze()` → `AnalyzerTool.Analyze()`
2. AnalyzerTool finds files matching search pattern
3. For each file, parsers are tried in order (JSON first, then SerializedFile)
4. `SerializedFileParser.ProcessFile()`:
- Checks for Unity Archive signature → calls `MountArchive()`
- Otherwise treats as SerializedFile → calls `OpenSerializedFile()`
5. `SerializedFileSQLiteWriter.WriteSerializedFile()`:
- Iterates through `sf.Objects`
- Gets TypeTree via `sf.GetTypeTreeRoot(objectId)`
- Creates `RandomAccessReader` to navigate properties
- Looks up type-specific handler in `m_Handlers` dictionary
- Handler extracts specialized properties (e.g., MeshHandler reads vertex count)
- Writes to `objects` table + type-specific table (e.g., `meshes`)
- Optionally processes PPtrs (references) and calculates CRC32
6. SQLiteWriter finalizes database with indexes and views

### Important Files

**Entry Points**:
- `UnityDataTool/Program.cs` - CLI using System.CommandLine
- `UnityDataTool/Commands/` - Command handlers (Analyze.cs, Dump.cs, Archive.cs, FindReferences.cs)

**Core Libraries**:
- `UnityFileSystem/UnityFileSystem.cs` - Init(), MountArchive(), OpenSerializedFile()
- `UnityFileSystem/DllWrapper.cs` - P/Invoke bindings to native library
- `UnityFileSystem/SerializedFile.cs` - Represents binary data files
- `UnityFileSystem/RandomAccessReader.cs` - TypeTree property navigation

**Analyzer**:
- `Analyzer/AnalyzerTool.cs` - Main API entry point
- `Analyzer/SQLite/SQLiteWriter.cs` - Base class for database writers
- `Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs` - Handler registration
- `Analyzer/SQLite/Writers/AddressablesBuildLayoutSQLWriter.cs` - JSON report processing
- `Analyzer/SQLite/Handlers/` - Type-specific extractors
- `Analyzer/SerializedObjects/` - RandomAccessReader-based property readers
- `Analyzer/SQLite/Resources/` - SQL DDL schema files

**TextDumper**:
- `TextDumper/TextDumperTool.cs` - Converts binary to YAML-like text

**ReferenceFinder**:
- `ReferenceFinder/ReferenceFinderTool.cs` - Traces object dependency chains

## Extending the Tool

### Adding New Unity Type Support

1. Create handler class implementing `ISQLiteHandler`:
```
Analyzer/SQLite/Handlers/FooHandler.cs
```

2. Create reader class using RandomAccessReader:
```
Analyzer/SerializedObjects/Foo.cs
```

3. Register handler in `SerializedFileSQLiteWriter.cs`:
```csharp
m_Handlers["Foo"] = new FooHandler();
```

4. Create SQL schema resource:
```
Analyzer/SQLite/Resources/Foo.sql
```
Define tables (e.g., `foos`) and views (e.g., `foo_view` joining `objects` and `foos`)

5. Reference the schema in handler's GetResourceName() method

### Adding New File Format Support

1. Create parser implementing `ISQLiteFileParser`
2. Create writer derived from `SQLiteWriter`
3. Add parser to `AnalyzerTool.parsers` list
4. Create SQL schema and Command classes as needed

Example: Addressables support uses `AddressablesBuildLayoutParser` + `AddressablesBuildLayoutSQLWriter` to parse JSON build reports.

## Important Concepts

### TypeTrees
TypeTrees describe how Unity objects are serialized (property names, types, offsets). They enable:
- Backward compatibility - reading files from different Unity versions
- Generic parsing without hardcoded type definitions
- Support for custom MonoBehaviours/ScriptableObjects

**Critical**: Player builds exclude TypeTrees by default for performance. To analyze Player data, enable the "ForceAlwaysWriteTypeTrees" diagnostic switch during build.

### File Formats
- **Unity Archive** - Container format (AssetBundles, .data files). Can be mounted as virtual filesystem.
- **SerializedFile** - Binary format storing Unity objects with TypeTree metadata.
- **Addressables BuildLayout** - JSON build report (buildlogreport.json, AddressablesReport.json)

### Database Views
The SQLite output uses views extensively to join base `objects` table with type-specific tables:
- `object_view` - All objects with basic properties
- `mesh_view` - Objects + mesh-specific columns
- `texture_view` - Objects + texture-specific columns
- `shader_view` - Objects + shader-specific columns
- `view_breakdown_by_type` - Aggregated size by type
- `view_potential_duplicates` - Assets included multiple times
- `asset_view` - Explicitly assigned assets only
- `shader_keyword_ratios` - Keyword variant analysis

See `Analyzer/README.md` and `Documentation/addressables-build-reports.md` for complete database schema documentation.

### Common Issues

**TypeTree Errors**: "Invalid object id" during analyze means SerializedFile lacks TypeTrees. Enable ForceAlwaysWriteTypeTrees or use files built with TypeTrees.

**File Loading Warnings**: "Failed to load... File may be corrupted" is normal for non-Unity files in analyzed directories. Use `-p` search pattern to filter (e.g., `-p "*.bundle"`).

**SQL UNIQUE Constraint Errors**: Occurs when same SerializedFile name appears in multiple archives. This happens when analyzing multiple builds in same directory or using AssetBundle variants. See `Documentation/comparing-builds.md` for solutions.

**Mac Security**: "UnityFileSystemApi.dylib cannot be opened" - Open System Preferences → Security & Privacy and allow the library.

## Native Library (UnityFileSystemApi)

The native library is included for Windows, Mac, and Linux in `UnityFileSystem/` directory. It's backward compatible and reads data files from most Unity versions.

To use a specific Unity version's library:
1. Find library in Unity Editor installation: `{UnityEditor}/Data/Tools/`
2. Copy to `UnityDataTool/UnityFileSystem/`:
- Windows: `UnityFileSystemApi.dll`
- Mac: `UnityFileSystemApi.dylib`
- Linux: `UnityFileSystemApi.so`
3. Rebuild the tool

## Testing Data

UnityFileSystemTestData is a Unity project that generates test data for the test suites. TestCommon provides shared test utilities.
4 changes: 2 additions & 2 deletions Analyzer/Resources/AssetBundle.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ CREATE TABLE IF NOT EXISTS asset_dependencies(
dependency INTEGER
);

CREATE VIEW asset_view AS
CREATE VIEW IF NOT EXISTS asset_view AS
SELECT
a.name AS asset_name,
o.*
FROM assets a INNER JOIN object_view o ON o.id = a.object;

CREATE VIEW asset_dependencies_view AS
CREATE VIEW IF NOT EXISTS asset_dependencies_view AS
SELECT a.id, a.asset_name, a.asset_bundle, a.type, od.id dep_id, od.asset_bundle dep_asset_bundle, od.name dep_name, od.type dep_type
FROM asset_view a
INNER JOIN asset_dependencies d ON a.id = d.object
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AGENTS.md