Skip to content

Foundation: Storage + Config System #52

@prosdev

Description

@prosdev

🎯 Overview

Implement centralized storage for repository indexes and project-level configuration system. This foundation enables clean project directories, cross-repository search, and per-project adapter configuration.

Part of Epic: #31

💾 Storage Strategy: Centralized Indexes

Overview

All repository indexes are stored globally in ~/.dev-agent/indexes/ rather than per-project. This enables:

  • ✅ Clean project directories (no large files to gitignore)
  • ✅ Cross-repository search and analysis
  • ✅ Shared indexes across clones
  • ✅ Easy storage management and cleanup
  • ✅ Survives project moves/deletions

Directory Structure

~/.dev-agent/
  indexes/
    a1b2c3d4/              # Frontend (git remote hash)
      vectors.lance        # Code vectors
      github-state.json    # GitHub cache
      metadata.json        # Repository metadata
    e5f6g7h8/              # Backend
      vectors.lance
      github-state.json
      metadata.json
  
  cache/
    embeddings-model/      # Shared ML model (~100MB)
    github-api/            # GitHub API response cache
  
  config/
    global.json            # Global settings (optional)

Storage Location Algorithm

function getStoragePath(repositoryPath: string): string {
  // 1. Try git remote (stable across clones)
  const gitRemote = await getGitRemote(repositoryPath);
  
  if (gitRemote) {
    // Normalize: git@github.com:company/repo.git → company/repo
    const normalized = normalizeGitRemote(gitRemote);
    const hash = crypto.createHash('md5')
      .update(normalized)
      .digest('hex')
      .slice(0, 8);
    
    return path.join(os.homedir(), '.dev-agent/indexes', hash);
  }
  
  // 2. Fallback: absolute path hash (for non-git repos)
  const pathHash = crypto.createHash('md5')
    .update(path.resolve(repositoryPath))
    .digest('hex')
    .slice(0, 8);
  
  return path.join(os.homedir(), '.dev-agent/indexes', pathHash);
}

Metadata File

Each index includes metadata for identification:

// ~/.dev-agent/indexes/a1b2c3d4/metadata.json
{
  "version": "1.0",
  "repository": {
    "path": "/Users/you/workspace/frontend",
    "remote": "git@github.com:company/frontend.git",
    "branch": "main",
    "lastCommit": "abc123..."
  },
  "indexed": {
    "timestamp": "2025-11-25T12:00:00Z",
    "files": 243,
    "components": 1847,
    "size": 52428800
  },
  "config": {
    "languages": ["typescript", "javascript"],
    "excludePatterns": ["**/node_modules/**"]
  }
}

🔧 Configuration System

Config File: .dev-agent/config.json

{
  "version": "1.0",
  
  "repository": {
    "path": ".",
    "excludePatterns": ["**/node_modules/**", "**/dist/**"],
    "languages": ["typescript", "javascript"]
  },
  
  "mcp": {
    "adapters": {
      "search": { "enabled": true },
      "github": { "enabled": true },
      "plan": { "enabled": true },
      "explore": { "enabled": true },
      "status": { "enabled": false }
    }
  }
}

Config Schema

interface DevAgentConfig {
  version: string;
  repository: {
    path?: string;
    excludePatterns?: string[];
    languages?: string[];
  };
  mcp?: {
    adapters?: Record<string, AdapterConfig>;
  };
}

interface AdapterConfig {
  enabled: boolean;
  source?: string;  // For custom adapters
  settings?: Record<string, string | number | boolean>;
}

Environment Variable Templating

Support ${VAR_NAME} syntax in config:

{
  "mcp": {
    "adapters": {
      "jira": {
        "settings": {
          "apiKey": "${JIRA_API_KEY}"
        }
      }
    }
  }
}

🧠 Memory Management

Lazy Loading

class MCPServer {
  private indexer?: RepositoryIndexer;
  private lastAccessed = Date.now();
  private readonly IDLE_TIMEOUT = 5 * 60 * 1000; // 5 minutes
  
  async ensureIndexer(): Promise<RepositoryIndexer> {
    if (!this.indexer) {
      // Lazy load on first use
      const storagePath = getStoragePath(this.repositoryPath);
      
      this.indexer = new RepositoryIndexer({
        repositoryPath: this.repositoryPath,
        vectorStorePath: path.join(storagePath, 'vectors.lance'),
      });
      
      await this.indexer.initialize();
      this.logger.info('Loaded indexes', { storagePath });
    }
    
    this.lastAccessed = Date.now();
    return this.indexer;
  }
  
  // Auto-unload after idle period
  startIdleMonitor() {
    setInterval(() => {
      const idleTime = Date.now() - this.lastAccessed;
      
      if (idleTime > this.IDLE_TIMEOUT && this.indexer) {
        this.indexer.close();
        this.indexer = undefined;
        this.logger.info('Unloaded indexes (idle timeout)', {
          idleMinutes: Math.floor(idleTime / 60000)
        });
      }
    }, 60000); // Check every minute
  }
}

📋 Implementation Tasks

Storage System

  • Implement getStoragePath() function (git remote → hash)
  • Create storage directory structure on first use
  • Implement metadata.json creation/updates
  • Update RepositoryIndexer to accept storage path parameter
  • Implement lazy loading in MCP server
  • Add idle timeout and auto-unload mechanism
  • Handle storage path resolution errors gracefully

Configuration System

  • Define DevAgentConfig TypeScript interface
  • Implement loadConfig() function with validation
  • Add environment variable templating (${VAR_NAME})
  • Create config file template/defaults
  • Update packages/cli/src/utils/config.ts to use new schema
  • Add config validation with helpful error messages
  • Support config file merging (defaults + user config)

Migration Path

  • Detect existing project-local indexes
  • Implement dev-agent storage migrate command
  • Move indexes to centralized location
  • Update configs to reference new storage paths
  • Clean up old local indexes (with confirmation)

✅ Acceptance Criteria

  • Config loads from .dev-agent/config.json with validation
  • Indexes stored in ~/.dev-agent/indexes/{hash}/ based on git remote
  • Metadata.json created/updated for each index
  • Lazy loading works - indexer only loads on first tool call
  • Auto-unload works - indexer unloads after 5 minutes idle
  • Environment variables resolved in config (${VAR_NAME})
  • Migration command successfully moves existing indexes
  • Storage path falls back to path hash for non-git repos
  • Config validation provides helpful error messages

🧪 Testing

  • Unit tests for getStoragePath() (git remote, fallback)
  • Unit tests for config loading/validation
  • Unit tests for environment variable templating
  • Integration test for lazy loading
  • Integration test for auto-unload
  • Integration test for migration path

🔗 Dependencies

Estimate: 1-1.5 days
Priority: High (foundation for other sub-issues)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions