This document describes the canonical v2 data model used by the code MCP server. The goal is that ANY other developer can quickly understand:
- what is indexed (semantically, into the vector DB),
- what is returned to the AI (structured JSON),
- which tool produces which type of result.
Indexing into the vector DB is done exclusively with the CodeChunk structure
(see internal/codetypes/types.go).
// CodeChunk is the canonical v2 format for indexing/search. It represents a
// semantically meaningful piece of code (usually a function, method, type or
// interface declaration) that is stored in vector search.
type CodeChunk struct {
// Symbol metadata
Type string // function | method | type | interface | file
Name string // Symbol name (or file base name for Type=file)
Package string // Package/module name
Language string // go | php | python | typescript etc
// Source location
FilePath string
URI string
StartLine int
EndLine int
// Selection range (for precise navigation to symbol name)
SelectionStartLine int
SelectionEndLine int
// Content
Signature string
Docstring string
Code string
// Extra metadata
Metadata map[string]any
}Principles:
CodeChunkis the only format written to the vector DB for code.Signature+Docstring+ (sometimes)Codeis the text that is embedded.Metadatacan contain language-specific information, for example:- Go:
type_info(serialized as JSON fromgolang.TypeInfo), - PHP/Laravel: model info (
eloquent_model), populated by the Laravel analyzer for Eloquent models (table, fillable, relationships, scopes, attributes, etc.).
- Go:
- Tools never read raw
Metadatadirectly. They only access it through a layer that builds the descriptors defined below.
All tools that support output_format: "json" must serialize one of the
structures defined in internal/codetypes/symbol_schema.go.
type SymbolLocation struct {
FilePath string `json:"file_path,omitempty"`
URI string `json:"uri,omitempty"`
StartLine int `json:"start_line,omitempty"`
EndLine int `json:"end_line,omitempty"`
}Used everywhere as location for precise navigation.
type ClassDescriptor struct {
Language string `json:"language"`
Kind string `json:"kind"` // class | interface | trait | struct | type | model
Name string `json:"name"`
Namespace string `json:"namespace,omitempty"`
Package string `json:"package,omitempty"`
FullName string `json:"full_name,omitempty"`
Signature string `json:"signature,omitempty"`
Description string `json:"description,omitempty"`
Location SymbolLocation `json:"location,omitempty"`
Fields []FieldDescriptor `json:"fields,omitempty"`
Methods []FunctionDescriptor `json:"methods,omitempty"`
Relations []RelationDescriptor `json:"relations,omitempty"`
// Data-model specific (ORM / Eloquent)
Table string `json:"table,omitempty"`
Fillable []string `json:"fillable,omitempty"`
Hidden []string `json:"hidden,omitempty"`
Visible []string `json:"visible,omitempty"`
Appends []string `json:"appends,omitempty"`
Casts map[string]string `json:"casts,omitempty"`
Scopes []string `json:"scopes,omitempty"`
Attributes []string `json:"attributes,omitempty"`
Tags []string `json:"tags,omitempty"`
Metadata map[string]any `json:"metadata,omitempty"`
}Used by:
find_type_definitionwithoutput_format: "json".- PHP: classes and Laravel models (User, Lawyer, etc.).
- Go: types (struct/interface), enriched with
FieldsandMethodsfromTypeInfowhen available.
type FunctionDescriptor struct {
Language string `json:"language"`
Kind string `json:"kind"` // function | method | scope | accessor | mutator | constructor
Name string `json:"name"`
Namespace string `json:"namespace,omitempty"`
Receiver string `json:"receiver,omitempty"`
Signature string `json:"signature,omitempty"`
Description string `json:"description,omitempty"`
Location SymbolLocation `json:"location,omitempty"`
Parameters []ParamDescriptor `json:"parameters,omitempty"`
Returns []ReturnDescriptor `json:"returns,omitempty"`
Visibility string `json:"visibility,omitempty"`
IsStatic bool `json:"is_static,omitempty"`
IsAbstract bool `json:"is_abstract,omitempty"`
IsFinal bool `json:"is_final,omitempty"`
Code string `json:"code,omitempty"`
Tags []string `json:"tags,omitempty"`
Metadata map[string]any `json:"metadata,omitempty"`
}Used by:
get_function_detailswithoutput_format: "json".- PHP: functions and methods, including:
visibility,is_static,is_abstract,is_final,parameters(with types from PHPDoc / type-hints),returns(including Eloquent types, e.g.BelongsToMany<App\\Role>),- Laravel-specific classification (
kind: "scope","accessor","mutator"for Eloquent special methods).
- Go: functions/methods in v2 minimal form (signature, description, code, location), extensible later with parameters/returns parsed from the AST.
- PHP: functions and methods, including:
type SymbolDescriptor struct {
Language string `json:"language"`
Kind string `json:"kind"` // class | interface | trait | function | method | constant | enum | file
Name string `json:"name"`
Namespace string `json:"namespace,omitempty"`
Package string `json:"package,omitempty"`
Signature string `json:"signature,omitempty"`
Description string `json:"description,omitempty"`
Location SymbolLocation `json:"location,omitempty"`
Tags []string `json:"tags,omitempty"`
Metadata map[string]any `json:"metadata,omitempty"`
}Used by:
list_package_exportswithoutput_format: "json"(Go + PHP).- Search-oriented tools (planned) to return compact hits.
- Standard input:
type_name(required),package/namespace(optional but recommended),output_format:"markdown"(default) or"json".
- Output:
markdown– human-friendly view, optimized for reading in a terminal.json– aClassDescriptorinstance.
- Standard input:
function_name(required),package/namespace,class_namefor PHP methods (implicitly derived from the chunk when possible),output_format.
- Output:
markdown– human-friendly view.json– aFunctionDescriptorinstance.
- Standard input:
package/namespace(required),symbol_type(filter; optional),output_format.
- Output:
markdown– structured list grouped by kind (function/type/class/etc.).json–[]SymbolDescriptor.
-
Semantic search (recall):
- operates on
CodeChunk+ embeddings, - tools like
hybrid_search/search_codeshould return:[]SymbolDescriptor+ small snippets of code.
- operates on
-
Structural / analytic (reasoning):
- operates on the already-selected chunk,
- for a specific symbol, the recommended tools are:
find_type_definition(json)→ClassDescriptor,get_function_details(json)→FunctionDescriptor,list_package_exports(json)→[]SymbolDescriptor.
This way, the AI uses very few tokens on raw code text and instead has a clear, standardized map of symbols via the schema above.