A Go library for parsing study guides into structured Abstract Syntax Trees (ASTs). Currently supports multiple formats including college study guides, AP exams, certifications, and more.
The library provides a robust parsing pipeline but requires significant manual work from users to handle format conversions and pipeline orchestration.
- Multiple incompatible token formats: Users must manually convert between
lexer.LineInfo
,preparser.ParsedLineInfo
, and different token types - Heavy boilerplate: Users need to write conversion functions like
convertTokenType()
andconvertParsedValue()
- Complex pipeline management: Users must manually orchestrate lexer β scanner β parser flow
- Poor type safety: Heavy use of
interface{}
makes it hard to work with parsed values
// Current heavy lifting required
tokens := getScannerOutput()
var lines []preparser.ParsedLineInfo
for _, token := range tokens {
parsedLine := preparser.ParsedLineInfo{
Number: token.Number,
Text: token.Text,
Type: convertTokenType(token.Type),
ParsedValue: convertParsedValue(token.ParsedValue, token.Type),
}
lines = append(lines, parsedLine)
}
parser := parser.NewParser(lines)
ast, err := parser.Parse(parserType)
We've successfully implemented the simple API! Users can now parse study guides with just one function call instead of the previous 20+ lines of boilerplate.
New Simple Usage:
// Super simple - just one function call
ast, err := processor.ParseFile("study_guide.txt", config.NewMetadata("colleges"))
if err != nil {
log.Fatal(err)
}
// Or from strings
lines := []string{"Mathematics Study Guide", "Colleges: Virginia: ODU: MATH 101: Linear Equations", "1. What is x? - A variable"}
ast, err := processor.Parse(lines, config.NewMetadata("colleges"))
Available Functions:
processor.ParseFile(filename, metadata)
- Parse directly from a fileprocessor.Parse(lines, metadata)
- Parse from string slicesprocessor.Preparse(lines, metadata)
- Preprocess and tokenizeprocessor.Lex(lines, metadata)
- Lexical analysis only
Supported Parser Types:
config.NewMetadata("colleges")
- College study guidesconfig.NewMetadata("ap_exams")
- AP exam study guidesconfig.NewMetadata("certifications")
- Certification study guidesconfig.NewMetadata("dod")
- Department of Defense study guidesconfig.NewMetadata("entrance_exams")
- Entrance exam study guides
We're working to make this library even better. The remaining phases will:
// Super simple - just one function call
ast, err := processor.ParseFile("study_guide.txt", processor.Colleges)
if err != nil {
log.Fatal(err)
}
// Output as JSON
jsonData, _ := json.MarshalIndent(ast, "", " ")
fmt.Println(string(jsonData))
Create high-level functions that handle everything internally:
// In core/processor/processor.go
func ParseFile(filename string, parserType ParserType) (*AbstractSyntaxTree, error)
func ParseLines(lines []string, parserType ParserType) (*AbstractSyntaxTree, error)
func ParseTokens(tokens []Token, parserType ParserType) (*AbstractSyntaxTree, error)
Benefits:
- 90% reduction in boilerplate code
- Single function call for most use cases
- Automatic internal conversions
Create a unified token format that works across all packages:
// In core/types/token.go
type Token struct {
Number int `json:"number"`
Text string `json:"text"`
Type TokenType `json:"type"`
ParsedValue ParsedValue `json:"parsed_value,omitempty"`
}
type TokenType string
const (
TokenTypeFileHeader TokenType = "file_header"
TokenTypeHeader TokenType = "header"
TokenTypeQuestion TokenType = "question"
TokenTypeContent TokenType = "content"
// ... etc
)
Benefits:
- No more manual token type conversions
- Consistent format across all packages
- Better JSON serialization
Replace interface{}
with concrete types:
type ParsedValue struct {
FileHeader *FileHeaderResult `json:"file_header,omitempty"`
Header *HeaderResult `json:"header,omitempty"`
Question *QuestionResult `json:"question,omitempty"`
Comment *CommentResult `json:"comment,omitempty"`
Passage *PassageResult `json:"passage,omitempty"`
LearnMore *LearnMoreResult `json:"learn_more,omitempty"`
Content *ContentResult `json:"content,omitempty"`
Empty *EmptyLineResult `json:"empty,omitempty"`
Binary *BinaryResult `json:"binary,omitempty"`
}
Benefits:
- Better type safety
- Easier to work with parsed values
- Better IDE support and autocomplete
- Builder pattern for advanced configuration
- Better error handling with context and line numbers
- CLI tool for testing and validation
- JSON helpers for easy serialization
type ParserType string
const (
Colleges ParserType = "colleges"
APExams ParserType = "ap_exams"
Certifications ParserType = "certifications"
DOD ParserType = "dod"
EntranceExams ParserType = "entrance_exams"
)
Study Guide: Mathematics
Subject: Algebra
Topic: Linear Equations
1. What is a linear equation?
A linear equation is an equation where the highest power of the variable is 1.
2. How do you solve 2x + 3 = 7?
Subtract 3 from both sides: 2x = 4
Divide both sides by 2: x = 2
Learn More: See Khan Academy's linear equations course.
core/
βββ lexer/ # Token classification
βββ preparser/ # Token parsing and value extraction
βββ parser/ # AST construction
βββ cleanstring/ # Text cleaning utilities
βββ constants/ # Shared constants
βββ regexes/ # Regular expression patterns
βββ utils/ # Utility functions
- Go 1.20 or higher
go build ./...
go test ./...
For easy testing and development, a web server is included:
# Option 1: Using make
make server
# Option 2: Using the script
./scripts/dev-server.sh
# Option 3: Direct command
go run cmd/server/main.go
The server will start on http://localhost:8000
and provides:
- Web interface for testing parsing
- Pre-loaded examples for each parser type
- Real-time parsing results
- API endpoints for programmatic access
# Process a study guide file
go run examples/basic_usage.go input.txt
# Validate a file
go run examples/validate.go input.txt
We welcome contributions! Please see our Contributing Guidelines for details.
- Phase 1: Implement simple API functions
- Phase 2: Unify token types across packages
- Phase 3: Improve type safety
- Phase 4: Add convenience features
[Add your license information here]
For questions, issues, or contributions, please:
- Check the Issues page
- Create a new issue with a clear description
- Include example input and expected output
Note: This library is actively being improved to provide a super simple API. The current version requires manual pipeline management, but future versions will provide one-line parsing capabilities.