Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 0 additions & 25 deletions .github/workflows/TestingCI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,28 +20,3 @@ jobs:
run: cargo test --release --verbose
- name: Run fmt check
run: cargo fmt --all -- --check

macos-homebrew:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- uses: sfackler/actions/rustup@master
- run: echo "version=$(rustc --version)" >> $GITHUB_OUTPUT
id: rust-version
- uses: actions/cache@v4
with:
path: ~/.cargo/registry/index
key: index-${{ runner.os }}-${{ github.run_number }}
restore-keys: |
index-${{ runner.os }}-
- run: cargo generate-lockfile
- uses: actions/cache@v4
with:
path: ~/.cargo/registry/cache
key: registry-${{ runner.os }}-${{ steps.rust-version.outputs.version }}-${{ hashFiles('Cargo.lock') }}
- name: Fetch
run: cargo fetch
- name: Build
run: cargo build --release --verbose
- name: Run tests
run: cargo test --release --verbose
66 changes: 66 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Build and Test Commands

```bash
# Build
cargo build --release

# Run all tests
cargo test

# Run a single test by name
cargo test test_name

# Run tests in a specific module
cargo test arithmetic::
cargo test control::
cargo test strings::

# Compile a BASIC program
cargo run -- program.bas # Output: ./program
cargo run -- program.bas -o out # Custom output name
cargo run -- -S program.bas # Emit assembly only (no linking)
```

## Architecture

xbasic64 is a BASIC-to-x86_64 native code compiler with a direct AST-to-assembly pipeline (no IR):

```
Source → Lexer → Parser → CodeGen → Assembly → Executable
(tokens) (AST) (x86-64)
```

### Source Files (`src/`)

- **lexer.rs** - Tokenizer handling case-insensitive keywords, line numbers, type suffixes (`%`, `&`, `!`, `#`, `$`), and BASIC literals
- **parser.rs** - Recursive descent parser producing an AST; handles expression precedence via Pratt parsing
- **codegen.rs** - Direct AST-to-x86-64 assembly translation using System V AMD64 ABI
- **runtime.rs** - Hand-written x86-64 assembly runtime library (I/O, strings, math) using libc
- **main.rs** - CLI driver: reads source, runs pipeline, shells out to `as` and `cc` for linking

### Test Structure (`tests/`)

Integration tests organized by feature area:
- `common/mod.rs` - Test harness with `compile_and_run()` helper that compiles BASIC source and captures output
- Feature modules: `arithmetic/`, `arrays/`, `control/`, `data/`, `file_io/`, `input/`, `math/`, `print/`, `procedures/`, `strings/`, `types/`, `variables/`

### Key Design Decisions

- **No IR**: AST compiles directly to assembly for simplicity
- **System V AMD64 ABI**: Enables libc interoperability for I/O and math
- **GW-BASIC semantics**: Division (`/`) always returns Double; integer division uses `\`
- **Default type is Double**: Unsuffixed numeric variables are `#` (Double), not Single
- **Boolean -1/0**: Comparisons return -1 (true) or 0 (false) for bitwise compatibility

## Language Reference

See [LANGREF.md](LANGREF.md) for the supported BASIC dialect. Key points:
- Types: INTEGER (`%`), LONG (`&`), SINGLE (`!`), DOUBLE (`#`), STRING (`$`)
- Control flow: IF/THEN/ELSE, FOR/NEXT, WHILE/WEND, DO/LOOP, SELECT CASE, GOTO/GOSUB
- Procedures: SUB and FUNCTION with recursion (parameters are by-value only)
- File I/O: OPEN FOR INPUT/OUTPUT/APPEND, PRINT #, INPUT #, LINE INPUT #, CLOSE
- String indexing is 1-based (MID$, INSTR); array indexing is 0-based
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Save as `fib.bas`, compile with `xbasic64 fib.bas`, and run `./fib`.
## Documentation

- **[Language Reference](LANGREF.md)** - Complete guide to the supported BASIC dialect
- **[Design Specification](basic_compiler_design_spec.md)** - Internal compiler architecture and design decisions
- **[Design Specification](design.md)** - Internal compiler architecture and design decisions

## Architecture

Expand Down
File renamed without changes.
49 changes: 38 additions & 11 deletions src/codegen.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@
//! This allows efficient substring operations without copying.
//!
//! - String values: `rax` = pointer to characters, `rdx` = length
//! - String variables: Two consecutive 8-byte slots at `[rbp - offset]` (ptr) and
//! `[rbp - offset - 8]` (len), but we use `[rbp - offset]` as the base offset
//! and the runtime expects ptr in first position when loading
//! - String variables: Two consecutive 8-byte slots at `[rbp + offset]` (ptr) and
//! `[rbp + offset - 8]` (len), where offset is negative (e.g., -8, -16).
//! The ptr is at higher address, len at lower address (stack grows downward).
//!
//! String literals are emitted in the `.data` section with labels `_str_N`.
//!
Expand Down Expand Up @@ -558,8 +558,9 @@ impl CodeGen {
self.emit(" push rbp");
self.emit(" mov rbp, rsp");

// Reserve stack space FIRST (before storing params)
self.emit(" sub rsp, 64 # local vars");
// Reserve stack space (will patch later with actual size)
let placeholder = format!(" sub rsp, 0 # STACK_RESERVE_PROC_{}", name);
self.emit(&placeholder);

// Parameters are passed in registers (System V ABI)
// Store them in the reserved stack space
Expand Down Expand Up @@ -600,20 +601,46 @@ impl CodeGen {
self.gen_stmt(stmt);
}

// Return
// Return - load return value into appropriate register based on type
if is_function {
let ret_info = &self.proc_vars[name];
// For now, return all values via xmm0 (will be type-aware later)
self.emit(&format!(
" movsd xmm0, QWORD PTR [rbp + {}]",
ret_info.offset
));
let offset = ret_info.offset;
let data_type = ret_info.data_type;
match data_type {
DataType::Integer => {
self.emit(&format!(" movsx eax, WORD PTR [rbp + {}]", offset));
}
DataType::Long => {
self.emit(&format!(" mov eax, DWORD PTR [rbp + {}]", offset));
}
DataType::Single => {
self.emit(&format!(" movss xmm0, DWORD PTR [rbp + {}]", offset));
}
DataType::Double => {
self.emit(&format!(" movsd xmm0, QWORD PTR [rbp + {}]", offset));
}
DataType::String => {
// Load string (ptr, len) into rax, rdx
self.emit(&format!(" mov rax, QWORD PTR [rbp + {}]", offset));
self.emit(&format!(" mov rdx, QWORD PTR [rbp + {}]", offset - 8));
}
}
}

self.emit(" leave");
self.emit(" ret");
self.emit("");

// Patch the stack reserve placeholder with actual size
let stack_needed = -self.stack_offset;
let stack_size = (stack_needed + 15) & !15; // Round up to multiple of 16
let old_placeholder = format!(" sub rsp, 0 # STACK_RESERVE_PROC_{}", name);
let new_instruction = format!(
" sub rsp, {} # STACK_RESERVE_PROC_{}",
stack_size, name
);
self.output = self.output.replace(&old_placeholder, &new_instruction);

self.current_proc = None;
self.stack_offset = old_stack_offset;
}
Expand Down
20 changes: 15 additions & 5 deletions src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
// SPDX-License-Identifier: MIT

use crate::lexer::Token;
use std::collections::HashSet;

/// Binary operator precedence levels (higher = tighter binding)
/// Returns (precedence, BinaryOp) or None if not a binary operator
Expand Down Expand Up @@ -171,7 +172,6 @@ pub enum GotoTarget {
pub enum Expr {
Literal(Literal),
Variable(String),
#[allow(dead_code)] // Part of AST, will be used when multi-dimensional arrays are implemented
ArrayAccess {
name: String,
indices: Vec<Expr>,
Expand Down Expand Up @@ -266,6 +266,8 @@ pub struct Parser {
last_loop_is_until: bool,
/// Stores condition from ELSEIF for nested IF construction
last_elseif_condition: Option<Expr>,
/// Tracks declared array names for distinguishing array access from function calls
declared_arrays: HashSet<String>,
}

impl Parser {
Expand Down Expand Up @@ -1004,6 +1006,9 @@ impl Parser {
let dimensions = self.parse_expr_list()?;
self.expect(Token::RParen)?;

// Track this array name for later use in parse_primary
self.declared_arrays.insert(name.to_uppercase());

arrays.push(ArrayDecl { name, dimensions });

if matches!(self.peek(), Token::Comma) {
Expand Down Expand Up @@ -1292,10 +1297,15 @@ impl Parser {
let args = self.parse_expr_list()?;
self.expect(Token::RParen)?;

// Could be array access or function call
// We'll treat everything as function call for now
// and distinguish during codegen based on known functions
Ok(Expr::FnCall { name, args })
// Distinguish array access from function call based on DIM declarations
if self.declared_arrays.contains(&name.to_uppercase()) {
Ok(Expr::ArrayAccess {
name,
indices: args,
})
} else {
Ok(Expr::FnCall { name, args })
}
} else {
Ok(Expr::Variable(name))
}
Expand Down
13 changes: 8 additions & 5 deletions src/runtime/string.s
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ _rt_instr:
push r13
push r14
push r15
sub rsp, 8 # Align stack for calls (6 pushes = 48 bytes, need +8 for 16-byte alignment)
# Move arguments to callee-saved registers
mov r12, rdi # haystack ptr
mov r13, rsi # haystack len
Expand Down Expand Up @@ -286,7 +287,8 @@ _rt_instr:
.Linstr_not_found:
xor rax, rax # return 0
.Linstr_done:
# Restore callee-saved registers
# Restore stack and callee-saved registers
add rsp, 8 # Restore stack alignment
pop r15
pop r14
pop r13
Expand Down Expand Up @@ -328,6 +330,7 @@ _rt_strcat:
push r13
push r14
push r15
sub rsp, 16 # Allocate aligned space for temp storage

# Save arguments in callee-saved registers
mov r12, rdi # left ptr
Expand All @@ -341,28 +344,28 @@ _rt_strcat:
call {libc}malloc # returns ptr in rax

# Copy left string: memcpy(result, left, left_len)
mov QWORD PTR [rsp], rax # save result ptr (aligned)
mov rdi, rax # dest = malloc result
mov rsi, r12 # src = left ptr
mov rdx, r13 # len = left len
push rax # save result ptr
call {libc}memcpy

# Copy right string: memcpy(result + left_len, right, right_len)
pop rdi # result ptr
push rdi # save again
mov rdi, QWORD PTR [rsp] # restore result ptr
add rdi, r13 # dest = result + left_len
mov rsi, r14 # src = right ptr
mov rdx, r15 # len = right len
call {libc}memcpy

# Null terminate (for safety)
pop rax # result ptr
mov rax, QWORD PTR [rsp] # restore result ptr
lea rcx, [r13 + r15] # total length
mov BYTE PTR [rax + rcx], 0

# Return: rax = ptr (already set), rdx = total length
mov rdx, rcx

add rsp, 16 # Deallocate temp storage
pop r15
pop r14
pop r13
Expand Down