Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a lexer for untokenised BBC BASIC files #1280

Merged
merged 21 commits into from
Aug 2, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
23a6c68
Add a lexer for untokenised BBC BASIC files
bavison Jan 14, 2019
f559c65
[bbcbasic] only colour the leading * of inline CLI command as Generic…
bavison Jul 30, 2019
c3eefd0
[bbcbasic] use correct method for control keywords
bavison Jul 30, 2019
b8b3aeb
[bbcbasic] remove unnecessary escapes within character ranges
bavison Jul 30, 2019
96c5780
[bbcbasic] use dedicated token for binary numbers
bavison Jul 30, 2019
1629f21
[bbcbasic] deduplicate some rules between :root and :assembly2 using …
bavison Jul 31, 2019
8662327
[bbcbasic] where one keyword is a substring of another, list longer o…
bavison Jul 31, 2019
bae4a5f
[bbcbasic] exercise more rules in visual spec
bavison Jul 31, 2019
eb0390b
[bbcbasic] colour CLI command introducer as Keyword
bavison Jul 31, 2019
db27bd2
[bbcbasic] imperative ERROR keyword needs to be captured at higher pr…
bavison Jul 31, 2019
918e1ec
[bbcbasic] attempt to reduce indcidences of * operator matching CLI c…
bavison Jul 31, 2019
46171d4
[bbcbasic] simplify expression states
bavison Aug 1, 2019
a8861a1
[bbcbasic] fix `*` operators being misidentified as inline commands
bavison Aug 1, 2019
b030693
[bbcbasic] add `o` modifiers to all regexps
bavison Aug 1, 2019
40f57d3
[bbcbasic] stricter checking of control flow statements
bavison Aug 1, 2019
33d500f
[bbcbasic] further simplification
bavison Aug 1, 2019
50f3d0f
[bbcbasic] improvements to handling of `PROC`
bavison Aug 1, 2019
3d84b20
[bbcbasic] treat `FN` as a built-in function
bavison Aug 1, 2019
cce0cf4
[bbcbasic] use a different approach for detecting CLI commands
bavison Aug 1, 2019
62819fd
[bbcbasic] handle multiple or newlines leading up to a command
bavison Aug 1, 2019
d70c214
[bbcbasic] fix whitespace requirement after keyword
bavison Aug 1, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions lib/rouge/demos/bbcbasic
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
REM > DefaultFilename
REM Ordinary comment
FOR n=1 TO 10
PRINTTAB(n)"Hello there ";FNnumber(n)DIV3+1
NEXT:END
DEFFNnumber(x%)=ABS(x%-4)
115 changes: 115 additions & 0 deletions lib/rouge/lexers/bbcbasic.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# -*- coding: utf-8 -*- #
# frozen_string_literal: true

module Rouge
module Lexers
class BBCBASIC < RegexLexer
title "BBCBASIC"
desc "BBC BASIC syntax"
tag 'bbcbasic'
filenames '*,fd1'
pyrmont marked this conversation as resolved.
Show resolved Hide resolved

def self.punctuation
@punctuation ||= %w(
[:,;'~] SPC TAB
)
end

def self.function
@function ||= %w(
ABS ACS ADVAL ASC ASN ATN BEATS BEAT BGET# CHR\$ COS COUNT DEG EOF#
ERL ERR ERROR EVAL EXP EXT# GET GET\$ GET\$# HIMEM INKEY INKEY\$
INSTR INT LEFT\$ LEN LN LOG LOMEM MID\$ OPENIN OPENOUT OPENUP PAGE
POS PTR# RAD REPORT\$ RIGHT\$ RND SGN SIN SQR STR\$ STRING\$ SUM
SUMLEN TAN TEMPO TIME TIME\$ TOP USR VAL VPOS
)
end

def self.control
@control ||= %w(
CASE CHAIN ELSE ENDCASE ENDIF ENDPROC ENDWHILE END FN FOR GOSUB GOTO
IF INSTALL LIBRARY NEXT OF OTHERWISE OVERLAY PROC REPEAT RETURN STEP
STOP THEN TO UNTIL WHEN WHILE
)
end

def self.statement
@statement ||= %w(
BEATS BPUT# CALL CLEAR CLG CLOSE# CLS COLOR COLOUR DATA DIM ENVELOPE
GCOL LET MODE OFF ON ORIGIN OSCI PLOT PRINT PRINT# QUIT READ REPORT
SOUND STEREO SWAP SYS TINT VDU VOICE VOICES WAIT WIDTH
)
end

def self.operator
@operator ||= %w(
<< <= <> < >= >>> >> > [-!\$()*+\/=?^|] AND DIV EOR MOD NOT OR
)
end

def self.constant
@constant ||= %w(
FALSE TRUE
)
end

state :root do
rule %r/[ \n]+/, Text
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/[\[]/, Keyword, :assembly1
rule %r/\*.*/, Generic::Prompt # CLI command
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/REM *>.*/, Comment::Special
rule %r/REM.*/, Comment
rule %r/#{BBCBASIC.punctuation.join('|')}/, Punctuation
rule %r/#{BBCBASIC.function.join('|')}/, Name::Builtin # function or pseudo-variable
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/(?:DIM|POINT)(?=\()/, Name::Builtin # function sharing keyword with statement, distinguished by ()
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/(?:#{BBCBASIC.function.join('|')}|DEF *(?:FN|PROC)|ERROR(?: *EXT)?|ON(?: *ERROR *OFF| *ERROR *LOCAL| *ERROR))/, Keyword # control flow statement
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/(?:#{BBCBASIC.statement.join('|')}|CIRCLE(?: *FILL)?|DRAW(?: *BY)?|ELLIPSE(?: *FILL)?|FILL(?: *BY)?|INPUT(?:#| *LINE)?|LINE(?: *INPUT)?|LOCAL(?: *DATA| *ERROR)?|MOUSE(?: *COLOUR| *OFF| *ON| *RECTANGLE| *STEP| *TO)?|MOVE(?: *BY)?|POINT(?: *BY)?|RECTANGE(?: *FILL)?|RESTORE(?: *DATA| *ERROR)?|TRACE(?: *CLOSE| *ENDPROC| *OFF| *STEP(?: *FN| *ON| *PROC)?| *TO)?)/, Keyword # other statement
rule %r/#{BBCBASIC.operator.join('|')}/, Operator
rule %r/#{BBCBASIC.constant.join('|')}/, Name::Constant
rule %r/"[^"]*"/, Literal::String
rule %r/[a-z_`][\w`]*[\$%]?/i, Name::Variable
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/@%/, Name::Variable
rule %r/[\d.]+/, Literal::Number
rule %r/%[01]+/, Literal::Number # binary
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/&[\h]+/, Literal::Number::Hex
end

# Assembly statements are parsed as
# {label} {directive|opcode |']' {expressions}} {comment}
# Technically, you don't need whitespace between opcodes and arguments,
# but this is rare in uncrunched source and trying to enumerate all
# possible opcodes here is impractical so we colour it as though
# the whitespace is required. Opcodes and directives can only easily be
# distinguished from the symbols that make up expressions by looking at
# their position within the statement. Similarly, ']' is treated as a
# keyword at the start of a statement or as punctuation elsewhere. This
# requires a two-state state machine.

state :assembly1 do
rule %r/ +/, Text
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/]/, Keyword, :pop!
rule %r/[:\n]/, Punctuation
rule %r/\.[a-z_`][\w`]*%? */i, Name::Label
rule %r/(?:REM|;)[^:\n]*/, Comment
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/[^ :\n]+/, Keyword, :assembly2
end

state :assembly2 do
pyrmont marked this conversation as resolved.
Show resolved Hide resolved
rule %r/ +/, Text
rule %r/[:\n]/, Punctuation, :pop!
rule %r/(?:REM|;)[^:\n]*/, Comment, :pop!
rule %r/#{BBCBASIC.function.join('|')}/, Name::Builtin # function or pseudo-variable
rule %r/(?:DIM|POINT)(?=\()/, Name::Builtin # function sharing keyword with statement, distinguished by ()
rule %r/#{BBCBASIC.operator.join('|')}/, Operator
rule %r/#{BBCBASIC.constant.join('|')}/, Name::Constant
rule %r/"[^"]*"/, Literal::String
rule %r/[a-z_`][\w`]*[\$%]?/i, Name::Variable
rule %r/@%/, Name::Variable
rule %r/[\d.]+/, Literal::Number
rule %r/%[01]+/, Literal::Number # binary
rule %r/&[\h]+/, Literal::Number::Hex
rule %r/[!#,@\[\]^{}]/, Punctuation
end
end
end
end
14 changes: 14 additions & 0 deletions spec/lexers/bbcbasic_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*- #
# frozen_string_literal: true

describe Rouge::Lexers::BBCBASIC do
let(:subject) { Rouge::Lexers::BBCBASIC.new }

describe 'guessing' do
include Support::Guessing

it 'guesses by filename' do
assert_guess :filename => 'foo,fd1'
end
end
end
19 changes: 19 additions & 0 deletions spec/visual/samples/bbcbasic
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
REM > DefaultFilename
PRINT:REM Ordinary comment: This is still a comment
*Cat
FOR n=1 TO 10
PRINTTAB(n)"Hello there ";FNnumber(n)DIV3+1'INKEY$(500)
NEXT
DIM code% 100
FOR opt=0 TO 3 STEP 3
P%=code%
[OPT opt
.label1%
.label2% LDR r0,[P%MOD2,#0]
MOV pc,r14:REM comments in assembly terminate at colon:EQUD -1
.label3;comment
ALIGN
.label4
]
END
DEFFNnumber(x%)=ABS(x%-4)
pyrmont marked this conversation as resolved.
Show resolved Hide resolved