Gold-syntax Specification¶
This document defines the syntax and semantics of gold-syntax units for the jeff65 compiler.
To propose a change to this document, create a pull request from a branch with a
name beginning with spec-gold-, which contains the edits to this document
relevant to enact the change, and if possible, the necessary alterations to the
parser and/or lexer. (If those changes would be high-effort, open a bug once the
PR has been accepted.)
Contents
Syntax¶
Gold-syntax files consist of printable ASCII characters, which are divided into the following groups:
- whitespace, any of SPACE, TAB, LINE FEED, or CARRIAGE RETURN
- special delimiters, any of
( ) [ ] { } : ; . , " \ @ & - alphanumeric characters, any of
A-Za-z0-9. Note that the underscore is not included (though it is allowed in identifiers, but discouraged). - punctuation, any of
! $ % ' * + - / < = > ? ^ _ ` | ~(i.e. the remaining ASCII characters).
Gold-syntax tokens are delimited by whitespace, which is discarded, and the special delimiters, which are considered single tokens. Note that new line characters are not treated specially. Also note that comments are treated specially by the parser; see below.
A token beginning with a digit in 0-9 is assumed to be a numeric value. A
valid numeric value must be one of the following:
- A sequence of characters in
0-9, such as1,42, or094, denotes a decimal (base-10) integer. Note that numbers beginning with a0are considered to be decimal, not octal. - The characters
0xor0X(a digit zero followed by an uppercase or lowercase letter ‘X’) followed by a sequence of characters in0-9A-Fa-f, such as0x1,0xa9, or0xbeef, denotes a hexadecimal (base-16) integer. - The characters
0oor0O(a digit zero followed by an uppercase or lowercase letter ‘O’) followed by a sequence of characters in0-7, such as0o644, denotes an octal (base-8) integer. Note that numbers beginning with a0(zero) followed by digits are considered to be decimal, not octal. - The characters
0bor0B(a digit zero followed by an uppercase or lowercase letter ‘B’) followed by a sequence of characters in0 1, such as0b11010001, denotes a binary (base-2) integer.
A token beginning with a digit in 0-9 and not conforming to one of the above
rules is an error.
A token beginning with a letter in A-Za-z is assumed to be an identifier.
The restriction to a letter applies only to the first character; identifiers may
contain any printable ASCII character which does not delimit a token. For
example, foo-bar? is considered a valid identifier.
A token beginning with a punctuation character continues until the first
non-punctuation character, at which point it will end, regardless of whether a
whitespace character, comment, or special delimiter has been encountered.
Therefore -1 becomes two tokens.
Attention
The above rules imply that the string 1+2 is considered a
single, invalid, numeric token, and the string x-2 is
considered an identifier. Make sure to put spaces around your
operators!
Language Elements¶
A Gold-syntax unit consists of a series of top-level statements.
Binding Statements¶
The following statements introduce bindings of names. They are all allowed at the top level of the file, and a subset are allowed inside executable code blocks.
Top-level bindings are visible throughout the program. Block-level bindings are visible until the end of the scope in which they are introduced. Introducing a block-level binding with the same name as another binding, even in the same scope, will result in the previous binding being shadowed until the new binding goes out of scope, at which time the previous binding will be restored.
It is an error to introduce a new top-level binding with the same name as another top-level binding.
use¶
Top-level usage:
use <identifier>
Locates a unit named with the given identifier, and makes its exported symbols available in the current unit in a namespace bound to the same name as the unit. If the unit cannot be located, a compilation error will be raised.
The binding introduced by a use statement is not exported from the unit.
Units have names derived from the name of the file they are defined in, and thus
the allowed names for units may be further restricted by what characters are
allowed in filenames on your system. For maximum portability, stick to
alphanumeric characters and the character - (hyphen minus) in your unit
names.
constant¶
Top-level / block-level usage:
constant <identifier> : <type> = <known-expression>
Binds a name to a value known at compile time which does not allocate memory in the program image. The value will be inlined at usage sites. Top-level constant bindings are exported from the unit as symbols, and may be referenced in other units.
The restriction to values which do not allocate memory means that arrays and
strings cannot be declared as constant-bindings. It is possible to declare
pointers and slices as constants through the use of certain functions exported
from the built-in mem unit.
let¶
Top-level usage
let [mut] <identifier> : <type> = <known-expression>
Binds a name to a value known at compile time. Always allocates memory in the program image. Top-level let-bindings are exported from the unit as symbols which may be referenced in other units.
Block-level usage:
let [mut] <identifier> : <type> = <expression>
let stash <identifier> : <type> = <known-expression>
Binds a name to a value. In the first form, memory is allocated statically (i.e. memory is reserved, but the value is not included in the program image), and the value is computed and stored when the statement is executed. In the second form, memory is allocated in the program image with the initial value stored.
By default, let-bindings are immutable, though they may be shadowed by
re-binding. If the mut or stash storage classes are applied, then the
binding becomes mutable, and the value may be changed.
fun¶
Top-level usage:
fun <identifier>([<identifier> : <type> [, ...]]) [-> <type-expression>]
[...]
endfun
Binds a name to a function with zero or more arguments and an optional return type. Introduces a new scope, and statements inside are considered block-level statements.
A function with a return type must terminate by executing a return
statement.
Note that the type of the binding introduced is a function type. Function types
may only be used to call the function or get a pointer to its address using the
& operator.
isr¶
Top-level usage:
isr <identifier>
[...]
endisr
Binds a name to an interrupt service routine. Introduces a new scope, and statements inside are considered block-level statements.
Note that the type of the binding introduced is an ISR type. ISR types may only
be used to get a pointer to its address using the & operator.
Control Flow Statements¶
Control flow statements may only be used in block-level contexts. Additional restrictions may apply to individual statements, depending on context. Gold-syntax programs are executed statement-by-statement unless a control-flow statement is encountered.
return¶
Usage:
return [<expression>]
Terminates execution of the current function, returning control to the caller, and possibly returning a value. This will cause any currently-executing loops to terminate.
If the current function does not have a return type, then the expression is disallowed; if the current function does have a return type, then the expression is required, and must have a type assignable to the return type of the function.
May also be used inside an ISR, in which case the expression is always disallowed.
if¶
Usage:
if <expression> then
[...]
[elseif <expression> then
[...]]
[elseif...]
[else
[...]]
end
Causes at most one of the blocks provided to execute. Expressions are tested in
order, and the first expression to evaluate to true causes the corresponding
block to be executed. If none of the expressions evalute to true, the block
beginning with else is executed, if present. Once an expression which
evaluates to true is executed, the rest of the expressions will not be
evaluated.
Each branch introduces a new scope.
while¶
Usage:
while <expression> do
[...]
end
Introduces a loop which executes the provided block zero or more times. The
block is executed repeatedly until the expression evaluates to false, or the
loop is terminated.
The provided block introduces a new scope.
for¶
Usage:
for <identifier> : <type> in <expression> do
[...]
end
Evaluates the given expression once, which must be of type array or slice, then introduces a loop which executes the provided block once for each element of the value of the expression, with the provided identifier bound to the value of the element.
The provided block introduces a new scope.
Type Expressions¶
Primitive Types¶
Primitive types are provided for signed and unsigned integers for 8-bit, 16-bit, 24-bit, and 32-bit integers. They are written as follows:
u8 u16 u24 u32
i8 i16 i24 i32
(Types beginning with u are unsigned.) Primitive types are as wide as the
number of bits divided by eight.
Array Types¶
Array types are written as:
[<base>; <start> to <end>] /* first form */
[<base>; <end>] /* second form */
where <base> is another type, <start> and <end> are the lower and
upper bounds, respectively, where the lower bound is inclusive and upper bound
is exclusive. In the second form, <start> is implied to be 0.
The width of an array type is the width of <base> multiplied by the
difference between <end> and <start>. (For example, [u8; 3 to 7] is
four bytes wide.)
Pointer and Slice Types¶
Pointer types are constructed by prefixing a non-array type with a &, for
example, &u8 is a pointer to an 8-bit unsigned type. Pointers are always two
bytes wide.
Slice types take the form of &[<base>]. Slices have a built-in length, and
are always four bytes wide. Taking a pointer to an array produces a slice.
Value Expressions¶
Expressions are written infix, similar to ‘C’. Operations are resolved in the following order.
(<expr>)
Parenthesised expressions are resolved from innermost out. Whitespace is allowed but not required around parentheses.
<expr>[<expr>]
Indexes into an array. The expression on the left must resolve to an array or
slice type, and the expression on the right must resolve to a u8 or u16.
<fun>([<expr>[, ...]])
Calls a function. <fun> must be an expression which evaluates to a function
or function pointer. Expressions are evaluated and passed as arguments, and the
function expression resolves to the return value of the function.
&<expr>
@<expr>
Takes a pointer to a value, and dereferences a pointer, respectively.
bitnot <expr>
<expr> bitand <expr>
<expr> bitor <expr>
<expr> bitxor <expr>
Bitwise operations are provided for unsigned types. For dyadic operations, both sides must be the same width.
<expr> << <known-expr>
<expr> >> <known-expr>
Left-shift and right-shift operations, respectively. The right-hand side must be known at compile-time.
<expr> * <expr>
<expr> / <expr>
Multiplication and division of integer types.
<expr> + <expr>
<expr> - <expr>
Addition and subtraction of integer types.
<expr> == <expr>
<expr> != <expr>
<expr> <= <expr>
<expr> >= <expr>
<expr> < <expr>
<expr> > <expr>
Comparison operators. Evaluates to a boolean value.
<expr> = <expr>
Assignment operator. The left-hand side must be a bare name, a dereference, or an index expression.