Breeze
- Syntax

Breeze

Breeze is an Arduino library that provides a Command Script Interpreter. It is currently a work in progress and has not been released yet.

Syntax

The syntax of this command language takes inspiration from BASIC and Tcl.

I looked at many other popular languages and styles, but had to reject them because the syntax would take a lot of effort to parse and understand (using lots of EEPROM program space), or because a large amount of data would need to be stored while the processor was trying to understand what the whole command means.

One command per line

The first rule I made was that commands cannot be continued over multiple lines of code. So each command line must be complete in itself.

Although many people like the style of Java, JavaScript, C, C++, etc. These languages use {braces} to group lines of code together into blocks. These blocks can be nested inside each other. This means that a lot of transient memory has to be allocated to keep track of where we are up to in the structure, so we know what to do next.

In contrast, if we can fully understand one line, then discard any temporary data before executing the command, more precious RAM is available to the rest of the application.

Line numbers

The next idea was taken from BASIC. Commands that are to be stored for later must start with a line number. Commands that do not start with a line number are executed immediately.

This makes it easy to edit stored procedures over a command line interface. Either the USB Serial Monitor provided by the Arduino IDE, or a remote Telnet session over an Ethernet or WiFi connection.

Line numbers can then be used in GOTO statements, rather than having to group lines together in {braces}.

if ( condition )
{
  code for when true ...
}
else
{
  code for when false ...
}

100 if ( not condition ) goto 200
110 code for when true ...
190 goto 300
200 code for when false ...
300 continue ...

Single Pass Parsing

One of the problems with some of the programming languages I looked at was that it's not immediately obvious what the context of a word is until some characters after that word.

variable = value;
function(arg1, arg2);

In the example above, it's not obvious that “variable” is the name of a variable, until the parser sees the equals sign. In the same way, it's the bracket that proves “function” is the name of a function. It is possible to search through a list of known variables or functions to see if we find a match, but it would be quicker if the syntax implied what type of word we're about to encounter before we had to process it.

If we make a small change, by adding the word “set” and an extra space…

set variable = value;
function (arg1, arg2);

We now know that the first word on the line (until the space) is the name of a command. This is a very simple syntax rule for humans and computers to understand.

What if we also removed any characters that are not really needed. That will save space when we want to store a program in memory.

set variable value
function arg1 arg2

This is now looking like a Tcl program. The first word on the line is the name of the command, and anything that follows are simply parameters to that command. When the command is executed, it will interpret the parameters according to it's own rules, and perform it's own validation.

This approach means the parser library can remain generic, while the application programmer can define their own commands to add to the lexicon. They just need to define their own C function in the Arduino IDE, and call the library to install the new command.

void funcGetTime(int argc, char** argv);
Breeze::addCommand("getTime", funcGetTime);

Data returned by functions

As we have now removed any decorative squiggles and semicolons to make things easier to read, maybe we need to assign a new meaning to them and start using them again! We'll have a look at Tcl again, and borrow a neat idea from there.

set now [ getTime ]

Using Tcl syntax, this would set the value of the variable “now” to be the result of calling the function “getTime”. Tcl does not have a built-in function called “getTime”, and neither does the Breeze library, but you could define one like we described in the previous section.

The square brackets can be nested, so we can build up quite powerful statements without introducing extra syntax rules.

set now [ getTime [ getLocalTimeZone ] ]

So this would (theoretically) call a function to obtain the name of the local time zone, pass that value as an argument to the getTime function, then store the result in the variable called “now”

Reading a variable

We've seen how to set the value of a variable, but how do we read it back again and use it in our program?

In many scripting languages, including Tcl, the value of a variable can be obtained using the dollar sign followed by the name of the variable.

set greeting hello
display $greeting

I do not plan on implementing that syntax, as there is already a way to obtain the value of a variable using the existing syntax rules. I want to keep the parser as small and fast as possible, so I don't want to add any features that are not essential. So I propose to simply provide a built-in command called “get” to obtain the value of variable.

set greeting hello
display [ get greeting ]

Quoted Strings

So far we've only seen how bare words are understood by the parser. Any number of spaces, or tab characters can be used to separate words, but what if we want to include spaces in parameters?

The Breeze library will support “quoted strings” in the same way they are implemented in most other programming languages. They may include some special characters prefixed by a slash, e.g. “\n” for newline, etc. The slash can also be used to include a quote within a string, e.g. “He said \”hello\“ to me”.

Catenation of Parameters

Square brackets have no special meaning within quoted strings (unlike Tcl). But any parameters that are not separated by at least one space will have their values joined together (like Tcl). This allows the data returned by functions to be included in a complex parameter, as well as unquoted words and numbers.

In this example there is only one parameter passed to the “display” command. This is a good example of what is possible, but a bad example of coding style.

set greeting hello
set quote "\""
display He" said "[ get quote ][ get greeting ][ get quote ]" to "me

Maths

I'm building a computer language here, but I've not said how we can use it to compute anything!

We could define a new command called “add” that takes two parameters, adds them together and returns the result. Then we could do something like this…

display [ add 5 7 ]

It doesn't look “normal” like other languages, it looks quite ugly, so I had a look around to see what would be the easiest thing to do to add better syntax, but re-using most of the code I have written so far.

After searching for a while, I found information about a small Tcl implementation called "Cricket". It uses [ and ] for command dispatch as in Tcl, but also ( and ) for “second argument dispatch”. Meaning that the second word is taken to be the command, rather than the first. This allows more natural looking syntax. It's even better if we rename our “add” function to be “+”.

display ( 5 + 7 )

This executes the command called “+” with “5” as the first parameter and “7” as the second. The same interface can be used to call native C functions that we use for square brackets. The called C function does not need to know if it has been triggered by [ ] or ( ).

If we want to add three values together, we will unfortunately need to call the “+” function twice.

display ( 5 + ( 7 + 8 ) )
display ( ( 5 + 7 ) + 8 )

The same syntax can be used for comparisons…

100 if ( [ get greeting ] == "hello" ) goto 300
200 if ( [ get greeting ] == "hello" ) [ goto 300 ]

The quotes around “hello” are optional, but putting them in makes the program look more natural.

The word “goto” on line 100 is accepted by “if” as its second parameter, but otherwise ignored.

On line 200 there is a coding error, but not a syntax error. The command “goto” will always executed, even if the condition is false. Can you guess why?

Named Sections of Code

I think it would be a good compromise to allow the “goto” command to support named places within the code, rather than just use line numbers all of the time. This way, sections of the code can be divided up in the same way that “functions” are defined in other languages. This would make it easier for the programmer to remember where sections of the code are, rather than having to remember line numbers. This would require a little more C++ coding on the Arduino, but would make the scripting language much easier to use. Each line must still have a line number, as they are used when editing the script via the command line interface.

I propose to use the hash character “#” as the “command” to define a label. In many programming languages, the hash character indicates a comment, and so it is not executed by the interpreter. Labels also contain no executable code, so this feels intuitive to programmers, and is understood by many syntax highlighters.

The first parameter will be the name of the label, and must be unique within the script. If there are no parameters, the command does nothing at all.

More parameters may be supplied, but will be ignored as comments, but beware of using [brackets] or (brackets) within the parameters as they will be executed as usual. It is recommended to enclose any comments in quotes.

If the first parameter is “#”, it does not need to be unique and it is ignored. This form can be used to continue the comment from the previous line, to describe this subroutine in more detail.

110 set count 10
210 if ( [ get count ] == 0 ) goto 310
220 display [ get count ]
230 set count ( [ get count ] - 1 )
240 goto 210
310 display "Blast Off!"

100 # CountDown "This is the entry point to this section of code"
105 # # "It counts down from 10 to 1, then signals 'Blast Off'"
110 set count 10
200 # CountDownLoop
210 if ( [ get count ] == 0 ) goto BlastOff
220 display [ get count ]
230 set count ( [ get count ] - 1 )
240 goto CountDownLoop
300 # BlastOff
310 display "Blast Off!"

Calling Subroutines

As one of the primary ideas of this script language is to make the interpreter as compact as possible without compromising too much on readability and usability, it would be good to have a way of re-using sections of script, rather than duplicating code.

Most languages have the concept of a subroutine. Some call it a function or a method or procedure. I have called it a subroutine in this documentation because I plan to implement it like subroutines in the BASIC language.

BASIC has the keywords “gosub” and “return”. The “gosub” command works like the “goto” command, but remembers where it came from, so that when a “return” command is encountered, execution continues with the line after the “gosub” command.

It is not possible to pass parameters to subroutines in BASIC using the “gosub” command, but you can define some variables before calling a subroutine, and that routine can return values by setting other variables. In this case it would be good to name the variables associated with the subroutine in a way that makes it obvious what they are related to, and also to prevent clashes with variables used by other subroutines.

100 set AskUser.question "Which country do you come from?"
110 gosub AskUser
120 set AskUser.question "Why have you come here from "[ get AskUser.answer ]"?"
130 gosub AskUser
140 set AskUser.question "Are you a spy?"
150 gosub AskUser
160 display "I don't believe you!"
170 end
200 # AskUser "Keep asking a question until we get an answer"
210 display [ get AskUser.question ]
220 input AskUser.answer
230 if ( [ get AskUser.answer ] == "" ) goto AskUser
240 return

Calling External Subroutines

Many programming languages have a way of calling sections of code that are not part of the current program. This allows greater re-use of code and makes it easier for groups of people to work together on a project. It also makes it easier to test sections of code in isolation, then include them as part of a larger project.

The BASIC language has a command called “chain”. This allows the current program to be replaced by loading the named program into memory, but without losing any of the current variables. Normally the “load” and “run” commands in BASIC would clear all variables.

So the variables don't “belong” to the current script, but they can be freely read and updated by it. They can be thought of like “environment variables” in MS-DOS, or Linux/Unix command shells, or “global variables” in most programming languages.

Rather than adding a new command like “chain” to my script language, I plan to extend the syntax of the “goto” and “gosub” commands. It would be easy to allow a second parameter to these commands that would specify the name of another script to execute. The first parameter being the line number or label name to jump to within the destination script. The “return” command would need a little extra code as well to enable it to switch back to the original script.

Switching between scripts in this way would be slow if the source code needed to be loaded into memory each time, but I will be using an AVL Tree to store the script (indexed by line number). Switching from one script to another simply requires the old file to be closed and the new one to be opened. There is no need to read large amounts of data into memory, as each line of the program is read only when it is needed.

Table of Contents