Breeze

This is an old revision of the document!

Breeze is an Arduino library that provides a Command Script Interpreter. It is currently a work in progress and has not been released yet.

The syntax of this command language takes inspiration from BASIC and Tcl.

I looked at many other popular languages and styles, but had to reject them because the syntax would take a lot of effort to parse and understand (using lots of EEPROM program space), or because a large amount of data would need to be stored while the processor was trying to understand what the whole command means.

The first rule I made was that commands cannot be continued over multiple lines of code. So each command line must be complete in itself.

Although many people like the style of Java, JavaScript, C, C++, etc. These languages use {braces} to group lines of code together into blocks. These blocks can be nested inside each other. This means that a lot of transient memory has to be allocated to keep track of where we are up to in the structure, so we know what to do next.

In contrast, if we can fully understand one line, then discard any temporary data before executing the command, more precious RAM is available to the rest of the application.

The next idea was taken from BASIC. Commands that are to be stored for later must start with a line number. Commands that do not start with a line number are executed immediately.

This makes it easy to edit stored procedures over a command line interface. Either the USB Serial Monitor provided by the Arduino IDE, or a remote Telnet session over an Ethernet or WiFi connection.

Line numbers can then be used in GOTO statements, rather than having to group lines together in {braces}.

if ( condition )
{
  code for when true ...
}
else
{
  code for when false ...
}

100 if ( not condition ) goto 200
110 code for when true ...
190 goto 300
200 code for when false ...
300 continue ...

One of the problems with some of the programming languages I looked at was that it's not immediately obvious what the context of a word is until some characters after that word.

variable = value;
function(arg1, arg2);

In the example above, it's not obvious that “variable” is the name of a variable, until the parser sees the equals sign. In the same way, it's the bracket that proves “function” is the name of a function. It is possible to search through a list of known variables or functions to see if we find a match, but it would be quicker if the syntax implied what type of word we're about to encounter before we had to process it.

If we make a small change, by adding the word “set” and an extra space…

set variable = value;
function (arg1, arg2);

We now know that the first word on the line (until the space) is the name of a command. This is a very simple syntax rule for humans and computers to understand.

What if we also removed any characters that are not really needed. That will save space when we want to store a program in memory.

set variable value
function arg1 arg2

This is now looking like a Tcl program. The first word on the line is the name of the command, and anything that follows are simply parameters to that command. When the command is executed, it will interpret the parameters according to it's own rules, and perform it's own validation.

This approach means the parser library can remain generic, while the application programmer can define their own commands to add to the lexicon. They just need to define their own C function in the Arduino IDE, and call the library to install the new command.

void funcGetTime(int argc, char** argv);
Breeze::addCommand("getTime", funcGetTime);

As we have now removed any decorative squiggles and semicolons to make things easier to read, maybe we need to assign a new meaning to them and start using them again! We'll have a look at Tcl again, and borrow a neat idea from there.

set now [ getTime ]

Using Tcl syntax, this would set the value of the variable “now” to be the result of calling the function “getTime”. Tcl does not have a built-in function called “getTime”, and neither does the Breeze library, but you could define one like we described in the previous section.

The square brackets can be nested, so we can build up quite powerful statements without introducing extra syntax rules.

set now [ getTime [ getLocalTimeZone ] ]

So this would (theoretically) call a function to obtain the name of the local time zone, pass that value as an argument to the getTime function, then store the result in the variable called “now”

We've seen how to set the value of a variable, but how do we read it back again and use it in our program?

In many scripting languages, including Tcl, the value of a variable can be obtained using the dollar sign followed by the name of the variable.

set greeting hello
display $greeting

I do not plan on implementing that syntax, as there is already a way to obtain the value of a variable using the existing syntax rules. I want to keep the parser as small and fast as possible, so I don't want to add any features that are not essential. So I propose to simply provide a built-in command called “get” to obtain the value of variable.

set greeting hello
display [ get greeting ]

So far we've only seen how bare words are understood by the parser. Any number of spaces, or tab characters can be used to separate words, but what if we want to include spaces in parameters?

The Breeze library will support “quoted strings” in the same way they are implemented in most other programming languages. They may include some special characters prefixed by a slash, e.g. “\n” for newline, etc. The slash can also be used to include a quote within a string, e.g. “He said \”hello\“ to me”.

Square brackets have no special meaning within quoted strings (unlike Tcl). But any parameters that are not separated by at least one space will have their values joined together (like Tcl). This allows the data returned by functions to be included in a complex parameter, as well as unquoted words and numbers.

In this example there is only one parameter passed to the “display” command. This is a good example of what is possible, but a bad example of coding style.

set greeting hello
set quote "\""
display He" said "[ get quote ][ get greeting ][ get quote ]" to "me

I'm building a computer language here, but I've not said how we can use it to compute anything!

We could define a new command called “add” that takes two parameters, adds them together and returns the result. Then we could do something like this…

display [ add 5 7 ]

It doesn't look “normal” like other languages, it looks quite ugly, so I had a look around to see what would be the easiest thing to do to add better syntax, but re-using most of the code I have written so far.

After searching for a while, I found information about a small Tcl implementation called "Cricket". It uses [ and ] for command dispatch as in Tcl, but also ( and ) for “second argument dispatch”. Meaning that the second word is taken to be the command, rather than the first. This allows more natural looking syntax. It's even better if we rename our “add” function to be “+”.

display ( 5 + 7 )

If we want to add three values together, we will unfortunately need to call the “+” function twice.

display ( 5 + ( 7 + 8 ) )
display ( ( 5 + 7 ) + 8 )

The same syntax can be used for comparisons…

100 if ( [ get greeting ] == "hello" ) goto 300
200 if ( [ get greeting ] == "hello" ) [ goto 300 ]

The quotes around “hello” are optional, but putting them in makes the program look more natural.

The word “goto” on line 100 is accepted by “if” as its second parameter, but otherwise ignored.

The command “goto” on line 200 is always executed, even if the condition is false. Can you guess why?

I think it would be a good compromise to allow the “goto” command to support named places within the code, rather than just use line numbers all of the time. This way, sections of the code can be divided up in the same way that “functions” are defined in other languages. This would make it easier for the programmer to remember where sections of the code are, rather than having to remember line numbers. This would require a little more C++ coding on the Arduino, but would make the scripting language much easier to use. Each line must still have a line number, as they are used when editing the script via the command line interface.

I propose to use the hash character “#” as the “command” to define a label. In many programming languages, the hash character indicates a comment, and so it is not executed by the interpreter. Labels also contain no executable code, so this feels intuitive to programmers, and is understood by many syntax highlighters.

The first parameter will be the name of the label, and must be unique within the script. More parameters may be supplied, but will be ignored as comments, but beware of using [brackets] or (brackets) within the parameters as they will be executed as usual. It is recommended to enclose any comments in quotes.

110 set count 10
210 if ( [ get count ] == 0 ) goto 310
220 display [ get count ]
230 set count ( [ get count ] - 1 )
240 goto 210
310 display "Blast Off!"

100 # CountDown : This is the entry point to this sub-routine
110 set count 10
200 # CountDownLoop
210 if ( [ get count ] == 0 ) goto BlastOff
220 display [ get count ]
230 set count ( [ get count ] - 1 )
240 goto CountDownLoop
300 # BlastOff
310 display "Blast Off!"

Breeze

Syntax

One command per line

Line numbers

Single Pass Parsing

Data returned by functions

Reading a variable

Quoted Strings

Catenation of Parameters

Maths

Named Destinations For GoTo