Introduction
This book is dedicated to documenting and teaching the Hash
programming language to newcomers.
Firstly, Hash is an interpreted, garbage collected, strongly and statically typed language.
Prerequisites about the book
Sections which have the 🚧 icon next to the title are likely to change or be re-written in the future due to the nature of the early age of the language.
Language features
This chapter is dedicated to everything you need to know about the Hash
language
to start writing code in it.
Name bindings
Basics
Name bindings are made of three distinct components. The name, the type and the value that is assigned to the name.
Declaration of variables happens using the :
and =
symbols:
x: i32 = 3;
The :
symbol is used to denote the type of a variable, and the =
symbol is used to assign a value to it.
The type can be omitted, in which case it is inferred:
x := "Terence Tao"; // `x: str` inferred
x: str = "Terence Tao"; // same thing
x: i32 = "Terence Tao"; // Compile error: `str` is not assignable to `i32`.
Declaration and assignment can happen separately:
x: i32:
print(x); // Compile error: `x` might be uninitialised at this point!
x = 3;
print(x); // Ok
A variable declaration is an expression like any other, and returns the value of the variable. This means that you can write something like:
while (bytes_read := read_file(&buffer)) != 0 {
print("Read some bytes!");
}
Hash is statically typed, which means that variables cannot change types:
x := "Ha!";
x = Some("Baaa"); // Compile error: `Option<str>` is not assignable to `str`.
Mutability
By default, all bindings in Hash are constant.
In order to declare a mutable variable, use the mut
keyword in front of a name binding.
(mut a, mut b) := (1, 2);
mut test := get_test();
modify_test(&mut test);
a += 2;
b += 3;
Visibility
Visibility modifiers can be added to declarations inside modules or traits.
By default, all declarations in a module scope or impl
block are private, while all declarations in a trait
block are public.
To declare that a member is private, use priv
, while to declare that a member is public, use pub
.
// Visible from outside:
pub foo := 3;
// Not visible from outside:
bar := 4;
Foo := trait {
// Visible from outside
foo: (Self) -> str;
// Not visible from outside
priv bar: (Self) -> str;
};
Grammar
The grammar for name bindings (and partial name bindings/reassignments) is as follows:
pattern = pattern_binding | ...(other patterns)
pattern_binding = ( "pub" | "priv" )? "mut"? identifier
name_binding =
| ( pattern ":=" expr ) // Declaration and assignment, infer type
| ( pattern ( ":" type )? "=" expr ) // Assignment
| ( pattern ( ":" type ) ) // Declaration
Functions
Overview
Hash
places a lot of emphasis on functions, which are first class citizens.
Functions can be assigned to name bindings in the same way that any other value is assigned, similar to languages like Python lambdas, JavaScript arrow functions, etc.
General syntax and notes
Functions are defined by being assigned to bindings:
#![allow(unused)] fn main() { func := (...args) => { ...body... }; // With a return type func := (...args) -> return_ty => { ...body... }; }
The return type of a function is inferred from its body by default, but the ->
syntax can be used to explicitly declare it.
Function arguments are comma separated:
#![allow(unused)] fn main() { func := (arg0, arg1) => { ...body... }; // and you can optionally specify types: func := (arg0: str, arg1: char) -> u32 => { ...body... }; }
Function types can be explicitly provided after the :
in the declaration, in which case argument names do not have to be specified.
#![allow(unused)] fn main() { var: (str, char) -> u32 = (arg0, arg1) => { ... }; }
Function literals can also specify default arguments, which must come after all required arguments:
#![allow(unused)] fn main() { func := (arg0: str, arg1: char = 'c') -> u32 => { ... }; // The type of the argument is also inferred if provided without a type // annotation: func := (arg0: str, arg1 = 'c') -> u32 => { ... }; }
Default arguments do not need to be specified when the function is called:
#![allow(unused)] fn main() { func := (a: str, b = 'c', c = 2) -> u32 => { ... }; func("foobar"); // a = "foobar", b = 'c', c = 2 // You can optionally provide the arguments in declaration order: func("foobar", 'b', 3); // a = "foobar", b = 'b', c = 3 // Or you can provide them as named arguments: func("foobar", c = 3); // a = "foobar", b = 'c', c = 3 }
Named arguments can be used for more context when providing arguments, using the syntax arg_name = arg_value
.
After the first named argument is provided, all following arguments must be named.
Furthermore, up until but excluding the first named argument, all previous $n$ arguments must be the first $n$ in the function definition.
For example:
foo := (a: str, b: str, c: str, d: str) => { .. };
foo("a", "b", "c", "d") // Allowed -- no arguments are named.
foo(a="a", b="b", c="c", d="d") // Allowed -- all arguments are named.
foo("a", "b", c="c", d="d") // Allowed -- first two arguments are named.
foo(a="a", "b", c="c", d="d") // Not allowed -- argument b must be named if a is named.
foo("a", "b", c="c", "d") // Not allowed -- argument d must be named.
Grammar
The grammar for function definitions and function types is as follows:
function_param =
| ( ident ":=" expr ) // Declaration and assignment, infer type
| ( ident ( ":" type )? "=" expr ) // Assignment
| ( ident ( ":" type ) ) // Declaration
function_def = "(" ( function_param "," )* function_param? ")" ( "->" type )? ("=>" expr)
function_type = "(" ( function_param "," )* function_param? ")" "->" type
The grammar for function calls is as follows:
function_call_arg = expr | ( ident "=" expr )
function_call = expr "(" ( function_call_arg "," )* function_call_arg? ")"
Primitives
There are the following primitive types:
u8
,u16
,u32
,u64
: unsigned integersi8
,i16
,i32
,i64
: signed integersf32
,f64
: floating point numbersusize
,isize
: unsigned and signed pointer-sized integer types (for list indexing)ibig
,ubig
: unlimited size integersbool
: booleanstr
: string, copy on write and immutable[A]
: a list containing type A{A:B}
: a map between type A and type B(A, B, C)
: a tuple containing types A, B and C. Elements can be accessed by dot notation (my_tuple.first
)(a: A, b: B, c: C)
a tuple which contains named members a, b, c with types A, B, C respectively.void
or()
: the empty tuple type. Has a correspondingvoid
/()
value.never
: a type which can never be inhabited. Used, for example, for functions that never return, likepanic
, or an infinite loop.
Note: the list, map and set type syntax will most likely have to change eventually once literal types are introduced in the language.
Numbers
Numbers in hash are like numbers in most other statically typed languages. They come in 3 variants: unsigned, signed, and floating point.
Floating point literals must include either a .
or a scientific notation exponent
like 3.0
, 3e2
, 30e-1
, etc.
Host-sized integers
The primitives usize
and isize
are intended for list indexing.
This is because some systems (which are 32-bit) may not be able to support indexing a contiguous region of memory that is larger than the 32-bit max value.
So, the usize
and isize
primitives are host-system dependent.
Compile-time
In the future, Hash will support compile-time arbitrary code execution.
Considering this, the host machine's (on which the code is compiled) usize
width might differ from the target machine's (on which the code is executed) usize
width.
To account for this, any usize
which gets calculated at compile time needs to be checked that it fits within the target usize
.
This check will happen at compile time, so there is no possibility of memory corruption or wrong data.
Unlimited-sized integers
The ibig
and ubig
number primitives are integer types that have no upper or lower bound and will grow until the host operating system memory is exhausted when storing them.
These types are intended to be used when working with heavy mathematical problems which may exceed the maximum 64 bit integer size.
Lists
Lists are denoted using square bracket syntax where the values are separated by commas.
Examples:
x := [1,2,3,4,5,6]; // multiple elements
y := [];
z := [1,]; // optional trailing comma
w: [u64] = [];
// ^^^^^
// type
Grammar for lists:
list_literal = "[" ( expr "," )* expr? "]"
list_type = "[" type "]"
Tuples
Tuples have a familiar syntax with many other languages:
- Empty tuples:
(,)
or()
- Singleton tuple :
(A,)
- Many membered tuple:
(A, B, C)
or(A, B, C,)
Examples:
empty_tuple: (,) = (,);
// ^^^
// type
empty_tuple: () = ();
// ^^
// type
some_tuple: (str, u32) = ("string", 12);
// ^^^^^^^^^^
// type
It's worth noting that tuples are fancy syntax for structures and are indexed using numerical indices like 0
, 1
, 2
, etc to access each member explicitly.
Although, they are intended to be used mostly for pattern matching, you can access members of tuples like so.
If this is the case, you should consider using a structural data type which will allow you to do the same thing, and name the fields.
Read more about patterns here.
Grammar for tuples:
tuple_literal = ( "(" ( expr "," )* ")" ) | ( "(" ( expr "," )* expr ")" )
Named tuples
Named tuples are tuples that specify field names for each field within the tuple. This can be done, for example, to have nested fields in structs without having to create another struct for each sub-type. For example:
Comment := struct(
contents: str,
anchor: (
start: u32,
end: u32
),
edited: bool,
author_id: str,
);
Then, you can create a Comment
instance and then access its fields like so:
comment := Comment(
contents = "Hello, world",
anchor = (
start = 2,
end = 4
),
edited = false,
author_id = "f9erf8g43"
);
print(abs(comment.anchor.start - comment.anchor.end));
To initialise a tuple that has named fields, this can be done like so:
anchor := (start := 1, end := 2); // `anchor: (start: u32, end: u32)` inferred
// This can also be done like so (but shouldn't be used):
anchor: (start: u32, end: u32) = (1, 2); // Warning: assigning unnamed tuple to named tuple.
Named tuples can be coerced into unnamed tuples if the type layout of both tuples matches. However, this is not recommended because specifically naming tuples implies that the type cares about the names of the fields rather than simply being a positionally structural type.
Sets
Sets in Hash represent unordered collections of values. The syntax for sets is as follows:
- Empty set:
{,}
. - Singleton set :
{A,}
. - Many membered set:
{A, B, C}
or{A, B, C,}
. - Set type:
{A}
, for examplefoo: {i32} = {1, 2, 3}
.
set_literal = ( "{" "," "}" ) | ( "{" ( expr "," )+ "}" ) | ( "{" ( expr "," )* expr "}" )
Map
Maps in Hash represent collections of key-value pairs.
Any type that implements the Eq
and Hash
traits can be used as the key type in a map.
The syntax for maps is as follows:
- Empty map:
{:}
. - Singleton map :
{A:1}
or{A:1,}
. - Many membered map:
{A: 1, B: 2, C: 3}
or{A: 1, B: 2, C: 3,}
. - Map type:
{K:V}
, for examplenames: {str:str} = {"thom":"yorke", "jonny":"greenwood"}
.
map_literal = ( "{" ":" "}" ) | ( "{" ( expr ":" expr "," )* ( expr ":" expr )? "}" )
Note: the grammar for literal types can be found in the Types section.
Conditional statements
Conditional statements in the Hash
programming language are very similar to other languages such as Python, Javascript, C and Rust. However, there is one subtle difference, which is that the statement provided to a conditional statement must always evaluate to an explicit boolean value.
If-else statements
If statements are very basic constructs in Hash. An example of a basic if-else
statement is as follows:
#![allow(unused)] fn main() { // checking if the value 'a' evaluates to being 'true' if a { print("a"); } else { print("b"); } // using a comparison operator if b == 2 { print("b is " + b); } else { print ("b is not " + conv(b)); } }
Obviously, this checks if the evaluation of a
returns a boolean value. If it does not
evaluate to something to be considered as true, then the block expression defined
at else
is executed.
If you want multiple clauses, you can utilise the else-if
syntax to define multiple
conditional statements. To use the else-if
syntax, you do so like this:
#![allow(unused)] fn main() { if b == 2 { print("b is 2") } else if b == 3 { print("b is 3") } else { print("b isn't 2 or 3 ") } }
As mentioned in the introduction, a conditional statement must evaluate an explicit boolean value. The if
statement syntax will not infer a boolean value from a statement within Hash
. This design feature is motivated by the fact that in many languages, common bugs and mistakes occur with the automatic inference of conditional statements.
An example of an invalid program is:
#![allow(unused)] fn main() { a: u8 = 12; if a { print("a") } }
Additional syntax
Furthermore, if you do not want an else statement you can do:
#![allow(unused)] fn main() { if a { print("a") } // if a is true, then execute }
which is syntactic sugar for:
#![allow(unused)] fn main() { if a { print("a") } else {} }
Additionally, since the if
statement body's are also equivalent to functional bodies, you
can also specifically return a type as you would normally do within a function body:
#![allow(unused)] fn main() { abs: (i64: x) => i64 = if x < 0 { -x } else { x } }
You can also assign values since if
statements are just blocks
#![allow(unused)] fn main() { my_value: i32 = if some_condition == x { 3 } else { 5 }; }
However, you cannot do something like this:
#![allow(unused)] fn main() { abs: (i64: x) => i64 = if x < 0 { -x } }
Note: here that you will not get a syntax error if you run this, but you will encounter an error during the interpretation stage of the program because the function may not have any return type since this has no definition
of what should happen for the else
case.
If statements and Enums 🚧
You can destruct enum values within if statements using the if-let
syntax, like so:
#![allow(unused)] fn main() { enum Result = <T, E> => { Ok(T); Err(E); }; // mission critical, program should exit if it failed result: Result<u16, str> = Ok(12); if let Ok(value) = result { print("Got '" + conv(value) + "' from operation") } else let Err(e) = result { panic("Failed to get result: " + e); } }
Furthermore, for more complicated conditional statements, you can include an expression block which is essentially treated as if it was a functional body, like so:
#![allow(unused)] fn main() { f: str = "file.txt"; if { a = open(f); is_ok(a) } { // run if is_ok(a) returns true } // the above statement can also be written as a = open(f); if is_ok(a) { // run if is_ok(a) returns true } }
The only difference between those two examples is that within the first, a is contained within
the if statement condition body expression, i.e; the a
variable will not be visible to any further
scope. This has some advantages, specifically when you don't wish to store that particular result
from the operation. But if you do, you can always use the second version to utilise the result
of a
within the if
statement body or later on in the program.
Match cases
Match cases are one step above the simple if-else
syntax. Using a matching case, you can construct more complicated cases in a more readable format than u can with an if-else
statement. Additionally, you can destruct Enums into their corresponding values. To use a matching case, you do the following:
#![allow(unused)] fn main() { a := input<u8>(); m2 := match a { 1 => "one"; 2 => "two"; _ => "not one or two"; } // Or as a function convert: (x: u8) => str = (x) => match x { 1 => "one"; 2 => "two"; _ => "not one or two"; } m := convert(input<u8>()); }
The _
case is a special wildcard case that captures any case. This is essentially synonymous with the else
clause in many other languages like Python or JavaScript. For conventional purposes, it should be included when creating a match
statement where the type value is not reasonably bounded (like an integer). One subtle difference with the match
syntax is you must always explicitly define a _
case. This language behaviour is designed to enforce that explicit
is better than implicit
. So, if you know that a program should never hit
the default case:
#![allow(unused)] fn main() { match x { 1 => "one"; 2 => "two"; _ => unreachable(); // we know that 'x' should never be 1 or 2. } }
Note: You do not have to provide a default case if you have defined all the cases for a type (this mainly applies to enums).
Additionally, because cases are matched incrementally, by doing the following:
#![allow(unused)] fn main() { convert: (x: u8) => str = (x) => match x { _ => "not one or two"; 1 => "one"; 2 => "two"; } }
The value of m
will always evaluate as "not one or two"
since the wildcard matches any condition.
Match statements are also really good for destructing enum types in Hash. For example,
#![allow(unused)] fn main() { enum Result = <T, E> => { Ok(T); Err(E); }; ... // mission critical, program should exit if it failed result: Result<u16, str> = Ok(12); match result { Ok(value) => print("Got '" + conv(value) + "' from operation"); Err(e) => panic("Failed to get result: " + e); } }
To specify multiple conditions for a single case within a match
statement, you can do so by
writing the following syntax:
#![allow(unused)] fn main() { x: u32 = input<u32>(); match x { 1 | 2 | 3 => print("x is 1, 2, or 3"); 4 | 5 | {2 | 4} => print("x is either 4, 5 or 6"); // using bitwise or operator _ => print("x is something else"); } }
To specify more complex conditional statements like and within the match case, you
can do so using the match-if
syntax, like so:
#![allow(unused)] fn main() { x: u32 = input<u32>(); y: bool = true; match x { 1 | 2 | 3 if y => print("x is 1, 2, or 3 when y is true"); {4 if y} | y => print("x is 4 and y is true, or x is equal to y"); // using bitwise or operator {2 | 4 if y} => print("x is 6 and y is true"); _ => print("x is something else"); } }
Loop constructs
Hash contains 3 distinct loop control constructs: for
, while
and loop
. Each construct has
a distinct usage case, but they can often be used interchangeably without hassle and are merely
a style choice.
General
Each construct supports the basic break
and continue
loop control flow statements. These statements
have the same properties as in many other languages like C, Rust, Python etc.
break
- Using this control flow statements immediately terminates the loop and continues
to any statement after the loop (if any).
continue
- Using this control flow statement will immediately skip the current iteration
of the loop body and move on to the next iteration (if any). Obviously, if no iterations
remain, continue
behaves just like break
.
For loop
Basics
For loops are special loop control statements that are designed to be used with iterators.
For loops can be defined as:
#![allow(unused)] fn main() { for i in range(1, 10) { // range is a built in iterator print(i); } }
Iterating over lists is also quite simple using the iter
function to
convert the list into an iterator:
#![allow(unused)] fn main() { nums: [u32] = [1,2,3,4,5,6,7,8,9,10]; // infix functional notation for num in nums.iter() { print(num); } // using the postfix functional notation for num in iter(nums) { print(nums); } }
iterators
Iterators ship with the standard library, but you can define your own iterators via the Hash generic typing system.
An iterator I
of T
it means to have an implementation next<I, T>
in scope the current scope.
So, for the example above, the range
function is essentially a RangeIterator
of the u8
, u16
, u32
, ...
types.
More details about generics are here.
While loop
Basics
While loops are identical to 'while' constructs in other languages such as Java, C, JavaScript, etc.
The loop will check a given conditional expression ( must evaluate to a bool
), and if it evaluates
to true
, the loop body is executed, otherwise the interpreter moves on. The loop body can also
use loop control flow statements like break
or continue
to prematurely stop looping for a
given condition.
While loops can be defined as:
#![allow(unused)] fn main() { c: u32 = 0; while c < 10 { print(i); } }
The loop
keyword is equivalent of someone writing a while
loop that has
a conditional expression that always evaluate to true
; like so,
#![allow(unused)] fn main() { while true { // do something } // is the same as... loop { // do something } }
Note: In
Hash
, you cannot writedo-while
loops, but if u want to write a loop that behaves like ado-while
statement, here is a good example usingloop
:
#![allow(unused)] fn main() { loop { // do something here, to enable a condition check, // and then use if statement, or match case to check // if you need to break out of the loop. if !condition {break} } }
Expression blocks and behaviour
Furthermore, The looping condition can also be represented as a block which means it can have any number of expressions before the final expression. For example:
#![allow(unused)] fn main() { while {c += 1; c < 10} { print(i); } }
It is worth noting that the looping expression whether block or not must explicitly have the
boolean
return type. For example, the code below will fail typechecking:
#![allow(unused)] fn main() { c: u32 = 100; while c -= 1 { ... } }
Running the following code snippet produces the following error:
error[0052]: Failed to Typecheck: Mismatching types.
--> 3:7 - 3:12
1 | c: u32 = 100;
2 |
3 | while c -= 1 {
| ^^^^^^ Expression does not have a 'boolean' type
|
= note: The type of the expression was `(,)` but expected an explicit `boolean`.
Loop
The loop construct is the simplest of the three. The basic syntax for a loop is as follows:
#![allow(unused)] fn main() { c: u64 = 1; loop { print("I looped " + c + " times!"); c += 1; } }
You can also use conditional statements within the loop body (which is equivalent to a function body) like so:
#![allow(unused)] fn main() { c: u64 = 1; loop { if c == 10 { break } print("I looped " + c + " times!"); c += 1; } // this will loop 10 times, and print all 10 times }
#![allow(unused)] fn main() { c: u64 = 1; loop { c += 1; if c % 2 != 0 { continue }; print("I loop and I print when I get a " + c); } // this will loop 10 times, and print only when c is even }
Operators & Symbols
This section contains all of the syntactic operators that are available within Hash
General operators 🚧
Here are the general operators for arithmetic, bitwise assignment operators. This table does not include all of the possible operators specified within the grammar. There are more operators that are related to a specific group of operations or are used to convey meaning within the language.
Operator | Example | Description | Overloadable trait |
---|---|---|---|
== , != | a == 2 , b != 'a' | Equality | eq |
= | a = 2 | Assignment | N/A |
! | !a | Logical not | not |
&& | a && b | Logical and | and |
|| | a || b | Logical or | or |
+ | 2 + 2 , 3 + b | Addition | add |
- | 3 - a | Subtraction | sub |
- | -2 | Negation | neg |
* | 3 * 2 , 2 * c | Multiplication | mul |
^^ | 3 ^^ 2 , 3 ^^ 2.3 | Exponentiation | exp |
/ | 4 / 2 , a / b | Division | div |
% | a % 1 | Modulo | mod |
<< | 4 << 1 | Bitwise left shift | shl |
>> | 8 >> 1 | Bitwise right shift | shr |
& | 5 & 4 , a & 2 | Bitwise and | andb |
| | a | 2 | Bitwise or | orb |
^ | 3 ^ 2 | Bitwise exclusive or | xorb |
~ | ~2 | Bitwise not | notb |
>= , <= , < , > | 2 < b , c >= 3 | Order comparison | ord |
+= | x += y | Add with assignment | add_eq |
-= | x -= 1 | Subtract with assignment | sub_eq |
*= | b *= 10 | Multiply with assignment | mul_eq |
/= | b /= 2 | Divide with assignment | div_eq |
%= | a %= 3 | Modulo with assignment | mod_eq |
&&= | b &&= c | Logical and with assignment | and_eq |
>>= | b >>= 3 | Bitwise right shift equality | shr_eq |
<<= | b <<= 1 | Bitwise left shift equality | shl_eq |
||= | b ||= c | Logical or with assignment | or_eq |
&= | a &= b | Bitwise and with assignment | andb |
|= | b |= SOME_CONST | Bitwise or with assignment | orb |
^= | a ^= 1 | Bitwise xor with assignment | xorb |
. | a.foo | Struct/Tuple enum property accessor | N/A |
: | {2: 'a'} | Map key-value separator | N/A |
:: | io::open() | Namespace symbol access | N/A |
as | t as str | Type assertion | N/A |
@ | N/A | Pattern value binding | N/A |
... | N/A | Spread operator (Not-implemented) | range ? |
; | expression; | statement terminator | N/A |
? | k<T> where s<T, ?> := ... | Type argument wildcard | N/A |
-> | (str) -> usize | Function return type notation | N/A |
=> | (a) => a + 2 | Function Body definition | N/A |
Comments 🚧
This table represents the syntax for different types of comments in Hash:
Symbol | Description |
---|---|
//... | Line comment |
/*...*/ | Block comment |
/// | function doc comment 🚧 |
//! | module doc comment 🚧 |
Type Signature Assertions
Basics
As in many other languages, the programmer can specify the type of a variable or a literal by using some special syntax. For example, in languages such as typescript, you can say that:
#![allow(unused)] fn main() { some_value as str }
which implies that you are asserting that some_value
is a string, this is essentially a way to avoid explicitly stating that type of a variable every
single time and telling the compiler "Trust me some_value
is a string
".
The principle is somewhat similar in Hash
, but it is more strictly enforced.
For example, within the statement x := 37;
, the type of x
can be any of the
integer types. This might lead to unexpected behaviour in future statements, where
the compiler has decided the type of x
(it might not be what you intended it).
So, you can either declare x
to be some integer type explicitly like so:
x: u32 = 37;
Or you can, use as
to imply a type for a variable, which the compiler will assume
to be true, like so:
x := 37 as u32;
Failing type assertions
If you specify a type assertion, the compiler will either attempt to infer this information from the left-hand side of the as
operator
to the right. If the inference results in a different type to the right-hand side, this will raise a typechecking failure.
For example, if you were to specify the expression:
#![allow(unused)] fn main() { "A" as char }
The compiler will report this error as:
error[0001]: Types mismatch, got a `str`, but wanted a `char`.
--> <interactive>:1:8
1 | "A" as char
| ^^^^ This specifies that the expression should be of type `char`
--> <interactive>:1:1
1 | "A" as char
| ^^^ Found this to be of type `str`
Usefulness
Why are type assertions when there is already type inference within the language? Well, sometimes the type inference system does not have enough information to infer the types of variables and declarations. Type inference may not have enough information when dealing with functions that are generic, so it can sometimes be useful to assert to the compiler that a given variable is a certain type.
Additionally, whilst the language is in an early stage of maturity and some things that are quirky or broken, type assertions can come to the rescue and help the compiler to understand your program.
In general, type assertions should be used when the compiler cannot infer the type of some expression with the given information and needs assistance. You shouldn't need to use type assertions often.
Types
Grammar
type =
| tuple_type
| list_type
| set_type
| map_type
| grouped_type
| named_type
| function_type
| type_function_call
| type_function
| merge_type
| union_type
| ref_type
tuple_type = ( "(" ( type "," )* ")" ) | ( "(" ( type "," )+ type ")" )
list_type = "[" type "]"
map_type = "{" type ":" type "}"
set_type = "{" type "}"
grouped_type = "(" type ")"
named_type = access_name
function_type_param = type | ( ident ":" type )
function_type = "(" ( function_type_param "," )* function_type_param? ")" "->" type
type_function_call_arg = type | ( ident "=" type )
type_function_call = ( grouped_type | named_type ) "<" ( type_function_call_arg "," )* type_function_call_arg? ">"
type_function_param = ident ( ":" type )? ( "=" type )?
type_function = "<" ( type_function_param "," )* type_function_param? ">" "->" type
merge_type = ( type "~" )+ type
union_type = ( type "|" )+ type
ref_type = "&" ( "raw" )? ( "mut" )? type
Struct types
In Hash, structs are pre-defined collections of heterogeneous types, similar to C or Rust:
#![allow(unused)] fn main() { FloatVector3 := struct( x: f32, y: f32, z: f32, ); }
A struct is comprised of a set of fields. Each field has a name, a type, and an optional default value.
Structs can be instantiated with specific values for each of the fields. Default values can be omitted, but can also be overridden.
#![allow(unused)] fn main() { Dog := struct( age: u32 = 42, name: str, ); d := Dog(name = "Bob"); print(d); // Dog(name = "Bob", age = 42) }
Structs are nominal types.
An argument of type Dog
can only be fulfilled by an instance of Dog
, and you can't pass in a struct that has the same fields but is of a different named type.
#![allow(unused)] fn main() { dog_name := (dog: Dog) => dog.name; FakeDog := struct( age: u32 = 42, name: str, ); print(dog_name(d)); // "Bob" print(dog_name(FakeDog(age = 1, name = "Max"))); // Error: Type mismatch: was expecting `Dog`, got `FakeDog`. }
Enum types
Hash enums are similar to Rust enums or Haskell data types. Each variant of an enum can also hold some data. These are also known as algebraic data types, or tagged unions.
#![allow(unused)] fn main() { NetworkError := enum( NoBytesReceived, ConnectionTerminated, Unexpected(message: str, code: i32), ); }
Enum contents consist of a semicolon-separated list of variant names. Each variant can be paired with some data, in the form of a comma-separated list of types.
#![allow(unused)] fn main() { err := NetworkError::Unexpected("something went terribly wrong", 32); }
They can be match
ed to discover what they contain:
#![allow(unused)] fn main() { handle_error := (error: NetworkError) => match error { NoBytesReceived => print("No bytes received, stopping"); ConnectionTerminated => print("Connection was terminated"); Unexpected(message, code) => print("An unexpected error occurred: " + err + " (" + conv(code) + ") "); }; }
Like structs, enums are nominal types, rather than structural. Each enum member is essentially a struct type.
Generic types
Because Hash supports type functions, structs and enums can be generic over some type parameters:
#![allow(unused)] fn main() { LinkedList := <T> => struct( head: Option<&raw T>, ); empty_linked_list = <T> => () -> LinkedList<T> => { LinkedList(head = None) }; x := empty_linked_list<i32>(); // x: LinkedList<i32> inferred }
Notice that struct(...)
and enum(...)
are expressions, which are bound to names on the left hand side.
For more information, see type functions.
Grammar
The grammar for struct definitions is as follows:
struct_member =
| ( ident ":=" expr ) // Declaration and assignment, infer type
| ( ident ( ":" type )? "=" expr ) // Assignment
| ( ident ( ":" type ) ) // Declaration
struct_def := "struct" "(" struct_member* ")"
The grammar for enum definitions is as follows:
enum_member =
| ident // No fields
| ident "(" struct_member* ")" // With fields
enum_def := "enum" "(" enum_member* ")"
Hash language modules
A module in Hash
can contain variable definitions, function definitions, type definitions or include other modules.
Each .hash
source file is a module, and inline modules can also be created using mod
blocks.
Importing
Given the project structure:
.
├── lib
│ ├── a.hash
│ ├── b.hash
│ └── sub
│ └── c.hash
└── main.hash
Modules in hash allow for a source to be split up into smaller code fragments, allowing for better source code organisation and maintenance.
You can import modules by specifying the path relative to the current path.
For example, if you wanted to include the modules a
, b
, and or c
within your main file
#![allow(unused)] fn main() { // main.hash a := import("lib/a"); b := import("lib/b"); c := import("lib/sub/c"); }
By doing so, you are placing everything that is defined within each of those modules under the namespace.
Exporting
In order to export items from a module, use the pub
keyword.
For example:
#![allow(unused)] fn main() { /// a.hash // Visible from outside: pub a := 1; // Not visible from outside (priv by default): b := 1; // Not visible from outside: priv c := 1; /// b.hash { a } := import("a.hash"); // Ok { b } := import("a.hash"); // Error: b is private { c } := import("a.hash"); // Error: c is private. }
Referencing exports
Furthermore, if the a
module contained a public structure definition like Point
:
#![allow(unused)] fn main() { // a.hash pub Point := struct( x: u32, y: u32, ); }
Within main, you can create a new Point
by doing the following
#![allow(unused)] fn main() { // main.hash a := import("lib/a"); p1 := a::Point( x = 2, y = 3, ); print(p1.x); // 2 print(p1.y); // 3 }
From this example, the ::
item access operator is used to reference any exports from the module.
Furthermore, what if you wanted to import only a specific definition within a module such as the 'Point' structure from the module a
.
You can do so by destructuring the definitions into using the syntax as follows:
#![allow(unused)] fn main() { { Point } := import("lib/a"); p1 := Point(x=2, y=3); }
In case you have a member of your current module already reserving a name, you
can rename the exported members to your liking using the as
pattern operator:
#![allow(unused)] fn main() { { Point as LibPoint } = import("lib/a"); p1 := LibPoint(x=2, y=3); }
Inline modules
Other than through .hash
files, modules can be created inline using mod
blocks:
#![allow(unused)] fn main() { // a.hash bar := 3; pub nested := mod { pub Colour := enum(Red, Green, Blue); }; // b.hash a := import("a.hash"); red := a::nested::Colour::Red; }
These follow the same conventions as .hash
files, and members need to be exported with pub
in order to be visible from the outside.
However, the external module items are always visible from within a mod
block, so in the above example, bar
can be used from within nested
.
Grammar
The grammar for file modules is as follows:
file_module = ( expr ";" )*
The grammar for mod
blocks (which are expressions) is as follows:
mod_block = "mod" "{" ( expr ";" )* "}"
Patterns
Pattern matching is a very big part of Hash
and the productivity of the language.
Patterns are a declarative form of equality checking, similar to patterns in Rust or Haskell.
Pattern matching within match
statements is more detailed within the Conditional statements section
of the book.
This chapter is dedicated to documenting the various kinds of patterns that there are in Hash.
Literal patterns
Literal patterns are patterns that match a specific value of a primitive type, like a number or a string. For example, consider the following snippet of code:
#![allow(unused)] fn main() { foo := get_foo(); // foo: i32 match foo { 1 => print("Got one"); 2 => print("Got two"); 3 => print("Got three"); _ => print("Got something else"); } }
On the left-hand side of the match cases there are the literal patterns 1
, 2
and 3
.
These perform foo == 1
, foo == 2
and foo == 3
in sequence, and the code follows the branch which succeeds first.
If no branch succeeds, the _
branch is followed, which means "match anything".
Literals can be integer literals for integer types (signed or unsigned), string literals for the str
type, or character literals for the char
type:
#![allow(unused)] fn main() { match my_char { 'A' => print("First letter"); 'B' => print("Second letter"); x => print("Letter is: " + conv(x)); } match my_str { "fizz" => print("Multiple of 3"); "buzz" => print("Multiple of 5"); "fizzbuzz" => print("Multiple of 15"); _ => print("Not a multiple of 3 or 5"); } }
Binding patterns
Nested values within the value being pattern matched can be bound to symbols, using binding patterns. A binding pattern is any valid Hash identifier:
#![allow(unused)] fn main() { match fallible_operation() { // fallible_operation: () -> Result<f32, i32> Ok(success) => print("Got success " + conv(result)); // success: f32 Err(failure) => print("Got failure " + conv(failure)); // failure: i32 } }
Tuple patterns
Tuple patterns match a tuple type of some given arity, and contain nested patterns. They are irrefutable if their inner patterns are irrefutable, so they can be used in declarations.
#![allow(unused)] fn main() { Cat := struct(name: str); // Creating a tuple: my_val := (Cat("Bob"), [1, 2, 3]); // my_val: (Cat, [i32]) // Tuple pattern: (Cat(name), elements) := my_val; assert(name == "Bob"); assert(elements == [1, 2, 3]); }
Constructor patterns
Constructor patterns are used to match the members of structs or enum variants. A struct is comprised of a single constructor, while an enum might be comprised of multiple constructors. Struct constructors are irrefutable if their inner patterns are irrefutable, while enum constructors are irrefutable only if the enum contains a single variant. For example:
#![allow(unused)] fn main() { Option := <T> => enum(Some(value: T), None); my_val := Some("haha"); match my_val { // Matching the Some(..) constructor Some(inner) => assert(inner == "haha"); // inner: str // Matching the None constructor None => assert(false); } }
The names of the members of a constructor need to be specified if the matching isn't done in order:
#![allow(unused)] fn main() { Dog := struct(name: str, breed: str); Dog(breed = dog_breed, name = dog_name) = Dog( name = "Bob", breed = "Husky" ) // dog_breed: str, dog_name: str // Same as: Dog(name, breed) = Dog( name = "Bob", breed = "Husky" ) // breed: str, name: str }
List patterns
A list pattern can match elements at certain positions of a list by using the following syntax:
#![allow(unused)] fn main() { match arr { [a, b] => print(conv(a) + " " + conv(b)); _ => print("Other"); // Matches everything other than [X, Y] for some X and Y } }
The ...
spread operator can be used to capture or ignore the rest of the elements of the list at some position:
#![allow(unused)] fn main() { match arr { [a, b, ...] => print(conv(a) + " " + conv(b)); _ => print("Other"); // Only matches [] and [X] for some X } }
If you want to match the remaining elements with some pattern, you can specify a pattern after the spread
operator like so:
#![allow(unused)] fn main() { match arr { [a, b, ...rest] => print(conv(a) + " " + conv(b) + " " + conv(rest)); [...rest, c] => print(conv(c)); // Only matches [X] for some X, rest is always [] _ => print("Other"); // Only matches [] } }
One obvious limitation of the spread
operator is that you can only use it once in the list pattern.
For example, the following pattern will be reported as an error by the compiler:
#![allow(unused)] fn main() { [..., a, ...] := arr; }
error: Failed to typecheck:
--> 1:6 - 1:9, 1:15 - 1:18
|
1 | [..., a, ...] := arr;
| ^^^ ^^^
|
= You cannot use multiple spread operators within a single list pattern.
Module patterns
Module patterns are used to match members of a module. They are used when importing symbols from other modules. They follow a simple syntax:
#![allow(unused)] fn main() { // imports only a and b from the module {a, b} := import("./my_lib"); // imports c as my_c, and d from the module. {c as my_c, d} := import("./other_lib"); // imports Cat from the nested module as NestedCat {Cat as NestedCat} := mod { pub Cat := struct(name: str, age: i32); }; }
You do not need to list all the members of a module in the pattern; the members which are not listed will be ignored. To read more about modules, you can click here.
Or-patterns
Or-patterns are specified using the |
pattern operator, and allow one to match multiple different patterns, and use the one which succeeds.
For example:
#![allow(unused)] fn main() { symmetric_result: Result<str, str> := Ok("bilbobaggins"); (Ok(inner) | Err(inner)) := symmetric_result; // inner: str }
The pattern above is irrefutable because it matches all variants of the Result
enum.
Furthermore, each branch has the binding inner
, which always has the type str
, and so is a valid pattern.
The same name binding can appear in multiple branches of an or-pattern, given that it is bound in every branch, and always to the same type.
Another use-case of or-patterns is to collapse match cases:
#![allow(unused)] fn main() { match color { Red | Blue | Green => print("Primary additive"); Cyan | Magenta | Yellow => print("Primary subtractive"); _ => print("Unimportant color"); } }
Conditional patterns
Conditional patterns allow one to specify further arbitrary boolean conditions to a pattern for it to match:
#![allow(unused)] fn main() { match my_result { Ok(inner) if inner > threshold * 2.0 => { print("Phew, above twice the threshold"); }; Ok(inner) if inner > threshold => { print("Phew, above the threshold but cutting it close!"); }; Ok(inner) => { print("The result was successful but the value was below the threshold"); }; Err(_) => { print("The result was unsuccessful... Commencing auto-destruct sequence."); auto_destruct(); }; } }
They are specified using the if
keyword after a pattern.
Conditional patterns are always refutable, at least as far as the current version of the language is concerned.
With more advanced type refinement and literal types, this restriction can be lifted sometimes.
Pattern grouping
Patterns can be grouped using parentheses ()
.
This is necessary in declarations for example, if one wants to specify a conditional pattern:
#![allow(unused)] fn main() { // get_value: () -> bool; true | false := get_value(); // Error: bitwise-or not implemented between `bool` and `void` (true | false) := get_value(); // Ok }
Grammar
The grammar for patterns is as follows:
pattern =
| single_pattern
| or_pattern
single_pattern =
| binding_pattern
| constructor_pattern
| tuple_pattern
| module_pattern
| literal_pattern
| list_pattern
or_pattern = ( single_pattern "|" )+ single_pattern
binding_pattern = identifier
tuple_pattern_member = identifier | ( identifier "=" single_pattern )
constructor_pattern = access_name ( "(" ( tuple_pattern_member "," )* tuple_pattern_member? ")" )?
tuple_pattern =
| ( "(" ( tuple_pattern_member "," )+ tuple_pattern_member? ")" )
| ( "(" tuple_pattern_member "," ")" )
module_pattern_member = identifier ( "as" single_pattern )?
module_pattern = "{" ( module_pattern_member "," )* module_pattern_member? "}"
literal_pattern = integer_literal | string_literal | character_literal | float_literal
list_pattern_member = pattern | ( "..." identifier? )
list_pattern = "[" ( list_pattern_member "," )* list_pattern_member? "]"
Traits and implementations
Traits
Hash supports compile-time polymorphism through traits. Traits are a core mechanism in Hash; they allow for different implementations of the same set of operations, for different types. They are similar to traits in Rust, type-classes in Haskell, and protocols in Swift. For example:
Printable := trait {
print: (Self) -> void;
};
The above declares a trait called Printable
which has a single associated function print
.
The special type Self
denotes the type for which the trait is implemented.
Traits can be implemented for types in the following way:
Dog := struct(
name: str,
age: i32,
);
Dog ~= Printable {
// `Self = Dog` inferred
print = (self) => io::printf(f"Doge with name {self.name} and age {self.age}");
};
Now a Dog
is assignable to any type that has bound Printable
.
The ~=
operator is the combination of ~
and =
operators, and it is equivalent to
Dog = Dog ~ Printable { ... };
The ~
operator means "attach", and it is used to attach implementations of traits to structs and enums.
Trait implementations can be created without having to attach them to a specific type:
DogPrintable := Printable {
Self = Dog, // Self can no longer be inferred, it needs to be explicitly specified.
print = (self) => io.printf(f"Doge with name {self.name} and age {self.age}");
};
doge := Dog(..);
DogPrintable::print(doge); // Trait implementations can be called explicitly like this
Dog ~= DogPrintable; // `DogPrintable` can be attached to `Dog` as long as `DogPrintable::Self = Dog`.
// Then you can also do this, and it will be resolved to `DogPrintable::print(doge)`:
doge.print();
Traits can also be generic over other types:
Sequence := <T> => trait {
at: (self, index: usize) -> Option<T>;
slice: (self, start: usize, end: usize) -> Self;
};
List := <T> => struct(...);
// For List (of type `<T: type> -> type`) implement Sequence (of type `<T: type> -> trait`):
// This will be implemented for all `T`.
List ~= Sequence;
Notice that in addition to traits, type functions returning traits can also be implemented for other type functions returning types. This is possible as long as both functions on the left hand side and right hand side match:
SomeTrait := <T> => trait {
Self: type; // Restrict what `Self` can be
...
};
// Allowed: `<T: type> -> trait` attachable to `<T: type> -> type`.
(<T> => SomeType) ~= (<T> => SomeTrait<T> {...});
// Not allowed: `<T: type> -> trait` is not attachable to `type`
SomeType ~= (<T> => SomeTrait<T> {...});
// Not allowed: `trait` is not attachable to `<T: type> -> type` because
// `SomeTraitImpl::Self` has type `type` and not `<T: type> -> type`.
SomeType ~= (<T> => SomeTrait<T> {...});
Furthermore, traits do not need to have a self type:
Convert := <I, O> => trait {
convert: (I) -> O;
};
ConvertDogeToGatos := Convert<Doge, Gatos> {
convert = (doge) => perform_transformation_from_doge_to_gatos(doge);
};
doggo := Doge(...);
kitty := ConvertDogeToGatos::convert(doggo);
Traits can also be used as bounds on type parameters:
print_things_if_eq := <Thing: Printable ~ Eq> => (thing1: Thing, Thing2: thing) => {
if thing1 == thing2 {
print(thing1);
print(thing2);
}
};
Here, Thing
must implement Printable
and Eq
.
Notice the same attachment syntax (~
) for multiple trait bounds, just as for attaching trait implementations to types.
Traits are monomorphised at runtime, and thus are completely erased. Therefore, there is no additional runtime overhead to structuring your code using lots of traits/generics and polymorphism, vs using plain old functions without any generics. There is, however, additional compile-time cost to very complicated trait hierarchies and trait bounds.
Implementations
Implementations can be attached to types without having to implement a specific trait, using impl
blocks.
These are equivalent to trait implementation blocks, but do not correspond to any trait, and just attach the given items to the type as associated items.
Example:
Vector3 := <T> => struct(x: T, y: T, z: T);
Vector3 ~= <T: Mul ~ Sub> => impl {
// Cross is an associated function on `Vector3<T>` for any `T: Mul ~ Sub`.
cross := (self, other: Self) -> Self => {
Vector3(
self.y * other.z - self.z * other.y,
self.z * other.x - self.x * other.z,
self.x * other.y - self.y * other.x,
)
};
};
print(Vector3(1, 2, 3).cross(Vector3(4, 5, 6)));
By default, members of impl
blocks are public, but priv
can be written to make them private.
Grammar
The grammar for trait definitions is as follows:
trait_def = "trait" "{" ( expr ";" )* "}"
The grammar for trait implementations is as follows:
trait_impl = ident "{" ( expr ";" )* "}"
The grammar for standalone impl
blocks is as follows:
impl_block = "impl" "{" ( expr ";" )* "}"
Type functions
Hash supports functions both at the value level and at the type level.
Type-level functions correspond to generics in other languages.
They are declared using angular brackets (<
and >
) rather than parentheses, and all parameters are types.
Other than that, they have the same syntax as normal (value-level) functions.
Type-level functions can be used to create generic structs, enums, functions, and traits.
For example, the generic Result<T, E>
type would is defined as
Result := <T, E> => enum(
Ok(T),
Err(E),
);
This declares that Result
is a function of kind <T: type, E: type> -> type
.
The default bound on each type parameter is type
, but it can be any trait (or traits) as well.
Multiple trait bounds can be specified using the ~
binary operator.
For example,
Result := <T: Clone ~ Eq, E: Error ~ Print> => enum(
Ok(T),
Err(E),
);
Here, T
must implement Clone
and Eq
, and E
must implement Error
and Print
.
In order to evaluate type functions, type arguments can be specified in angle brackets:
my_result: Result<i32, str> = Ok(3);
When calling functions or instantiating enums/structs, type arguments can be inferred so that you don't have to specify them manually:
RefCounted := <Inner: Sized> => struct(
ptr: &raw Inner,
references: usize
);
make_ref_counted := <Inner: Sized> => (value: Inner) -> RefCounted<Inner> => {
data_ptr := allocate_bytes_for<Inner>();
RefCounted( // Type argument `Inner` inferred
ptr = data_ptr,
references = 1,
)
};
my_ref_counted_string = make_ref_counted("Bilbo bing bong"); // `Inner = str` inferred
In order to explicitly infer specific arguments, you can use the _
sigil:
Convert := <I, O> => trait {
convert: (input: I) -> O;
};
// ...implementations of convert
x := 3.convert<_, str>(); // `I = i32` inferred, `O = str` given.
x := 3.convert<I = _, O = str>(); // same thing.
x: str = 3.convert(); // same thing.
Type functions can only return types or functions; they cannot return values (though this is planned eventually). This means that you cannot write
land_with := <T> => land_on_moon_with<T>();
signal := land<Rover>;
but you can write
land_with := <T> => () => land_on_moon_with<T>();
signal := land<Rover>();
Just like with value-level functions, type-level functions can be provided with named arguments rather than positional arguments. These are subject to the same rules as value-level functions:
make_repository := <
Create, Read,
Update, Delete
> => () -> Repository<Create, Read, Update, Delete> => {
...
};
repo := make_repository<
Create = DogCreate,
Read = DogRead,
Update = DogUpdate,
Delete = DogDelete
>();
Finally, type-level function parameters can be given default arguments, which will be used if the arguments cannot be inferred from context and aren't specified explicitly:
Vec := <T, Allocator: Alloc = GlobalAllocator> => struct(
data: RawRefInAllocator<T, Allocator>,
length: usize,
);
make_vec := <T> => () -> Vec<T> => { ... }; // `Allocator = GlobalAllocator` inferred
make_vec_with_alloc := <T, Allocator: Alloc> => (allocator: Allocator) -> Vec<T, Allocator> => { ... };
x := make_vec<str>(); // `Allocator = GlobalAllocator` inferred
y := make_vec_with_alloc<str, _>(slab_allocator); // `Allocator = SlabAllocator` inferred
Grammar
The grammar for type function definitions and type function types can be found in the Types section.
Memory
Still under construction.
Macros
This section describes the syntax for macros in Hash. Macros are a way to write code that writes other code. There are two kind of macro invocations: one macro works on AST items, and the other works on tokens.
AST macros
AST-level macros are written with the syntax #macro_name <subject>
or #[macro_name(macro_arg)] <subject>
. The first form is a used as a shorthand
for macros that don't have any additional arguments to the macro itself.
For example, the #dump_ast
macro will accept any AST item as the subject and print the parsed AST to the console.
#![allow(unused)] fn main() { dump_ast main := () => { println("Hello, world!"); } }
An example of an AST macro being used to set some attributes on a function:
#![allow(unused)] fn main() { #[attr(foreign(c), no_mangle, link_name = "jpeg_read_header")] jpeg_read_header := (&raw cinfo, bool require_image) -> i32; }
Token macros
Token macros follow a similar syntax to AST macros, but instead of working on AST items, they work on tokens. The syntax for token macros is @macro_name <subject>
or @[macro_name(macro_arg)] <subject>
. The first form is a used as a shorthand for token macros that have no arguments. However, one significant difference between token macros and AST macros is that the token macro only accepts a token tree
as the subject. A token tree is a sequence of tokens that are
enclosed in a pair of delimiters. Token trees are either [...]
, {...}
or (...)
. It is then up to the macro to define various
rules for accepting the token tree:
An example of using min
and max
macros:
#![allow(unused)] fn main() { main := () => { min := @min {1 + 2, 3 * 4, 7 - 6 + 1 }; max := @max {1 + 2, 3 * 4, 7 - 6 + 1 }; if max - min == 0 { println("min and max are equal") } else { println("min and max are not equal") } } }
Another example of macro with a token tree for HTML:
welcome := () => {
@[xml(variant=html)] {
<html>
<head>
<title>My page</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
}
}
Defining a macro 🚧
This section hasn't been defined yet, and is still a work in progress.
Macro Rules 🚧
Macro invocation locations
Both styles of macro invocations can appear in the following positions:
Expr
Type
Pat
Param
Arg
TypeArg
PatArg
MatchCase
EnumVariant
Here is an example in code of all of the possible positions where a macro invocation can appear:
#![allow(unused)] #![module_attributes] fn main() { dump_ast Foo := struct<#dump_ast T>( dump_ast x: T, dump_ast y: T, dump_ast z: T, ); Bar := enum<T>( dump_ast A(T), dump_ast B(T), dump_ast C(T), ); bing := (#param x: i32) -> #ty i32 => { foo := Foo(#arg x = 5, #arg y = 6, #arg z = 7); match x { dump_ast 0 => 0, dump_ast 1 => 1, dump_ast (#dump_ast _) => bing(x - 1) + bing(x - 2), } } }
Grammar
Formally, the macro syntax invocation can be written as follows:
token_macro_invocation ::= "@" ( macro_name | macro_args ) token_tree;
token_tree ::= "{" any "}"
| "[" any "]"
| "(" any ")";
ast_macro_invocation ::= '#' (macro_name | macro_args ) macro_subject;
module_macro_invocation ::= "#!" macro_args;
macro_subject ::= expr
| type
| pat
| param
| arg
| type_arg
| pat_arg
| match_case
| enum_variant;
macro_args ::= "[" ( ∅ | macro_invocation ("," macro_invocation)* ","? ) "]";
macro_invocation ::= macro_name ( "(" ∅ | expr ("," expr )* ","? ")" )?;
macro_name ::= access_name;
Standard library
Current modules
The standard library included the following modules:
math
: Mathematical functions, constants and over useful numerical methods.io
: File handling and IO moduleiter
: Iterators modulelist
: Useful list functions including sorting, manipulation and transformations
Future expansions 🚧
These modules are currently under construction or proposed:
time
: Useful constructs for time orientated datasys
: System information about the host OSpath
: Path utilities
Interpreter
This chapter is dedicated to documenting the current interpreter implementation, future plans and a very basic manual for how to use the interpreter (via commandline arguments).
Interpretor command-line arguments
The Hash
interpreter has a number of options that you can enable when running an instance of
a VM. This page documents options and configurations that you can change when running a Hash
interpreter.
General overview
-e
, --execute
: Execute a command
Set the mode of the interpreter to 'execute' mode implying to immediately run the provided script rather than launching as an interactive mode.
For example:
$ hash -e examples/compute_pi.hash
3.1415926535897
-d
, --debug
: Run compiler in debug mode
This will enable debug mode within the compiler which will mean that the compiler will verbosely report on timings, procedures and in general what it is doing at a given moment.
-h
, --help
: Print commandline help menu
Displays a help dialogue on how to use the command line arguments with the hash interpreter.
-v
, --version
: Compiler version
Displays the current interpreter version with some additional debug information about the installed interpreter.
VM Specific options
-s
, --stack-size
: Adjust vm stack size
Adjust the stack size of the Virtual Machine. Default value is 10,0000
Debug Modes
ast-gen
: Generate AST from input file only
This mode tells the compiler to finish at the Abstract Syntax Tree stage and not produce any other kind of output.
-v
: Whilst generating AST, output a visual representation of the AST.
-d
: Run in debug mode.
ir-gen
: : Generate IR from input file only
This mode tells the compiler to finish at the IR stage and not produce any other kind of output.
-v
: Whilst generating IR, output a visual representation of the IR.
-d
: Run in debug mode.
Compiler backends
Current backend
The current backend uses a Bytecode representation of the program which will run in a Virtual machine that implements garbage collection. This is similar to Python's approach to running programs, but however as we all know, Python is incredibly terrible for performant code (unless using C bindings).
We want to move away from using a Virtual machine as the main backend and actually provide
executables that can be run on x86_64
backend using either a native (naive) backend, and
LLVM.
However, there are advantages to having a VM implementation for the language, which are primarily:
- We can have an interactive mode, execute code on the fly (with a minor performance hit)
- We can run compile-time code functions that are beyond just templates and constant folding expressions.
Planned backends
Here are the currently planned backends, that will be worked on and stabilised some time in the future:
Name | Description | Target platform | Status |
---|---|---|---|
x86_64_native | A native backend for generating executables and performing optimisations ourselves. | x86_64 | ❌ |
x86_64_llvm | An backend powered by the might of LLVM backend. | x86_64 | ❌ |
vm | Virtual machine backend able to run bytecode compiled programs. | any | ✅ |
elf64 | Backend for generating standalone ELFs for un-named host operating systems. | i386 | ❌ |
wasm | WebAssembly backend, convert hash programs into WebAssembly executables | browser/any | ❌ |
js | JS backend, generate TS/JavaScript code from the provided program. | browser/any | ❌ |
Advanced Concepts
This chapter of the book is dedicated to documenting advanced concepts for developers and contributors.
Compiler internals
This chapter is dedicated to documenting some core internal features of the compiler which are note worthy and should be examined by individuals who are interested in more than using the language but contributing to it's development.
Loop transpilation
As mentioned at the start of the loops section in the basics chapter, the loop
control flow keyword
is the most universal control flow since to you can use loop
to represent
both the for
and while
loops.
for loop transpilation
Since for
loops are used for iterators in hash, we transpile the construct into
a primitive loop. An iterator can be traversed by calling the next
function on the
iterator. Since next
returns a Option
type, we need to check if there is a value
or if it returns None
. If a value does exist, we essentially perform an assignment
to the pattern provided. If None
, the branch immediately breaks the for
loop.
A rough outline of what the transpilation process for a for
loop looks like:
For example, the for
loop can be expressed using loop
as:
for <pat> in <iterator> {
<block>
}
// converted to
loop {
match next(<iterator>) {
Some(<pat>) => <block>;
None => break;
}
}
An example of the transpilation process:
#![allow(unused)] fn main() { i := [1,2,3,5].into_iter(); for x in i { print("x is " + x); } // the same as... i := [1,2,3,5].into_iter(); loop { match next(i) { Some(x) => {print("x is " + x)}; None => break; } } }
While loop internal representation
In general, a while loop transpilation process occurs by transferring the looping
condition into a match block, which compares a boolean condition. If the boolean
condition evaluates to false
, the loop will immediately break
. Otherwise
the body expression is expected. A rough outline of what the transpilation process for a while
loop looks like:
while <condition> {
<block>
}
// converted to
loop {
match <condition> {
true => <block>;
false => break;
}
}
This is why the condition must explicitly return a boolean value.
An example of a transpilation:
And the while
loop can be written using the loop
directive
like so:
#![allow(unused)] fn main() { c := 0; loop { match c < 5 { // where 'x' is the condition for the while loop true => c += 1; false => break; } } // same as... c := 0; while c < 5 { c+=1; } }
If Statement transpilation
As mentioned at the start of the conditionals section in the basics chapter, if statements can be
represented as match
statements. This is especially advised when you have many if
branched and
more complicated branch conditions.
Internally, the compiler will convert if
statements into match cases so that it has to do
less work in the following stages of compilation.
In general, transpilation process can be represented as:
if <condition_1> {
<block_1>
} else if <condition_2> {
<block_2>
}
...
} else {
<block_n>
}
// will be converted to
match true {
_ if <condition_1> => block_1;
_ if <condition_2> => block_3;
...
_ => block_n;
}
For example, the following if
statement will be converted as follows:
#![allow(unused)] fn main() { if conditionA { print("conditionA") } else if conditionB { print("conditionB") } else { print("Neither") } // Internally, this becomes: match true { _ if conditionA => { print("conditionA") }; _ if conditionB => { print("conditionB") }; _ => { print("Neither") }; } }
However, this representation is not entirely accurate because the compiler will optimise out some components
out of the transpiled version. Redundant statements such as match true { ... }
will undergo constant folding
to produce more optimal AST representations of the program.
Missing 'else' case
If the if
statement lacks an else
clause or a default case branch, the compiler will insert one automatically
to avoid issues with pattern exhaustiveness. This behaviour is designed to mimic the control flow of classic if
statements because the else
branch will have an assigned empty expression block.
From the above example, but without the else
branch:
#![allow(unused)] fn main() { if conditionA { print("conditionA") } else if conditionB { print("conditionB") } // Internally, this becomes: match true { _ if conditionA => { print("conditionA") }; _ if conditionB => { print("conditionB") }; _ => { }; } }
Type inference
🚧 Still under construction! 🚧
Future features
This page is dedicated to documenting future planned features within the language.