Introduction

This book is dedicated to documenting and teaching the Hash programming language to newcomers.

Firstly, Hash is an interpreted, garbage collected, strongly and statically typed language.

Prerequisites about the book

Sections which have the 🚧 icon next to the title are likely to change or be re-written in the future due to the nature of the early age of the language.

Language features

This chapter is dedicated to everything you need to know about the Hash language to start writing code in it.

Name bindings

Basics

Name bindings are made of three distinct components. The name, the type and the value that is assigned to the name.

Declaration of variables happens using the : and = symbols:

x: i32 = 3;

The : symbol is used to denote the type of a variable, and the = symbol is used to assign a value to it. The type can be omitted, in which case it is inferred:

x := "Terence Tao"; // `x: str` inferred
x: str = "Terence Tao"; // same thing
x: i32 = "Terence Tao"; // Compile error: `str` is not assignable to `i32`.

Declaration and assignment can happen separately:

x: i32:
print(x); // Compile error: `x` might be uninitialised at this point!
x = 3; 
print(x); // Ok

A variable declaration is an expression like any other, and returns the value of the variable. This means that you can write something like:

while (bytes_read := read_file(&buffer)) != 0 {
  print("Read some bytes!");
}

Hash is statically typed, which means that variables cannot change types:

x := "Ha!";
x = Some("Baaa"); // Compile error: `Option<str>` is not assignable to `str`.

Mutability

By default, all bindings in Hash are constant. In order to declare a mutable variable, use the mut keyword in front of a name binding.

(mut a, mut b) := (1, 2);
mut test := get_test();

modify_test(&mut test);
a += 2;
b += 3;

Visibility

Visibility modifiers can be added to declarations inside modules or traits. By default, all declarations in a module scope or impl block are private, while all declarations in a trait block are public. To declare that a member is private, use priv, while to declare that a member is public, use pub.

// Visible from outside:
pub foo := 3;

// Not visible from outside:
bar := 4;


Foo := trait {
    // Visible from outside
    foo: (Self) -> str;

    // Not visible from outside
    priv bar: (Self) -> str;
};

Grammar

The grammar for name bindings (and partial name bindings/reassignments) is as follows:

pattern = pattern_binding | ...(other patterns)
pattern_binding = ( "pub" | "priv" )? "mut"? identifier

name_binding =
  | ( pattern ":=" expr )  // Declaration and assignment, infer type
  | ( pattern ( ":" type )? "=" expr  ) // Assignment
  | ( pattern ( ":" type )  ) // Declaration

Functions

Overview

Hash places a lot of emphasis on functions, which are first class citizens. Functions can be assigned to name bindings in the same way that any other value is assigned, similar to languages like Python lambdas, JavaScript arrow functions, etc.

General syntax and notes

Functions are defined by being assigned to bindings:

#![allow(unused)]
fn main() {
func := (...args) => { ...body... };

// With a return type
func := (...args) -> return_ty => { ...body... };
}

The return type of a function is inferred from its body by default, but the -> syntax can be used to explicitly declare it.

Function arguments are comma separated:

#![allow(unused)]
fn main() {
func := (arg0, arg1) => { ...body... };

// and you can optionally specify types:
func := (arg0: str, arg1: char) -> u32 => { ...body... };
}

Function types can be explicitly provided after the : in the declaration, in which case argument names do not have to be specified.

#![allow(unused)]
fn main() {
var: (str, char) -> u32 = (arg0, arg1) => { ... };
}

Function literals can also specify default arguments, which must come after all required arguments:

#![allow(unused)]
fn main() {
func := (arg0: str, arg1: char = 'c') -> u32 => { ... };

// The type of the argument is also inferred if provided without a type
// annotation:
func := (arg0: str, arg1 = 'c') -> u32 => { ... };
}

Default arguments do not need to be specified when the function is called:

#![allow(unused)]
fn main() {
func := (a: str, b = 'c', c = 2) -> u32 => { ... };

func("foobar"); // a = "foobar", b = 'c', c = 2

// You can optionally provide the arguments in declaration order:
func("foobar", 'b', 3);  // a = "foobar", b = 'b', c = 3
// Or you can provide them as named arguments:
func("foobar", c = 3);  // a = "foobar", b = 'c', c = 3
}

Named arguments can be used for more context when providing arguments, using the syntax arg_name = arg_value. After the first named argument is provided, all following arguments must be named. Furthermore, up until but excluding the first named argument, all previous $n$ arguments must be the first $n$ in the function definition. For example:

foo := (a: str, b: str, c: str, d: str) => { .. };

foo("a", "b", "c", "d") // Allowed -- no arguments are named.
foo(a="a", b="b", c="c", d="d") // Allowed -- all arguments are named.
foo("a", "b", c="c", d="d") // Allowed -- first two arguments are named.
foo(a="a", "b", c="c", d="d") // Not allowed -- argument b must be named if a is named.
foo("a", "b", c="c", "d") // Not allowed -- argument d must be named.

Grammar

The grammar for function definitions and function types is as follows:

function_param =
  | ( ident ":=" expr )  // Declaration and assignment, infer type
  | ( ident ( ":" type )? "=" expr  ) // Assignment
  | ( ident ( ":" type )  ) // Declaration

function_def = "(" ( function_param "," )* function_param? ")" ( "->" type )? ("=>" expr)
function_type = "(" ( function_param "," )* function_param? ")" "->" type

The grammar for function calls is as follows:

function_call_arg = expr | ( ident "=" expr )
function_call = expr "(" ( function_call_arg "," )* function_call_arg? ")"

Primitives

There are the following primitive types:

  • u8, u16, u32, u64: unsigned integers
  • i8, i16, i32, i64: signed integers
  • f32, f64 : floating point numbers
  • usize, isize: unsigned and signed pointer-sized integer types (for list indexing)
  • ibig, ubig: unlimited size integers
  • bool: boolean
  • str: string, copy on write and immutable
  • [A]: a list containing type A
  • {A:B}: a map between type A and type B
  • (A, B, C): a tuple containing types A, B and C. Elements can be accessed by dot notation (my_tuple.first)
  • (a: A, b: B, c: C) a tuple which contains named members a, b, c with types A, B, C respectively.
  • void or (): the empty tuple type. Has a corresponding void/() value.
  • never: a type which can never be inhabited. Used, for example, for functions that never return, like panic, or an infinite loop.

Note: the list, map and set type syntax will most likely have to change eventually once literal types are introduced in the language.

Numbers

Numbers in hash are like numbers in most other statically typed languages. They come in 3 variants: unsigned, signed, and floating point.

Floating point literals must include either a . or a scientific notation exponent like 3.0, 3e2, 30e-1, etc.

Host-sized integers

The primitives usize and isize are intended for list indexing. This is because some systems (which are 32-bit) may not be able to support indexing a contiguous region of memory that is larger than the 32-bit max value. So, the usize and isize primitives are host-system dependent.

Compile-time

In the future, Hash will support compile-time arbitrary code execution. Considering this, the host machine's (on which the code is compiled) usize width might differ from the target machine's (on which the code is executed) usize width. To account for this, any usize which gets calculated at compile time needs to be checked that it fits within the target usize. This check will happen at compile time, so there is no possibility of memory corruption or wrong data.

Unlimited-sized integers

The ibig and ubig number primitives are integer types that have no upper or lower bound and will grow until the host operating system memory is exhausted when storing them. These types are intended to be used when working with heavy mathematical problems which may exceed the maximum 64 bit integer size.

Lists

Lists are denoted using square bracket syntax where the values are separated by commas.

Examples:

x := [1,2,3,4,5,6]; // multiple elements
y := [];
z := [1,]; // optional trailing comma

w: [u64] = [];
// ^^^^^
//  type

Grammar for lists:

list_literal = "[" ( expr "," )* expr? "]"
list_type = "[" type "]"

Tuples

Tuples have a familiar syntax with many other languages:

  • Empty tuples: (,) or ()
  • Singleton tuple : (A,)
  • Many membered tuple: (A, B, C) or (A, B, C,)

Examples:

empty_tuple: (,) = (,);
//           ^^^
//           type

empty_tuple: () = ();
//           ^^
//          type

some_tuple: (str, u32) = ("string", 12);
//          ^^^^^^^^^^
//             type

It's worth noting that tuples are fancy syntax for structures and are indexed using numerical indices like 0, 1, 2, etc to access each member explicitly. Although, they are intended to be used mostly for pattern matching, you can access members of tuples like so. If this is the case, you should consider using a structural data type which will allow you to do the same thing, and name the fields. Read more about patterns here.

Grammar for tuples:

tuple_literal = ( "(" ( expr "," )* ")" ) | ( "(" ( expr "," )* expr ")" )

Named tuples

Named tuples are tuples that specify field names for each field within the tuple. This can be done, for example, to have nested fields in structs without having to create another struct for each sub-type. For example:

Comment := struct(
    contents: str,
    anchor: (
        start: u32,
        end: u32
    ),
    edited: bool,
    author_id: str,
);

Then, you can create a Comment instance and then access its fields like so:

comment := Comment(
    contents = "Hello, world",
    anchor = (
        start = 2,
        end = 4
    ),
    edited = false,
    author_id = "f9erf8g43"
);

print(abs(comment.anchor.start - comment.anchor.end));

To initialise a tuple that has named fields, this can be done like so:

anchor := (start := 1, end := 2); // `anchor: (start: u32, end: u32)` inferred

// This can also be done like so (but shouldn't be used):
anchor: (start: u32, end: u32) = (1, 2); // Warning: assigning unnamed tuple to named tuple.

Named tuples can be coerced into unnamed tuples if the type layout of both tuples matches. However, this is not recommended because specifically naming tuples implies that the type cares about the names of the fields rather than simply being a positionally structural type.

Sets

Sets in Hash represent unordered collections of values. The syntax for sets is as follows:

  • Empty set: {,}.
  • Singleton set : {A,}.
  • Many membered set: {A, B, C} or {A, B, C,}.
  • Set type: {A}, for example foo: {i32} = {1, 2, 3}.
set_literal = ( "{" "," "}" ) | ( "{" ( expr "," )+ "}" ) | ( "{" ( expr "," )* expr "}" )

Map

Maps in Hash represent collections of key-value pairs. Any type that implements the Eq and Hash traits can be used as the key type in a map. The syntax for maps is as follows:

  • Empty map: {:}.
  • Singleton map : {A:1} or {A:1,}.
  • Many membered map: {A: 1, B: 2, C: 3} or {A: 1, B: 2, C: 3,} .
  • Map type: {K:V}, for example names: {str:str} = {"thom":"yorke", "jonny":"greenwood"}.
map_literal = ( "{" ":" "}" ) | ( "{" ( expr ":" expr "," )* ( expr ":" expr )? "}" )

Note: the grammar for literal types can be found in the Types section.

Conditional statements

Conditional statements in the Hash programming language are very similar to other languages such as Python, Javascript, C and Rust. However, there is one subtle difference, which is that the statement provided to a conditional statement must always evaluate to an explicit boolean value.

If-else statements

If statements are very basic constructs in Hash. An example of a basic if-else statement is as follows:

#![allow(unused)]
fn main() {
// checking if the value 'a' evaluates to being 'true'
if a { print("a"); } else { print("b"); }

// using a comparison operator
if b == 2 { print("b is " + b); } else { print ("b is not " + conv(b)); }
}

Obviously, this checks if the evaluation of a returns a boolean value. If it does not evaluate to something to be considered as true, then the block expression defined at else is executed.

If you want multiple clauses, you can utilise the else-if syntax to define multiple conditional statements. To use the else-if syntax, you do so like this:

#![allow(unused)]
fn main() {
if b == 2 {
     print("b is 2")
} else if b == 3 {
     print("b is 3")
} else {
    print("b isn't 2 or 3 ")
}
}

As mentioned in the introduction, a conditional statement must evaluate an explicit boolean value. The if statement syntax will not infer a boolean value from a statement within Hash. This design feature is motivated by the fact that in many languages, common bugs and mistakes occur with the automatic inference of conditional statements.

An example of an invalid program is:

#![allow(unused)]
fn main() {
a: u8 = 12;

if a { print("a") }
}

Additional syntax

Furthermore, if you do not want an else statement you can do:

#![allow(unused)]
fn main() {
if a { print("a") }  // if a is true, then execute
}

which is syntactic sugar for:

#![allow(unused)]
fn main() {
if a { print("a") } else {}
}

Additionally, since the if statement body's are also equivalent to functional bodies, you can also specifically return a type as you would normally do within a function body:

#![allow(unused)]
fn main() {
abs: (i64: x) => i64 = if x < 0 { -x } else { x }
}

You can also assign values since if statements are just blocks

#![allow(unused)]
fn main() {
my_value: i32 = if some_condition == x { 3 } else { 5 };
}

However, you cannot do something like this:

#![allow(unused)]
fn main() {
abs: (i64: x) => i64 = if x < 0 { -x }
}

Note: here that you will not get a syntax error if you run this, but you will encounter an error during the interpretation stage of the program because the function may not have any return type since this has no definition of what should happen for the else case.

If statements and Enums 🚧

You can destruct enum values within if statements using the if-let syntax, like so:

#![allow(unused)]
fn main() {
enum Result = <T, E> => {
   Ok(T);
   Err(E);
};

// mission critical, program should exit if it failed
result: Result<u16, str> = Ok(12);

if let Ok(value) = result  { 
  print("Got '" + conv(value) + "' from operation") 
} else let Err(e) = result {
  panic("Failed to get result: " + e);
}
}

Furthermore, for more complicated conditional statements, you can include an expression block which is essentially treated as if it was a functional body, like so:

#![allow(unused)]
fn main() {
f: str = "file.txt";

if { a = open(f); is_ok(a) } {
    // run if is_ok(a) returns true
}

// the above statement can also be written as
a = open(f);

if is_ok(a) {
    // run if is_ok(a) returns true
}

}

The only difference between those two examples is that within the first, a is contained within the if statement condition body expression, i.e; the a variable will not be visible to any further scope. This has some advantages, specifically when you don't wish to store that particular result from the operation. But if you do, you can always use the second version to utilise the result of a within the if statement body or later on in the program.


Match cases

Match cases are one step above the simple if-else syntax. Using a matching case, you can construct more complicated cases in a more readable format than u can with an if-else statement. Additionally, you can destruct Enums into their corresponding values. To use a matching case, you do the following:

#![allow(unused)]

fn main() {
a := input<u8>();

m2 := match a {
  1 => "one";
  2 => "two";
  _ => "not one or two";
}

// Or as a function

convert: (x: u8) => str = (x) => match x {
  1 => "one";
  2 => "two";
  _ => "not one or two";
}

m := convert(input<u8>());
}

The _ case is a special wildcard case that captures any case. This is essentially synonymous with the else clause in many other languages like Python or JavaScript. For conventional purposes, it should be included when creating a match statement where the type value is not reasonably bounded (like an integer). One subtle difference with the match syntax is you must always explicitly define a _ case. This language behaviour is designed to enforce that explicit is better than implicit. So, if you know that a program should never hit the default case:

#![allow(unused)]
fn main() {
match x {
  1 => "one";
  2 => "two";
  _ => unreachable(); // we know that 'x' should never be 1 or 2.
}
}

Note: You do not have to provide a default case if you have defined all the cases for a type (this mainly applies to enums).

Additionally, because cases are matched incrementally, by doing the following:

#![allow(unused)]
fn main() {
convert: (x: u8) => str = (x) => match x {
  _ => "not one or two";
  1 => "one";
  2 => "two";
}
}

The value of m will always evaluate as "not one or two" since the wildcard matches any condition.

Match statements are also really good for destructing enum types in Hash. For example,

#![allow(unused)]
fn main() {
enum Result = <T, E> => {
   Ok(T);
   Err(E);
};

...

// mission critical, program should exit if it failed
result: Result<u16, str> = Ok(12);

match result {
  Ok(value) => print("Got '" + conv(value) + "' from operation");
  Err(e)    => panic("Failed to get result: " + e);
}
}

To specify multiple conditions for a single case within a match statement, you can do so by writing the following syntax:

#![allow(unused)]
fn main() {
x: u32 = input<u32>();

match x {
  1 | 2 | 3       => print("x is 1, 2, or 3");
  4 | 5 | {2 | 4} => print("x is either 4, 5 or 6"); // using bitwise or operator
  _               => print("x is something else");
}
}

To specify more complex conditional statements like and within the match case, you can do so using the match-if syntax, like so:

#![allow(unused)]
fn main() {
x: u32 = input<u32>();
y: bool = true;

match x {
  1 | 2 | 3 if y => print("x is 1, 2, or 3 when y is true");
  {4 if y} | y   => print("x is 4 and y is true, or  x is equal to y"); // using bitwise or operator
  {2 | 4 if y}   => print("x is 6 and y is true");
  _              => print("x is something else");
}
}

Loop constructs

Hash contains 3 distinct loop control constructs: for, while and loop. Each construct has a distinct usage case, but they can often be used interchangeably without hassle and are merely a style choice.

General

Each construct supports the basic break and continue loop control flow statements. These statements have the same properties as in many other languages like C, Rust, Python etc.

break - Using this control flow statements immediately terminates the loop and continues to any statement after the loop (if any).

continue - Using this control flow statement will immediately skip the current iteration of the loop body and move on to the next iteration (if any). Obviously, if no iterations remain, continue behaves just like break.

For loop

Basics

For loops are special loop control statements that are designed to be used with iterators.

For loops can be defined as:

#![allow(unused)]
fn main() {
for i in range(1, 10) { // range is a built in iterator
    print(i);
}
}

Iterating over lists is also quite simple using the iter function to convert the list into an iterator:

#![allow(unused)]
fn main() {
nums: [u32] = [1,2,3,4,5,6,7,8,9,10]; 

// infix functional notation
for num in nums.iter() {
    print(num);
}

// using the postfix functional notation
for num in iter(nums) {
    print(nums);
}
}

iterators

Iterators ship with the standard library, but you can define your own iterators via the Hash generic typing system.

An iterator I of T it means to have an implementation next<I, T> in scope the current scope. So, for the example above, the range function is essentially a RangeIterator of the u8, u16, u32, ... types.

More details about generics are here.

While loop

Basics

While loops are identical to 'while' constructs in other languages such as Java, C, JavaScript, etc. The loop will check a given conditional expression ( must evaluate to a bool), and if it evaluates to true, the loop body is executed, otherwise the interpreter moves on. The loop body can also use loop control flow statements like break or continue to prematurely stop looping for a given condition.

While loops can be defined as:

#![allow(unused)]

fn main() {
c: u32 = 0;

while c < 10 {
    print(i);
}
}

The loop keyword is equivalent of someone writing a while loop that has a conditional expression that always evaluate to true; like so,

#![allow(unused)]
fn main() {
while true {
    // do something
}

// is the same as...

loop {
    // do something
}
}

Note: In Hash, you cannot write do-while loops, but if u want to write a loop that behaves like a do-while statement, here is a good example using loop:

#![allow(unused)]

fn main() {
loop {
    // do something here, to enable a condition check,
    // and then use if statement, or match case to check
    // if you need to break out of the loop.

    if !condition {break}
}    
}

Expression blocks and behaviour

Furthermore, The looping condition can also be represented as a block which means it can have any number of expressions before the final expression. For example:

#![allow(unused)]
fn main() {
while {c += 1; c < 10} {
    print(i);
}
}

It is worth noting that the looping expression whether block or not must explicitly have the boolean return type. For example, the code below will fail typechecking:

#![allow(unused)]
fn main() {
c: u32 = 100;

while c -= 1 {
    ...
}
}

Running the following code snippet produces the following error:

error[0052]: Failed to Typecheck: Mismatching types.
 --> 3:7 - 3:12
1 | c: u32 = 100;
2 |
3 | while c -= 1 {
  |       ^^^^^^  Expression does not have a 'boolean' type 
  |
  = note: The type of the expression was `(,)` but expected an explicit `boolean`.

Loop

The loop construct is the simplest of the three. The basic syntax for a loop is as follows:

#![allow(unused)]

fn main() {
c: u64 = 1;

loop {
    print("I looped " + c + " times!");
    c += 1;
}
}

You can also use conditional statements within the loop body (which is equivalent to a function body) like so:

#![allow(unused)]

fn main() {
c: u64 = 1;

loop {
    if c == 10 { break }

    print("I looped " + c + " times!");
    c += 1;
} // this will loop 10 times, and print all 10 times
}
#![allow(unused)]
fn main() {
c: u64 = 1;

loop {
    c += 1;
    if c % 2 != 0 { continue };

    print("I loop and I print when I get a  " + c);
} // this will loop 10 times, and print only when c is even
}

Operators & Symbols

This section contains all of the syntactic operators that are available within Hash

General operators 🚧

Here are the general operators for arithmetic, bitwise assignment operators. This table does not include all of the possible operators specified within the grammar. There are more operators that are related to a specific group of operations or are used to convey meaning within the language.

OperatorExampleDescriptionOverloadable trait
==, !=a == 2, b != 'a'Equalityeq
=a = 2AssignmentN/A
!!aLogical notnot
&&a && bLogical andand
||a || bLogical oror
+2 + 2, 3 + bAdditionadd
-3 - aSubtractionsub
--2Negationneg
*3 * 2, 2 * cMultiplicationmul
^^3 ^^ 2, 3 ^^ 2.3Exponentiationexp
/4 / 2, a / bDivisiondiv
%a % 1Modulomod
<<4 << 1Bitwise left shiftshl
>>8 >> 1Bitwise right shiftshr
&5 & 4, a & 2Bitwise andandb
|a | 2Bitwise ororb
^3 ^ 2Bitwise exclusive orxorb
~~2Bitwise notnotb
>=, <=, <, >2 < b, c >= 3Order comparisonord
+=x += yAdd with assignmentadd_eq
-=x -= 1Subtract with assignmentsub_eq
*=b *= 10Multiply with assignmentmul_eq
/=b /= 2Divide with assignmentdiv_eq
%=a %= 3Modulo with assignmentmod_eq
&&=b &&= cLogical and with assignmentand_eq
>>=b >>= 3Bitwise right shift equalityshr_eq
<<=b <<= 1Bitwise left shift equalityshl_eq
||=b ||= cLogical or with assignmentor_eq
&=a &= bBitwise and with assignmentandb
|=b |= SOME_CONSTBitwise or with assignmentorb
^=a ^= 1Bitwise xor with assignmentxorb
.a.fooStruct/Tuple enum property accessorN/A
:{2: 'a'}Map key-value separatorN/A
::io::open()Namespace symbol accessN/A
ast as strType assertionN/A
@N/APattern value bindingN/A
...N/ASpread operator (Not-implemented)range?
;expression;statement terminatorN/A
?k<T> where s<T, ?> := ...Type argument wildcardN/A
->(str) -> usizeFunction return type notationN/A
=>(a) => a + 2Function Body definitionN/A

Comments 🚧

This table represents the syntax for different types of comments in Hash:

SymbolDescription
//...Line comment
/*...*/Block comment
///function doc comment 🚧
//!module doc comment 🚧

Type Signature Assertions

Basics

As in many other languages, the programmer can specify the type of a variable or a literal by using some special syntax. For example, in languages such as typescript, you can say that:

#![allow(unused)]
fn main() {
some_value as str
}

which implies that you are asserting that some_value is a string, this is essentially a way to avoid explicitly stating that type of a variable every single time and telling the compiler "Trust me some_value is a string".

The principle is somewhat similar in Hash, but it is more strictly enforced. For example, within the statement x := 37;, the type of x can be any of the integer types. This might lead to unexpected behaviour in future statements, where the compiler has decided the type of x (it might not be what you intended it).

So, you can either declare x to be some integer type explicitly like so:

x: u32 = 37;

Or you can, use as to imply a type for a variable, which the compiler will assume to be true, like so:

x := 37 as u32;

Failing type assertions

If you specify a type assertion, the compiler will either attempt to infer this information from the left-hand side of the as operator to the right. If the inference results in a different type to the right-hand side, this will raise a typechecking failure.

For example, if you were to specify the expression:

#![allow(unused)]
fn main() {
"A" as char
}

The compiler will report this error as:

error[0001]: Types mismatch, got a `str`, but wanted a `char`.
 --> <interactive>:1:8
1 |   "A" as char
  |          ^^^^ This specifies that the expression should be of type `char`

 --> <interactive>:1:1
1 |   "A" as char
  |   ^^^ Found this to be of type `str`

Usefulness

Why are type assertions when there is already type inference within the language? Well, sometimes the type inference system does not have enough information to infer the types of variables and declarations. Type inference may not have enough information when dealing with functions that are generic, so it can sometimes be useful to assert to the compiler that a given variable is a certain type.

Additionally, whilst the language is in an early stage of maturity and some things that are quirky or broken, type assertions can come to the rescue and help the compiler to understand your program.

In general, type assertions should be used when the compiler cannot infer the type of some expression with the given information and needs assistance. You shouldn't need to use type assertions often.

Types

Grammar

type =
  | tuple_type
  | list_type
  | set_type
  | map_type
  | grouped_type
  | named_type
  | function_type
  | type_function_call
  | type_function
  | merge_type
  | union_type
  | ref_type

tuple_type = ( "(" ( type "," )* ")" ) | ( "(" ( type "," )+ type ")" )

list_type = "[" type "]"

map_type = "{" type ":" type "}"

set_type = "{" type "}"

grouped_type = "(" type ")"

named_type = access_name

function_type_param = type | ( ident ":" type )

function_type = "(" ( function_type_param "," )* function_type_param? ")" "->" type

type_function_call_arg = type | ( ident "=" type )
type_function_call = ( grouped_type | named_type ) "<" ( type_function_call_arg "," )* type_function_call_arg? ">"

type_function_param = ident ( ":" type )? ( "=" type )?

type_function = "<" ( type_function_param "," )* type_function_param? ">" "->" type

merge_type = ( type "~" )+ type
union_type = ( type "|" )+ type

ref_type = "&" ( "raw" )? ( "mut" )? type

Struct types

In Hash, structs are pre-defined collections of heterogeneous types, similar to C or Rust:

#![allow(unused)]
fn main() {
FloatVector3 := struct(
   x: f32,
   y: f32,
   z: f32,
);
}

A struct is comprised of a set of fields. Each field has a name, a type, and an optional default value.

Structs can be instantiated with specific values for each of the fields. Default values can be omitted, but can also be overridden.

#![allow(unused)]
fn main() {
Dog := struct(
   age: u32 = 42,
   name: str,
);

d := Dog(name = "Bob");

print(d); // Dog(name = "Bob", age = 42)
}

Structs are nominal types. An argument of type Dog can only be fulfilled by an instance of Dog, and you can't pass in a struct that has the same fields but is of a different named type.

#![allow(unused)]
fn main() {
dog_name := (dog: Dog) => dog.name;

FakeDog := struct(
   age: u32 = 42,
   name: str,
);

print(dog_name(d)); // "Bob"
print(dog_name(FakeDog(age = 1, name = "Max"))); // Error: Type mismatch: was expecting `Dog`, got `FakeDog`.
}

Enum types

Hash enums are similar to Rust enums or Haskell data types. Each variant of an enum can also hold some data. These are also known as algebraic data types, or tagged unions.

#![allow(unused)]
fn main() {
NetworkError := enum(
   NoBytesReceived,
   ConnectionTerminated,
   Unexpected(message: str, code: i32),
);
}

Enum contents consist of a semicolon-separated list of variant names. Each variant can be paired with some data, in the form of a comma-separated list of types.

#![allow(unused)]
fn main() {
err := NetworkError::Unexpected("something went terribly wrong", 32);
}

They can be matched to discover what they contain:

#![allow(unused)]
fn main() {
handle_error := (error: NetworkError) => match error {
   NoBytesReceived => print("No bytes received, stopping");
   ConnectionTerminated => print("Connection was terminated");
   Unexpected(message, code) => print("An unexpected error occurred: " + err + " (" + conv(code) + ") ");
};
}

Like structs, enums are nominal types, rather than structural. Each enum member is essentially a struct type.

Generic types

Because Hash supports type functions, structs and enums can be generic over some type parameters:

#![allow(unused)]
fn main() {
LinkedList := <T> => struct(
   head: Option<&raw T>,
);

empty_linked_list = <T> => () -> LinkedList<T> => {
   LinkedList(head = None)
};

x := empty_linked_list<i32>(); // x: LinkedList<i32> inferred
}

Notice that struct(...) and enum(...) are expressions, which are bound to names on the left hand side. For more information, see type functions.

Grammar

The grammar for struct definitions is as follows:

struct_member =
  | ( ident ":=" expr )  // Declaration and assignment, infer type
  | ( ident ( ":" type )? "=" expr  ) // Assignment
  | ( ident ( ":" type )  ) // Declaration

struct_def := "struct" "(" struct_member* ")"

The grammar for enum definitions is as follows:

enum_member =
  | ident // No fields
  | ident "(" struct_member* ")" // With fields

enum_def := "enum" "(" enum_member* ")"

Hash language modules

A module in Hash can contain variable definitions, function definitions, type definitions or include other modules. Each .hash source file is a module, and inline modules can also be created using mod blocks.

Importing

Given the project structure:

.
├── lib
│   ├── a.hash
│   ├── b.hash
│   └── sub
│       └── c.hash
└── main.hash

Modules in hash allow for a source to be split up into smaller code fragments, allowing for better source code organisation and maintenance.

You can import modules by specifying the path relative to the current path.

For example, if you wanted to include the modules a, b, and or c within your main file

#![allow(unused)]
fn main() {
// main.hash
a := import("lib/a");
b := import("lib/b");
c := import("lib/sub/c");
}

By doing so, you are placing everything that is defined within each of those modules under the namespace.

Exporting

In order to export items from a module, use the pub keyword. For example:

#![allow(unused)]
fn main() {
/// a.hash

// Visible from outside:
pub a := 1;

// Not visible from outside (priv by default):
b := 1;

// Not visible from outside:
priv c := 1;

/// b.hash
{ a } := import("a.hash"); // Ok
{ b } := import("a.hash"); // Error: b is private
{ c } := import("a.hash"); // Error: c is private.
}

Referencing exports

Furthermore, if the a module contained a public structure definition like Point:

#![allow(unused)]
fn main() {
// a.hash
pub Point := struct(
    x: u32,
    y: u32,
);
}

Within main, you can create a new Point by doing the following

#![allow(unused)]
fn main() {
// main.hash
a := import("lib/a");

p1 := a::Point(
    x = 2,
    y = 3,
);

print(p1.x); // 2
print(p1.y); // 3
}

From this example, the :: item access operator is used to reference any exports from the module.

Furthermore, what if you wanted to import only a specific definition within a module such as the 'Point' structure from the module a.

You can do so by destructuring the definitions into using the syntax as follows:

#![allow(unused)]
fn main() {
{ Point } := import("lib/a");

p1 := Point(x=2, y=3);
}

In case you have a member of your current module already reserving a name, you can rename the exported members to your liking using the as pattern operator:

#![allow(unused)]
fn main() {
{ Point as LibPoint } = import("lib/a");

p1 := LibPoint(x=2, y=3);
}

Inline modules

Other than through .hash files, modules can be created inline using mod blocks:

#![allow(unused)]
fn main() {
// a.hash
bar := 3;

pub nested := mod {
    pub Colour := enum(Red, Green, Blue);
};

// b.hash
a := import("a.hash");
red := a::nested::Colour::Red;
}

These follow the same conventions as .hash files, and members need to be exported with pub in order to be visible from the outside. However, the external module items are always visible from within a mod block, so in the above example, bar can be used from within nested.

Grammar

The grammar for file modules is as follows:

file_module = ( expr ";" )*

The grammar for mod blocks (which are expressions) is as follows:

mod_block = "mod" "{" ( expr ";" )* "}"

Patterns

Pattern matching is a very big part of Hash and the productivity of the language. Patterns are a declarative form of equality checking, similar to patterns in Rust or Haskell.

Pattern matching within match statements is more detailed within the Conditional statements section of the book. This chapter is dedicated to documenting the various kinds of patterns that there are in Hash.

Literal patterns

Literal patterns are patterns that match a specific value of a primitive type, like a number or a string. For example, consider the following snippet of code:

#![allow(unused)]
fn main() {
foo := get_foo(); // foo: i32

match foo {
    1 => print("Got one");
    2 => print("Got two");
    3 => print("Got three");
    _ => print("Got something else");
}
}

On the left-hand side of the match cases there are the literal patterns 1, 2 and 3. These perform foo == 1, foo == 2 and foo == 3 in sequence, and the code follows the branch which succeeds first. If no branch succeeds, the _ branch is followed, which means "match anything". Literals can be integer literals for integer types (signed or unsigned), string literals for the str type, or character literals for the char type:

#![allow(unused)]
fn main() {
match my_char {
    'A' => print("First letter");
    'B' => print("Second letter");
    x => print("Letter is: " + conv(x));
}

match my_str {
    "fizz" => print("Multiple of 3");
    "buzz" => print("Multiple of 5");
    "fizzbuzz" => print("Multiple of 15");
    _ => print("Not a multiple of 3 or 5");
}
}

Binding patterns

Nested values within the value being pattern matched can be bound to symbols, using binding patterns. A binding pattern is any valid Hash identifier:

#![allow(unused)]
fn main() {
match fallible_operation() { // fallible_operation: () -> Result<f32, i32>
    Ok(success) => print("Got success " + conv(result)); // success: f32
    Err(failure) => print("Got failure " + conv(failure)); // failure: i32
}
}

Tuple patterns

Tuple patterns match a tuple type of some given arity, and contain nested patterns. They are irrefutable if their inner patterns are irrefutable, so they can be used in declarations.

#![allow(unused)]
fn main() {
Cat := struct(name: str);

// Creating a tuple:
my_val := (Cat("Bob"), [1, 2, 3]); // my_val: (Cat, [i32])

// Tuple pattern:
(Cat(name), elements) := my_val;

assert(name == "Bob");
assert(elements == [1, 2, 3]);
}

Constructor patterns

Constructor patterns are used to match the members of structs or enum variants. A struct is comprised of a single constructor, while an enum might be comprised of multiple constructors. Struct constructors are irrefutable if their inner patterns are irrefutable, while enum constructors are irrefutable only if the enum contains a single variant. For example:

#![allow(unused)]
fn main() {
Option := <T> => enum(Some(value: T), None);

my_val := Some("haha");

match my_val {
    // Matching the Some(..) constructor
    Some(inner) => assert(inner == "haha"); // inner: str
    // Matching the None constructor
    None => assert(false);
}
}

The names of the members of a constructor need to be specified if the matching isn't done in order:

#![allow(unused)]
fn main() {
Dog := struct(name: str, breed: str);

Dog(breed = dog_breed, name = dog_name) = Dog(
    name = "Bob",
    breed = "Husky"
) // dog_breed: str, dog_name: str

// Same as:
Dog(name, breed) = Dog(
    name = "Bob",
    breed = "Husky"
) // breed: str, name: str
}

List patterns

A list pattern can match elements at certain positions of a list by using the following syntax:

#![allow(unused)]
fn main() {
match arr {
    [a, b] => print(conv(a) + " " + conv(b));
    _ => print("Other"); // Matches everything other than [X, Y] for some X and Y
}
}

The ... spread operator can be used to capture or ignore the rest of the elements of the list at some position:

#![allow(unused)]
fn main() {
match arr {
    [a, b, ...] => print(conv(a) + " " + conv(b));
    _ => print("Other"); // Only matches [] and [X] for some X
}
}

If you want to match the remaining elements with some pattern, you can specify a pattern after the spread operator like so:

#![allow(unused)]
fn main() {
match arr {
    [a, b, ...rest] => print(conv(a) + " " + conv(b) + " " + conv(rest));
    [...rest, c] => print(conv(c)); // Only matches [X] for some X, rest is always []
    _ => print("Other"); // Only matches []
}
}

One obvious limitation of the spread operator is that you can only use it once in the list pattern. For example, the following pattern will be reported as an error by the compiler:

#![allow(unused)]
fn main() {
[..., a, ...] := arr;
}
error: Failed to typecheck:
 --> 1:6 - 1:9, 1:15 - 1:18
  |
1 | [..., a, ...] := arr;
  |  ^^^     ^^^
  |
  = You cannot use multiple spread operators within a single list pattern.

Module patterns

Module patterns are used to match members of a module. They are used when importing symbols from other modules. They follow a simple syntax:

#![allow(unused)]
fn main() {
// imports only a and b from the module
{a, b} := import("./my_lib");

// imports c as my_c, and d from the module.
{c as my_c, d} := import("./other_lib"); 

// imports Cat from the nested module as NestedCat
{Cat as NestedCat} := mod {
    pub Cat := struct(name: str, age: i32);
};
}

You do not need to list all the members of a module in the pattern; the members which are not listed will be ignored. To read more about modules, you can click here.

Or-patterns

Or-patterns are specified using the | pattern operator, and allow one to match multiple different patterns, and use the one which succeeds. For example:

#![allow(unused)]
fn main() {
symmetric_result: Result<str, str> := Ok("bilbobaggins");

(Ok(inner) | Err(inner)) := symmetric_result; // inner: str
}

The pattern above is irrefutable because it matches all variants of the Result enum. Furthermore, each branch has the binding inner, which always has the type str, and so is a valid pattern. The same name binding can appear in multiple branches of an or-pattern, given that it is bound in every branch, and always to the same type. Another use-case of or-patterns is to collapse match cases:

#![allow(unused)]
fn main() {
match color {
    Red | Blue | Green => print("Primary additive");
    Cyan | Magenta | Yellow => print("Primary subtractive");
    _ => print("Unimportant color");
}
}

Conditional patterns

Conditional patterns allow one to specify further arbitrary boolean conditions to a pattern for it to match:

#![allow(unused)]
fn main() {
match my_result {
    Ok(inner) if inner > threshold * 2.0 => {
        print("Phew, above twice the threshold");
    };
    Ok(inner) if inner > threshold => {
        print("Phew, above the threshold but cutting it close!");
    };
    Ok(inner) => {
        print("The result was successful but the value was below the threshold");
    };
    Err(_) => {
        print("The result was unsuccessful... Commencing auto-destruct sequence.");
        auto_destruct();
    };
}
}

They are specified using the if keyword after a pattern. Conditional patterns are always refutable, at least as far as the current version of the language is concerned. With more advanced type refinement and literal types, this restriction can be lifted sometimes.

Pattern grouping

Patterns can be grouped using parentheses (). This is necessary in declarations for example, if one wants to specify a conditional pattern:

#![allow(unused)]
fn main() {
// get_value: () -> bool;
true | false := get_value(); // Error: bitwise-or not implemented between `bool` and `void`
(true | false) := get_value(); // Ok
}

Grammar

The grammar for patterns is as follows:

pattern = 
    | single_pattern
    | or_pattern

single_pattern =
    | binding_pattern
    | constructor_pattern
    | tuple_pattern
    | module_pattern
    | literal_pattern
    | list_pattern

or_pattern = ( single_pattern "|" )+ single_pattern

binding_pattern = identifier

tuple_pattern_member = identifier | ( identifier "=" single_pattern )

constructor_pattern = access_name ( "(" ( tuple_pattern_member "," )* tuple_pattern_member? ")" )?

tuple_pattern = 
    | ( "(" ( tuple_pattern_member "," )+ tuple_pattern_member? ")" ) 
    | ( "(" tuple_pattern_member "," ")" )

module_pattern_member = identifier ( "as" single_pattern )?

module_pattern = "{" ( module_pattern_member "," )* module_pattern_member? "}"

literal_pattern = integer_literal | string_literal | character_literal | float_literal

list_pattern_member = pattern | ( "..." identifier? )

list_pattern = "[" ( list_pattern_member "," )* list_pattern_member? "]"

Traits and implementations

Traits

Hash supports compile-time polymorphism through traits. Traits are a core mechanism in Hash; they allow for different implementations of the same set of operations, for different types. They are similar to traits in Rust, type-classes in Haskell, and protocols in Swift. For example:

Printable := trait {
  print: (Self) -> void;
};

The above declares a trait called Printable which has a single associated function print. The special type Self denotes the type for which the trait is implemented.

Traits can be implemented for types in the following way:

Dog := struct(
  name: str,
  age: i32,
);

Dog ~= Printable {
  // `Self = Dog` inferred
  print = (self) => io::printf(f"Doge with name {self.name} and age {self.age}");
};

Now a Dog is assignable to any type that has bound Printable.

The ~= operator is the combination of ~ and = operators, and it is equivalent to

Dog = Dog ~ Printable { ... };

The ~ operator means "attach", and it is used to attach implementations of traits to structs and enums.

Trait implementations can be created without having to attach them to a specific type:

DogPrintable := Printable {
  Self = Dog, // Self can no longer be inferred, it needs to be explicitly specified.
  print = (self) => io.printf(f"Doge with name {self.name} and age {self.age}");
};

doge := Dog(..);
DogPrintable::print(doge); // Trait implementations can be called explicitly like this

Dog ~= DogPrintable; // `DogPrintable` can be attached to `Dog` as long as `DogPrintable::Self = Dog`.

// Then you can also do this, and it will be resolved to `DogPrintable::print(doge)`:
doge.print();

Traits can also be generic over other types:

Sequence := <T> => trait {
  at: (self, index: usize) -> Option<T>;
  slice: (self, start: usize, end: usize) -> Self;
};

List := <T> => struct(...);

// For List (of type `<T: type> -> type`) implement Sequence (of type `<T: type> -> trait`):
// This will be implemented for all `T`.
List ~= Sequence; 

Notice that in addition to traits, type functions returning traits can also be implemented for other type functions returning types. This is possible as long as both functions on the left hand side and right hand side match:

SomeTrait := <T> => trait {
  Self: type; // Restrict what `Self` can be
  ...
};

// Allowed: `<T: type> -> trait` attachable to `<T: type> -> type`.
(<T> => SomeType) ~= (<T> => SomeTrait<T> {...});

// Not allowed: `<T: type> -> trait` is not attachable to `type`
SomeType ~= (<T> => SomeTrait<T> {...});

// Not allowed: `trait` is not attachable to `<T: type> -> type` because
// `SomeTraitImpl::Self` has type `type` and not `<T: type> -> type`.
SomeType ~= (<T> => SomeTrait<T> {...});

Furthermore, traits do not need to have a self type:

Convert := <I, O> => trait {
  convert: (I) -> O;
};

ConvertDogeToGatos := Convert<Doge, Gatos> {
  convert = (doge) => perform_transformation_from_doge_to_gatos(doge);
};

doggo := Doge(...);
kitty := ConvertDogeToGatos::convert(doggo);

Traits can also be used as bounds on type parameters:

print_things_if_eq := <Thing: Printable ~ Eq> => (thing1: Thing, Thing2: thing) => {
  if thing1 == thing2 {
    print(thing1);
    print(thing2);
  }
};

Here, Thing must implement Printable and Eq. Notice the same attachment syntax (~) for multiple trait bounds, just as for attaching trait implementations to types.

Traits are monomorphised at runtime, and thus are completely erased. Therefore, there is no additional runtime overhead to structuring your code using lots of traits/generics and polymorphism, vs using plain old functions without any generics. There is, however, additional compile-time cost to very complicated trait hierarchies and trait bounds.

Implementations

Implementations can be attached to types without having to implement a specific trait, using impl blocks. These are equivalent to trait implementation blocks, but do not correspond to any trait, and just attach the given items to the type as associated items. Example:

Vector3 := <T> => struct(x: T, y: T, z: T);

Vector3 ~= <T: Mul ~ Sub> => impl {
  // Cross is an associated function on `Vector3<T>` for any `T: Mul ~ Sub`.
  cross := (self, other: Self) -> Self => {
      Vector3(
        self.y * other.z - self.z * other.y,
        self.z * other.x - self.x * other.z,
        self.x * other.y - self.y * other.x,
      )
  };
};

print(Vector3(1, 2, 3).cross(Vector3(4, 5, 6)));

By default, members of impl blocks are public, but priv can be written to make them private.

Grammar

The grammar for trait definitions is as follows:

trait_def = "trait" "{" ( expr ";" )* "}"

The grammar for trait implementations is as follows:

trait_impl = ident "{" ( expr ";" )* "}"

The grammar for standalone impl blocks is as follows:

impl_block = "impl" "{" ( expr ";" )* "}"

Type functions

Hash supports functions both at the value level and at the type level. Type-level functions correspond to generics in other languages. They are declared using angular brackets (< and >) rather than parentheses, and all parameters are types. Other than that, they have the same syntax as normal (value-level) functions.

Type-level functions can be used to create generic structs, enums, functions, and traits. For example, the generic Result<T, E> type would is defined as

Result := <T, E> => enum(
  Ok(T),
  Err(E),
);

This declares that Result is a function of kind <T: type, E: type> -> type. The default bound on each type parameter is type, but it can be any trait (or traits) as well. Multiple trait bounds can be specified using the ~ binary operator. For example,

Result := <T: Clone ~ Eq, E: Error ~ Print> => enum(
  Ok(T),
  Err(E),
);

Here, T must implement Clone and Eq, and E must implement Error and Print.

In order to evaluate type functions, type arguments can be specified in angle brackets:

my_result: Result<i32, str> = Ok(3);

When calling functions or instantiating enums/structs, type arguments can be inferred so that you don't have to specify them manually:

RefCounted := <Inner: Sized> => struct(
  ptr: &raw Inner,
  references: usize
);

make_ref_counted := <Inner: Sized> => (value: Inner) -> RefCounted<Inner> => {
  data_ptr := allocate_bytes_for<Inner>();

  RefCounted( // Type argument `Inner` inferred
    ptr = data_ptr,
    references = 1,
  )
};

my_ref_counted_string = make_ref_counted("Bilbo bing bong"); // `Inner = str` inferred

In order to explicitly infer specific arguments, you can use the _ sigil:

Convert := <I, O> => trait {
  convert: (input: I) -> O;
};

// ...implementations of convert

x := 3.convert<_, str>(); // `I = i32` inferred, `O = str` given.
x := 3.convert<I = _, O = str>(); // same thing.
x: str = 3.convert(); // same thing.

Type functions can only return types or functions; they cannot return values (though this is planned eventually). This means that you cannot write

land_with := <T> => land_on_moon_with<T>();
signal := land<Rover>;

but you can write

land_with := <T> => () => land_on_moon_with<T>();
signal := land<Rover>();

Just like with value-level functions, type-level functions can be provided with named arguments rather than positional arguments. These are subject to the same rules as value-level functions:

make_repository := <
  Create, Read,
  Update, Delete
> => () -> Repository<Create, Read, Update, Delete> => {
  ...
};

repo := make_repository<
  Create = DogCreate,
  Read = DogRead,
  Update = DogUpdate,
  Delete = DogDelete
>();

Finally, type-level function parameters can be given default arguments, which will be used if the arguments cannot be inferred from context and aren't specified explicitly:

Vec := <T, Allocator: Alloc = GlobalAllocator> => struct(
  data: RawRefInAllocator<T, Allocator>,
  length: usize,
);

make_vec := <T> => () -> Vec<T> => { ... }; // `Allocator = GlobalAllocator` inferred
make_vec_with_alloc := <T, Allocator: Alloc> => (allocator: Allocator) -> Vec<T, Allocator> => { ... };

x := make_vec<str>(); // `Allocator = GlobalAllocator` inferred
y := make_vec_with_alloc<str, _>(slab_allocator); // `Allocator = SlabAllocator` inferred

Grammar

The grammar for type function definitions and type function types can be found in the Types section.

Memory

Still under construction.

Macros

This section describes the syntax for macros in Hash. Macros are a way to write code that writes other code. There are two kind of macro invocations: one macro works on AST items, and the other works on tokens.

AST macros

AST-level macros are written with the syntax #macro_name <subject> or #[macro_name(macro_arg)] <subject>. The first form is a used as a shorthand for macros that don't have any additional arguments to the macro itself.

For example, the #dump_ast macro will accept any AST item as the subject and print the parsed AST to the console.

#![allow(unused)]
fn main() {
dump_ast main := () => {
    println("Hello, world!");
}
}

An example of an AST macro being used to set some attributes on a function:

#![allow(unused)]
fn main() {
#[attr(foreign(c), no_mangle, link_name = "jpeg_read_header")]
jpeg_read_header := (&raw cinfo, bool require_image) -> i32;
}

Token macros

Token macros follow a similar syntax to AST macros, but instead of working on AST items, they work on tokens. The syntax for token macros is @macro_name <subject> or @[macro_name(macro_arg)] <subject>. The first form is a used as a shorthand for token macros that have no arguments. However, one significant difference between token macros and AST macros is that the token macro only accepts a token tree as the subject. A token tree is a sequence of tokens that are enclosed in a pair of delimiters. Token trees are either [...], {...} or (...). It is then up to the macro to define various rules for accepting the token tree:

An example of using min and max macros:

#![allow(unused)]
fn main() {
main := () => {
    min := @min {1 + 2, 3 * 4, 7 - 6 + 1 };
    max := @max {1 + 2, 3 * 4, 7 - 6 + 1 };

    if max - min == 0 {
        println("min and max are equal")
    } else {
        println("min and max are not equal")
    }
}
}

Another example of macro with a token tree for HTML:

welcome := () => {
    @[xml(variant=html)] {
        <html>
            <head>
                <title>My page</title>
            </head>
            <body>
                <h1>Hello, world!</h1>
            </body>
        </html>
    }
}

Defining a macro 🚧

This section hasn't been defined yet, and is still a work in progress.

Macro Rules 🚧

Macro invocation locations

Both styles of macro invocations can appear in the following positions:

  • Expr
  • Type
  • Pat
  • Param
  • Arg
  • TypeArg
  • PatArg
  • MatchCase
  • EnumVariant

Here is an example in code of all of the possible positions where a macro invocation can appear:

#![allow(unused)]
#![module_attributes]

fn main() {
dump_ast
Foo := struct<#dump_ast T>(
    dump_ast x: T,
    dump_ast y: T,
    dump_ast z: T,
);

Bar := enum<T>(
    dump_ast A(T),
    dump_ast B(T),
    dump_ast C(T),
);

bing := (#param x: i32) -> #ty i32 => {
    foo := Foo(#arg x = 5, #arg y = 6, #arg z = 7);

    match x {
        dump_ast 0 => 0,
        dump_ast 1 => 1,
        dump_ast (#dump_ast _) => bing(x - 1) + bing(x - 2),
    }
}
}

Grammar

Formally, the macro syntax invocation can be written as follows:

token_macro_invocation ::= "@" ( macro_name | macro_args ) token_tree;

token_tree ::= "{" any "}" 
             | "[" any "]" 
             | "(" any ")";

ast_macro_invocation ::= '#' (macro_name | macro_args ) macro_subject;

module_macro_invocation ::= "#!" macro_args;

macro_subject ::= expr 
                  | type 
                  | pat 
                  | param 
                  | arg 
                  | type_arg
                  | pat_arg
                  | match_case 
                  | enum_variant;  

macro_args ::= "[" ( ∅ | macro_invocation ("," macro_invocation)* ","? ) "]";

macro_invocation ::= macro_name ( "("  ∅ | expr ("," expr )* ","?  ")" )?;

macro_name ::= access_name;

Standard library

Current modules

The standard library included the following modules:

  • math: Mathematical functions, constants and over useful numerical methods.
  • io: File handling and IO module
  • iter: Iterators module
  • list: Useful list functions including sorting, manipulation and transformations

Future expansions 🚧

These modules are currently under construction or proposed:

  • time: Useful constructs for time orientated data
  • sys: System information about the host OS
  • path: Path utilities

Interpreter

This chapter is dedicated to documenting the current interpreter implementation, future plans and a very basic manual for how to use the interpreter (via commandline arguments).

Interpretor command-line arguments

The Hash interpreter has a number of options that you can enable when running an instance of a VM. This page documents options and configurations that you can change when running a Hash interpreter.

General overview

-e, --execute: Execute a command

Set the mode of the interpreter to 'execute' mode implying to immediately run the provided script rather than launching as an interactive mode.

For example:

$ hash -e examples/compute_pi.hash
3.1415926535897

-d, --debug: Run compiler in debug mode

This will enable debug mode within the compiler which will mean that the compiler will verbosely report on timings, procedures and in general what it is doing at a given moment.

-h, --help: Print commandline help menu

Displays a help dialogue on how to use the command line arguments with the hash interpreter.

-v, --version: Compiler version

Displays the current interpreter version with some additional debug information about the installed interpreter.

VM Specific options

-s, --stack-size: Adjust vm stack size

Adjust the stack size of the Virtual Machine. Default value is 10,0000

Debug Modes

ast-gen: Generate AST from input file only

This mode tells the compiler to finish at the Abstract Syntax Tree stage and not produce any other kind of output.

-v : Whilst generating AST, output a visual representation of the AST.

-d : Run in debug mode.

ir-gen: : Generate IR from input file only

This mode tells the compiler to finish at the IR stage and not produce any other kind of output.

-v : Whilst generating IR, output a visual representation of the IR.

-d : Run in debug mode.

Compiler backends

Current backend

The current backend uses a Bytecode representation of the program which will run in a Virtual machine that implements garbage collection. This is similar to Python's approach to running programs, but however as we all know, Python is incredibly terrible for performant code (unless using C bindings).

We want to move away from using a Virtual machine as the main backend and actually provide executables that can be run on x86_64 backend using either a native (naive) backend, and LLVM.

However, there are advantages to having a VM implementation for the language, which are primarily:

  • We can have an interactive mode, execute code on the fly (with a minor performance hit)
  • We can run compile-time code functions that are beyond just templates and constant folding expressions.

Planned backends

Here are the currently planned backends, that will be worked on and stabilised some time in the future:

NameDescriptionTarget platformStatus
x86_64_nativeA native backend for generating executables and performing optimisations ourselves.x86_64
x86_64_llvmAn backend powered by the might of LLVM backend.x86_64
vmVirtual machine backend able to run bytecode compiled programs.any
elf64Backend for generating standalone ELFs for un-named host operating systems.i386
wasmWebAssembly backend, convert hash programs into WebAssembly executablesbrowser/any
jsJS backend, generate TS/JavaScript code from the provided program.browser/any

Advanced Concepts

This chapter of the book is dedicated to documenting advanced concepts for developers and contributors.

Compiler internals

This chapter is dedicated to documenting some core internal features of the compiler which are note worthy and should be examined by individuals who are interested in more than using the language but contributing to it's development.

Loop transpilation

As mentioned at the start of the loops section in the basics chapter, the loop control flow keyword is the most universal control flow since to you can use loop to represent both the for and while loops.

for loop transpilation

Since for loops are used for iterators in hash, we transpile the construct into a primitive loop. An iterator can be traversed by calling the next function on the iterator. Since next returns a Option type, we need to check if there is a value or if it returns None. If a value does exist, we essentially perform an assignment to the pattern provided. If None, the branch immediately breaks the for loop. A rough outline of what the transpilation process for a for loop looks like:

For example, the for loop can be expressed using loop as:

for <pat> in <iterator> {
    <block>
}

// converted to
loop {
    match next(<iterator>) {
        Some(<pat>) => <block>;
        None        => break;
    }
}

An example of the transpilation process:

#![allow(unused)]
fn main() {
i := [1,2,3,5].into_iter();

for x in i {
    print("x is " + x);
}


// the same as...
i := [1,2,3,5].into_iter();

loop {
  match next(i) {
    Some(x) => {print("x is " + x)};
    None => break;
  }
}
}

While loop internal representation

In general, a while loop transpilation process occurs by transferring the looping condition into a match block, which compares a boolean condition. If the boolean condition evaluates to false, the loop will immediately break. Otherwise the body expression is expected. A rough outline of what the transpilation process for a while loop looks like:

while <condition> {
    <block>
}

// converted to
loop {
    match <condition> {
        true  => <block>;
        false => break;
    }
}

This is why the condition must explicitly return a boolean value.

An example of a transpilation:

And the while loop can be written using the loop directive like so:

#![allow(unused)]
fn main() {
c := 0;

loop {
    match c < 5 { // where 'x' is the condition for the while loop
        true  => c += 1;
        false => break;
    }
}

// same as...
c := 0;

while c < 5 {
    c+=1;
}
}

If Statement transpilation

As mentioned at the start of the conditionals section in the basics chapter, if statements can be represented as match statements. This is especially advised when you have many if branched and more complicated branch conditions.

Internally, the compiler will convert if statements into match cases so that it has to do less work in the following stages of compilation.

In general, transpilation process can be represented as:

if <condition_1> {
     <block_1> 
} else if <condition_2> { 
    <block_2> 
} 
... 
} else {
    <block_n>
}

// will be converted to

match true {
    _ if <condition_1> => block_1;
    _ if <condition_2> => block_3;
    ...
    _ => block_n;
}

For example, the following if statement will be converted as follows:

#![allow(unused)]
fn main() {
if conditionA {
  print("conditionA")
} else if conditionB {
  print("conditionB")
} else {
  print("Neither")
}

// Internally, this becomes:

match true {
  _ if conditionA => { print("conditionA") };
  _ if conditionB => { print("conditionB") };
  _ => { print("Neither") };
}
}

However, this representation is not entirely accurate because the compiler will optimise out some components out of the transpiled version. Redundant statements such as match true { ... } will undergo constant folding to produce more optimal AST representations of the program.

Missing 'else' case

If the if statement lacks an else clause or a default case branch, the compiler will insert one automatically to avoid issues with pattern exhaustiveness. This behaviour is designed to mimic the control flow of classic if statements because the else branch will have an assigned empty expression block.

From the above example, but without the else branch:

#![allow(unused)]
fn main() {
if conditionA {
  print("conditionA")
} else if conditionB {
  print("conditionB")
}

// Internally, this becomes:

match true {
  _ if conditionA => { print("conditionA") };
  _ if conditionB => { print("conditionB") };
  _ => { };
}
}

Type inference

🚧 Still under construction! 🚧

Future features

This page is dedicated to documenting future planned features within the language.