2. Lexical Elements

Note

The contents of this section are informational.

2:1 The text of a Hash program consists of modules organised into source files. The text of a source file is a sequence of separate lexical elements, each composed of characters, whose rules are defined in this section.

2.1. Character Set

2.1:1 The program text of a Hash program is a sequence of Unicode characters.

Syntax

2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.

2.1:3 A whitespace character is one of the following characters:

  • 2.1:4 0x09 (horizontal tab \t)

  • 2.1:5 0x0A (line feed \n)

  • 2.1:6 0x0B (vertical tab \v)

  • 2.1:7 0x0C (form feed \f)

  • 2.1:8 0x0D (carriage return \r)

  • 2.1:9 0x20 (space)

  • 2.1:10 0x200E (left-to-right mark)

  • 2.1:11 0x200F (right-to-left mark)

  • 2.1:12 0x2028 (line separator)

  • 2.1:13 0x2029 (paragraph separator)

2.1:14 A whitespace string is a sequence of one or more whitespace characters.

2.1:15 a AsciiCharacter is any Unicode character in the range 0x00 to 0x7F, inclusive.

Legality Rules

2.1:16 The coded representation of a character is tool defined.

2.2. Lexical Elements, Separators, and Punctuation

2.2:1 A lexical element is the most basic syntactic element in program text.

Syntax

LexicalElement ::=
     Comment
     | Identifier
     | Keyword
     | Literal
     | Punctuation

Punctuation ::=
     Delimiter
     | +
     | -
     | *
     | /
     | %
     | ^
     | ^^
     | &
     | &&
     | |
     | ||
     | ~
     | !
     | <
     | >
     | =
     | ==
     | !=
     | <=
     | >=
     | =>
     | +=
     | -=
     | *=
     | /=
     | %=
     | ^=
     | ^^=
     | >>=
     | <<=
     | |=
     | ||=
     | &=
     | &&=
     | ~=
     | .
     | ..
     | ...
     | ..<
     | ;
     | ,
     | :
     | ::
     | ?
     | @
     | #
     | $
     | ->
     | =>


Delimiter ::=
     {
     | }
     | [
     | ]
     | (
     | )

Legality Rules

2.2:2 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements.

2.2:3 A lexical element is the most basic syntactic element in program text.

2.2:4 A line is a sequence of zero or more characters followed by an end of line.

2.2:5 The representation of an end of line is tool defined (i.e. specific to an operating system).

2.2:6 A separator is a character or string that separates adjacent lexical elements. A whitespace string is a separator.

2.2:7 A simple punctuator is one of the following characters:

+
-
*
/
%
^
&
|
~
<
>
=
!
;
,
:
?
@
#
$
.
{
}
[
]
(
)
_

2.2:8 A compound punctuator is one of the following two or more adjacent special characters:

&&
||
^^
==
!=
<=
>=
=>
+=
-=
*=
/=
%=
^=
^^=
>>=
<<=
|=
||=
&=
&&=
~=
..
...
..<
::
->
=>

2.2:9 The following compound punctuators are flexible compound punctuators:

&&
||
<<
>>

2.2:10 A flexible compound punctuator may be treated as a compound punctuator or two adjacent simple punctuators.

2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if the character is being used as part of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.

2.2:12 The following names are used to refer to the punctuators:

2.2:13

punctuator

name

2.2:15

+

Plus

2.2:15

-

Minus

2.2:16

*

Star

2.2:17

/

Slash

2.2:18

%

Modulo

2.2:19

^

Caret

2.2:20

^^

Exponent

2.2:21

!

Bang

2.2:22

&

And

2.2:23

|

Or

2.2:24

&&

Logical And, Lazy and, And And

2.2:25

||

Logical Or, Lazy Or, Or Or

2.2:26

<

Less than

2.2:27

<<

Left Shift

2.2:28

>

Greater than

2.2:29

>>

Right shift

2.2:30

=

Equals, Assign

2.2:31

==

Logical Equals, Double Equals

2.2:32

!=

Logical Not Equals, Not Equals

2.2:33

<=

Less than or Equals

2.2:34

>=

Greater than or Equals

2.2:35

<<=

Left Shift Assign

2.2:36

>>=

Right Shift Assign

2.2:37

+=

Plus Equals

2.2:38

-=

Minus Equals

2.2:39

*=

Plus Equals

2.2:40

/=

Minus Equals

2.2:41

%=

Percent Equals

2.2:42

^=

Caret Equals

2.2:44

^^=

Exponent Equals

2.2:44

@

At

2.2:45

.

Dot

2.2:46

..

Range, Dot Dot

2.2:47

..<

Exclusive Range

2.2:48

...

Ellipsis, Spread

2.2:49

,

Comma

2.2:50

;

Semi

2.2:51

:

Colon

2.2:52

::

Access

2.2:53

->

Thin Arrow

2.2:54

=>

Fat Arrow

2.2:55

#

Pound

2.2:56

$

Dollar sign

2.2:57

?

Question Mark

2.2:58

{

Left brace

2.2:59

}

Right brace

2.2:60

[

Left bracket

2.2:61

]

Right bracket

2.2:62

(

Left parenthesis

2.2:63

)

Right parenthesis

2.3. Comments

Syntax

Comment ::=
     LineComment
     | BlockComment

LineComment ::=
     // ~[\n]*

BlockComment ::=
     /* (BlockComment | ~[*/])* */
     | /**/

Legality Rules

2.3:1 A comment is a lexical element that acts as annotation in the program text.

2.3:2 A block comment is a comment that spans one or more lines.

2.3:3 A line comment is a comment that spans over one line.

2.3:4 Character 0x0D (carriage return) shall not appear in a comment.

Examples

// This is a comment
/* This is a block comment */
/* /* This is a nested block comment */ */

2.4. Identifiers

Syntax

Identifier ::=
     IdentifierStart IdentifierContinue*

IdentifierList ::=
     Identifier ( , Identifier )* ,?

IdentifierStart ::=
     [a..z A..Z _]

IdentifierContinue ::=
     IdentifierStart | 0..9

Legality Rules

2.4:1 An identifier is a lexical element that refers to a name.

2.4:2 Two identifiers are equivalent if they consist of the same sequence of characters.

Examples

foo
bar2
_identifier

2.5. Keywords

Syntax

Keyword ::=
     for
     | while
     | loop
     | if
     | else
     | false
     | match
     | as
     | in
     | trait
     | enum
     | struct
     | continue
     | break
     | return
     | import
     | raw
     | unsafe
     | pub
     | priv
     | mut
     | mod
     | impl
     | type
     | true

2.5.1. Reserved Keywords

Syntax

ReservedKeyword ::=
     macro
     | use
     | where
     | ref

2.5.1:1 Reserved keywords are keywords that are reserved for future use, but are not currently used by the language. The are currently allowed to be used as identifiers, however they will likely be used in the future, and so it is recommended to avoid using them as identifiers.

2.6. Literals

Syntax

Literal ::=
     BooleanLiteral
     | ByteLiteral
     | CharacterLiteral
     | StringLiteral
     | NumericLiteral

Legality Rules

2.6:1 A literal is a fixed value in program text.

2.7. Boolean Literals

Syntax

BooleanLiteral ::=
     true
     | false

Legality Rules

2.7:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.

2.7:2 the type of a boolean literal is bool.

Examples

false

2.8. Byte Literals

Syntax

ByteLiteral ::=
     b' ByteContent '

ByteContent ::=
     ByteCharacter
     | ByteEscape

ByteEscape ::=
     \0
     | \n
     | \r
     | \t
     | \a
     | \b
     | \f
     | \v
     | \\
     | \'
     | \"
     | \x OctalDigit HexadecimalDigit

2.8:1 A ByteCharacter is any character in the AsciiCharacter except characters 0x09 (horizontal tab \t), 0x0A (line feed \n), 0x0D (carriage return \r), 0x27 (single quote '), and 0x5C (backslash \).

Legality Rules

2.8:2 A byte literal is a literal that denotes a fixed byte value.

2.8:3 The type of a byte literal is u8.

2.9. Character Literals

Syntax

CharacterLiteral ::=
     ' CharacterContent '

CharacterContent ::=
     AsciiEscape
     | CharacterContentItem
     | UnicodeEscape

AsciiEscape ::=
     \0
     | \n
     | \r
     | \t
     | \a
     | \b
     | \f
     | \v
     | \\
     | \'
     | \"
     | \x OctalDigit HexadecimalDigit

2.9:1 A CharacterContentItem is any Unicode codepoint except for the Unicode characters 0x09 (horizontal tab \t), 0x0A (line feed \n), 0x0D (carriage return \r), 0x27 (single quote '), and 0x5C (backslash \).

2.9:2 A UnicodeEscape starts with a \u{ literal, followed by 1 to 6 instances of a HexadecimalDigit, inclusive, followed by a } character. The literal can represent any Unicode codepoint between U+000000 and U+10FFFF, inclusive, except Unicode surrogate codepoints, which exist between the range of U+D800 and U+DFFF, inclusive.

Legality Rules

2.9:3 A character literal is a literal that denoted a fixed Unicode character.

2.9:4 The type of a character literal is char.

Examples

'a'
'\t'
'\x1b'
'\u{1F30}'

2.10. String Literals

Syntax

StringLiteral ::=
     " StringContent* "

StringContent ::=
     AsciiEscape
     | StringContentItem
     | UnicodeEscape

2.10:1 A StringContentItem is any Unicode codepoint except for the Unicode 0x0D (carriage return \r) characters 0x22 (double quote ") and 0x5C (backslash \).

Legality Rules

2.10:2 A string literal is where the characters are Unicode characters, enclosed in double quotes ".

2.10:3 The type of a string literal is str.

Examples

""
"Москва"
"cat"
"\tcol\nrow"
"bell\x07"
"\u{B80a}"

2.11. Numerical Literals

Syntax

NumericLiteral ::=
     IntegerLiteral
     | FloatLiteral

2.11.1. Integer Literals

Syntax

IntegerLiteral ::=
     -? IntegerContent IntegerSuffix?

IntegerContent ::=
     BinaryLiteral
     | OctalLiteral
     | DecimalLiteral
     | HexadecimalLiteral

BinaryLiteral ::=
     0b BinaryDigitOrUnderscore* BinaryDigit BinaryDigitOrUnderscore*

BinaryDigitOrUnderscore ::=
     BinaryDigit
     | _

BinaryDigit ::=
     [0-1]

OctalLiteral ::=
     0o OctalDigitOrUnderscore* OctalDigit OctalDigitOrUnderscore*

OctalDigitOrUnderscore ::=
     OctalDigit
     | _

OctalDigit ::=
     [0-7]

DecimalLiteral ::=
     DecimalDigitOrUnderscore* DecimalDigit DecimalDigitOrUnderscore*

DecimalDigitOrUnderscore ::=
     DecimalDigit
     | _

DecimalDigit ::=
     [0-9]

HexadecimalLiteral ::=
     0x HexadecimalDigitOrUnderscore* HexadecimalDigit HexadecimalDigitOrUnderscore*

HexadecimalDigitOrUnderscore ::=
     HexadecimalDigit
     | _

HexadecimalDigit ::=
     [0-9 a-f A-F]

IntegerSuffix ::=
     SignedIntegerSuffix
     | UnsignedIntegerSuffix

SignedIntegerSuffix ::=
     i8
     | i16
     | i32
     | i64
     | i128
     | isize
     | ibig

UnsignedIntegerSuffix ::=
     u8
     | u16
     | u32
     | u64
     | u128
     | usize
     | ubig

Legality Rules

2.11.1:1 An integer literal is a numeric literal that denotes a whole number.

2.11.1:2 A binary literal is an integer literal in base 2.

2.11.1:3 A octal literal is an integer literal in base 8.

2.11.1:4 A decimal literal is an integer literal in base 10.

2.11.1:5 A hexadecimal literal is an integer literal in base 16.

2.11.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.

2.11.1:7 A suffixed integer is an integer literal with a integer suffix.

2.11.1:8 An unsuffixed integer is an integer literal without a integer suffix.

2.11.1:9 The type of a unsuffixed integer is determined by the integer suffix as follows:

  • 2.11.1:10 Suffix i8 specifies the type i8.

  • 2.11.1:11 Suffix i16 specifies the type i16.

  • 2.11.1:12 Suffix i32 specifies the type i32.

  • 2.11.1:13 Suffix i64 specifies the type i64.

  • 2.11.1:14 Suffix i128 specifies the type i128.

  • 2.11.1:15 Suffix isize specifies the type isize.

  • 2.11.1:16 Suffix ibig specifies the type ibig.

  • 2.11.1:17 Suffix u8 specifies the type u8.

  • 2.11.1:18 Suffix u16 specifies the type u16.

  • 2.11.1:19 Suffix u32 specifies the type u32.

  • 2.11.1:20 Suffix u64 specifies the type u64.

  • 2.11.1:21 Suffix u128 specifies the type u128.

  • 2.11.1:22 Suffix usize specifies the type usize.

  • 2.11.1:23 Suffix ubig specifies the type ubig.

2.11.1:24 The type of a unsuffixed integer is determined by type inference as follows:

  • 2.11.1:25 If a integer type can be inferred from the context, then the unsuffixed integer has that type.

  • 2.11.1:26 If the program content under-constrains the type, then the inferred type is i32.

  • 2.11.1:27 If the program content over-constrains the type, then it is considered to be a static error.

Examples

0b0010_1110_u8
1___2_3
0xDeAdBeEf_u32
0o77_52i128

2.11.2. Float Literals

Syntax

FloatLiteral ::= -? FloatComponent

FloatComponent ::=
     DecimalLiteral .
     | DecimalLiteral FloatExponent
     | DecimalLiteral . DecimalLiteral FloatExponent?
     | DecimalLiteral (. DecimalLiteral)? FloatExponent? FloatSuffix?

FloatExponent ::=
     ExponentAnnotation ExponentSign? ExponentMagnitude

ExponentAnnotation ::=
     e
     | E

ExponentSign ::=
     +
     | -

ExponentMagnitude ::=
     DecimalDigitOrUnderscore* DecimalDigit DecimalDigitOrUnderscore*

FloatSuffix ::=
     f32
     | f64

Legality Rules

2.11.2:1 A float literal is a numeric literal that denotes a fractional number.

2.11.2:2 A float suffix is a component of a float literal that specifies an explicit floating point type.

2.11.2:3 A suffixed float is a float literal with a float suffix.

2.11.2:4 An unsuffixed float is a float literal without a float suffix.

2.11.2:5 The type of a suffixed float is determined by the float suffix as follows:

  • 2.11.2:6 Suffix f32 specifies the type f32.

  • 2.11.2:7 Suffix f64 specifies the type f64.

2.11.2:8 The type of a unsuffixed float is determined by type inference as follows:

Examples

45.
8E+1_820
3.14e5
8_031.4_e-12f64