2. Lexical Elements¶
Note
The contents of this section are informational.
2:1 The text of a Hash program consists of modules organised into source files. The text of a source file is a sequence of separate lexical elements, each composed of characters, whose rules are defined in this section.
2.1. Character Set¶
2.1:1 The program text of a Hash program is a sequence of Unicode characters.
Syntax
2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.
2.1:3 A whitespace character is one of the following characters:
2.1:4
0x09
(horizontal tab\t
)2.1:5
0x0A
(line feed\n
)2.1:6
0x0B
(vertical tab\v
)2.1:7
0x0C
(form feed\f
)2.1:8
0x0D
(carriage return\r
)2.1:9
0x20
(space)2.1:10
0x200E
(left-to-right mark)2.1:11
0x200F
(right-to-left mark)2.1:12
0x2028
(line separator)2.1:13
0x2029
(paragraph separator)
2.1:14 A whitespace string is a sequence of one or more whitespace characters.
2.1:15
a AsciiCharacter
is any Unicode character in the range 0x00
to 0x7F
, inclusive.
Legality Rules
2.1:16 The coded representation of a character is tool defined.
2.2. Lexical Elements, Separators, and Punctuation¶
2.2:1 A lexical element is the most basic syntactic element in program text.
Syntax
LexicalElement
::=Comment
|Identifier
|Keyword
|Literal
|Punctuation
Punctuation
::=Delimiter
| + | - | * | / | % | ^ | ^^ | & | && | | | || | ~ | ! | < | > | = | == | != | <= | >= | => | += | -= | *= | /= | %= | ^= | ^^= | >>= | <<= | |= | ||= | &= | &&= | ~= | . | .. | ... | ..< | ; | , | : | :: | ? | @ | # | $ | -> | =>Delimiter
::= { | } | [ | ] | ( | )
Legality Rules
2.2:2 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements.
2.2:3 A lexical element is the most basic syntactic element in program text.
2.2:4 A line is a sequence of zero or more characters followed by an end of line.
2.2:5 The representation of an end of line is tool defined (i.e. specific to an operating system).
2.2:6 A separator is a character or string that separates adjacent lexical elements. A whitespace string is a separator.
2.2:7 A simple punctuator is one of the following characters:
+ - * / % ^ & | ~ < > = ! ; , : ? @ # $ . { } [ ] ( ) _
2.2:8 A compound punctuator is one of the following two or more adjacent special characters:
&& || ^^ == != <= >= => += -= *= /= %= ^= ^^= >>= <<= |= ||= &= &&= ~= .. ... ..< :: -> =>
2.2:9 The following compound punctuators are flexible compound punctuators:
&& || << >>
2.2:10 A flexible compound punctuator may be treated as a compound punctuator or two adjacent simple punctuators.
2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if the character is being used as part of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.
2.2:12 The following names are used to refer to the punctuators:
2.2:13 |
punctuator |
name |
2.2:15 |
|
Plus |
2.2:15 |
|
Minus |
2.2:16 |
|
Star |
2.2:17 |
|
Slash |
2.2:18 |
|
Modulo |
2.2:19 |
|
Caret |
2.2:20 |
|
Exponent |
2.2:21 |
|
Bang |
2.2:22 |
|
And |
2.2:23 |
|
Or |
2.2:24 |
|
Logical And, Lazy and, And And |
2.2:25 |
|
Logical Or, Lazy Or, Or Or |
2.2:26 |
|
Less than |
2.2:27 |
|
Left Shift |
2.2:28 |
|
Greater than |
2.2:29 |
|
Right shift |
2.2:30 |
|
Equals, Assign |
2.2:31 |
|
Logical Equals, Double Equals |
2.2:32 |
|
Logical Not Equals, Not Equals |
2.2:33 |
|
Less than or Equals |
2.2:34 |
|
Greater than or Equals |
2.2:35 |
|
Left Shift Assign |
2.2:36 |
|
Right Shift Assign |
2.2:37 |
|
Plus Equals |
2.2:38 |
|
Minus Equals |
2.2:39 |
|
Plus Equals |
2.2:40 |
|
Minus Equals |
2.2:41 |
|
Percent Equals |
2.2:42 |
|
Caret Equals |
2.2:44 |
|
Exponent Equals |
2.2:44 |
|
At |
2.2:45 |
|
Dot |
2.2:46 |
|
Range, Dot Dot |
2.2:47 |
|
Exclusive Range |
2.2:48 |
|
Ellipsis, Spread |
2.2:49 |
|
Comma |
2.2:50 |
|
Semi |
2.2:51 |
|
Colon |
2.2:52 |
|
Access |
2.2:53 |
|
Thin Arrow |
2.2:54 |
|
Fat Arrow |
2.2:55 |
|
Pound |
2.2:56 |
|
Dollar sign |
2.2:57 |
|
Question Mark |
2.2:58 |
|
Left brace |
2.2:59 |
|
Right brace |
2.2:60 |
|
Left bracket |
2.2:61 |
|
Right bracket |
2.2:62 |
|
Left parenthesis |
2.2:63 |
|
Right parenthesis |
2.4. Identifiers¶
Syntax
Identifier
::=IdentifierStart
IdentifierContinue
*IdentifierList
::=Identifier
( ,Identifier
)* ,?IdentifierStart
::= [a..z A..Z _]IdentifierContinue
::=IdentifierStart
| 0..9
Legality Rules
2.4:1 An identifier is a lexical element that refers to a name.
2.4:2 Two identifiers are equivalent if they consist of the same sequence of characters.
Examples
foo
bar2
_identifier
2.5. Keywords¶
Syntax
Keyword
::=
for
| while
| loop
| if
| else
| false
| match
| as
| in
| trait
| enum
| struct
| continue
| break
| return
| import
| raw
| unsafe
| pub
| priv
| mut
| mod
| impl
| type
| true
2.5.1. Reserved Keywords¶
Syntax
ReservedKeyword
::=
macro
| use
| where
| ref
2.5.1:1 Reserved keywords are keywords that are reserved for future use, but are not currently used by the language. The are currently allowed to be used as identifiers, however they will likely be used in the future, and so it is recommended to avoid using them as identifiers.
2.6. Literals¶
Syntax
Literal
::=BooleanLiteral
|ByteLiteral
|CharacterLiteral
|StringLiteral
|NumericLiteral
Legality Rules
2.6:1 A literal is a fixed value in program text.
2.7. Boolean Literals¶
Syntax
BooleanLiteral
::=
true
| false
Legality Rules
2.7:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.
2.7:2 the type of a boolean literal is bool
.
Examples
false
2.8. Byte Literals¶
Syntax
ByteLiteral
::= b'ByteContent
'ByteContent
::=ByteCharacter
|ByteEscape
ByteEscape
::= \0 | \n | \r | \t | \a | \b | \f | \v | \\ | \' | \" | \xOctalDigit
HexadecimalDigit
2.8:1
A ByteCharacter
is any character in the AsciiCharacter
except
characters 0x09 (horizontal tab \t
), 0x0A (line feed \n
), 0x0D (carriage return \r
), 0x27 (single quote '
),
and 0x5C (backslash \
).
Legality Rules
2.8:2 A byte literal is a literal that denotes a fixed byte value.
2.8:3
The type of a byte literal is u8
.
2.9. Character Literals¶
Syntax
CharacterLiteral
::= 'CharacterContent
'CharacterContent
::=AsciiEscape
|CharacterContentItem
|UnicodeEscape
AsciiEscape
::= \0 | \n | \r | \t | \a | \b | \f | \v | \\ | \' | \" | \xOctalDigit
HexadecimalDigit
2.9:1
A CharacterContentItem
is any Unicode codepoint except for the Unicode
characters 0x09 (horizontal tab \t
), 0x0A (line feed \n
), 0x0D (carriage return \r
), 0x27 (single quote '
),
and 0x5C (backslash \
).
2.9:2
A UnicodeEscape
starts with a \u{
literal, followed by 1 to 6 instances of a
HexadecimalDigit
, inclusive, followed by a }
character. The literal can represent
any Unicode codepoint between U+000000 and U+10FFFF, inclusive, except Unicode
surrogate codepoints, which exist between the range of U+D800 and U+DFFF, inclusive.
Legality Rules
2.9:3 A character literal is a literal that denoted a fixed Unicode character.
2.9:4
The type of a character literal is char
.
Examples
'a'
'\t'
'\x1b'
'\u{1F30}'
2.10. String Literals¶
Syntax
StringLiteral
::= "StringContent
* "StringContent
::=AsciiEscape
|StringContentItem
|UnicodeEscape
2.10:1
A StringContentItem
is any Unicode codepoint except for the Unicode
0x0D (carriage return \r
) characters 0x22 (double quote "
) and 0x5C (backslash \
).
Legality Rules
2.10:2
A string literal is where the characters are Unicode characters, enclosed in double quotes "
.
2.10:3
The type of a string literal is str
.
Examples
""
"Москва"
"cat"
"\tcol\nrow"
"bell\x07"
"\u{B80a}"
2.11. Numerical Literals¶
Syntax
NumericLiteral
::=IntegerLiteral
|FloatLiteral
2.11.1. Integer Literals¶
Syntax
IntegerLiteral
::= -?IntegerContent
IntegerSuffix
?IntegerContent
::=BinaryLiteral
|OctalLiteral
|DecimalLiteral
|HexadecimalLiteral
BinaryLiteral
::= 0bBinaryDigitOrUnderscore
*BinaryDigit
BinaryDigitOrUnderscore
*BinaryDigitOrUnderscore
::=BinaryDigit
| _BinaryDigit
::= [0-1]OctalLiteral
::= 0oOctalDigitOrUnderscore
*OctalDigit
OctalDigitOrUnderscore
*OctalDigitOrUnderscore
::=OctalDigit
| _OctalDigit
::= [0-7]DecimalLiteral
::=DecimalDigitOrUnderscore
*DecimalDigit
DecimalDigitOrUnderscore
*DecimalDigitOrUnderscore
::=DecimalDigit
| _DecimalDigit
::= [0-9]HexadecimalLiteral
::= 0xHexadecimalDigitOrUnderscore
*HexadecimalDigit
HexadecimalDigitOrUnderscore
*HexadecimalDigitOrUnderscore
::=HexadecimalDigit
| _HexadecimalDigit
::= [0-9 a-f A-F]IntegerSuffix
::=SignedIntegerSuffix
|UnsignedIntegerSuffix
SignedIntegerSuffix
::= i8 | i16 | i32 | i64 | i128 | isize | ibigUnsignedIntegerSuffix
::= u8 | u16 | u32 | u64 | u128 | usize | ubig
Legality Rules
2.11.1:1 An integer literal is a numeric literal that denotes a whole number.
2.11.1:2 A binary literal is an integer literal in base 2.
2.11.1:3 A octal literal is an integer literal in base 8.
2.11.1:4 A decimal literal is an integer literal in base 10.
2.11.1:5 A hexadecimal literal is an integer literal in base 16.
2.11.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.
2.11.1:7 A suffixed integer is an integer literal with a integer suffix.
2.11.1:8 An unsuffixed integer is an integer literal without a integer suffix.
2.11.1:9 The type of a unsuffixed integer is determined by the integer suffix as follows:
2.11.1:10 Suffix
i8
specifies the typei8
.2.11.1:11 Suffix
i16
specifies the typei16
.2.11.1:12 Suffix
i32
specifies the typei32
.2.11.1:13 Suffix
i64
specifies the typei64
.2.11.1:14 Suffix
i128
specifies the typei128
.2.11.1:15 Suffix
isize
specifies the typeisize
.2.11.1:16 Suffix
ibig
specifies the typeibig
.2.11.1:17 Suffix
u8
specifies the typeu8
.2.11.1:18 Suffix
u16
specifies the typeu16
.2.11.1:19 Suffix
u32
specifies the typeu32
.2.11.1:20 Suffix
u64
specifies the typeu64
.2.11.1:21 Suffix
u128
specifies the typeu128
.2.11.1:22 Suffix
usize
specifies the typeusize
.2.11.1:23 Suffix
ubig
specifies the typeubig
.
2.11.1:24 The type of a unsuffixed integer is determined by type inference as follows:
2.11.1:25 If a integer type can be inferred from the context, then the unsuffixed integer has that type.
2.11.1:26 If the program content under-constrains the type, then the inferred type is
i32
.2.11.1:27 If the program content over-constrains the type, then it is considered to be a static error.
Examples
0b0010_1110_u8
1___2_3
0xDeAdBeEf_u32
0o77_52i128
2.11.2. Float Literals¶
Syntax
FloatLiteral
::= -?FloatComponent
FloatComponent
::=DecimalLiteral
. |DecimalLiteral
FloatExponent
|DecimalLiteral
.DecimalLiteral
FloatExponent
? |DecimalLiteral
(.DecimalLiteral
)?FloatExponent
?FloatSuffix
?FloatExponent
::=ExponentAnnotation
ExponentSign
?ExponentMagnitude
ExponentAnnotation
::= e | EExponentSign
::= + | -ExponentMagnitude
::=DecimalDigitOrUnderscore
*DecimalDigit
DecimalDigitOrUnderscore
*FloatSuffix
::= f32 | f64
Legality Rules
2.11.2:1 A float literal is a numeric literal that denotes a fractional number.
2.11.2:2 A float suffix is a component of a float literal that specifies an explicit floating point type.
2.11.2:3 A suffixed float is a float literal with a float suffix.
2.11.2:4 An unsuffixed float is a float literal without a float suffix.
2.11.2:5 The type of a suffixed float is determined by the float suffix as follows:
2.11.2:8 The type of a unsuffixed float is determined by type inference as follows:
2.11.2:9 If a floating-point type can be inferred from the context, then the unsuffixed float has that type.
2.11.2:10 If the program content under-constrains the type, then the inferred type is
f64
.2.11.2:11 If the program content over-constrains the type, then it is considered to be a static error.
Examples
45.
8E+1_820
3.14e5
8_031.4_e-12f64
2.3. Comments¶
Syntax
Legality Rules
2.3:1 A comment is a lexical element that acts as annotation in the program text.
2.3:2 A block comment is a comment that spans one or more lines.
2.3:3 A line comment is a comment that spans over one line.
2.3:4 Character 0x0D (carriage return) shall not appear in a comment.
Examples