Updated Programming language specification (Gemstone) (markdown)

servostar 2024-02-02 17:44:14 +00:00
parent 03ba645142
commit be6056bc2e
1 changed files with 320 additions and 10 deletions

@ -1,19 +1,62 @@
# Programming language specification
## Introduction
The Gemstone programming language is statically typed CPU based language. It incorporates many features from languages such as Rust, GLSL, C/C++ and Java.
## Primitive data types
The most fundamental types of data. They provide the base upon everything else is built. A primitive type has 2 properties: size in bytes and arithmetic model (additionally specifies representation on bit level). There are exactly two primitive data types:
| Type name | size in bytes | arithmetic model |
| - | - | - |
| `int` | 4 | signed integer stored as two's complement with the MSB representing the sign bit |
| `float` | 4 | signed single precision floating point as specified by IEEE-754 |
| type name | size in bytes | arithmetic model | default value |
| - | - | - | - |
| `int` | 4 | signed integer stored as two's complement with the MSB representing the sign bit | 0 |
| `float` | 4 | signed single precision floating point as specified by IEEE-754 | 0.0 |
### Automatic type coercion
Primitive types `int` and `float` are automatically converted into each other upon need. This is a lossy conversion. A float may not have the precision to store every digit of a given integer. Likewise the fraction of any float will be truncated.
### Type cast
Primtive types can be logically transmuted into each other. A type cast tries to translate the logical value of a type into another one. This conversion, depending on the case, may be lossy. A lossy conversion occures when a float is converted into an integer. In this case the fraction of the float is truncated. In the case that an integer is converted into a float, some significants of the integer may be altered due to the fact that floating points have much more a limited count of significant digits than equally sized integers.
A table detailing all conversions can be found below:
:::info
A variable is **Out-Of-Bounds** if at conversion to another type causes an overflow, underflow or the value cannot fit into the target range.
:::
| Destination type | Source type | Conversion method |
| -------- | -------- | -------- |
| `int` | `float` | truncation of fraction, saturation on out-of-bounds, zero for IEEE-754 NaN |
| `float` | `int` | integral is copied into floating-point format with the fraction being zero. Saturation on out-of-bounds |
#### Syntax
Type cast can be performed manually with the syntax:
```rust
<expr> as <composite>
```
### Reinterpret cast
Types can be reinterpetet. This means that the least significant bit pattern of the current value of the type is preserved during conversion. A float or integer is only looked at by their bitwise representation not by their logical value.
#### Example
The equivalent logical values of an integer and float:
- `3`
- `3.0`
Their corresponding bitwise representation (may depend on platform):
- `00000000000000000000000000000011`
- `01000000010000000000000000000000`
As you can see both have equal *logical* values but very different bit pattern. A reinterpret cast will try to retain this pattern. Any reinterpret cast between two identically sized types is lossless. At conversion to a type of smaller size the most significant bytes are dropped. Otherwise on conversion to larger sized types the most significant bits are filled with zeros.
#### Syntax
A reinterpretation cast can be performed manually with the syntax:
```
(composite) <expr>
```
## Composite data types
Primitive data types provide the foundation upon which direct derivations can be built. The derivations of primitive types we call composite types. A composite type modifies or extends the definition of a primitive or another composite type.
Extensions or modifications can mean adding or removing a sign bit, exchanging the arithmetic model, etc.
Extensions or modifications can mean anything from adding or removing a sign bit to exchanging the arithmetic model and other.
The declaration of a composite data type is made with the following syntax:
`<sign> <scale> <base> = $name`
@ -41,19 +84,286 @@ A single byte: `half short int` with size: `4 bytes * 2^(-1) * 2^(-1) = 1 byte`
#### Limitations:
The size of composite type cannot exceed 32 and be lower than 1 nor be a non integral value.
A notable exception is the primitive type `float`. Its size is constrained to 2, 4, 8, 10, 16 and 32 in order to remain compliant with the IEEE-754 standard. A platform may choose to only provide implementations for only a few of the previously named sizes. On x86 and x64 and ARM the most common composite floats are of the sizes: 4 (float) and 8 (double).
As an example the maximum possible integer is a `double double double int` for 32 Bytes.
A notable exception is the primitive type `float`. Its size is constrained to 2, 4, 8, 16 and 32 in order to remain compliant with the IEEE-754 standard. A platform may choose to only provide implementations for only a few of the previously named sizes. On x86, x64 and ARM the most common composite floats are of the sizes: 4 (float) and 8 (double).
### Base
The base of a composite type specifies on which primitive or composite type a derivation is to be built.
## Example types
### Example composite types
```cpp
long float = double_precision
short short int = ascii
double int = long_int
unsigned half int = word
unsigned int = dword
```
## References
References are abstracted pointers for specific composites. They refer to the location in memory where a contigous block of memory is stored. This block is divided into heterogneous sections of statically defined composite types. Unlike pointers, the raw memory address of references is immutable. Only the data at the addressed memory location can be altered; not the position itself.
Individual sections can be accessed via an index. Indexing start at 0 and continues in integral increments until the end of the memory block. Each section has the capacity to store a single instance of the specified composite type.
### Declaration
A reference can be declared in this way:
```
ref <reference-composite>|<composite>: $identifier
```
Whereas a `reference-composite` is another reference declaration. Thats means reference declarations can be recursive. A nested reference declaration is equivalent to multi-dimensional pointers. A reference referencing another reference of a primitive data types is pointing to a block of references which each individually point to a primitive data type.
:::info
References can be recursive
:::
### Dereferencing
Accessing a section for writing and reading is called dereferencing. The syntax to access a specific section of a reference is as follows:
```glsl
$identifier|<variable>[<expr>]
```
:::info
Variables, expressions and constants used as index must resolve to an **integral** composite type at compile time.
:::
### Memory allocation
The standard library and subsequent platform has to provide a function inside the `mem` module with the following declaration:
:::danger
TODO: decide on implementation with either one:
- generic (implication: implement generic system)
- reference cast (implication: undefinied behavior on cast, runtime errors)
:::
```glsl
alloc<T>(out ref T, in int: len)
ref float a = alloc<float>(5)
```
```glsl
alloc(out ref int, in int: len)
ref float a = (ref float) alloc(5)
```
### Examples
```glsl
ref int: array = alloc(3)
ref int: g = alloc(9)
array[0] = 7
g = abc[array[2]]
```
## Storage qualifier
Variables may have storage qualifier. These keywords describe how a variable is to be stored upon its creation. Available options are:
- `local`: store variable on function-local stack
- `global`: store variable on dynamically process-level heap
- `global`: store variable on dynamically allocated process-level heap
- `static`: store variable in process-level data segment with static allocation
Storage qualifiers must prefix the compsite type and cannot be used in parameter declaration.
:::info
Storage qualifier are mutally exclusive to each other
:::
:::info
The default storage qualifier is `static`.
:::
## Variables
A variable is a monolithic and heterogenouslly typed block of memory with an optional identifier associated to it at compile time. Variables must be declared and initialised before their usage. However declaration and initalisation can be separated syntactically as well as temporally as long as the previously mentioned condition is not violated.
### Declaration
To declare a variable use the following syntax:
```
<storage-qualifier> <composite>: $identifier
```
Example declarations:
```cpp
long double: pi
short int: a_single_byte
```
:::info
Variables must only be taged with a single qualifier at once.
:::
### Initialisation
To initialise a variable use the following syntax:
```
$identifier = <expr>
```
Example declarations:
```cpp
long double: pi
pi = 3.1415926
```
### Definition
To define a variable use the following syntax:
```
<storage-qualifier> <composite>: $identifier = <expr>
```
Example declarations:
```cpp
long double: pi = 3.1415926
```
### Shadowing
Any variable can be redefined on any basis. This means not the reuse of existing memory but rather binding an existing identifier to a new block of memory. On shadowing a variable the previous data becomes inaccessibleby the identifier but may be still reachable via previously defined references.
```glsl
local int: a = 5
local ref int: b = a
local float a = 7.0
```
In this example the reference b still yields the value 5 stored in the original memory of variable a as is was defined in the first line. After the shadowing of variable a the new value of a occupies indipendant memory and has no influence of reference b.
## Functions
Functions group together expressions. A function is uniquely identified by an identifier. They declare at least one list of parameters. Parameter lists declare the order an type of variables which are passed to the function when calling and are available during its execution. Parameter declarations can be extended with IO-qualifier.
### IO-qualifier (Input/Output-Qualifier)
A parameter declaration has to be IO-qualified. This means its has to be definied how caller and the function interface with a parameter. Parameter are not always readable from inside a function. Writeable are only parameter declared with the IO-qualifier `out`, refered to as output. These parameter are 'returned' to the caller of the function. Additionally parameter can be initialized with a default value. Output variables without default value will be initalized with the types default value. Parameter declared with `in` are passed down from the caller into the function and are read only inside the function. Any parameter without any IO-qualifier are automatically handled as if declared with `in`. The qualifier `in` and `out` can be combined and used together. In this case the parameter is initialised by the caller with an optional default value and be written into by the function. The caller can then retrieve the parameter back. Output parameters inserted by the caller which are already initialised will be overwritten by the function call.
Output parameter without default parameter have to be initialised inside the function body.
| Qualifier | Access inside function body | Returned to caller |
| -------- | -------- | -------- |
| `in` | Read | no |
| `out` | Write | yes |
| `in out` | Read, Write | yes |
| no qualifier | Read | no |
#### Function declaration example
```glsl
foo(in double float: a, in double float: b)(out double float: c)
{
c = a + b
}
```
#### Function call example
```glsl
foo(in double float: a, in double float: b)(out double float: c)
{
c = a + b
}
abc(in ref int: output)
{
output = 99
}
main()
{
local double float: a = 7
local double float: b = 9
local double float: c
foo(a, b, c)
local ref int float: d = c
abc(d)
}
```
:::warning
Unlike other languages (Java, C, ...) gemstone has no special handling for function return types.
:::
Functions must be defined in global scope and cannot be nested. Functions can be declared but its body must be defined eventually. In the following example we declare a function without its body. The declared function can be used freely. However, if no implementation is provided at some point during compilation an error is to be thrown as no executable code is found.
```glsl
divide(in int: a, in int b, out int c)
divide(in int: a, in int b, out int c)
{
c = a / b
}
```
## Operations
A operatiton is a connection of one or two inputs and reuturns one output. For all opeerations a reference has to be dereferencized.
### Logical
A Comparison wants as input two variables that are the same datatype and return an integer with 1 for true and 0 for false.
When the variables are not the same type the compiler return an error
```
a == b: is equals
```
### Boolean
Boolean operations want two integer as inputs and return an integer with 1 for true and 0 for false. A not gets one integer as Input and return one integer with 1 for true and 0 for false.
If one of these operations gets a float as input then the Compiler reinterpret cast the float to an int.
```
a && b: and
a || b: or
a ^^ b: xor
!!(a) : not
```
### Bitwise
Bitwise operations wants two integer as inputs and return an integer with the same size as the input and 1 for true and 0 for false for every bit. floats as inputs return an error from the compiler.
```
a & b: bitwise and
a | b: bitwise or
a ^ b: bitwise xor
!a : bitwise not
```
### algebra
Algebra operations are normal mathematical operations. The operations return the more powerfull datatype. that means with float and int the float will be returned. also the bigger scale is returned. So when you have a double datertype and a short the double is returned. The return datatype is the combination of both.
addition
```
a + b: addition
a - b: subtraction
a * b: multiplication
a / b: division
```
to compare to variables with "or", "and" or "xour" you can use the below defined syntax.
```
// or
a || b
// and
a && b
//xor
a ^^ b
```
```
boolean algebra for bits
c = a | b: a or b
c = a & b: a and b
c = a ^ b: a xor b
c = !(a) : not a
only for ints
a and b has to be the same datatype
expression return the statements as the datatype of a
```
### TODO
herausfinden, wie syntaktisch ein Strukt und Interfaces aufgebaut sein soll