0 Programming language specification (Gemstone)
Filleo edited this page 2024-04-10 13:41:52 +02:00

Programming language specification

Introduction

The Gemstone programming language is statically typed CPU based language. It incorporates many features from languages such as Rust, GLSL, C/C++ and Java.

Primitive data types

The most fundamental types of data. They provide the base upon everything else is built. A primitive type has 2 properties: size in bytes and arithmetic model (additionally specifies representation on bit level). There are exactly two primitive data types:

type name size in bytes arithmetic model default value
int 4 signed integer stored as two's complement with the MSB representing the sign bit 0
float 4 signed single precision floating point as specified by IEEE-754 0.0

Type cast

Primtive types can be logically transmuted into each other. A type cast tries to translate the logical value of a type into another one. This conversion, depending on the case, may be lossy. A lossy conversion occures when a float is converted into an integer. In this case the fraction of the float is truncated. In the case that an integer is converted into a float, some significants of the integer may be altered due to the fact that floating points have much more a limited count of significant digits than equally sized integers. A table detailing all conversions can be found below:

:::info A variable is Out-Of-Bounds if at conversion to another type causes an overflow, underflow or the value cannot fit into the target range. :::

Destination type Source type Conversion method
int float truncation of fraction, saturation on out-of-bounds, zero for IEEE-754 NaN
float int integral is copied into floating-point format with the fraction being zero. Saturation on out-of-bounds

Syntax

Type cast can be performed manually with the syntax:

<expr> as <composite>

Reinterpret cast

Types can be reinterpetet. This means that the least significant bit pattern of the current value of the type is preserved during conversion. A float or integer is only looked at by their bitwise representation not by their logical value.

Example

The equivalent logical values of an integer and float:

  • 3
  • 3.0

Their corresponding bitwise representation (may depend on platform):

  • 00000000000000000000000000000011
  • 01000000010000000000000000000000

As you can see both have equal logical values but very different bit pattern. A reinterpret cast will try to retain this pattern. Any reinterpret cast between two identically sized types is lossless. At conversion to a type of smaller size the most significant bytes are dropped. Otherwise on conversion to larger sized types the most significant bits are filled with zeros.

Syntax

A reinterpretation cast can be performed manually with the syntax:

(composite) <expr>

Composite data types

Primitive data types provide the foundation upon which direct derivations can be built. The derivations of primitive types we call composite types. A composite type modifies or extends the definition of a primitive or another composite type. Extensions or modifications can mean anything from adding or removing a sign bit to exchanging the arithmetic model and other. The declaration of a composite data type is made with the following syntax:

<sign> <scale> <base> = $name

Sign

The <sign> is a keyword specifying whether to implement a sign bit or not. Possible values are:

  • signed: for a single sign bit
  • unsigned: for no sign bit

Scale

<scale> describes the scaling factor to use. This is a rational factor used alongside the byte size of the base. To calculate the final size of the composite type multiply all factors with the base types size in bytes.

Possible values

Name Factor
short 2^{-1}
long 2
half 2^{-1}
double 2

Examples:

IEEE-754 double precision floating point: long float with size: 4 bytes * 2 = 8 bytes
32-Bit integer: half long int with size: 4 bytes * 2 * 2^(-1) = 4 bytes
A single byte: half short int with size: 4 bytes * 2^(-1) * 2^(-1) = 1 byte

Limitations:

The size of composite type cannot exceed 32 and be lower than 1 nor be a non integral value. As an example the maximum possible integer is a double double double int for 32 Bytes. A notable exception is the primitive type float. Its size is constrained to 2, 4, 8, 16 and 32 in order to remain compliant with the IEEE-754 standard. A platform may choose to only provide implementations for only a few of the previously named sizes. On x86, x64 and ARM the most common composite floats are of the sizes: 4 (float) and 8 (double).

Base

The base of a composite type specifies on which primitive or composite type a derivation is to be built.

Example composite types

long float = double_precision
short short int = ascii
double int = long_int
unsigned half int = word
unsigned int = dword 

References

References are abstracted pointers for specific composites. They refer to the location in memory where a contigous block of memory is stored. This block is divided into heterogneous sections of statically defined composite types. Unlike pointers, the raw memory address of references is immutable. Only the data at the addressed memory location can be altered; not the position itself. Individual sections can be accessed via an index. Indexing start at 0 and continues in integral increments until the end of the memory block. Each section has the capacity to store a single instance of the specified composite type.

Declaration

A reference can be declared in this way:

ref <reference-composite>|<composite>: $identifier

Whereas a reference-composite is another reference declaration. Thats means reference declarations can be recursive. A nested reference declaration is equivalent to multi-dimensional pointers. A reference referencing another reference of a primitive data types is pointing to a block of references which each individually point to a primitive data type.

:::info References can be recursive :::

Dereferencing

Accessing a section for writing and reading is called dereferencing. The syntax to access a specific section of a reference is as follows:

$identifier|<variable>[<expr>]

:::info Variables, expressions and constants used as index must resolve to an integral composite type at compile time. :::

Memory allocation

The standard library and subsequent platform has to provide a function inside the mem module with the following declaration:

alloc<T>(out ref T, in int: len)
ref float a = alloc<float>(5)
alloc(out ref int, in int: len)
ref float a = (ref float) alloc(5)

Examples

ref int: array = alloc(3)
ref int: g = alloc(9)

array[0] = 7
g = abc[array[2]]

Custom type definitions

Custom types can be defined in the following way:

type <composite-type>: <name>

Example type definitions:

type long float: f64
type short short int: byte

Storage qualifier

Variables may have storage qualifier. These keywords describe how a variable is to be stored upon its creation. Available options are:

  • local: store variable on function-local stack
  • global: store variable on dynamically allocated process-level heap
  • static: store variable in process-level data segment with static allocation

Storage qualifiers must prefix the compsite type and cannot be used in parameter declaration.

:::info Storage qualifier are mutally exclusive to each other :::

:::info The default storage qualifier is static. :::

Variables

A variable is a monolithic and heterogenouslly typed block of memory with an optional identifier associated to it at compile time. Variables must be declared and initialised before their usage. However declaration and initalisation can be separated syntactically as well as temporally as long as the previously mentioned condition is not violated.

Declaration

To declare a variable use the following syntax:

<storage-qualifier> <composite>: $identifier

Example declarations:

long double: pi
short int: a_single_byte 

:::info Variables must only be taged with a single qualifier at once. :::

Initialisation

To initialise a variable use the following syntax:

$identifier = <expr>

Example declarations:

long double: pi
pi = 3.1415926

Definition

To define a variable use the following syntax:

<storage-qualifier> <composite>: $identifier = <expr>

Example declarations:

long double: pi = 3.1415926

Shadowing

Any variable can be redefined on any basis. This means not the reuse of existing memory but rather binding an existing identifier to a new block of memory. On shadowing a variable the previous data becomes inaccessibleby the identifier but may be still reachable via previously defined references.

local int: a = 5
local ref int: b = a
local float a = 7.0

In this example the reference b still yields the value 5 stored in the original memory of variable a as is was defined in the first line. After the shadowing of variable a the new value of a occupies indipendant memory and has no influence of reference b.

flow control

if


if (bedingung) 
{


}

if bedingung {funktion}

if x == 5 {blahaj()}

bedingung MUSS ein primitive oder composite sein mit 0 == false und !0 == true 
bedingung kann keine Funktion sein. funktionen müssen zuerst aufgelößt werden mit einer variable.


optional else:

if bedingung {funktion} else  {funktion}

optional else if:

if bedingung {funktion} else if bedingung {funktion} 

while


while bedingung {fuktion}

bedingung MUSS ein primitive oder composite sein mit 0 == false und !0 == true 
bedingung kann keine Funktion sein. funktionen müssen zuerst aufgelößt werden mit einer variable.

x == 0
while x < 5
{
    . . . 
    x + 1
}



Functions

Functions group together expressions. A function is uniquely identified by an identifier. They declare at least one list of parameters. Parameter lists declare the order an type of variables which are passed to the function when calling and are available during its execution. Parameter declarations can be extended with IO-qualifier.

IO-qualifier (Input/Output-Qualifier)

A parameter declaration has to be IO-qualified. This means its has to be definied how caller and the function interface with a parameter. Parameter are not always readable from inside a function. Writeable are only parameter declared with the IO-qualifier out, refered to as output. These parameter are 'returned' to the caller of the function. Additionally parameter can be initialized with a default value. Output variables without default value will be initalized with the types default value. Parameter declared with in are passed down from the caller into the function and are read only inside the function. Any parameter without any IO-qualifier are automatically handled as if declared with in. The qualifier in and out can be combined and used together. In this case the parameter is initialised by the caller with an optional default value and be written into by the function. The caller can then retrieve the parameter back. Output parameters inserted by the caller which are already initialised will be overwritten by the function call. Output parameter without default parameter have to be initialised inside the function body.

Qualifier Access inside function body Returned to caller
in Read no
out Write yes
in out Read, Write yes
no qualifier Read no

Function declaration example

fun foo(in double float: a, in double float: b)(out double float: c)
{
    c = a + b
}

Function call example

fun foo(in double float: a, in double float: b)(out double float: c)
{
    c = a + b
}

fun abc(in ref int: output)
{
    output = 99
}

fun main()
{
    local double float: a = 7
    local double float: b = 9
    local double float: c

    foo(a, b, c)
    
    local ref int float: d = c 
    abc(d)
}

:::warning Unlike other languages (Java, C, ...) gemstone has no special handling for function return types. :::

Functions must be defined in global scope and cannot be nested. Functions can be declared but its body must be defined eventually. In the following example we declare a function without its body. The declared function can be used freely. However, if no implementation is provided at some point during compilation an error is to be thrown as no executable code is found.

fun divide(in int: a, in int b, out int c)

fun divide(in int: a, in int b, out int c)
{
    c = a / b
}

Operations

A operatiton is a connection of one or two inputs and reuturns one output. For all opeerations a reference has to be dereferencized.

Logical

A Comparison wants as input two variables that are the same datatype and return an integer with 1 for true and 0 for false. When the variables are not the same type the compiler return an error

a == b: is equals

Boolean

Boolean operations want two integer as inputs and return an integer with 1 for true and 0 for false. A not gets one integer as Input and return one integer with 1 for true and 0 for false.

If one of these operations gets a float as input then the Compiler reinterpret cast the float to an int.

a && b: and
a || b: or
a ^^ b: xor

!!(a) : not

Bitwise

Bitwise operations wants two integer as inputs and return an integer with the same size as the input and 1 for true and 0 for false for every bit. floats as inputs return an error from the compiler.

 a & b: bitwise and
 a | b: bitwise or
 a ^ b: bitwise xor
 
  !a  : bitwise not

algebra

Algebra operations are normal mathematical operations. The operations return the more powerfull datatype. that means with float and int the float will be returned. also the bigger scale is returned. So when you have a double datertype and a short the double is returned. The return datatype is the combination of both.

addition

 a + b: addition
 a - b: subtraction
 a * b: multiplication
 a / b: division

to compare to variables with "or", "and" or "xour" you can use the below defined syntax.

// or
a || b

// and 
a && b

//xor
a ^^ b
boolean algebra for bits

    c = a | b: a or b 
    c = a & b: a and b
    c = a ^ b: a xor b
    c = !(a) : not a
    
    only for ints
    a and b has to be the same datatype
    expression return the statements as the datatype of a

Container types

A Container is a complex type and uses the keyword box to initialize it. a box can contain other types as well as other boxes. Types in the box are bound to the box and cant be set with storage qualifyer. boxes are not composite types so you can't declare it with a scale or a sign.

Boxes are directly initialized with all content set with the default values. You can use or modify the content by the name of the content after the instance of the box seperated by dot.

A Box can have functions that require an object of the box to use them on. To get the content of the current instance the keyword self has to be used

When an instance of a box is intitialized you can directly use a function on that instance in the same line with an explicite syntax.

type box: rectangle{
    float: a
    float: b = 43

    fun area(out float: area) {
        area = self.a * self.b
    }
    
    fun new(in float: a, in float: b) {
        self.a = a
        self.b = b
    }
    
    fun fill(in float: a, out float b) {
        self.a = a
        b = a
    }
}

rectangle: square
square.new(2, 3)


float: b
rectangle: square2.fill(2, b)

Modules

Code in a file is referred to as a module. Modules can be importet into one another. Modules can be imported with the import keyword followed a literal specifying the relative path to the file with the module source. To use a function within a certain module you have to specify the module the name of the file seperated with a point.

File bar.gem

fun foo() {

}

File main.gem:

import "bar.gem"

fun main() {
    bar.foo()
}

Imports cascades over modules so if a Module is imported in another module the main file can refer to both with the filename as the name of the module. Every Module name has to be unique and throws an error if not.

File foo.gem

fun foo() {

}

File bar.gem

import foo.gem

File main.gem:

import "bar.gem"

fun main() {
    bar.foo.foo()
}

To hide functionalities inside modules so another file can't use them the keyword silent is used. In the example the function bar and the variable number can't be used in main.

File bar.gem

fun foo() {

}

silent fun bar() {

}

silent int: number

File main.gem:

import "bar.gem"

fun main() {
       bar.foo()
}

For boxes the silent keyword can be used for the whole box and/or specific functions inside of it.

silent type box: foo {
    
    int: number
    
    silent fun bar () {}
    
}

Comments

comments or ignored characters starts with # and ends with a line break.


#comment
#########comment

int: foo #comment



Compiler Meta Functions

Compiler meta functions are predefined "functions". They are called like normal functions and have repective deklarations but are builtin. They are not executed during runtime but are always evaluated by the compiler at compile time.

Typeof

resolves the given T's type to an UTF-8 encoded string typeof(type gen: T)(in ref T, out ref short short int: typename)

Sizeof

resolves the given T's type size in bytes sizeof(type gen: T)(in ref T, out unsigned long int)

File name

returns the current name of the file the function is called in filename(out ref short short int, out unsigned long int)

Function name

returns the current name of the function it is called in funname(out ref short short int, out unsigned long int)

Line number

returns the current line number the function is called in lineno(out unsigned long int)

Check for language extension

returns one if the specified extension (UTF-8 encoded string) is supported by the current compiler and zero otherwise extsupport(in ref short short int, out unsigned short short int)

Language Extensions

A extension adds extra functionality to the base language.

Generics Extension

Extension name: GEM_EXT_GENERIC

A generic defines a type alias. Generic types have an unknown size at declaration time. Respective type options can be limited by Schemes. Generics must be unique at any time. They cannot be overwritten by other generics or type defintions.

type gen: T = Add, Sub, Mul, Div

fun foo(in T: some)

Generics can be inlined into function parameter lists. Generics defined within parameter lists are only valid within the scop of the function they are declared in.

fun foo(type gen: T, type gen: U)(in T: t, out U: u)


short int: buffer
short short int: number

foo(buffer, number)

foo(short int, short short int, buffer, number)

note, that you can set the types, but only none or all. If the variables dont have th right type the compiler throws an error.

The types of generics can be resolved to a concrete type with the of syntax. This syntax adds the keyword of to box type deklarations using generics. With this syntax you can specify concrete types to generics defined in declared order (top to bottom).

type Box: Boxname = Scheme1, Scheme2{
    
    type gen: generic  
    generic: buffer  

}

Boxname of int: box_with_int

Composition and polymorphism

Schemes define certain functionality that is to be implemented by a given type. They define functions a type has to implement in order to be a member of the schemes accountants.

scheme Area {
    fun area(out float: area)
}

type box: Rect = Area {
    float: a
    float: b

    fun area(out float: area) {
        area = self.a * self.b
    }
    
    fun new(in float: a, in float: b) {
        self.a = a
        self.b = b
    }
}

scheme cube = Area{
    fun cube( out float: cube)
}

Schemes can inherit the functionalities of other schemes:

type unsigned short short int: ascii

scheme Stringify {
    fun stringify(out ref ascii)
}



scheme Serialize = Stringify {
    type gen: T
    
    fun serialize(in ref ascii)
}

The number of type that can be inferred for a generic can be limited with schemes. A generic that is marked with a specific scheme will only accept type that implement that scheme.

scheme Add {
    type gen: T
    
    fun add(in T, in T, out T)
}

scheme Sub {
    type gen: T
    
    fun sub(in T, in T, out T)
}

scheme Algebra = Add, Sub {
}

fun do_math(type gen: T = Algebra)(in T, in T)

schemes can be silent for modules. Every function within it is silent when implemented. If a silent scheme is inheritated then the new scheme is implicid silent.

silent scheme Add {
    type gen: T
    
    fun add(in T, in T, out T)
}

scheme Algebra = Add {

}

#SAME AS

silent scheme Algebra = Add {

}

. . . 


type box: foo = Algebra {

    fun add (in foo )

}

#SAME AS

type box: foo = Algebra {

    silent fun add (in foo )

}

Standard library

Memory management mem

Ability to allocate and deallocate memory.

fun alloc(in unsigned long int: len, out ref gen: ptr)

fun realloc(in unsigned long int: len, in out ref gen: ptr)

fun free(in ref gen: ptr)

fun copy(type gen: T)(in ref T: src, in ref T: dst)

fun fill(type gen: T)(in ref T: src, in T: elem)

fun append(type gen: T)(in ref T: src, in T: elem)

fun swap(type gen: T)(in ref T: a, in ref T: b)

Common data structures data

Contains boxes which implement common data types used in various other languages such as:

  • Stack
  • Queue
  • Array
  • LinkedList
  • ArrayList
  • Hashmap
  • BTree

UTF-8 Strings str

Growable UTF-8 encoded string data structure and string processing functions.

Files fs

Create, read and write to files.

Input and Output io

Access to standard input, output and error streams.

Formatting fmt

Functions for formatting various arguments into strings and writing them into streams.

Multithreading thread

Access to native operating system threads

Atomic data structures and synchronisation mechanisims atom

Includes:

  • Semaphore
  • Mutex
  • Channel

Native operating system os

Access to information about operating system or platform specific functionality