Introduction

What is Universal Machine Code?

Universal Machine Code is an attempt to provide a portable bytecode, similar to WebAssembly. It uses a register-based machine to achieve fast execution, and the reference virtual machine is on par with existing, fast WebAssembly interpreters such as wasmi. The syntax and semantics of UMC have been explicitly designed to make JIT-compilation to common ISAs such as x86, Arm and RISC-V simple and fast.

Below is an example of iterative fibonacci in UMC.

.FIB:
  mov u64:0, #30
  bz .END, u64:0

  mov u64:1, #0 ; int cur = 0
  mov u64:2, #1 ; int next = 1
  mov u64:3, #0 ; int i = 0

.LOOP: ; u64:3 next
  mov u64:4, u64:2        ; int next2 = next
  add u64:4, u64:4, u64:1 ; next2 += cur
  mov u64:1, u64:2        ; cur = next
  mov u64:2, u64:4        ; next = next2
  add u64:3, u64:3, #1

  dbg u64:3
  dbg u64:0

  xor u64:8, u64:3, u64:0 ; Compare :3 and :0
  bnz .LOOP, u64:8
.END:
  dbg u64:1

UMC Registers

There are a unlimited number of registers in UMC. Each type (including width) is a separate set of registers.

Register Types

Register Set	Syntax	Widths	Description
Unsigned Int	`u32:0`, `u8:0`, `uX:0`	Any	Unsigned Integers
Signed Int	`i32:0`, `i64:0`	Any	Signed Integers
Floats	`f32:0`, `f64:0`	32/64	Floating Point Numbers
Memory Address	`m:0`, `m:1`	At least 32-bit	Memory location
Instruction Address	`n:0`, `n:1`	At least 32-bit	Instruction location

Vector registers

All register types can be extended to a vector register type.

Register Set	Syntax	Description
Unsigned Int	`u32x4:0`	Vector of 4 Unsigned 32-bit Integers
Signed Int	`i64x4:0`	Vector of 4 Signed 64-bit Integers
Float	`f32x8:1`	Vector of 8, 32-bit Floats
Memory Address	`mx8:0`	Vector of 8 memory addresses
Instruction Address	`nx8:0`	Vector of 8 instruction addresses

UMC Instructions

Below is a list of all instructions in UMC. Instructions are written like:

add u32:2, u32:0, u32:1

which adds the value in u32:0 and u32:1 and stores the result in u32:0

Misc

Mneumonic	Operand 1	Operand 2	Description
`mov`	Reg	Any	Sets op1 = op2
`cast`	Reg	Any	Sets op1 = op2, casting
`nop`			Does nothing
`dbg`	Reg		Prints the value and register for debugging

Arithmetic

Mneumonic	Operand 1	Operand 2	Operand 3	Description
`add`	Numeric Reg	Numeric	Numeric	op1 = op2 + op3
`sub`	Numeric Reg	Numeric	Numeric	op1 = op2 - op3
`mul`	Numeric Reg	Numeric	Numeric	op1 = op2 * op3
`div`	Numeric Reg	Numeric	Numeric	op1 = op2 / op3
`mod`	Numeric Reg	Numeric	Numeric	op1 = op2 `mod` op3

Jumps, Calls and Conditional Branches

Mneumonic	Operand 1	Operand 2	Description
`jmp`	Label		Unconditional jump to the given label
`jal`	Label	Instr Reg	Jumps to the given label, setting the instruction register to the next instruction
`bz`	Label	Numeric	Jump to label if op2 is zero
`bnz`	Label	Numeric	Jump to label if op2 is not zero

Comparison Operations

Comparison operations take an unsigned destination register (minimum width 1), and sets the destination to 1 if true, otherwise sets to 0. Comparisons are only allowed between the same register sets.

Mneumonic	Operand 1	Operand 2	Operand 3	Description
`eq`	Unsigned Reg	Numeric	Numeric	\(op2 = op3\)
`gt`	Unsigned Reg	Numeric	Numeric	\(op2 \gt op1\)
`gte`	Unsigned Reg	Numeric	Numeric	\(op2 \ge op3\)
`lt`	Unsigned Reg	Numeric	Numeric	\(op2 \lt op3\)
`lte`	Unsigned Reg	Numeric	Numeric	\(op2 \le op3\)

Bitwise Operations

All (signed and unsigned) integers are defined to be stored in twos-complement representation. The following bitwise operations work only between integer types.

Mneumonic	Operand 1	Operand 2	Operand 3	Description
`and`	Integer Reg	Integer	Integer	Bitwise AND
`or`	Integer Reg	Integer	Integer	Bitwise OR
`xor`	Integer Reg	Integer	Integer	Bitwise XOR
`not`	Integer Reg	Integer	Integer	Bitwise NOT

Memory

Blocks of memory can be allocated with the alloc instruction.

Mneumonic	Operand 1	Operand 2	Description
`alloc`	Address Reg	Unsigned	Allocates a continguous block of memory X bytes long
`free`	Address Reg		De-allocates a block of memory
`load`	Reg	Address Reg	Loads the given register from the address
`store`	Address Reg	Any	Stores a value into the address
`msize`	Unsigned Reg		Sets op1 to the size of a memory address in bytes
`isize`	Unsigned Reg		Sets op1 to the size of a instruction address in bytes

Binary Format

This RFC defines the binary format for UMC.

All UMC binary files begin with the following 16-byte magic data:

Magic: 0x7F 0x55 0x4D 0x43 0x20 0x42 0x79 0x74 0x65 0x64 0x6F 0x64 0x65 0x00 0x00 0x00

This translates to the following ascii string: \x7FUMC Bytecode\0\0\0

All UMC binary files are then immediately followed by two bytes describing the UMC binary format version:

Major Version: 1 byte
Minor Version: 1 byte

If a UMC Virtual Machine encounters an incompatible major version, it should refuse to execute the program. If a UMC Virtual Machine encounters a minor version higher than what is supported for that major version by the Virtual Machine, then it should refuse to execute the program. Lower minor versions than are supported by the UMC VM for a particular major version will work correctly on VMs that correctly follow the UMC Specification for a higher minor version.

Little Endian Base 128 (LEB128)

The UMC Binary File format extensively uses LEB128 for integers.

Version 0.3

Type Header Table

The first header found in a 1.0 UMC Binary file contains the register type header, which maps type indices to types

This table first starts with the number of entries:

Entry Count: Unsigned LEB128

Followed by each entry in series:

Control Byte:
  - Length Flag:   1 bit
  - Constant Flag: 1 bit
  - Reserved (0):  3 bits
  - Register Type: 3 bits
Width: Unsigned LEB128
Length: Unsigned LEB128 (Only if indicated in the control byte)

Duplicate entries are currently not permitted, as they may be significant in future versions. Duplicate entries are permitted for register entries (constant flag not set). A instruction references a different register if it has a different type index, even if the register index is the same. This allows using encoding more registers of the same type in the same number of bytes. For example, it allows 256 unique register operands to be encoded instead of 128 in two bytes.

The register types are encoded as follows:

Register Type	Encoding
Unsigned Integer	0b000
Signed Integer	0b001
Float	0b010
Memory Address	0b011
Instruction Address	0b100

Pre-initialised memory Table

UMC supports pre-initialised memory regions which can be referenced by a memory label, which is encoded as a memory constant. The pre-initialised memory table immediately follows the RT Header, and begins with the number of entries:

Entry Count: Unsigned LEB128

Then for each entry is stored contiguously:

Data Length: Unsigned LEB128
Data: Variable number of bytes

Instruction Encoding

Immediately after the type header table, are all instructions, one after the other.

The first byte of each instruction is the opcode:

Op Code: 1 byte

Op Code	Value
NOP	0b000000
MOV	0b000001
ADD	0b000010
SUB	0b000011
MUL	0b000100
DIV	0b000101
MOD	0b000110
JMP	0b001000
JAL	0b001001
BZ	0b001010
BNZ	0b001011
EQ	0b001100
GT	0b001101
GTE	0b001110
AND	0b010000
OR	0b010001
XOR	0b010010
NOT	0b010011
ALLOC	0b100000
FREE	0b100001
LOAD	0b100010
STORE	0b100011
SIZE	0b100100
CAST	0b110001
ECALL	0b110100
DBG	0b111111

Fixed Instruction Encoding

The majority of instructions are encoded using this format, since for a given op code they have fixed number of arguments. Each operand begins with its type:

Operand Type: Unsigned LEB128

This operand type is an index in the Type Header Table. Depending on this entry, one of the following values are then decoded:

Register Index: Unsigned LEB128

Unsigned integer constants are encoded as:

Constant Value: Unsigned LEB128

Signed integer constants are encoded as:

Constant Value: Signed LEB128

Float constants are encoded as:

Constant Value: 4 or 8 bytes

The number of bytes for the constant value is dictated by the width’s bytes, rounded up. The only supported float constant types are f32 and f64, which use 4 and 8 byte constants respectively.

Memory Label Constants are encoded as:

Constant Value: Unsigned LEB128

These refer to an index in the pre-initialised memory table.

Instruction Label Constants are encoded as:

Constant Value: Unsigned LEB128

Variable Instruction Encoding

This encoding is used for the ECALL opcode. The variable instruction encoding first starts with the operand count:

Operand Count: Unsigned LEB128

And then the encoding is the same as the fixed instruction encoding.

Sizeof Instruction Encoding

msize and isize are encoded like a fixed instruction with two operands:

Register Operand, but with no index
Unsigned Register Operand

Memory Model

UMC uses little-endian byte-ordering, and the smallest unit of memory is the byte.

A given number of bytes can be allocated with the alloc instruction:

alloc m:0, #100

This gives a memory address in m:0, which can be used for load and store operations:

mov u32:0, #7
store m:0, u32:0
load u32:1, m:0

This should be free’d once it is no longer needed:

free m:0

Memory addresses can be offset using add:

add m:0, #10

but it is the programmer’s responsibility to ensure that they address is incremented enough for the given data type.

Memory and instruction addresses have an opaque memory representation. It is undefined behaviour to do anything other than load a memory address from where a memory address is stored, and the same with an instruction address. However, the msize and nsize instructions allow you to know how much room an address will take up:

; Allocate enough for a memory address to be stored
msize u32:0
alloc m:0, u32:0

Pre-allocated strings and arbitrary bytes are supported through memory labels:

&HELLO: "hello"

; Load the first character of "hello" into u8:0
load u8:0, &HELLO

Environment Calls

Environment calls are like system calls in UMC. An environment call requests the the VM perform privilleged, (likely platform specific) action on its behalf.

UMC External Call Table

Name	id	Details	Short Description
`exit`	`0x0`	exit	Halt the UMC VM with the given exit code
`open`	`0x1`	open	Open a file
`close`	`0x2`	close	Close a file
`read`	`0x3`	read	Read bytes from a file
`write`	`0x4`	write	Write bytes to a file
`getarg`	`0x10`	getarg	Retrieve an arbitrary bytes argument passed to the UMC VM by index

External Call List

exit

Exits with the given exit code

Parameters:

Result Register (Ignored)
External Call Code
Exit Code

ecall u1:0, 0x0, #1 ; Exit the VM with exit code 1

open

Open attempts to open a file with the given filename. On success, it returns a positive integer filename handle

Parameters:

Result Register (signed integer)
External Call Code
Null terminated filename (memory address)

ecall i32:1, 0x1, m:0

close

Close the given file. On success, 1 is stored into the result register Parameters:

Result Register (unsigned, at least 1 bit)
External Call Code
File handle (unsigned integer)

ecall u1:0, 0x2, u32:1

read

Read up to the given number of bytes from the file. On success, the number of bytes read is stored into the result register. Parameters:

Result Register (unsigned)
External Call Code
File handle (unsigned integer)
Buffer (memory address)
Maximum number of bytes to read (unsigned integer)

ecall u64:1, 0x3, u32:0, m:1, u64:0

write

Write the given number of bytes from a memory address into a file Stores the number of bytes successfully written into the result register Parameters:

Result Register (unsigned)
External Call Code
File handle (unsigned integer)
Buffer (memory address)
Maximum number of bytes to write (unsigned integer)

ecall u64:0, 0x4, u32:0, m:2, u64:1

getarg

Retrieve an user-provided argument given the UMC Program at runtime.

The first argument is always a memory address of the null-terminated name of the program. The remaining arguments are provided by the user when the UMC VM is started.

All values are copies of the original arguments, and the same arguments can be retrieved multiple times in the program. Memory addresses are also “copies”, but values may be stored in them - this can allow returning parameters to the caller.

UMC programs that use this environment call should document their required parameters. Calling getarg with type or argument number that doesn’t match the provided arguments will cause the program to terminate.

Examples:

ecall m:0,   0x10, #0 ; Get 0th argument - null terminated name of the program
ecall m:1,   0x10, #1 ; Get first argument as a memory address
ecall u64:0, 0x10, #1 ; Get first argument as a 64-bit unsigned integer
ecall i64:1, 0x10, #2 ; Get second argument as a 64-bit signed integer
ecall f32:0, 0x10, #3 ; Get third argument as a 32-bit float

Basic Architecture

The original proposition of UMC is to create a register-based VM bytecode (as opposed to a stack-based VM). LuaJIT successfull implemented a register-based bytecode with significant performance improvements [1]

Syntax Proposal - Typed Register Sets (Accepted)

;;;;; FULLY TYPED REGISTER SETS PROTOTYPE

; Pros:
; No implicit casting since independent register sets
;  - Vector registers can't be reshaped
;  - No undefined behaviour
;  - Easy to implement registers as vecs

; Cons:
; Very verbose
;  - Can be mitigated by implicit register type
;    add u32:0, :0, :1


fib:
  ; i32:0 (n)
  b i32:0 < $0, .L4
  mov u32:0, $0 ; int cur = 0
  mov u32:1, $1 ; int next = 1
  mov u32:2, $0 ; int i = 0
.L3:
  ; u32:3 next
  mov u32:3, u32:2 ; int next2 = next
  add u32:3, u32:3, u32:0 ; next2 += cur
  mov u32:0, u32:1 ; cur = next
  mov u32:2, u32:3 ; next = next2
  add u32:2, u32:2, $1
  b u32:2 != n, .L3
.L4:
  mov i32:0, u32:0

sum:
  ; a:0   - Array of integers
  ; u32:0 - Length of the array
  b i32:0 == 0, .L8    ; If empty, terminate
  mov i32:1, $0        ; int total = 0
  shl u32:0, u32:0, $2 ; Multiply by 4 to get length in bytes
  add a:1, a:0, u0       ; end address = start address + length
.L8:
  load u32:2 a:0
  add i32:1, i32:1, u32:2
  add a:0, a:0, $4
  b   a:0 != a:1, .L8
.L9:
  mov i32:0, $0
  ret

;;;;;;;;;;; IMPLICIT MULTIPLE OPERANDS ;;;;;;;;;;;;

fib:
  ; i32:0 (n)
  b i32:0 < $0, .L4
  mov u32:0, $0 ; int cur = 0
  mov u32:1, $1 ; int next = 1
  mov u32:2, $0 ; int i = 0
.L3:
  ; u32:3 next
  mov u32:3, :2 ; int next2 = next
  add u32:3, :3, u32:0 ; next2 += cur
  mov u32:0, :1 ; cur = next
  mov u32:2, :3 ; next = next2
  add u32:2, :2, $1
  b u32:2 != n, .L3
.L4:
  mov i32:0, u32:0

sum:
  ; a:0   - Array of integers
  ; u32:0 - Length of the array
  b i32:0 == 0, .L8    ; If empty, terminate
  mov i32:1, $0        ; int total = 0
  shl u32:0, :0, $2 ; Multiply by 4 to get length in bytes
  add a:1, a:0, u32:0       ; end address = start address + length
.L8:
  load u32:2 a:0
  add i32:1, i32:1, u32:2
  add a:0, a:0, $4
  b   a:0 != a:1, .L8
.L9:
  mov i32:0, $0
  ret

;;;;;;;;; VECTOR INSTRUCTIONS ;;;;;;;;;;;;;

doublearray:
  load  u32x8:0, a:0
  mul   u32x8:0, u32x8:0, $2
  store u32x8:0, a:0

Original Syntax Proposal - Types don’t include widths

;;;;; Instructions

; Based off RISC-V basic instruction set
; https://projectf.io/posts/riscv-cheat-sheet/#arithmetic


; Pros:
; Large reduction in instructions achieved
; - Lots of instructions have a signed/unsigned variant - add, div, mul, even bit shift, cond. jump
; - Instructions overloaded, or only work on some types of registers
; - Simple idea with different applications
; - Register names (not numbers) describe their capabilities
; Specifying width for same registers matches hardware
;  -  Can be simulated if not big enough
; Vector registers having different name, to avoid implicit conversions
; Destination register gives type of operation

; Cons:
; Undefined behaviour to reshape vector registers?
; - Semantic checker for different acceses?
; Lots of implicit sign casting
; Lots of repeating the same widths?
; - Could use implicit from destination like:
;   add u0:32 u0: u1:

; Arithmetic
add u0:32, $100 ; Immediate
add i0:32, i1:32, i2:32  ; Registers, all same, ints
add f0:32, f1:32, f3:32  ; Registers, all same, floats
add i0:64, i1:64, u1:32  ; Mix of types and widths

cast i2:64, u1:32
add [i64] %0, %1, ~~~u1:32~~~

neg i1:32                ; cast to i1, then negate
sub i0:32, u1:32, u2:32  ; Can subtract unsigned numbers (can the result be signed?)

mul u1:32, u2:32, u3:32  ; Multiplication of several signed numbers
mul u1:64, u2:32, u3:32  ; Multiplication (completely safe)
mul i1:64, i2:32, u3:32  ; Signed Multiplication

div u3:32, u2:32, u1:32  ; Unsigned division
div i3:32, u2:32, i1:32  ; Signed division

; Bitwise logic
and i3:32, u2:32, i1:32  ; Just works on bits. Only width needs to match.
not i0:32, i0:32         ; same again
or  i0:32, i0:32         ; same again
xor i0:32, i0:32         ; same again

; Shifts

shl i0:32, i0:32, $2 ; Arithmetic left shift (x2)
shr u0:32, i0:32, $2 ; Logical shift (determined by destination)

shl i0:32, i1:32, i2:32 ; Arithmetic left shift
shr u0:32, i1:32, i2:32 ; Logical right shift (determined by destination)

; This is quite good, saves a few instructions

; Memory accessing
load u0:32, 0xDEADBEEF ; Loads a 32 bit value into the register
load u0:8,  0xDEADBEEF ; Loads a 8 bit value into the register

mov u0:8, 0xFF
stor 0xDEADBEEF, u0:8
stor 0xDEADBEEF, u0:32 ; Store 32 bits even though its 8 bit

; Immediate offset in load / store?
load u0:32, 4(0xDEADBEEF)

; Jumps

j    0xBEEF     ; Jump to immediate address
j    .LABEL     ; Jump to label (labels have type aX)
jal  a0, 0xBEEF ; Jump to immediate address, store next instruction address in a0
jalr a0, a1, $4 ; Jump to a1+4, store next instruction address in a0 (Should immediate be multiplied by instruction size?)

call .LABEL ; Jump to 
ret         ;


; Conditional branches
beq u0:32, i0:32, .LABEL ; Branch if the two are equal
bne u0:32, i0:32, .LABEL ; Branch if the two are not equal

bgt u0:32, i0:32, .LABEL ; Branch if greater than
bge u0:32, i0:32, .LABEL ; Branch if greater than or equal to

; Allow aX instead of label above?

;;;; zero-register, like ARM64 and RISC-V? Partly an encoding concern

seq u0:1, i0:32, i0:8 ; Set u0 to 1 if equal
sne u0:1, 

; Alternative?
add %0:u32, %1:i32, %2:i32

; Vector registers?
; iv, uv, fv, av avoids

add iv0:32x8 iv0:32x8, iv0:32x8

add av0:x8, av0:x8, $4

;;;; Fibonacci example ;;;;

; Problem - how are parameters passed? As u0,u1,.. by default?
fib:
  ; i0:32 (n)
  b i0:32 < $0, .L4
  mov u0:32, $0 ; int cur = 0
  mov u1:32, $1 ; int next = 1
  mov u2:32, $0 ; int i = 0
.L3:
  ; u3: next
  mov u3:32, u2:32 ; int next2 = next
  add u3:32, u3:32, u0:32 ; next2 += cur
  mov u0:32, u1:32 ; cur = next
  mov u2:32, u3:32 ; next = next2
  add u2:32, u2:32, $1
  b u2:32 != n, .L3
.L4:
  mov i0:32, u0:32
  ret

;;;; Sum array ;;;;
sum:
  ; a0 - Array of integers
  ; u0 - Length of the array
  b u0:32 == 0, .L8    ; If empty, terminate
  mov i1:32, $0        ; int total = 0
  shl u0:32, u0:32, $2 ; Multiply by 4 to get length in bytes
  add a1, a0, u0       ; end address = start address
.L8:
  load u2:32, a0
  add  i1:32, i1:32, u2:32
  add  a0, a0, $4
  b    a0 != a1, .L8
.L9:
  mov i0:32, $0
  ret

;;;;; Vector instructions - Double array
doublearray:
  ; Assume multiple of 4
  load  uv0:32x8, a0
  mul   uv0:32x8, uv0:32x8, $2
  store uv0:32x8, a0

; Pick operations
pick u1:32, uv1:32x4, 0
pick u2:32, uv1:32x4, 1
pick u3:32, uv1:32x4, 2
pick u4:32, uv1:32x4, 3

pick uv1:32x4, uv0:32x8, 0 ; Take first 4
pick uv2:32x4, uv0:32x8, 4 ; Take second 4

; Put (reverse pick) operations
pick uv1:32x4, u1:32, 0

; Pick alternative - similar to ARM
mov u0:32, uv1:32x4[0]
mov u1:32, uv1:32x4[1]
mov u2:32, uv1:32x4[2]
mov u3:32, uv1:32x4[3]

mov uv2:32x2, uv1:32x4[2]

; Broadcast
brdc uv1:32x4, 0


; MUL V0.4S, V2.4S, V3.S[2] multiplies each of the four 32-bit elements in V2 by the 32-bit scalar value in lane 2 of V3, storing the result vector in V0.
; https://developer.arm.com/documentation/102474/0100/Fundamentals-of-Armv8-Neon-technology/Registers--vectors--lanes-and-elements

; is there any point having a stack pointer register?
; special case of a-type register

; How to detect overflow??
; add u3:32, u2:32, u1:32 : $overflow > u4:1

Alternative Syntax Proposal - No types

add  r0:32, $100
adds r0:32, r1:32, r2:32 $-100

neg r0:32, r0:32
sub r3:32, r2:32, r1:32 # r3 = r2 - r1, signed/unsigned doesn't matter

mul  r3:32, r2:32, r1:32
muls r3:32, r2:32, r1:32
mul  r3:64, r2:32, r2:32
muls r3:64, r2:32, r2:32

div  u3:32, u2:32, u1:32
divs u3:32, u2:32, u1:32

; Bitwise - Sign not relevant
and r3:32, r2:32, r1:32
not r3:32, r2:32, r1:32
or  r3:32, r2:32, r1:32
xor r3:32, r2:32, r1:32

; Shifts

sll r3:32, r2:32, r1:32 ; Logical left shift
slr r3:32, r2:32, r1:32 ; Logical right shift

sra r3:32, r2:32, r1:32 ; Arithmetic left shift
; No arithmetic right shift

load  r3:32, 0xDEADBEEF ; Load 32 bits into register
store 0xDEADBEEF, r3:32 ; Store 32 bits into memory

Alternative Syntax Proposal - Typed Instructions

;;;; TYPED INSTRUCTION PROTOTYPE

; Pros:
; No implicit casting since independent register sets
;  - Vector registers can't be reshaped
;  - No undefined behaviour

; Cons:
; %1 and %1 in different instructions, makes it hard to read
;  - This code is mostly going to be generated anyway
; Some instructions are supposed to have different types:
;  - load [u32] a0
;  - vector broadcast

; Memory leak of registers?

fib:
  ; i0:32 (n)
  b   [i32] %0 < #0, .L4
  mov [u32] %0, #0 ; int cur = 0
  mov [u32] %1, #1 ; int next = 1
  mov [u32] %2, #0 ; int i = 0
.L3:
  ; u3: next
  mov [u32] %3, %2     ; int next2 = next
  add [u32] %3, %3, #0 ; next2 += cur
  mov [u32] %0, #1     ; cur = next
  mov [u32] %2, %3     ; next = next2
  add [u32] %2, %2, #1 ; i++
.L4:
  mov [i32] %0, {u0:32}
  ret


sum:
  ; a0 - Array of integers
  ; u0 - Length of the array
  b    [u32] %0 == 0, .L8 ; If empty, terminate
  mov  [i32] %1, #0       ; int total = 0
  shl  [u32] %0, %0, #2   ; Multiply by 4 to get the length in bytes
  add  [a]   %1, %0, {u0:32} ; end address = start address + length
.L8:
  load [u32] %2, [a0] ; Load the memory address
  add  [i32] %1, %1, {u2:32}
  add  [a]   %1, %0, {u0:32}
  b    [a]   %0 == %1, .L8
.L9:
  mov [i32]  %0, #0
  ret

External calls / Syscalls in UMC

; External calls in UMC, function like exact order
ecall retreg, .LABEL?, u32:1, u32:0, f32:3, i32:1,

System V x86-64 ABI Translation

rdi <= u32:1
rsi <= u32:0
rdx <= i32:1

xmm0 <= f32:3

; System V can only return rax, but this is interpreted based on ecall
; e.g. ecall a:0   ; interpret as address
; e.g. ecall u32:0 ; interpret as u32
; e.g. ecall i64:0 ; interpret as i64
ret

UMC External call

.LABEL:
  u32:0 <= u32:1
  u32:1 <= u32:0

  f32:0 <= f32:3

  i32:0 <= i32:1

  ret X ; retreg <= X

Explicit and Implicit Casting

tl;dr: Widening is allowed, but all other casts must be explicit. Calculations are always done in the domain of the domain of the destination register.

Examples

Examples of what is allowed:

add u32:0, u32:0, u32:0 ; Great
add u64:0, u32:0, u32:0 ; Implicit widening cast

; Alowed, and recommended for large multiplications
; Would translate as multiply low and high
mul u64:0, u32:0, u32:0
mul u128:0, u64:0, u64:0

; Comparisons can use any width unsigned register
gt u1:0, u32:0, u32:0
gt u1:0, u64:0, u32:0 ; 64-bit comparison

; Use explicit casts where necessary

; Do 32-bit subtraction
mov u32:1, u64:0 ; Require u64:0 to fit in 32 bits
sub u32:2, u32:1, u32:0
; Do 64-bit subtraction, expect the result to fit in 32 bits
sub u64:2, u64:1, u32:0
mov u32:2, u64:2

mov u32:0, i32:0
gt u1:0, u32:0, u32:1 ; Performs unsigned comparison: Weird if i32:0 is negative

mov i32:0, i32:0
gt u1:0, i32:0, i32:0 ; Performs signed comparison: Weird u32:0 uses the top bit

What is not allowed

; This is not allowed, result depends on cast before vs after, confusing
sub u16:0, u32:0, u32:0

; Confusing comparisons
gt u1:0, u32:0, i32:0 ; This is actually Undefined behaviour in C when i32:0 is negative
; E.g. 0b1000, 0b0001 => Signed comparison interprets u32:0 as smallest negative
; E.g. 0b0000, 0b1000 => Unsigned comparison interprets smallest negative as bigger than 0

Types of casts

What operations do we need to do?

Narrowing cast (signed, unsigned or float)
Unsigned <-> Signed (bit or value-based?)
Unsigned <-> Float (Value-based)
Signed <-> Float (Value-based)

bcast - Binary / Bitwise cast - for twos complement conversions/narrowing vcast - Value cast - Best effort at encoding the same value with the new datatype

bcast u32:0, u64:0 ; Truncation
bcast i32:0, i64:0 ; Truncation

bcast u32:0, i32:0 ; Reinterpret two's complement i32:0 as a u32:0.
bcast i32:0, u32:0 ; Reinterpret unsigned as two's complement - if top bit is set then it signed overflow occurs

vcast f64:0, u32:0 ; Take the unsigned integer and convert it into its closest floating point representation
vcast f64:0, i32:0 ; Take the signed integer and convert it into its closest floating point representation

vcast f32:0, f64:0 ; Take the value of the f64 and re-encode it as a f32 (precision lost)

; Explicit vcast with specific behaviour?
abs u32:0, i32:0 ; Take the signed integer, make it positive so it definitely fits correctly (no vcast for u32 and i32?)

; Maybe just a fcast with specific behaviour too?
fcast f64:0, u32:0 ; Re-encode as a float
fcast f64:0, i32:0 ; Re-encode as a float
fcast f32:0, f64:0 ; Re-encode as a different precision float (not a straightforward bit-cast)

brdc u32x8:0, u32:0 ; Broadcast u32 to vector of u32s.
; Does it make sense to allow every instruction to operate on vector elements?
; Or just allow mov to put/pick (Leaning towards this, otherwise it signficantly complicates the instruction set)

Rationale

Many instructions need to specify whether to operate in signed or unsigned form, such as comparisons. The top bit of an unsigned integer can be confused with a signed negative number.

Doing something like:

sub u32:0, u64:0, u32:1

Does not equate to a single instruction in x86, RISC-V or ARM. It must do a 32-bit or 64-bit operation, with a narrow/widen before/after

Problems with using extra registers

The idea is that the casting instruction will be eliminated during JIT compilation. However, the cast causes there to potentially be 2 registers used for the same value (one of which may outlive the other).

For example, if the following code were to be JIT compiled:

mov u32:1, u64:0
sub u32:2, u32:1, u32:0

it might require one register to store the 64 bit value, and another to store the 32-bit value (one of which may live much longer than the other). In more complex examples, with many more registers, this might cause registers to overflow, even though u32:1 is really not needed. However, if necessary this can be solved by clearing the register in UMC:

mov u32:1, u64:0
sub u32:2, u32:1, u32:0
mov u64:1, #0 ; If the u64:0 value is never needed

Now, only one register is needed if the instruction set supports accessing 64-bit registers as 32-bit. The UMC JIT can set u64:0 to #0 without an output register, as it knows this is guaranteed.

UMC Memory Model

Four Instructions: alloc and free, load and store
One register type with archecture-dependent width: a:0
add and sub allowed with one address register and one (un)signed integer “offset”

Examples

alloc a:0, 4
store a:0, u32:1 ; Store 4-byte integer into address register
load u32:1 ; Load it as a 32 bit integer
free a:0 ; Frees the memory allocated earlier. a:0 is now invalid

Rationale

Allocating and de-allocating memory is a common task, which if we are running in the UMC VM, can easily be delegated to the VM.

Questions:

How do we allocate a block of memory large enough for a memory address if size is machine-dependent? A constant?

UMC Bytecode Format

In UMC we have not only “unlimited” registers, but also “unlimited” register types. These need to be represented in a fixed amount of space, which requires a variable-length encoding of instructions.

Constraints:

Unlimited registers and register types
Minimum 64 instruction opcodes, ideally more or extensible

Register-based bytecodes are generally faster than stack-based ones, but spend significantly longer on “instruction-fetch”. So we want to use an efficient encoding to mitigate this, especially since we have unbounded size.

Register Types

Unsigned Int (+ width)
Signed Int (+ width)
Float (+ width)
Memory Address
Instruction Address

Plus vector types with a “length” This could fit in 4 bits, first bit as a flag for a vector. However we might also want a flag for a constant, at which point we’d need 8 bits (a full byte).

Basic/Naive encoding

Example of 1.5 byte type and 1 byte index

u32:1 | 0001 00010000 00000001 | 2.5 bytes
        ^^^^ ^^^^^^^^ ^^^^^^^^
    unsigned   |      |
              32      index 1

Example of vectorised register

u32x8:1 | 1001 00010000 0100 00000001
          ^^^^ ^^^^^^^^ ^^^^ ^^^^^^^^
 unsigned vector  |      |      |
                  32     8    index

Example of a full u32-u32 instruction (8.5 bytes):

add     | 00000010 <- add opcode
u32:1,  | 0001 00010000 00000001
u32:2,  | 0001 00010000 00000010
u32:3   | 0001 00010000 00000011

Some instructions have only one valid register type, or a constant, so they don’t necessarily need a type:

jmp | 00001000 <- jump opcode
n:0 | 00000000 <- jmp to register 0 value

jmp    | 00001000 <- Jump-constant instruction ideally?
.LABEL | 00001000

A lot of instructions are likely to be “uniform”, using the same types (for the registers that aren’t forced), so we could consider an encoding like (5.5 bytes):

add     | 00000010 <- add opcode
u32     | 0001 00010000
:1      | 0000 0001
:2      | 0000 0010
:3      | 0000 0011

However, using a full byte just to store the 32/64-part of u32/u64 is pretty wasteful, especially considering they are probably the most common type. Unfortunately the full range of u1-u32 is allowed, so we can’t simply store log2(width).

Huffman encoding of register types

Ultimately, the most common register types will be used massively more than the less common types.

I propose a Huffman encoding of types (with their widths / lengths!).

In the following example, a program uses 32-bit registers most often, plus some others used in UMC.

 entry    | register
--------------------
0000 0000 | u32
0000 0001 | i32
0000 0010 | u64
0000 0011 | i64
0000 0100 | f32
0000 0101 | f64
0000 0111 | u1 (used as a boolean in UMC)
0000 1001 | n (Instruction Address)
0000 1011 | m (Memory Address)
0000 1111 | u32x8 (example of vector registers, probably only a handful used)
... a few more application specific, maybe u128/i128, u8 etc.

0000 0000 0000 0001 | If necessary, for rare types, a 2-byte or 1.5-byte encoding can be used

Instruction and Memory address registers will only really be used in add/sub for offsetting, so won’t be too high, but should still fit in the table.

It might also be a good idea to ‘fix’ the first few entries of the table for types we know will be used, like u32, u64, i32, i64, u1, n and m to reduce how much binaries change and make things like combining two umc binary files easier.

Instructions now for non-uniform types (7 bytes):

add     | 0000 0010 <- add opcode
u32x8:1 | 0000 1111 0000 0001
u32x8:2 | 0000 1111 0000 0010
u32:2   | 0000 0000 0000 0010

Covers any 3 operand types with a range of 256 different registers.

Only 5 bytes needed for a uniform instruction

add     | 0100 0010 <- add opcode
u32     | 0000 0000 <- u32 index
:1      | 0000 0001
:2      | 0000 0010
:3      | 0000 0011

Re-indexing with the most common as lower numbers can also be used to ensure the most common register indexes are packed tightly, without changing the result of the program. 1 byte is probably the minimum for a register index. Another advantage is that registers can be stored in an implementation-dependent way without decoding penalty, and we can just store a list of register operands.

Can reserve one or two bits in the instruction opcode for 2-byte encoding where rare types or high indices are used. Leading 1 on operand type could indicate a inline constant

Huffman encoding registers

We could take this even further for a much greater space saving, by encoding the exact registers (type + index).

   entry  | register
---------------------
0000 0000 | u32:0
0000 0001 | u32:1
0000 0010 | u32:2
0000 0011 | u32:3
0000 0100 | u64:0
0000 0101 | u64:1
0000 0110 | i32:0
0000 0111 | i32:1
0000 1000 | i32:2
0000 1001 | f32:0
0000 1010 | f32:1
0000 1011 | u32x8:0

This causes an 4-byte encoding of:

add    | 0000 0010 <- add opcode
u32:2  | 0000 0010
u32:1  | 0000 0001
u32:0  | 0000 0000

The cost of non-uniform encodings is no longer an issue (Still 4 bytes):

add   | 0000 0010 <- add opcode
u64:0 | 0000 0100
u32:1 | 0000 0001
u32:2 | 0000 0010

However this would require a decently large table at the start of the file, and decoding logic would also be more complex, if you wanted to store values of the same type contiguously.

Initialised data

In NASM, its possible to define pre-initialised data in a file, which gets put in the .data segment:

x db "hello\0"

This is especially useful for defining static strings, and it would be nice to be able to do this in UMC.

A simple syntax would be:

.x: "hello\0"

However this would then be confusing / mixing memory addresses versus instruction addresses.

Instead, we could introduce a memory label:

&x: "hello\0" ; A null terminated string constant

&y: [0x01, 0x02, 0x03] ; Arbitrary bytes

And this could only be used as a memory address constant:

load u8:0, &x

Keyboard shortcuts

Universal Machine Code Docs