Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

What is Universal Machine Code?

UMC Instructions

Misc

MneumonicOperand 1Operand 2Description
movRegAnySets op1 = op2
castRegAnySets op1 = op2, casting
bcastInteger RegRegBit / Reinterpret cast
absNumericSigned NumericGets positive value
nopDoes nothing
dbgRegPrints the value and register for debugging

Arithmetic

MneumonicOperand 1Operand 2Operand 3Description
addNumeric RegNumericNumericop1 = op2 + op3
subNumeric RegNumericNumericop1 = op2 - op3
mulNumeric RegNumericNumericop1 = op2 * op3
divNumeric RegNumericNumericop1 = op2 / op3
modNumeric RegNumericNumericop1 = op2 mod op3

Jumps, Calls and Conditional Branches

MneumonicOperand 1Operand 2Description
jmpLabelUnconditional jump to the given label
jalLabelInstr RegJumps to the given label, setting the instruction register to the next instruction
bzLabelNumericJump to label if op2 is zero
bnzLabelNumericJump to label if op2 is not zero

Comparison Operations

Comparison operations take an unsigned destination register (minimum width 1), and sets the destination to 1 if true, otherwise sets to 0. Comparisons are only allowed between the same register sets.

MneumonicOperand 1Operand 2Operand 3Description
eqUnsigned RegNumericNumeric\(op2 = op3\)
gtUnsigned RegNumericNumeric\(op2 \gt op1\)
gteUnsigned RegNumericNumeric\(op2 \ge op3\)
ltUnsigned RegNumericNumeric\(op2 \lt op3\)
lteUnsigned RegNumericNumeric\(op2 \le op3\)

Bitwise Operations

All (signed and unsigned) integers are defined to be stored in twos-complement representation. The following bitwise operations work only between integer types.

MneumonicOperand 1Operand 2Operand 3Description
andInteger RegIntegerIntegerBitwise AND
orInteger RegIntegerIntegerBitwise OR
xorInteger RegIntegerIntegerBitwise XOR
notInteger RegIntegerIntegerBitwise NOT

Memory

Blocks of memory can be allocated with the alloc instruction.

MneumonicOperand 1Operand 2Description
allocAddress RegUnsignedAllocates a continguous block of memory X bytes long
freeAddress RegDe-allocates a block of memory
loadRegAddress RegLoads the given register from the address
storeAddress RegAnyStores a value into the address
msizeUnsigned RegSets op1 to the size of a memory address in bytes
isizeUnsigned RegSets op1 to the size of a instruction address in bytes

UMC Registers

There are a unlimited number of registers in UMC. Each type (including width) is a separate set of registers.

Register Types

Register SetSyntaxWidthsDescription
Unsigned Intu32:0, u8:0, uX:0AnyUnsigned Integers
Signed Inti32:0, i64:0AnySigned Integers
Floatsf32:0, f64:032/64Floating Point Numbers
Memory Addressm:0, m:1At least 32-bitMemory location
Instruction Addressn:0, n:1At least 32-bitInstruction location

Vector registers

All register types can be extended to a vector register type.

Register SetSyntaxDescription
Unsigned Intu32x4:0Vector of 4 Unsigned 32-bit Integers
Signed Inti64x4:0Vector of 4 Signed 64-bit Integers
Floatf32x8:1Vector of 8, 32-bit Floats
Memory Addressmx8:0Vector of 8 memory addresses
Instruction Addressnx8:0Vector of 8 instruction addresses

Binary Format

This RFC defines the binary format for UMC.

All UMC binary files begin with the following 16-byte magic data:

Magic: 0x7F 0x55 0x4D 0x43 0x20 0x42 0x79 0x74 0x65 0x64 0x6F 0x64 0x65 0x00 0x00 0x00

This translates to the following ascii string: \x7FUMC Bytecode\0\0\0

All UMC binary files are then immediately followed by two bytes describing the UMC binary format version:

Major Version: 1 byte
Minor Version: 1 byte

If a UMC Virtual Machine encounters an incompatible major version, it should refuse to execute the program. If a UMC Virtual Machine encounters a minor version higher than what is supported for that major version by the Virtual Machine, then it should refuse to execute the program. Lower minor versions than are supported by the UMC VM for a particular major version will work correctly on VMs that correctly follow the UMC Specification for a higher minor version.

Little Endian Base 128 (LEB128)

The UMC Binary File format extensively uses LEB128 for integers.

Version 0.3

Type Header Table

The first header found in a 1.0 UMC Binary file contains the register type header, which maps type indices to types

This table first starts with the number of entries:

Entry Count: Unsigned LEB128

Followed by each entry in series:

Control Byte:
  - Length Flag:   1 bit
  - Constant Flag: 1 bit
  - Reserved (0):  5 bits
  - Register Type: 3 bits
Width: Unsigned LEB128
Length: Unsigned LEB128 (Only if indicated in the control byte)

Duplicate entries are currently not permitted, as they may be significant in future versions. Duplicate entries are permitted for register entries (constant flag not set). A instruction references a different register if it has a different type index, even if the register index is the same. This allows using encoding more registers of the same type in the same number of bytes. For example, it allows 256 unique register operands to be encoded instead of 128 in two bytes.

The register types are encoded as follows:

Register TypeEncoding
Unsigned Integer0b000
Signed Integer0b001
Float0b010
Memory Address0b011
Instruction Address0b100

Pre-initialised memory Table

UMC supports pre-initialised memory regions which can be referenced by a memory label, which is encoded as a memory constant. The pre-initialised memory table immediately follows the RT Header, and begins with the number of entries:

Entry Count: Unsigned LEB128

Then for each entry is stored contiguously:

Data Length: Unsigned LEB128
Data: Variable number of bytes

Instruction Encoding

Immediately after the type header table, are all instructions, one after the other.

The first byte of each instruction is the opcode:

Op Code: 1 byte
Op CodeValue
NOP0b000000
MOV0b000001
ADD0b000010
SUB0b000011
MUL0b000100
DIV0b000101
MOD0b000110
JMP0b001000
JAL0b001001
BZ0b001010
BNZ0b001011
EQ0b001100
GT0b001101
GTE0b001110
AND0b010000
OR0b010001
XOR0b010010
NOT0b010011
ALLOC0b100000
FREE0b100001
LOAD0b100010
STORE0b100011
SIZE0b100100
CAST0b110001
ECALL0b110100
DBG0b111111

Fixed Instruction Encoding

The majority of instructions are encoded using this format, since for a given op code they have fixed number of arguments. Each operand begins with its type:

Operand Type: Unsigned LEB128

This operand type is an index in the Type Header Table. Depending on this entry, one of the following values are then decoded:

Register operands are encoded as:

Register Index: Unsigned LEB128

Unsigned integer constants are encoded as:

Constant Value: Unsigned LEB128

Signed integer constants are encoded as:

Constant Value: Signed LEB128

Float constants are encoded as:

Constant Value: 4 or 8 bytes

The number of bytes for the constant value is dictated by the width’s bytes, rounded up. The only supported float constant types are f32 and f64, which use 4 and 8 byte constants respectively.

Memory Label Constants are encoded as:

Constant Value: Unsigned LEB128

These refer to an index in the pre-initialised memory table.

Instruction Label Constants are encoded as:

Constant Value: Unsigned LEB128

Variable Instruction Encoding

This encoding is used for the ECALL opcode. The variable instruction encoding first starts with the operand count:

Operand Count: Unsigned LEB128

And then the encoding is the same as the fixed instruction encoding.

Sizeof Instruction Encoding

msize and isize are encoded like a fixed instruction with two operands:

  • Register Operand, but with no index
  • Unsigned Register Operand

Environment Calls

Environment calls are like system calls in UMC. An environment call requests the the VM perform privilleged, (likely platform specific) action on its behalf.

UMC External Call Table

NameidDetailsShort Description
exit0x0exitHalt the UMC VM with the given exit code
open0x1openOpen a file
close0x2closeClose a file
read0x3readRead bytes from a file
write0x4writeWrite bytes to a file
getarg0x10getargRetrieve an arbitrary bytes argument passed to the UMC VM by index

External Call List

exit

Exits with the given exit code

Parameters:

  • Result Register (Ignored)
  • External Call Code
  • Exit Code
ecall u1:0, 0x0, #1 ; Exit the VM with exit code 1

open

Open attempts to open a file with the given filename. On success, it returns a positive integer filename handle

Parameters:

  • Result Register (signed integer)
  • External Call Code
  • Null terminated filename (memory address)
ecall i32:1, 0x1, m:0

close

Close the given file. On success, 1 is stored into the result register Parameters:

  • Result Register (unsigned, at least 1 bit)
  • External Call Code
  • File handle (unsigned integer)
ecall u1:0, 0x2, u32:1

read

Read up to the given number of bytes from the file. On success, the number of bytes read is stored into the result register. Parameters:

  • Result Register (unsigned)
  • External Call Code
  • File handle (unsigned integer)
  • Buffer (memory address)
  • Maximum number of bytes to read (unsigned integer)
ecall u64:1, 0x3, u32:0, m:1, u64:0

write

Write the given number of bytes from a memory address into a file Stores the number of bytes successfully written into the result register Parameters:

  • Result Register (unsigned)
  • External Call Code
  • File handle (unsigned integer)
  • Buffer (memory address)
  • Maximum number of bytes to write (unsigned integer)
ecall u64:0, 0x4, u32:0, m:2, u64:1

getarg

Retrieve an user-provided argument given the UMC Program at runtime.

The first argument is always a memory address of the null-terminated name of the program. The remaining arguments are provided by the user when the UMC VM is started.

All values are copies of the original arguments, and the same arguments can be retrieved multiple times in the program. Memory addresses are also “copies”, but values may be stored in them - this can allow returning parameters to the caller.

UMC programs that use this environment call should document their required parameters. Calling getarg with type or argument number that doesn’t match the provided arguments will cause the program to terminate.

Examples:

ecall m:0,   0x10, #0 ; Get 0th argument - null terminated name of the program
ecall m:1,   0x10, #1 ; Get first argument as a memory address
ecall u64:0, 0x10, #1 ; Get first argument as a 64-bit unsigned integer
ecall i64:1, 0x10, #2 ; Get second argument as a 64-bit signed integer
ecall f32:0, 0x10, #3 ; Get third argument as a 32-bit float

Basic Architecture

The original proposition of UMC is to create a register-based VM bytecode (as opposed to a stack-based VM). LuaJIT successfull implemented a register-based bytecode with significant performance improvements [1]

Syntax Proposal - Typed Register Sets (Accepted)

;;;;; FULLY TYPED REGISTER SETS PROTOTYPE

; Pros:
; No implicit casting since independent register sets
;  - Vector registers can't be reshaped
;  - No undefined behaviour
;  - Easy to implement registers as vecs

; Cons:
; Very verbose
;  - Can be mitigated by implicit register type
;    add u32:0, :0, :1


fib:
  ; i32:0 (n)
  b i32:0 < $0, .L4
  mov u32:0, $0 ; int cur = 0
  mov u32:1, $1 ; int next = 1
  mov u32:2, $0 ; int i = 0
.L3:
  ; u32:3 next
  mov u32:3, u32:2 ; int next2 = next
  add u32:3, u32:3, u32:0 ; next2 += cur
  mov u32:0, u32:1 ; cur = next
  mov u32:2, u32:3 ; next = next2
  add u32:2, u32:2, $1
  b u32:2 != n, .L3
.L4:
  mov i32:0, u32:0

sum:
  ; a:0   - Array of integers
  ; u32:0 - Length of the array
  b i32:0 == 0, .L8    ; If empty, terminate
  mov i32:1, $0        ; int total = 0
  shl u32:0, u32:0, $2 ; Multiply by 4 to get length in bytes
  add a:1, a:0, u0       ; end address = start address + length
.L8:
  load u32:2 a:0
  add i32:1, i32:1, u32:2
  add a:0, a:0, $4
  b   a:0 != a:1, .L8
.L9:
  mov i32:0, $0
  ret

;;;;;;;;;;; IMPLICIT MULTIPLE OPERANDS ;;;;;;;;;;;;

fib:
  ; i32:0 (n)
  b i32:0 < $0, .L4
  mov u32:0, $0 ; int cur = 0
  mov u32:1, $1 ; int next = 1
  mov u32:2, $0 ; int i = 0
.L3:
  ; u32:3 next
  mov u32:3, :2 ; int next2 = next
  add u32:3, :3, u32:0 ; next2 += cur
  mov u32:0, :1 ; cur = next
  mov u32:2, :3 ; next = next2
  add u32:2, :2, $1
  b u32:2 != n, .L3
.L4:
  mov i32:0, u32:0

sum:
  ; a:0   - Array of integers
  ; u32:0 - Length of the array
  b i32:0 == 0, .L8    ; If empty, terminate
  mov i32:1, $0        ; int total = 0
  shl u32:0, :0, $2 ; Multiply by 4 to get length in bytes
  add a:1, a:0, u32:0       ; end address = start address + length
.L8:
  load u32:2 a:0
  add i32:1, i32:1, u32:2
  add a:0, a:0, $4
  b   a:0 != a:1, .L8
.L9:
  mov i32:0, $0
  ret

;;;;;;;;; VECTOR INSTRUCTIONS ;;;;;;;;;;;;;

doublearray:
  load  u32x8:0, a:0
  mul   u32x8:0, u32x8:0, $2
  store u32x8:0, a:0

Original Syntax Proposal - Types don’t include widths

;;;;; Instructions

; Based off RISC-V basic instruction set
; https://projectf.io/posts/riscv-cheat-sheet/#arithmetic


; Pros:
; Large reduction in instructions achieved
; - Lots of instructions have a signed/unsigned variant - add, div, mul, even bit shift, cond. jump
; - Instructions overloaded, or only work on some types of registers
; - Simple idea with different applications
; - Register names (not numbers) describe their capabilities
; Specifying width for same registers matches hardware
;  -  Can be simulated if not big enough
; Vector registers having different name, to avoid implicit conversions
; Destination register gives type of operation

; Cons:
; Undefined behaviour to reshape vector registers?
; - Semantic checker for different acceses?
; Lots of implicit sign casting
; Lots of repeating the same widths?
; - Could use implicit from destination like:
;   add u0:32 u0: u1:

; Arithmetic
add u0:32, $100 ; Immediate
add i0:32, i1:32, i2:32  ; Registers, all same, ints
add f0:32, f1:32, f3:32  ; Registers, all same, floats
add i0:64, i1:64, u1:32  ; Mix of types and widths

cast i2:64, u1:32
add [i64] %0, %1, ~~~u1:32~~~

neg i1:32                ; cast to i1, then negate
sub i0:32, u1:32, u2:32  ; Can subtract unsigned numbers (can the result be signed?)

mul u1:32, u2:32, u3:32  ; Multiplication of several signed numbers
mul u1:64, u2:32, u3:32  ; Multiplication (completely safe)
mul i1:64, i2:32, u3:32  ; Signed Multiplication

div u3:32, u2:32, u1:32  ; Unsigned division
div i3:32, u2:32, i1:32  ; Signed division

; Bitwise logic
and i3:32, u2:32, i1:32  ; Just works on bits. Only width needs to match.
not i0:32, i0:32         ; same again
or  i0:32, i0:32         ; same again
xor i0:32, i0:32         ; same again

; Shifts

shl i0:32, i0:32, $2 ; Arithmetic left shift (x2)
shr u0:32, i0:32, $2 ; Logical shift (determined by destination)

shl i0:32, i1:32, i2:32 ; Arithmetic left shift
shr u0:32, i1:32, i2:32 ; Logical right shift (determined by destination)

; This is quite good, saves a few instructions

; Memory accessing
load u0:32, 0xDEADBEEF ; Loads a 32 bit value into the register
load u0:8,  0xDEADBEEF ; Loads a 8 bit value into the register

mov u0:8, 0xFF
stor 0xDEADBEEF, u0:8
stor 0xDEADBEEF, u0:32 ; Store 32 bits even though its 8 bit

; Immediate offset in load / store?
load u0:32, 4(0xDEADBEEF)

; Jumps

j    0xBEEF     ; Jump to immediate address
j    .LABEL     ; Jump to label (labels have type aX)
jal  a0, 0xBEEF ; Jump to immediate address, store next instruction address in a0
jalr a0, a1, $4 ; Jump to a1+4, store next instruction address in a0 (Should immediate be multiplied by instruction size?)

call .LABEL ; Jump to 
ret         ;


; Conditional branches
beq u0:32, i0:32, .LABEL ; Branch if the two are equal
bne u0:32, i0:32, .LABEL ; Branch if the two are not equal

bgt u0:32, i0:32, .LABEL ; Branch if greater than
bge u0:32, i0:32, .LABEL ; Branch if greater than or equal to

; Allow aX instead of label above?

;;;; zero-register, like ARM64 and RISC-V? Partly an encoding concern

seq u0:1, i0:32, i0:8 ; Set u0 to 1 if equal
sne u0:1, 

; Alternative?
add %0:u32, %1:i32, %2:i32

; Vector registers?
; iv, uv, fv, av avoids

add iv0:32x8 iv0:32x8, iv0:32x8

add av0:x8, av0:x8, $4

;;;; Fibonacci example ;;;;

; Problem - how are parameters passed? As u0,u1,.. by default?
fib:
  ; i0:32 (n)
  b i0:32 < $0, .L4
  mov u0:32, $0 ; int cur = 0
  mov u1:32, $1 ; int next = 1
  mov u2:32, $0 ; int i = 0
.L3:
  ; u3: next
  mov u3:32, u2:32 ; int next2 = next
  add u3:32, u3:32, u0:32 ; next2 += cur
  mov u0:32, u1:32 ; cur = next
  mov u2:32, u3:32 ; next = next2
  add u2:32, u2:32, $1
  b u2:32 != n, .L3
.L4:
  mov i0:32, u0:32
  ret

;;;; Sum array ;;;;
sum:
  ; a0 - Array of integers
  ; u0 - Length of the array
  b u0:32 == 0, .L8    ; If empty, terminate
  mov i1:32, $0        ; int total = 0
  shl u0:32, u0:32, $2 ; Multiply by 4 to get length in bytes
  add a1, a0, u0       ; end address = start address
.L8:
  load u2:32, a0
  add  i1:32, i1:32, u2:32
  add  a0, a0, $4
  b    a0 != a1, .L8
.L9:
  mov i0:32, $0
  ret

;;;;; Vector instructions - Double array
doublearray:
  ; Assume multiple of 4
  load  uv0:32x8, a0
  mul   uv0:32x8, uv0:32x8, $2
  store uv0:32x8, a0

; Pick operations
pick u1:32, uv1:32x4, 0
pick u2:32, uv1:32x4, 1
pick u3:32, uv1:32x4, 2
pick u4:32, uv1:32x4, 3

pick uv1:32x4, uv0:32x8, 0 ; Take first 4
pick uv2:32x4, uv0:32x8, 4 ; Take second 4

; Put (reverse pick) operations
pick uv1:32x4, u1:32, 0

; Pick alternative - similar to ARM
mov u0:32, uv1:32x4[0]
mov u1:32, uv1:32x4[1]
mov u2:32, uv1:32x4[2]
mov u3:32, uv1:32x4[3]

mov uv2:32x2, uv1:32x4[2]

; Broadcast
brdc uv1:32x4, 0


; MUL V0.4S, V2.4S, V3.S[2] multiplies each of the four 32-bit elements in V2 by the 32-bit scalar value in lane 2 of V3, storing the result vector in V0.
; https://developer.arm.com/documentation/102474/0100/Fundamentals-of-Armv8-Neon-technology/Registers--vectors--lanes-and-elements

; is there any point having a stack pointer register?
; special case of a-type register

; How to detect overflow??
; add u3:32, u2:32, u1:32 : $overflow > u4:1

Alternative Syntax Proposal - No types

add  r0:32, $100
adds r0:32, r1:32, r2:32 $-100

neg r0:32, r0:32
sub r3:32, r2:32, r1:32 # r3 = r2 - r1, signed/unsigned doesn't matter

mul  r3:32, r2:32, r1:32
muls r3:32, r2:32, r1:32
mul  r3:64, r2:32, r2:32
muls r3:64, r2:32, r2:32

div  u3:32, u2:32, u1:32
divs u3:32, u2:32, u1:32

; Bitwise - Sign not relevant
and r3:32, r2:32, r1:32
not r3:32, r2:32, r1:32
or  r3:32, r2:32, r1:32
xor r3:32, r2:32, r1:32

; Shifts

sll r3:32, r2:32, r1:32 ; Logical left shift
slr r3:32, r2:32, r1:32 ; Logical right shift

sra r3:32, r2:32, r1:32 ; Arithmetic left shift
; No arithmetic right shift

load  r3:32, 0xDEADBEEF ; Load 32 bits into register
store 0xDEADBEEF, r3:32 ; Store 32 bits into memory

Alternative Syntax Proposal - Typed Instructions

;;;; TYPED INSTRUCTION PROTOTYPE

; Pros:
; No implicit casting since independent register sets
;  - Vector registers can't be reshaped
;  - No undefined behaviour

; Cons:
; %1 and %1 in different instructions, makes it hard to read
;  - This code is mostly going to be generated anyway
; Some instructions are supposed to have different types:
;  - load [u32] a0
;  - vector broadcast

; Memory leak of registers?

fib:
  ; i0:32 (n)
  b   [i32] %0 < #0, .L4
  mov [u32] %0, #0 ; int cur = 0
  mov [u32] %1, #1 ; int next = 1
  mov [u32] %2, #0 ; int i = 0
.L3:
  ; u3: next
  mov [u32] %3, %2     ; int next2 = next
  add [u32] %3, %3, #0 ; next2 += cur
  mov [u32] %0, #1     ; cur = next
  mov [u32] %2, %3     ; next = next2
  add [u32] %2, %2, #1 ; i++
.L4:
  mov [i32] %0, {u0:32}
  ret


sum:
  ; a0 - Array of integers
  ; u0 - Length of the array
  b    [u32] %0 == 0, .L8 ; If empty, terminate
  mov  [i32] %1, #0       ; int total = 0
  shl  [u32] %0, %0, #2   ; Multiply by 4 to get the length in bytes
  add  [a]   %1, %0, {u0:32} ; end address = start address + length
.L8:
  load [u32] %2, [a0] ; Load the memory address
  add  [i32] %1, %1, {u2:32}
  add  [a]   %1, %0, {u0:32}
  b    [a]   %0 == %1, .L8
.L9:
  mov [i32]  %0, #0
  ret

External calls / Syscalls in UMC

; External calls in UMC, function like exact order
ecall retreg, .LABEL?, u32:1, u32:0, f32:3, i32:1,

System V x86-64 ABI Translation

rdi <= u32:1
rsi <= u32:0
rdx <= i32:1

xmm0 <= f32:3

; System V can only return rax, but this is interpreted based on ecall
; e.g. ecall a:0   ; interpret as address
; e.g. ecall u32:0 ; interpret as u32
; e.g. ecall i64:0 ; interpret as i64
ret

UMC External call

.LABEL:
  u32:0 <= u32:1
  u32:1 <= u32:0

  f32:0 <= f32:3

  i32:0 <= i32:1

  ret X ; retreg <= X

Explicit and Implicit Casting

tl;dr: Widening is allowed, but all other casts must be explicit. Calculations are always done in the domain of the domain of the destination register.

Examples

Examples of what is allowed:

add u32:0, u32:0, u32:0 ; Great
add u64:0, u32:0, u32:0 ; Implicit widening cast

; Alowed, and recommended for large multiplications
; Would translate as multiply low and high
mul u64:0, u32:0, u32:0
mul u128:0, u64:0, u64:0

; Comparisons can use any width unsigned register
gt u1:0, u32:0, u32:0
gt u1:0, u64:0, u32:0 ; 64-bit comparison

; Use explicit casts where necessary

; Do 32-bit subtraction
mov u32:1, u64:0 ; Require u64:0 to fit in 32 bits
sub u32:2, u32:1, u32:0
; Do 64-bit subtraction, expect the result to fit in 32 bits
sub u64:2, u64:1, u32:0
mov u32:2, u64:2

mov u32:0, i32:0
gt u1:0, u32:0, u32:1 ; Performs unsigned comparison: Weird if i32:0 is negative

mov i32:0, i32:0
gt u1:0, i32:0, i32:0 ; Performs signed comparison: Weird u32:0 uses the top bit

What is not allowed

; This is not allowed, result depends on cast before vs after, confusing
sub u16:0, u32:0, u32:0

; Confusing comparisons
gt u1:0, u32:0, i32:0 ; This is actually Undefined behaviour in C when i32:0 is negative
; E.g. 0b1000, 0b0001 => Signed comparison interprets u32:0 as smallest negative
; E.g. 0b0000, 0b1000 => Unsigned comparison interprets smallest negative as bigger than 0

Types of casts

What operations do we need to do?

  • Narrowing cast (signed, unsigned or float)
  • Unsigned <-> Signed (bit or value-based?)
  • Unsigned <-> Float (Value-based)
  • Signed <-> Float (Value-based)

bcast - Binary / Bitwise cast - for twos complement conversions/narrowing vcast - Value cast - Best effort at encoding the same value with the new datatype

bcast u32:0, u64:0 ; Truncation
bcast i32:0, i64:0 ; Truncation

bcast u32:0, i32:0 ; Reinterpret two's complement i32:0 as a u32:0.
bcast i32:0, u32:0 ; Reinterpret unsigned as two's complement - if top bit is set then it signed overflow occurs

vcast f64:0, u32:0 ; Take the unsigned integer and convert it into its closest floating point representation
vcast f64:0, i32:0 ; Take the signed integer and convert it into its closest floating point representation

vcast f32:0, f64:0 ; Take the value of the f64 and re-encode it as a f32 (precision lost)

; Explicit vcast with specific behaviour?
abs u32:0, i32:0 ; Take the signed integer, make it positive so it definitely fits correctly (no vcast for u32 and i32?)

; Maybe just a fcast with specific behaviour too?
fcast f64:0, u32:0 ; Re-encode as a float
fcast f64:0, i32:0 ; Re-encode as a float
fcast f32:0, f64:0 ; Re-encode as a different precision float (not a straightforward bit-cast)

brdc u32x8:0, u32:0 ; Broadcast u32 to vector of u32s.
; Does it make sense to allow every instruction to operate on vector elements?
; Or just allow mov to put/pick (Leaning towards this, otherwise it signficantly complicates the instruction set)

Rationale

Many instructions need to specify whether to operate in signed or unsigned form, such as comparisons. The top bit of an unsigned integer can be confused with a signed negative number.

Doing something like:

sub u32:0, u64:0, u32:1

Does not equate to a single instruction in x86, RISC-V or ARM. It must do a 32-bit or 64-bit operation, with a narrow/widen before/after

Problems with using extra registers

The idea is that the casting instruction will be eliminated during JIT compilation. However, the cast causes there to potentially be 2 registers used for the same value (one of which may outlive the other).

For example, if the following code were to be JIT compiled:

mov u32:1, u64:0
sub u32:2, u32:1, u32:0

it might require one register to store the 64 bit value, and another to store the 32-bit value (one of which may live much longer than the other). In more complex examples, with many more registers, this might cause registers to overflow, even though u32:1 is really not needed. However, if necessary this can be solved by clearing the register in UMC:

mov u32:1, u64:0
sub u32:2, u32:1, u32:0
mov u64:1, #0 ; If the u64:0 value is never needed

Now, only one register is needed if the instruction set supports accessing 64-bit registers as 32-bit. The UMC JIT can set u64:0 to #0 without an output register, as it knows this is guaranteed.

UMC Memory Model

  • Four Instructions: alloc and free, load and store
  • One register type with archecture-dependent width: a:0
  • add and sub allowed with one address register and one (un)signed integer “offset”

Examples

alloc a:0, 4
store a:0, u32:1 ; Store 4-byte integer into address register
load u32:1 ; Load it as a 32 bit integer
free a:0 ; Frees the memory allocated earlier. a:0 is now invalid

Rationale

Allocating and de-allocating memory is a common task, which if we are running in the UMC VM, can easily be delegated to the VM.

Questions:

  • How do we allocate a block of memory large enough for a memory address if size is machine-dependent? A constant?

UMC Bytecode Format

In UMC we have not only “unlimited” registers, but also “unlimited” register types. These need to be represented in a fixed amount of space, which requires a variable-length encoding of instructions.

Constraints:

  • Unlimited registers and register types
  • Minimum 64 instruction opcodes, ideally more or extensible

Register-based bytecodes are generally faster than stack-based ones, but spend significantly longer on “instruction-fetch”. So we want to use an efficient encoding to mitigate this, especially since we have unbounded size.

Register Types

  • Unsigned Int (+ width)
  • Signed Int (+ width)
  • Float (+ width)
  • Memory Address
  • Instruction Address

Plus vector types with a “length” This could fit in 4 bits, first bit as a flag for a vector. However we might also want a flag for a constant, at which point we’d need 8 bits (a full byte).

Basic/Naive encoding

Example of 1.5 byte type and 1 byte index

u32:1 | 0001 00010000 00000001 | 2.5 bytes
        ^^^^ ^^^^^^^^ ^^^^^^^^
    unsigned   |      |
              32      index 1

Example of vectorised register

u32x8:1 | 1001 00010000 0100 00000001
          ^^^^ ^^^^^^^^ ^^^^ ^^^^^^^^
 unsigned vector  |      |      |
                  32     8    index

Example of a full u32-u32 instruction (8.5 bytes):

add     | 00000010 <- add opcode
u32:1,  | 0001 00010000 00000001
u32:2,  | 0001 00010000 00000010
u32:3   | 0001 00010000 00000011

Some instructions have only one valid register type, or a constant, so they don’t necessarily need a type:

jmp | 00001000 <- jump opcode
n:0 | 00000000 <- jmp to register 0 value

jmp    | 00001000 <- Jump-constant instruction ideally?
.LABEL | 00001000

A lot of instructions are likely to be “uniform”, using the same types (for the registers that aren’t forced), so we could consider an encoding like (5.5 bytes):

add     | 00000010 <- add opcode
u32     | 0001 00010000
:1      | 0000 0001
:2      | 0000 0010
:3      | 0000 0011

However, using a full byte just to store the 32/64-part of u32/u64 is pretty wasteful, especially considering they are probably the most common type. Unfortunately the full range of u1-u32 is allowed, so we can’t simply store log2(width).

Huffman encoding of register types

Ultimately, the most common register types will be used massively more than the less common types.

I propose a Huffman encoding of types (with their widths / lengths!).

In the following example, a program uses 32-bit registers most often, plus some others used in UMC.

 entry    | register
--------------------
0000 0000 | u32
0000 0001 | i32
0000 0010 | u64
0000 0011 | i64
0000 0100 | f32
0000 0101 | f64
0000 0111 | u1 (used as a boolean in UMC)
0000 1001 | n (Instruction Address)
0000 1011 | m (Memory Address)
0000 1111 | u32x8 (example of vector registers, probably only a handful used)
... a few more application specific, maybe u128/i128, u8 etc.

0000 0000 0000 0001 | If necessary, for rare types, a 2-byte or 1.5-byte encoding can be used

Instruction and Memory address registers will only really be used in add/sub for offsetting, so won’t be too high, but should still fit in the table.

It might also be a good idea to ‘fix’ the first few entries of the table for types we know will be used, like u32, u64, i32, i64, u1, n and m to reduce how much binaries change and make things like combining two umc binary files easier.

Instructions now for non-uniform types (7 bytes):

add     | 0000 0010 <- add opcode
u32x8:1 | 0000 1111 0000 0001
u32x8:2 | 0000 1111 0000 0010
u32:2   | 0000 0000 0000 0010

Covers any 3 operand types with a range of 256 different registers.

Only 5 bytes needed for a uniform instruction

add     | 0100 0010 <- add opcode
u32     | 0000 0000 <- u32 index
:1      | 0000 0001
:2      | 0000 0010
:3      | 0000 0011

Re-indexing with the most common as lower numbers can also be used to ensure the most common register indexes are packed tightly, without changing the result of the program. 1 byte is probably the minimum for a register index. Another advantage is that registers can be stored in an implementation-dependent way without decoding penalty, and we can just store a list of register operands.

Can reserve one or two bits in the instruction opcode for 2-byte encoding where rare types or high indices are used. Leading 1 on operand type could indicate a inline constant

Huffman encoding registers

We could take this even further for a much greater space saving, by encoding the exact registers (type + index).

   entry  | register
---------------------
0000 0000 | u32:0
0000 0001 | u32:1
0000 0010 | u32:2
0000 0011 | u32:3
0000 0100 | u64:0
0000 0101 | u64:1
0000 0110 | i32:0
0000 0111 | i32:1
0000 1000 | i32:2
0000 1001 | f32:0
0000 1010 | f32:1
0000 1011 | u32x8:0

This causes an 4-byte encoding of:

add    | 0000 0010 <- add opcode
u32:2  | 0000 0010
u32:1  | 0000 0001
u32:0  | 0000 0000

The cost of non-uniform encodings is no longer an issue (Still 4 bytes):

add   | 0000 0010 <- add opcode
u64:0 | 0000 0100
u32:1 | 0000 0001
u32:2 | 0000 0010

However this would require a decently large table at the start of the file, and decoding logic would also be more complex, if you wanted to store values of the same type contiguously.

Initialised data

In NASM, its possible to define pre-initialised data in a file, which gets put in the .data segment:

x db "hello\0"

This is especially useful for defining static strings, and it would be nice to be able to do this in UMC.

A simple syntax would be:

.x: "hello\0"

However this would then be confusing / mixing memory addresses versus instruction addresses.

Instead, we could introduce a memory label:

&x: "hello\0" ; A null terminated string constant

&y: [0x01, 0x02, 0x03] ; Arbitrary bytes

And this could only be used as a memory address constant:

load u8:0, &x