Thursday, October 1, 2015

Toying around with LE PowerPC64 via the PowerNV QEMU

I've validated that my ppc64le_hello example runs on top of BenH's PowerNV QEMU tree. Runs really snappy!

The only thing that doesn't work is mixed page-size segment support (MPSS, like 16MB in a 4K segment). QEMU does not support MPSS at the moment. Also, QEMU does not implement any of the IBM simulator's crazy Mambo calls.

Monday, July 13, 2015

Toying around with LE PowerPC64 via the Power8 simulator

ppc64le_hello is simple example of what it takes to write stand-alone (that is, system or OS) code that runs in Little-Endian and Hypervisor modes on the latest OpenPOWER/Power8 chips. Of course, I don't have a spare $3k to get one of these nice Tyan reference systems, but IBM does have a free, albeit glacially slow and non-OSS, POWER8 Functional Simulator.

What you get is a simple payload you can boot via skiboot, or another OPAL-compatible firmware. Features, in no particular order:
  • 64-bit real-mode HV LE operation.
  • logging via sim inteface (mambo_write).
  • logging via OPAL firmware (opal_write).
  • calling C code, stack/BSS/linkage setup/TOC.
  • calling BE code from LE.
  • FDT parsing, dumping FDT.
  • Taking and returning from exceptions, handling unrecoverable/nested exceptions.
  • Timebase (i.e. the "timestamp counter"), decrementer and hypervisor decrementer manipulation with some basic timer support (done for periodic callbacks into OPAL).
  • Running at HV alias addresses (loaded at 0x00000000200XXXXX, linked at 0x80000000200XXXXX). The idea being that the code will access physical RAM and its own data structures solely using the HV addresses.
  • SLB setup: demonstrates 1T segments with 4K base page and 16M base page size. One segment (slot = 0) is used  to back the HV alias addresses with 16M pages. Another  segment maps EA to VA 1:1 using 4K pages.
  • Very basic HTAB setup. Mapping and unmapping for pages in the 4K and 16M segments, supporting MPSS (16M pages in the 4K segment). No secondary PTEG. No eviction support. Not SMP safe. Any access within the HV alias addresses get mapped in. Any faults to other  unmapped locations are crashes, as addresses below 0x8000000000000000 should only be explicit maps.
  • Taking exception vectors with MMU on at the alternate vector location (AIL) 0xc000000000004000.
  • Running unpriviledged code.
See README for more information, including how to build and run. At some point it ran on a real Power8 machine - and may run still ;-).

Monday, July 6, 2015

DOES> in Jonesforth

Jonesforth 47 quoth:
NOTES ----------------------------------------------------------------------

DOES> isn't possible to implement with this FORTH because we don't have a separate
data pointer.

Thankfully, that's not true. The following is a tad AArch32-specific, given that I am playing with pijFORTHos (, but the principle remains the same. Let's first look at how DOES> gets used.
This creates a word MKCON, that when invoked like:
1337 MKCON PUSH1337
...creates a new word PUSH1337 that will behave, as if it were defined as:
: PUSH1337 1337 ;
Recall the CREATE...;CODE example. DOES> is very similar to ;CODE, except you want Forth words, not native machine words invoked. In ;CODE, the native machine words are embedded in the word using CREATE...;CODE, and in CREATE...DOES> it will be Forth words instead. So if we had no DOES> word, we could write something like:
...where $DODOES is the machine code generator word that creates the magic we've yet to figure out. $DODOES needs to behave like a mix between DOCOL and NEXT, that is adjusting FIP (the indirect threaded code instruction pointer, pointing to the next word to execute) to point past $DODOES to the @ word. The DFA of the CREATEd word (i.e. PUSH1337) is put on the stack, so @ can read the constant (1337) out. This means the simplest CREATE...DOES> example is:
...because we need to clean up the DFA for ADUMMY that is pushed on its invocation. Anyway, we could thus define DOES> like:
Let's look at two ways of implementing $DODOES. Way 1 - fully inline. The address of the Forth words (the new FIP) is calculated by skipping past the bits emitted by $DODOES.
        .macro COMPILE_INSN, insn:vararg
        .int LIT
        .int COMMA

        .macro NEXT_BODY, wrap_insn:vararg=
        \wrap_insn ldr r0, [FIP], #4
        \wrap_insn ldr r1, [r0]
        \wrap_insn bx  r1
@ A CREATE...DOES> word is basically a special CREATE...;CODE
@ word, where the forth words follow $DODOES. $DODOES thus
@ adjusts FIP to point right past $DODOES and does NEXT.
@ You can think of this as a special DOCOL that sets FIP to a
@ certain offset into the CREATE...DOES> word's DFA. This
@ version is embedded into the DFA so finding FIP is
@ as easy as moving FIP past itself.
@ - Just like DOCOL, we enter with CFA in r0.
@ - Just like DOCOL, we need to push (old) FIP for EXIT to pop.
@ - The forth words expect DFA on stack.
        .macro DODOES_BODY, magic=, wrap_insn:vararg=
0:      \wrap_insn PUSHRSP FIP
1:      \wrap_insn ldr FIP, [r0]
        \wrap_insn add FIP, FIP, #((2f-0b)/((1b-0b)/(4)))
        \wrap_insn add r0, r0, #4
        \wrap_insn PUSHDSP r0
        NEXT_BODY \wrap_insn
@ $DODOES ( -- ) emits the machine words used by DOES>.
        .int EXIT

Way 2 - partly inline, where the emitted code does an absolute branch and link. This reduces the amount of memory used per definition at the cost of a branch. Ultimately this is the solution adopted. _DODOES calculates the new FIP adjusting the return address from the branch-and-link done by the inlined bits.
        PUSHRSP FIP        @ just like DOCOL, for EXIT to work
        mov FIP, lr        @ FIP now points to label 3 below
        add FIP, FIP, #4   @ add 4 to skip past ldr storage
        add r0, r0, #4     @ r0 was CFA
        PUSHDSP r0         @ need to push DFA onto stack

        .macro DODOES_BODY, wrap_insn:vararg=
1:      \wrap_insn ldr r12, . + ((3f-1b)/((2f-1b)/(4)))
2:      \wrap_insn blx r12
3:      \wrap_insn .long _DODOES

@ $DODOES ( -- ) emits the machine words used by DOES>.
        .int EXIT
In either case, just like DOCOL, we need to push the old FIP pointer before calculating the new one. The old FIP pointer corresponds to the address within the word that called the DOES>-created word. In both cases we need to push the DFA of the executing word onto the stack (this is in r0 on the AArch32 Jonesforth).

Finally, in both cases the CREATE...DOES> word is indistinguishable from a CREATE...;CODE word, and the created word is indistinguishable from a word created by a CREATE...;CODE word.
\ This is the CREATE...;CODE $DOCON END-CODE example before.
: MKCON WORD CREATE 0 , , ;CODE ( MKCON+7 ) E590C004 E52DC004 E49A0004 E5901000 E12FFF11 (END-CODE)

\ Fully inlined CREATE...DOES>.
: MKCON_WAY1 WORD CREATE 0 , , ;CODE ( MKCON_WAY1+7) E52BA004 E590A000 E28AA020 E2800004 E52D0004 E49A0004 E5901000 E12FFF11 9714 938C (END-CODE)

\ Partly-inlined CREATE...DOES>. 
: MKCON_WAY2 WORD CREATE 0 , , ;CODE ( MKCON_WAY2+7 ) E59FC000 E12FFF3C 9F64 9714 938C (END-CODE)
This makes decompiling (i.e. SEE) a bit tricky, but not impossible. As you can see here, I haven't written a good disassembler yet, which would detect these sequences as $DOCON. IMHO this is still a lesser evil than introducing new fields or flags into the word definition header.

P.S. Defining constants is a classical example of using DOES>, but a bit silly when applied to Jonesforth, where it's an intrinsic. It's an intrinsic so that certain compile-time constants, known only at assembler time, can be exposed to the Forth prelude and beyond. The other classical example of DOES> is struct-like definitions.

P.P.S. You might be wondering how I'm SEEing into code words, as neither Jonesforth nor pijFORTHos support it. I guess I'll blog about that next real soon whenever... The ( CODEWORD XXX ) business here shows the "code word" pointed to by the CFA, which is necessarily not DOCOL (otherwise it would be a regular colon definition, not CODE). The ( CODEWORD word+offset ) notation tells you that the machine words pointed to by the CFA are part of a different word. Native (jonesforth.s-defined) intrinsics would decompile as something like:

Sunday, July 5, 2015

Implementing ;CODE in AArch32 Jonesforth for real

The Jonesforth ;CODE definition is unfortunately little more than a curiosity. After all, if you wanted to write a native machine word, you'd probably follow along and implement it inside jonesforth.s proper using the defcode macro. The real power of ;CODE would be in coupling with the CREATE word, letting you have words that define other words.

I.e. we want to be able to do something like:
    defword "$DOCON",F_IMM,ASMDOCON
        .int LIT            @ r0 points to DFA
        ldr r12, [r0, #4]   @ read cell from DFA
        .int COMMA
        .int LIT
        PUSHDSP r12         @ push to stack
        .int COMMA
        .int EXIT
    : MKCON
       0 ,        ( push dummy codeword, rewritten by (;CODE) )
       ,          ( actual constant )
    5 MKCON CON5  ( create word CON5 that will push 5 on stack )
    CON5 . CR     ( prints 5 )
So ;CODE is the variant to be used with CREATE, while the plain ol' make-me-a-native-word variant is called CODE. And both get to be matched with END-CODE, not semicolon. At least according to F83 or something. We're not trying to stick to any Forth standard, but the definitions have to be useful...right? So the ;CODE business now looks a bit different:
    \ This used to look like : FOO ;CODE

    @ push r0 to stack
    defword "$<R0",F_IMM,ASMFROMR0
        .int LIT
        PUSHDSP r0
        .int COMMA
        .int EXIT
    @ push r7 to stack
    defword "$<R7",F_IMM,ASMFROMR7
        .int LIT
        PUSHDSP r7
        .int COMMA
        .int EXIT
    @ pop stack to r0
    defword "$>R0",F_IMM,ASMTOR0
        .int LIT
        POPDSP r0
        .int COMMA
        .int EXIT
    @ pop stack to r7
    defword "$>R7",F_IMM,ASMTOR7
        .int LIT
        POPDSP r7
        .int COMMA
        .int EXIT
    CODE SWAP $>R0 $>R7 $<R0 $<R7 END-CODE
    HEX 1337 FOOF SWAP . . ( prints 1337 FOOF )
So now for the actual definitions. It /looks/ pretty tame...but it took me a week to wrap my mind around it.
: (;CODE) R> LATEST @ >CFA ! ;
: CODE : (CODE) ;
Most interesting here is the behavior of ;CODE. Let's examine the example I gave first. It's an IMMEDIATE word that will compile (;CODE) into MKCON, followed by the machine code placed by generators like $NEXT or $DOCON. When MKCON is executed, it will then update the CFA of CON5 to point to the machine words inside MKCON that followed (;CODE), instead of DOCOL. The address of machine words of course is on the return stack since it's the first word following (;CODE). Aaaaand because we pop the return address, we end up EXITing not to MKCON from (;CODE) but to its caller, thereby not crashing on the crazy machine code placed by $DOCON.

Fun. Hope that made sense. I had to meditate for a while over Brad Rodriguez' Moving Forth 3 ( article before it made any sense to me. But like all ingenious beautiful things, it ends up being dead simple.

Implementing ;CODE in AArch32 Jonesforth

So I got a new Raspberry Pi and me being me got sidetracked playing with a toy Forth implementation, pijFORTHos (,which is a standalone AArch32 port of Jonesforth (, which is/was an IA32-only affair. Of course I've always been amused by the idea of writing a kernel in why not? Sadly, I probably won't do much with this...

Anyway. To cut to the chase, pijFORTHos was missing the ;CODE functionality from Jonesforth 47, which let you define native machine words in Forth... i.e. an assembler, basically. A couple of completely empty and useless examples that do nothing (and yet not crash) would look like:
The later is redundant, since $NEXT is already emitted by ;CODE. The implementation is straighforward.. Although I took the liberty of sticking it into jonesforth.s instead of the Forth prelude, and in the actual commit I'm a bit smarter about defining $NEXT and the actual _NEXT/NEXT bits used by the Forth core itself. You wonder why bother emitting the NEXT bits inline instead of branching, but the later would take up 3 cells as well (ldr, bx and immed for ldr) and also involve a branch. Look at how the $NEXT word is defined. Isn't this crazy? It's an IMMEDIATE word that writes literals, which just happen to be machine code, at HERE, effectively compiling them into the current word definition when used in compiler mode (such as a colon definition).
@ $NEXT ( -- ) emits the _NEXT body at HERE, to be used
@ in ;CODE or ;CODE-defined words.
       .int LIT
        ldr r0, [FIP], #4
       .int COMMA
       .int LIT
        ldr r1, [r0]
       .int COMMA
       .int LIT
        bx r1
       .int COMMA
       .int EXIT
@ Finishes a machine code colon definition in Forth, as
@ a really basic assembler.
       .int ASMNEXT                      @ end the word with NEXT macro
       .int LATEST, FETCH, DUP           @ LATEST points to the compiled word
       .int HIDDEN                       @ unhide the compiled word
       .int DUP, TDFA, SWAP, TCFA, STORE @ set codeword to data instead of DOCOL
       .int LBRAC                        @ just like ";" exit compile mode
       .int EXIT