GHC on m68k

This all started from this ghc bug report.

The bug stated that on m68k ABI return values of int-type are passed in register %d0 while void *-type are passed in register %a0. GHC’s C codegen was not using return types consistently.

I had zero knowledge of m68k at that time. I only had a vague impression that CPU of this type is typically attached to ~1MB RAM or something. I expected it be 32-bit-ish or less :)

Portability bugs are my favourite and I’ve started The Quest by building a C crosscompiler.

Gentoo allows you to do it with crossdev script:

$ crossdev -t m68k-unknown-linux-gnu -t m68k-unknown-linux-gnu # Done! $ m68k-unknown-linux-gnu-gcc --version --version m68k-unknown-linux-gnu-gcc (Gentoo 5.3.0 p1.0, pie-0.6.5) 5.3.0 (Gentoo 5.3.0 p1.0, pie-0.6.5)

the assembly

The m68k ISA is very simple. Smaller example from bug report:

long f( long a) { return a; } f(a) {a; } void * g( void * p) { return p; } * g(* p) {p; }

# m68k-unknown-linux-gnu-gcc -S -O2 -fomit-frame-pointer a.c f: 4 (% sp ),%d0 move.l(%),%d0 rts g: 4 (% sp ),%a0 move.l(%),%a0 move.l %a0,%d0 rts

Real-world code example (just to get a feeling about patterns used by gcc):

# m68k-unknown-linux-gnu-objdump -d /usr/m68k-unknown-linux-gnu/usr/bin/locale 80001754 <.text>: : 80001754: 4e56 fff8 linkw %fp,#- 8 fff8 linkw %fp,#- 80001758: 48e7 3f3c moveml %d2-%d7/%a2-%a5,%sp@- 3f3c moveml %d2-%d7/%a2-%a5,%sp@- 8000175c: 42b9 8000 96b0 clrl 800096b0 <stdout@@GLIBC_ 2.0 + 0x1c > 42b996b0 clrl 800096b0 80001762: 42b9 8000 96ac clrl 800096ac <stdout@@GLIBC_ 2.0 + 0x18 > 42b996ac clrl 800096ac 80001768: 4879 8000 4258 pea 80004258 <_IO_stdin_used@@Base+ 0x178 > pea 8000176e: 42a7 clrl %sp@- 42a7 clrl %sp@- 80001770: 45f9 8000 163c lea 8000163c < setl ocale@plt>,%a2 45f9163c8000163c ,%a2 80001776: 4e92 js r %a2@ r %a2@ 80001778: 508f addql # 8 ,% sp 508f addql #,% 8000177a: 4a88 tstl %a0 4a88 tstl %a0 8000177c: 6700 02a2 beqw 80001a20 <calloc@plt+ 0x2e0 > 02a2 beqw 80001a20 80001780: 4879 8000 4258 pea 80004258 <_IO_stdin_used@@Base+ 0x178 > pea 80001786: 4878 0005 pea 5 <strstr@plt -0x800011eb > pea 8000178a: 4e92 js r %a2@ r %a2@ 8000178c: 508f addql # 8 ,% sp 508f addql #,% 8000178e: 4a88 tstl %a0 4a88 tstl %a0 80001790: 6700 02cc beqw 80001a5e <calloc@plt+ 0x31e > 02cc beqw 80001a5e 80001794: 4879 8000 968c pea 8000968c <_libc_intl_domainname@@GLIBC_ 2.0 > 968c pea 8000968c 8000179a: 4eb9 8000 131c js r 8000131c <textdomain@plt> 4eb9131cr 8000131c 800017a0: 42a7 clrl %sp@- 42a7 clrl %sp@- 800017a2: 486e fff8 pea %fp@(- 8 ) 486e fff8 pea %fp@(- 800017a6: 42a7 clrl %sp@- 42a7 clrl %sp@- 800017a8: 2f2e 000c movel %fp@( 12 ),%sp@- 2f2e 000c movel %fp@(),%sp@- 800017ac: 2f2e 0008 movel %fp@( 8 ),%sp@- 2f2emovel %fp@(),%sp@-

first attempt

“Should be easy to weed out all the bugs” was my thought :)

Having spent some time arguing with gcc devs about handling of incomplete prototypes in gcc bugzilla and on gcc mailing list it was time to start fixing GHC.

I was familiar with the GHC code responsible for C code generation and gave first shot without actually building GHC compiler for m68k: patch.

Adrian tried to test the patch and reported GHC still SIGSEGVed as before. It was not very clear if patch changed anything. Did it crash later than used to or did I break something subtle?

cross-ghc nano-howto

Next step was to try to actually build GHC as a crosscompiler.

It appears to be as simple as building cross-gcc:

That’s it! The first line is enough but it’s faster for subsequent rebuilds to use system dependencies and not bundled libs. Thus i’ve installed dependencies to the SYSROOT:

$ ARCH= m68k emerge-m68k-unknown-linux-gnu -1 \ m68k-1 sys-libs/ncurses \ dev-libs/gmp \ virtual/libffi \ sys-libs/binutils-libs

And i’ve got cross-ghc! Testing:

$ echo 'main = print 42' > /tmp/hi.hs /tmp/hi.hs $ inplace/bin/ghc-stage1 --make /tmp/hi.hs --make /tmp/hi.hs $ file /tmp/hi /tmp/hi /tmp /hi: ELF 32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), /hi:32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.1, for GNU/Linux 2.6.32, not stripped linked, interpreter /lib/ld.so.1, for GNU/Linux 2.6.32, not stripped

Simple!

qemu-user nano-howto

qemu allows you to run not only system images (qemu-system) but also standalone executables (qemu-user) for foreign architectures and even other kernels!

The simple usage example for powerpc64 (it’s slightly better supported by qemu) would be:

#include <stdio.h> int main() { main() { "Hello world!

" ); printf (); }

$ powerpc64-unknown-linux-gnu-gcc a.c -o a a.c -o a $ /usr/bin/qemu-ppc64 -L /usr/powerpc64-unknown-linux-gnu ./a -L /usr/powerpc64-unknown-linux-gnu ./a Hello world! world!

With help of binfmt_misc kernel module you can run foreign binaries seamlessly as native binaries. This allows running things like testsuites for crosscompilers as they would have produced binaries you can run on your machine.

Setup for powerpc64:

$ echo ':ppc64:M::\x7fELF\x02\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x15:\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/qemu-ppc64:' > /proc/sys/fs/binfmt_misc/register /proc/sys/fs/binfmt_misc/register $ cat /qemu-ppc64 /qemu-ppc64 #!/bin/bash exec /usr/bin/qemu-ppc64 -L /usr/powerpc64-unknown-linux-gnu/ " $@ " /usr/bin/qemu-ppc64 -L /usr/powerpc64-unknown-linux-gnu/

$ file a a : ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), : ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, interpreter /lib64/ld64.so.1, for GNU/Linux 2.6.32, not stripped linked, interpreter /lib64/ld64.so.1, for GNU/Linux 2.6.32, not stripped $ ./a Hello world! world!

Gentoo has a detailed wiki page on how to get qemu-user on your system.

m68k hoops

I did the same for m68k but found out qemu-m68k can’t even run simplest programs:

$ /usr/bin/qemu-m68k -L /usr/m68k-unknown-linux-gnu ./a -L /usr/m68k-unknown-linux-gnu ./a qemu : fatal: Illegal instruction: ebc0 @ f67e36a8 : fatal: Illegal instruction: ebc0 @ f67e36a8 D0 = 6ffffef5 A0 = f67fb6bc F0 = 0000000000000000 ( 0) = 6ffffef5 A0 = f67fb6bc F0 = 0000000000000000 ( 0) D1 = 0000010a A1 = f67de000 F1 = 0000000000000000 ( 0) = 0000010a A1 = f67de000 F1 = 0000000000000000 ( 0) D2 = 0000000f A2 = f6ffec04 F2 = 0000000000000000 ( 0) = 0000000f A2 = f6ffec04 F2 = 0000000000000000 ( 0) D3 = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0) = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0) D4 = 00000000 A4 = 00000000 F4 = 0000000000000000 ( 0) = 00000000 A4 = 00000000 F4 = 0000000000000000 ( 0) D5 = 00000000 A5 = f67fb774 F5 = 0000000000000000 ( 0) = 00000000 A5 = f67fb774 F5 = 0000000000000000 ( 0) D6 = 00000000 A6 = f6ffee54 F6 = 0000000000000000 ( 0) = 00000000 A6 = f6ffee54 F6 = 0000000000000000 ( 0) D7 = 00000000 A7 = f6ffec04 F7 = 0000000000000000 ( 0) = 00000000 A7 = f6ffec04 F7 = 0000000000000000 ( 0) PC = f67e36a8 SR = 0000 ----- FPRESULT = 0 = f67e36a8 SR = 0000 ----- FPRESULT = 0

That means qemu does not support some instructions generated by gcc.

objdump -d can help us find out if it’s a sane opcode or gcc/ld generated garbage:

$ m68k-unknown-linux-gnu-gcc -static a.c -o a -static a.c -o a $ m68k-unknown-linux-gnu-objdump -d ./a | grep ebc0 -d ./aebc0 800180ec : ebc0 105f bfexts %d0,1,31,%d1 : ebc0 105f bfexts %d0,1,31,%d1 8001ebc0 : 226e 0014 moveal %fp@(20), %a1 : 226e 0014 moveal %fp@(20), 80037ab8 : ebc0 105f bfexts %d0,1,31,%d1 : ebc0 105f bfexts %d0,1,31,%d1 8003ebc0 : 6612 bnes 8003ebd4 < _dl_close_worker+0x96a > : 6612 bnes 8003ebd4_dl_close_worker+0x96a 80058168 : 61ff ffff ebc0 bsrl 80056d2a < get_cie_encoding > : 61ff ffff ebc0 bsrl 80056d2aget_cie_encoding

ebc0 looks like a normal bfexts instruction (Bit Field Extract Signed). But Debian guys who reported a bug somehow can run m68k chroot!

hoop #1: qemu fork

Debian has a nice wiki page about it Basically the trick to emulate m68k binaries is to use Laurent’s qemu git tree. It allowed me to run C hello-world.

hoop #2: minimum alignment

Time to try haskell hello-world:

$ echo 'main = print 42' > /tmp/hi.hs /tmp/hi.hs $ inplace/bin/ghc-stage1 --make -debug /tmp/hi.hs --make -debug /tmp/hi.hs $ /tmp/hi SIGSEGV

We are on the right track. qemu understands all the instructions it sees.

qemu-user is capable of generating nice gdb-readable core files.

$ gdb /tmp/hi qemu_hi*.core /tmp/hi qemu_hi*.core Core was generated by ` /tmp/hi +RTS -Ds -Di -Dw -DG -Dg -Db -DS -Dt -Dp -Da -Dl -Dm -Dz -Dc -Dr '. was generated by+RTS -Ds -Di -Dw -DG -Dg -Db -DS -Dt -Dp -Da -Dl -Dm -Dz -Dc -Dr Program terminated with signal SIGSEGV, Segmentation fault. #0 0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858) at includes/rts/storage/ClosureMacros.h:248 248 return (info->type != INVALID_OBJECT && info->type < N_CLOSURE_TYPES) ? rtsTrue : rtsFalse; (gdb) bt #0 0x80463b0a in LOOKS_LIKE_INFO_PTR_NOT_NULL (p=32858) at includes/rts/storage/ClosureMacros.h:248 #1 0x80463b46 in LOOKS_LIKE_INFO_PTR (p=32858) at includes/rts/storage/ClosureMacros.h:253 #2 0x80463b6c in LOOKS_LIKE_CLOSURE_PTR (p=0x805aac6e <stg_dummy_ret_closure>) at includes/rts/storage/ClosureMacros.h:258 #3 0x80463e4c in initStorage () at rts/sm/Storage.c:121 #4 0x8043ffb4 in hs_init_ghc (argc=0xf6ffebf0, argv=0xf6ffebf4, rts_config=...) at rts/RtsStartup.c:181 #5 0x80455982 in hs_main (argc=1, argv=0xf6ffed54, main_closure=0x804b8f70 <ZCMain_main_closure>, rts_config=...) at rts/RtsMain.c:51 #6 0x80003c1c in main ()

These +RTS -Ds -Di -Dw -DG -Dg -Db -DS -Dt -Dp -Da -Dl -Dm -Dz -Dc -Dr are all flags for debug runtime. Sanity checks of sorts.

This sanity failure basically says that it got tagged pointer where if should not be tagged. GHC uses least significant bits for closure pointers to distinct between evaluated and unevaluated closures. On 32-bit systems last 2 bits are used for tags assuming those will always be zero.

// includes/Cmm.h #if SIZEOF_VOID_P == 4 #define W_ bits32 /* Maybe it's better to include MachDeps.h */ #define TAG_BITS 2 #elif SIZEOF_VOID_P == 8 #define W_ bits64 /* Maybe it's better to include MachDeps.h */ #define TAG_BITS 3 #else #error Unknown word size #endif /* * The RTS must sometimes UNTAG a pointer before dereferencing it. * See the wiki page Commentary/Rts/HaskellExecution/PointerTagging */ #define TAG_MASK ((1 << TAG_BITS) - 1) #define UNTAG(p) (p & ~TAG_MASK) #define GETTAG(p) (p & TAG_MASK)

// includes/rts/storage/ClosureMacros.h ... static inline StgClosure * StgClosure * UNTAG_CLOSURE(StgClosure * p) { return (StgClosure*)((StgWord)p & ~TAG_MASK); (StgClosure*)((StgWord)p & ~TAG_MASK); }

Back to our backtrace. stg_dummy_ret_closure is an int array (untagged) with currently evaluated closure. But it’s address is clearly not aligned by 4-byte boundary:

Prelude > 0x805aac6e `mod` 4 2

Most of architectures have sizeof(int) == __alignof__(int) alignment for various reasons (performance, lack of support for unaligned access in memory/cache unit).

#include <stdio.h> int main() { main() { "sizeof (int) = %u, __alignof__ (int) = %u

" printf ( unsigned ) sizeof ( int ), ( unsigned ) __alignof__ ( int )); , (), () __alignof__ ()); return 0 ; }

Here is the result for a few targets:

target sizeof (int) __alignof__ (int) x86_64-pc-linux-gnu 4 4 i686-pc-linux-gnu 4 4 powerpc-unknown-linux-gnu 4 4 powerpc64-unknown-linux-gnu 4 4 sparc-unknown-linux-gnu 4 4 sparc64-unknown-linux-gnu 4 4 ia64-unknown-linux-gnu 4 4 <you get the pattern> … … m68k-unknown-linux-gnu 4 2(!)

The fix for this is simple: we need to add __attribute__((aligned(sizeof(int)))) to all closure declarations in .data section. Thus the fix does roughly that.

Python also had suspiciously similar problem reported here.

hoop #3: qemu bug

I did not expect GHC to work at this time as I’ve not fixed int / void* prototypes mismatch yet. But hello-world haskell program was already working!

The next step is to try to run GHC testsuite and see what will break:

$ make fasttest TEST_HC= $( pwd ) /inplace/bin/ghc-stage1 THREADS=12 fasttest TEST_HC=/inplace/bin/ghc-stage1 THREADS=12

testsuite/mk/*inplace_bin_ghc-stage1.mk requires some manual tweaks in autodetected variables for our crosscompiler. Updated variables:

WORDSIZE=32 # was 64 TARGETPLATFORM=m68k-unknown-linux # was x86_64-unknown-linux TargetARCH_CPP=m68k # was x86_64 GhcWithInterpreter=NO # was YES

What was unclear is why qemu itself did throw assertion errors on many samples:

qemu-m68k/tcg/tcg.c:1774: tcg fatal error

The samples seemingly didn’t do anything fancy. It was time to dive into qemu!

I’ve picked one of small testcases that failed: mul2

-- testsuite/tests/numeric/should_run/mul2.hs {-# LANGUAGE MagicHash, UnboxedTuples #-} import GHC.Prim import GHC.Word import Data.Bits main :: IO () () = do f 5 6 main f 0xFD94E3B7FE36FB18 49 f 0xFD94E3B7FE36FB18 0xFC1D8A3BFB29FC6A f :: Word -> Word -> IO () () @ ( W # x) wy @ ( W # y) f wxx) wyy) = do putStrLn "-----" putStrLn ( "Doing " ++ show wx ++ " * " ++ show wy) wxwy) case x `timesWord2#` y of ( # h, l # ) -> h, l do let wh = W # h wh = W # l wl r = shiftL ( fromIntegral wh) (bitSize wh) shiftL (wh) (bitSize wh) + fromIntegral wl wl putStrLn ( "High: " ++ show wh) wh) putStrLn ( "Low: " ++ show wl) wl) putStrLn ( "Result: " ++ show ( r :: Integer )) ))

Stepping aside a few words on how qemu works:

qemu is a tiny jit compiler which has the following passes:

target native instructions->TCG IR: JIT decodes target’s instructions and translates them into intermediate representation (TCG). TCG IR->TCG IR: a few basic optimisations like dead store elimination TCG IR->host-native instructions: intermediate representation is then compiled to host’s instruction set execute host-native instructions: executed host code; return status explains why generated code finished

qemu does these steps not on instruction-by-instruction basis but on sequence-by-sequence basis. Simplictically qemu disassembles up to the first branching instruction.

Our qemu assertion happens in pass 3. somewhere here

To find out bad instruction sequence it’s enough to run qemu with -d in_asm flag. It dumps whatever is read by pass 1.

$ qemu-m68k-git -d in_asm -L /usr/m68k-unknown-linux-gnu/ /tmp/mul2 ---------------- IN: 0xf550c812: moveq #32,%d5 0xf550c814: subl %d4,%d5 0xf550c816: movel %d6,%d0 0xf550c818: asll #2,%d0 0xf550c81a: addal %d0,%a0 0xf550c81c: addal %d0,%a1 0xf550c81e: movel %a0@-,%d2 0xf550c820: movel %d2,%d0 0xf550c822: lsrl %d5,%d0 0xf550c824: lsll %d4,%d2 0xf550c826: movel %d2,%d1 0xf550c828: subql #1,%d6 0xf550c82a: beqs 0xf550c856 # qemu-m68k/tcg/tcg.c:1774: tcg fatal error

The offending instruction sequence was quite large but it was easy to trim it down to 2 instructions:

_start: 2 ,%d0 asll #,%d0 movel %a0@,%d2 rts

I’ve just copied original disassembly dump, pasted it in a text file and built a tiny program from it (removing instructions one by one):

$ m68k-unknown-linux-gnu-gcc -nostdlib -nostartfiles m68k.S -o foo -nostdlib -nostartfiles m68k.S -o foo $ qemu-m68k-git -d in_asm -L /usr/m68k-unknown-linux-gnu/ ./foo -d in_asm -L /usr/m68k-unknown-linux-gnu/ ./foo # fails as: # IN: # 0x80000054: asll #2,%d0 # 0x80000056: movel %a0@,%d2 # 0x80000058: rts # qemu-m68k/tcg/tcg.c:1774: tcg fatal error

We don’t need to get a working program. It’s enough to keep qemu crashing in TCG pass.

Having explored failure a bit more I’ve found out it’s enough to have single bad instruction to appear and crash qemu:

# m68k.S _start: 1 ,%d0 asll #,%d0 br _start

Using qemu’s -d op flag allows us to look at intermediate representation of TCG:

$ m68k-unknown-linux-gnu-gcc -nostdlib -nostartfiles m68k.S -o foo -nostdlib -nostartfiles m68k.S -o foo $ qemu-m68k-git -d in_asm,op -L ./foo -d in_asm,op -L ./foo # asll #1,%d0 muops: ---- 80000054 ffffffff 80000054 ffffffff movi_i32 tmp0, $0 x0 # tmp0 = 0 tmp0,x0 # tmp0 = 0 movi_i32 tmp1, $0 x1f # tmp1 = 31 tmp1,x1f # tmp1 = 31 shr_i32 CC_C,D0,tmp1 # CC_C = D0 >> tmp1 CC_C,D0,tmp1 # CC_C = D0tmp1 movi_i32 tmp1, $0 x1 # tmp1 = 1 tmp1,x1 # tmp1 = 1 shl_i32 CC_N,D0,tmp1 # CC_N = D0 << tmp1 CC_N,D0,tmp1 # CC_N = D0 mov_i32 CC_V,tmp0 # CC_V = tmp0 $0 x1f # tmp1 = 31 movi_i32 tmp1,x1f # tmp1 = 31 $0 x1e # tmp3 = 30 movi_i32 tmp3,x1e # tmp3 = 30 shr_i32 CC_V,D0,tmp3 # CC_V = D0 << tmp3 sar_i32 tmp2,D0,tmp2 # tmp2 = D0 >> tmp2, but tmp2 is used uninitilalized! ...

Our tiny instruction got translated to a pile of intermediate muops. An interesting fact: tmp2 (a temporary register) is used in sar_i32 without initialization.

It’s clearly qemu’s decoding pass 1. bug (could be optimizing pass 2. but I disabled it) which generates read from freshly introduced temp register.

// somewhere in target-m68k/translate.c static inline void shift_im(DisasContext *s, uint16_t insn, int opsize) { shift_im(DisasContext *s,insn,opsize) { .... TCGv t0 = tcg_temp_new(); 1 - count); tcg_gen_shri_i32(QREG_CC_V, reg, bits -- count); // that's our broken shift! tcg_gen_sar_i32(t0, reg, t0); tcg_gen_not_i32(t0, t0); ...

I’ve fixed that with the simple patch and reran testsuite again.

hoop #4: crosscompiling doubles is hard

Or not that hard :) A lot of floating-point related tests worked without crashes but resulted in garbage outputs.

The broken test was to print a Float (32bit ‘float’ type in C-speak):

v :: Float v = 43 = print v -- prints "0.0" main

This code printed “0.0” when was built with ghc-stage1. At this point I’ve looked up if m68k was a big-endian platform by chance. And it was!

GHC does weird things to encode double values in generated code. Basically it uses platform’s pointer-sized integrals to encode everything. GHC calls the type StgWord:

// somewhere in includes/stg/Types.h #if SIZEOF_INT == 4 typedef unsigned int StgWord32; StgWord32; #elif SIZEOF_LONG == 4 typedef unsigned long StgWord32; StgWord32; #else #error GHC untested on this architecture: sizeof(int) != 4 #endif ... #if SIZEOF_VOID_P == 8 typedef StgWord64 StgWord; StgWord64 StgWord; #else #if SIZEOF_VOID_P == 4 typedef StgWord32 StgWord; StgWord32 StgWord; #else #error GHC untested on this architecture: sizeof(void *) != 4 or 8 #endif #endif

Thus in the end of the day GHC encodes ‘double’ and ‘float’ values as StgWord closure[] = { (StgWord)&useful_callback, 0x00000000, 0x40458000, … }.

I’s an interesting question how 43.0 should look like:

int main() { main() { double d = 43.0 ; d = float f = 43.0 ; f = int * i2 = &d; printf ( "i[2]: %08X %08X

" , i2[ 0 ], i2[ 1 ]); * i2 = &d; printf (, i2[], i2[]); long long * l = &d; printf ( " l: %016llX

" , l[ 0 ]); * l = &d; printf (, l[]); int * i = &f; printf ( " i: %08X

" , i[ 0 ]); * i = &f; printf (, i[]); }

# 64bit-LE $ x86_64-pc-linux-gnu-gcc a.c -o a && ./a a.c -o a i [2]: 00000000 40458000 [2]: 00000000 40458000 l : 4045800000000000 : 4045800000000000 i : 422C0000 : 422C0000 # 32bit-LE $ i686-pc-linux-gnu-clang a.c -o a && ./a a.c -o a i [2]: 00000000 40458000 [2]: 00000000 40458000 l : 4045800000000000 : 4045800000000000 i : 422C0000 : 422C0000 # 32bit-BE $ m68k-unknown-linux-gnu-gcc a.c -o a && ./a a.c -o a i [2]: 40458000 00000000 [2]: 40458000 00000000 l : 4045800000000000 : 4045800000000000 i : 422C0000 : 422C0000 # 64bit-BE $ powerpc64-unknown-linux-gnu-gcc a.c -o a && ./a a.c -o a i [2]: 40458000 00000000 [2]: 40458000 00000000 l : 4045800000000000 : 4045800000000000 i : 422C0000 : 422C0000 # 64bit-BE $ sparc64-unknown-linux-gnu-gcc a.c -o a && ./a a.c -o a i [2]: 40458000 00000000 [2]: 40458000 00000000 l : 4045800000000000 : 4045800000000000 i : 422C0000 : 422C0000

The only thing that changes here is the order of elements of i[2] array. Just as like you would see if you stored 64-bit integral value.

The problem in original code was the assumption of target having the same endianness as a host. It took me a while to fix it cleanly in GHC: the result.

hoop #5: Cmm call annotations

Even more tests started working better but some SIGSEGVs still remained. I’ve looked at one example and noticed stack squeezing code consistently crashed garbage collector right after threadStackUnderflow() function call.

It was another case of int versus void* return type in m68k. One of the intermediate representations (or just pieces of low-level code written to deal with closures) is **Cmm** (c-minus-minus)

Suppose we need to call external C function from haskell. Typical cmm code would look like that:

foo(bits32 a) { bits32 r; "C" cfun(a); (r) = foreigncfun(a); return (r); (r); }

Generated C code for it looks like that:

// $ inplace/bin/ghc-stage1 -c -keep-tmp-files a.cmm // $ cat /tmp/ghc22508_0/ghc_3.hc /* GHC_PACKAGES */ #include "Stg.h" EFF_(cfun); FN_(foo) { StgWord32 _c1; StgWord32 _c2; W_ _c3; StgWord32 _c4; _c5: _c1 = R1.w; _c3 = (W_)&cfun; _c4 = _c1; _c2 = ((StgWord32 (*)(StgWord32))_c3)(_c4);; R1.w = _c2; JMP_(*((P_)(*Sp))); }

Let’s tweak original example and add a hint to result type:

@@ -1,6 +1,6 @@ foo(bits32 a) { bits32 r; - (r) = foreign "C" cfun(a); + ("ptr" r) = foreign "C" cfun(a); return (r); }

@@ -1,18 +1,18 @@ /* GHC_PACKAGES */ #include "Stg.h" .... EFF_(cfun); FN_(foo) { StgWord32 _c1; StgWord32 _c2; W_ _c3; StgWord32 _c4; _c5: _c1 = R1.w; _c3 = (W_)&cfun; _c4 = _c1; - _c2 = ((StgWord32 (*)(StgWord32))_c3)(_c4);; + _c2 = (StgWord32)((void * (*)(StgWord32))_c3)(_c4);; R1.w = _c2; JMP_(*((P_)(*Sp))); }

On most 32-bit platforms this change does nothing. But on m68k it’s precisely what flips between %d0 and %a0 return registers.

The fix for this issue looks like that. I tried to eyeball most of calls’ return values but could have easily missed something.

At this point i’ve got GHCi working:

$ inplace/bin/ghc-stage2 --interactive --interactive GHCi , version 8.1.20160305: http://www.haskell.org/ghc/ :? for help , version 8.1.20160305: http://www.haskell.org/ghc/ :? for help Loaded GHCi configuration from /home/slyfox/.ghci GHCi configuration from /home/slyfox/.ghci

The result

Not all tests are passing yet. But the result is good enough for people to try out GHC and fix the bugs themselves. I’ve triaged some of these failures but did not bother fixing them yet.

Fun facts (as usual):

building GHC as a crosscompiler is simple!

as a crosscompiler is simple! m68k is a 32-bit big-endian architecture

is a 32-bit big-endian architecture m68k has alignment 2 (less than natural alignment)

has alignment 2 (less than natural alignment) m68k is a rare system which uses different registers for integral and pointer types

is a rare system which uses different registers for integral and pointer types qemu (and it’s TCG ) is a fun tiny project to play with :)

) is a fun tiny project to play with :) binfmt_misc linux kernel feature can run arbitrary binaries as executables on your system

linux kernel feature can run arbitrary binaries as executables on your system doubles are not as scary as I thought :)

as a result GHC is friendlier for BE platforms both as a host and target for crosscompilation :)

Thanks!

Posted on March 12, 2016

Please enable JavaScript to view the comments powered by Disqus.

Disqus