leonardo

View: Recent Entries. View: Archive. View: Friends. View: Profile. View: Website (My Website). February 9th, 2009 Tags: benchmark, c++, d language, programming Security: Subject: Struct/class benchmark with D and C++ Time: 02:01 pm



As usual benchmarks are very tricky things, so if you see things to fix, please tell me. I may redo the benchmarks.



---------------------

TIMINGS, N = 28, best of 3: fibs_c++: 0.27 s fibs_d: 0.38 s fib_c++: 3.06 s fib2_d: 8.50 s fib_d: 1.07 s (with scope) fib2_c++: 3.17 s (virtual method) With llvm-g++: fibs_c++: 0.31 s fib_c++: 2.73 s fib2_c++: 2.83 s (virtual method)

The results don't look much good for D.

At the bottom of this post you can see part of the ASM produced by LDC and GCC.



The slowness in the struct-based case of the struct may come from the fact that GCC seems to have unrolled lot of recursive calls, while LDC has not done it. Maybe (probably) there are ways to do the same with LLVM (and LDC) too.



The slowness of the class-based code may partially come from the virtual nature of the calls (absent in C++) and from time spent by the GC. I have tried to add the "final" keyword everywhere (in the D class code) with no difference in running time.



----------------------------------------



UPDATE Feb 11 2009:



I timed two more versions:



An alternative version of fibs_d, with "scope" where each struct is defined, like:

scope auto f1 = Fib(_value - 1);

But of course the timing is the same, it's useless, because scope acts where there's a memory allocation.



I have tried a C++ version that is more similar to the D code, using a virtual function:

virtual int value() {

The results (fi2_c++, code not shown) are a bit worse: 3.17 s instead of 3.06 s.



I have tried to make the method in fib_d final, so it's not virtual (as in C++):

final int value() {

but the running time is unchanged or a bit worse.



Then to show a more apple-to-apple comparison I have timed the C++ code using llvm-g++:

fibs_c++: 0.31 s

fib_c++: 2.73 s

fib2_c++: 2.83 s

The fibs timing is intermediate, while fib_c/fib2_c is now faster. It's a lot a matter of how the recursivity is unrolled, I think.



Benchmark details: CPU: Pentium3 500 Mhz, 256 MB RAM OS: Ubuntu 8.10 Timings done with the 'time' command, 'real' timings, best of 3 runs. Code compiled with: g++ -O3 -s fibs_cpp.cpp -o fibs_cpp ldc -O3 -release -inline fibs_d.d Compilers used: LLVM D Compiler rev.939, based on DMD v1.039 and llvm 2.4 (Wed Feb 4 23:09:12 2009) gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)

----------------------------------------

UPDATE Jul 3 2009:



I have done tests again on Pubuntu, with GCC 4.2.4 and LDC based on DMD v1.045 and llvm 2.6svn (Thu Jul 2 23:07:48 2009):

TIMINGS, N = 28, best of 3: fibs_c++: 0.02 fibs_d: 0.04 fib_c++: 0.54 fib2_d: 0.62 fib_d: 0.18 (with scope) fib2_c++: 0.56 (virtual method) Now timings are similar to the C++ code. LDC isn't able still to de-virtualize the value() method.



(A problem of LDC is that it's not a compiler, but a front-end for a compiler developed by other people, so when LDC doesn't perform certain optimizations it's sometimes not easy to understand where the problem is.)

----------------------------------------



// fibs_cpp.cpp #include "stdio.h" #define N 28 class Fib { private: int _value; public: Fib(int n) { _value = n; } int value() { if (_value <= 2) return 1; Fib f1 = Fib(_value - 1); Fib f2 = Fib(_value - 2); return f1.value() + f2.value(); } }; int main() { int tot = 0; for (int i = 0; i < 10; i++) { Fib x = Fib(N); tot += x.value(); } printf("tot: %d

", tot); return 0; } --------------------- // fibs_d.d import tango.io.Stdout: Stdout; const int N = 28; struct Fib { private int _value; int value() { if (_value <= 2) return 1; auto f1 = Fib(_value - 1); auto f2 = Fib(_value - 2); return f1.value + f2.value; } } void main() { int tot; for (int i; i < 10; i++) { auto f = Fib(N); tot += f.value; } Stdout.formatln("tot: {}", tot); } --------------------- // fib_cpp.cpp #include "stdio.h" #define N 28 class Fib { private: int _value; public: Fib(int n) { _value = n; } int value() { if (_value <= 2) return 1; Fib *f1 = new Fib(_value - 1); Fib *f2 = new Fib(_value - 2); int n1 = f1->value(); int n2 = f2->value(); delete(f1); delete(f2); return n1 + n2; } }; int main() { int tot = 0; for (int i = 0; i < 10; i++) { Fib *x = new Fib(N); tot += x->value(); delete(x); } printf("tot: %d

", tot); return 0; } --------------------- // fib_d.d import tango.io.Stdout: Stdout; const int N = 28; class Fib { private int _value; this(int n) { _value = n; } int value() { if (_value <= 2) return 1; scope f1 = new Fib(_value - 1); scope f2 = new Fib(_value - 2); return f1.value + f2.value; } } void main() { int tot; for (int i; i < 10; i++) { scope x = new Fib(N); tot += x.value; } Stdout.formatln("tot: {}", tot); } --------------------- // fib2_d.d import tango.io.Stdout: Stdout; const int N = 28; class Fib { private int _value; this(int n) { _value = n; } int value() { if (_value <= 2) return 1; auto f1 = new Fib(_value - 1); auto f2 = new Fib(_value - 2); return f1.value + f2.value; } } void main() { int tot; for (int i; i < 10; i++) { auto x = new Fib(N); tot += x.value; } Stdout.formatln("tot: {}", tot); } --------------------- ASM of fib_d.d relative to the main and struct method: .text .align 16 .globl _Dmain .type _Dmain,@function _Dmain: .Leh_func_begin1: .Llabel1: pushl %ebx pushl %edi pushl %esi subl $48, %esp xorl %esi, %esi movl %esi, %edi .align 16 .LBB1_1: # forbody movl $27, 40(%esp) movl $26, 32(%esp) leal 40(%esp), %eax call _D4fibs3Fib5valueMFZi movl %eax, %ebx leal 32(%esp), %eax call _D4fibs3Fib5valueMFZi addl %edi, %ebx addl %eax, %ebx incl %esi cmpl $10, %esi movl %ebx, %edi jne .LBB1_1 # forbody .LBB1_2: # endfor movl _D5tango2io6Stdout6StdoutC5tango2io6stream6Format20__T12FormatOutputTaZ12FormatOutput, %eax movl %ebx, 24(%esp) leal 24(%esp), %edx movl %edx, 8(%esp) movl $.str1, 16(%esp) movl $7, 12(%esp) movl $._arguments.storage, 4(%esp) movl $1, (%esp) call _D5tango2io6stream6Format20__T12FormatOutputTaZ12FormatOutput8formatlnMFAaYC5tango2io6stream6Format20__T12FormatOutputTaZ12FormatOutput subl $20, %esp xorl %eax, %eax addl $48, %esp popl %esi popl %edi popl %ebx ret $8 .size _Dmain, .-_Dmain .Leh_func_end1: .align 16 .globl _D4fibs3Fib5valueMFZi .type _D4fibs3Fib5valueMFZi,@function _D4fibs3Fib5valueMFZi: pushl %esi subl $16, %esp movl (%eax), %eax cmpl $2, %eax jg .LBB2_3 # endif .LBB2_1: # if movl $1, %eax .LBB2_2: # if addl $16, %esp popl %esi ret .LBB2_3: # endif leal -1(%eax), %ecx movl %ecx, 8(%esp) addl $4294967294, %eax movl %eax, (%esp) leal 8(%esp), %eax call _D4fibs3Fib5valueMFZi movl %eax, %esi leal (%esp), %eax call _D4fibs3Fib5valueMFZi addl %esi, %eax jmp .LBB2_2 # if .size _D4fibs3Fib5valueMFZi, .-_D4fibs3Fib5valueMFZi --------------------- ASM of fibs_cpp.cpp relative to the main and struct method: .section .text._ZN3Fib5valueEv,"axG",@progbits,_ZN3Fib5valueEv,comdat .align 2 .p2align 4,,15 .weak _ZN3Fib5valueEv .type _ZN3Fib5valueEv, @function _ZN3Fib5valueEv: .LFB36: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: subl $88, %esp .LCFI2: movl 8(%ebp), %eax movl %ebx, -12(%ebp) .LCFI3: movl %esi, -8(%ebp) .LCFI4: movl %edi, -4(%ebp) .LCFI5: movl (%eax), %edx movl $1, %eax cmpl $2, %edx jg .L46 .L3: movl -12(%ebp), %ebx movl -8(%ebp), %esi movl -4(%ebp), %edi movl %ebp, %esp popl %ebp ret .p2align 4,,7 .p2align 3 .L46: cmpl $3, %edx leal -2(%edx), %esi movl $1, -84(%ebp) jne .L47 cmpl $2, %esi movl $1, %eax jg .L48 .L19: addl -84(%ebp), %eax jmp .L3 .p2align 4,,7 .p2align 3 .L47: cmpl $2, %esi leal -3(%edx), %edi movl $1, -80(%ebp) jg .L49 .L7: cmpl $2, %edi movl $1, %eax jg .L50 .L13: addl -80(%ebp), %eax cmpl $2, %esi movl %eax, -84(%ebp) movl $1, %eax jle .L19 .L48: cmpl $3, %esi leal -2(%esi), %edi movl $1, -60(%ebp) jne .L51 .L21: cmpl $2, %edi movl $1, %eax jg .L52 .L31: addl -60(%ebp), %eax jmp .L19 .p2align 4,,7 .p2align 3 .L49: leal -4(%edx), %eax cmpl $2, %edi movl %eax, -76(%ebp) movl $1, -72(%ebp) jle .L9 movl %eax, -16(%ebp) leal -5(%edx), %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -72(%ebp) .L9: cmpl $2, -76(%ebp) movl $1, %eax jle .L11 movl -76(%ebp), %eax subl $1, %eax movl %eax, -20(%ebp) movl -76(%ebp), %eax subl $2, %eax movl %eax, -16(%ebp) leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv addl %ebx, %eax .L11: addl -72(%ebp), %eax movl %eax, -80(%ebp) jmp .L7 .p2align 4,,7 .p2align 3 .L52: cmpl $3, %edi leal -2(%edi), %esi movl $1, -44(%ebp) jne .L53 .L33: cmpl $2, %esi movl $1, %eax jg .L54 .L39: addl -44(%ebp), %eax jmp .L31 .p2align 4,,7 .p2align 3 .L51: leal -3(%esi), %eax cmpl $2, %edi movl %eax, -56(%ebp) movl $1, -52(%ebp) jle .L23 movl %eax, -16(%ebp) leal -4(%esi), %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -52(%ebp) .L23: cmpl $2, -56(%ebp) movl $1, %eax jg .L55 .L25: addl -52(%ebp), %eax movl %eax, -60(%ebp) jmp .L21 .p2align 4,,7 .p2align 3 .L50: leal -2(%edi), %eax cmpl $3, %edi movl %eax, -68(%ebp) movl $1, -64(%ebp) je .L15 movl %eax, -16(%ebp) leal -3(%edi), %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -64(%ebp) .L15: cmpl $2, -68(%ebp) movl $1, %eax jle .L17 movl -68(%ebp), %eax subl $1, %eax movl %eax, -20(%ebp) movl -68(%ebp), %eax subl $2, %eax movl %eax, -16(%ebp) leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv addl %ebx, %eax .L17: addl -64(%ebp), %eax jmp .L13 .p2align 4,,7 .p2align 3 .L55: movl -56(%ebp), %esi movl $1, -48(%ebp) subl $2, %esi cmpl $3, -56(%ebp) je .L27 movl -56(%ebp), %eax movl %esi, -20(%ebp) subl $3, %eax movl %eax, -16(%ebp) leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -48(%ebp) .L27: cmpl $2, %esi movl $1, %eax jle .L29 leal -1(%esi), %eax movl %eax, -16(%ebp) leal -2(%esi), %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv addl %ebx, %eax .L29: addl -48(%ebp), %eax jmp .L25 .p2align 4,,7 .p2align 3 .L54: cmpl $3, %esi leal -2(%esi), %edi movl $1, -32(%ebp) je .L41 leal -3(%esi), %eax movl %eax, -16(%ebp) leal -20(%ebp), %eax movl %edi, -20(%ebp) movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -32(%ebp) .L41: cmpl $2, %edi movl $1, %eax jle .L43 leal -1(%edi), %eax movl %eax, -16(%ebp) leal -2(%edi), %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv addl %ebx, %eax .L43: addl -32(%ebp), %eax jmp .L39 .p2align 4,,7 .p2align 3 .L53: leal -3(%edi), %eax cmpl $2, %esi movl %eax, -40(%ebp) movl $1, -36(%ebp) jle .L35 movl %eax, -20(%ebp) leal -4(%edi), %eax movl %eax, -16(%ebp) leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv leal (%eax,%ebx), %ebx movl %ebx, -36(%ebp) .L35: cmpl $2, -40(%ebp) movl $1, %eax jle .L37 movl -40(%ebp), %eax subl $1, %eax movl %eax, -16(%ebp) movl -40(%ebp), %eax subl $2, %eax movl %eax, -20(%ebp) leal -16(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv movl %eax, %ebx leal -20(%ebp), %eax movl %eax, (%esp) call _ZN3Fib5valueEv addl %ebx, %eax .L37: addl -36(%ebp), %eax movl %eax, -44(%ebp) jmp .L33 .LFE36: .size _ZN3Fib5valueEv, .-_ZN3Fib5valueEv .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "tot: %d

" .text .p2align 4,,15 .globl main .type main, @function main: .LFB37: leal 4(%esp), %ecx .LCFI6: andl $-16, %esp pushl -4(%ecx) .LCFI7: pushl %ebp .LCFI8: movl %esp, %ebp .LCFI9: pushl %edi .LCFI10: pushl %esi .LCFI11: pushl %ebx .LCFI12: pushl %ecx .LCFI13: subl $104, %esp .LCFI14: leal -20(%ebp), %ebx movl $27, -20(%ebp) leal -24(%ebp), %esi movl $26, -24(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -96(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -92(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -88(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -84(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -80(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -76(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -72(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -68(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -64(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -60(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -56(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -52(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -48(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -44(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %eax, -40(%ebp) movl %esi, (%esp) call _ZN3Fib5valueEv movl $27, -20(%ebp) movl $26, -24(%ebp) movl %eax, -36(%ebp) movl %ebx, (%esp) call _ZN3Fib5valueEv movl %esi, (%esp) movl %eax, %edi call _ZN3Fib5valueEv movl -84(%ebp), %edx addl -88(%ebp), %edx addl -80(%ebp), %edx addl -76(%ebp), %edx addl -72(%ebp), %edx addl -68(%ebp), %edx addl -64(%ebp), %edx addl -60(%ebp), %edx addl -56(%ebp), %edx addl -52(%ebp), %edx addl -48(%ebp), %edx addl -44(%ebp), %edx addl -40(%ebp), %edx addl -36(%ebp), %edx movl $27, -20(%ebp) movl $26, -24(%ebp) addl %edi, %edx addl %eax, %edx movl -92(%ebp), %eax addl -96(%ebp), %eax movl %ebx, (%esp) leal (%eax,%edx), %edi call _ZN3Fib5valueEv movl %esi, (%esp) movl %eax, %ebx call _ZN3Fib5valueEv movl $.LC0, 4(%esp) movl $1, (%esp) addl %ebx, %eax addl %edi, %eax movl %eax, 8(%esp) call __printf_chk addl $104, %esp xorl %eax, %eax popl %ecx popl %ebx popl %esi popl %edi popl %ebp leal -4(%ecx), %esp ret .LFE37: .size main, .-main .section .eh_frame,"a",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .globl __gxx_personality_v0 .string "zP" .uleb128 0x1 .sleb128 -4 .byte 0x8 .uleb128 0x5 .byte 0x0 .long __gxx_personality_v0 .byte 0xc .uleb128 0x4 .uleb128 0x4 .byte 0x88 .uleb128 0x1 .align 4 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB36 .long .LFE36-.LFB36 .uleb128 0x0 .byte 0x4 .long .LCFI0-.LFB36 .byte 0xe .uleb128 0x8 .byte 0x85 .uleb128 0x2 .byte 0x4 .long .LCFI1-.LCFI0 .byte 0xd .uleb128 0x5 .byte 0x4 .long .LCFI5-.LCFI1 .byte 0x87 .uleb128 0x3 .byte 0x86 .uleb128 0x4 .byte 0x83 .uleb128 0x5 .align 4 .LEFDE1: .LSFDE3: .long .LEFDE3-.LASFDE3 .LASFDE3: .long .LASFDE3-.Lframe1 .long .LFB37 .long .LFE37-.LFB37 .uleb128 0x0 .byte 0x4 .long .LCFI6-.LFB37 .byte 0xc .uleb128 0x1 .uleb128 0x0 .byte 0x9 .uleb128 0x4 .uleb128 0x1 .byte 0x4 .long .LCFI7-.LCFI6 .byte 0xc .uleb128 0x4 .uleb128 0x4 .byte 0x4 .long .LCFI8-.LCFI7 .byte 0xe .uleb128 0x8 .byte 0x85 .uleb128 0x2 .byte 0x4 .long .LCFI9-.LCFI8 .byte 0xd .uleb128 0x5 .byte 0x4 .long .LCFI13-.LCFI9 .byte 0x84 .uleb128 0x6 .byte 0x83 .uleb128 0x5 .byte 0x86 .uleb128 0x4 .byte 0x87 .uleb128 0x3 .align 4 .LEFDE3: .ident "GCC: (Ubuntu 4.3.2-1ubuntu12) 4.3.2" .section .note.GNU-stack,"",@progbits

I think the (close) future of D compilers is in LDC. I can accept the DMD backend to be not much efficient, but seeing how C programs compiled with LLVM-GCC are about as efficient as the ones compiled with GCC, I want D to run fast when compiled with LDC. So I've run similar (but not equal) benchmarks with LDC (D1 compiler, 32 bit, adapted to Tango, that has a GC partially different from the Phobos one, so memory allocation timings are different and usually better).As usual benchmarks are very tricky things, so if you see things to fix, please tell me. I may redo the benchmarks.---------------------The results don't look much good for D.At the bottom of this post you can see part of the ASM produced by LDC and GCC.The slowness in the struct-based case of the struct may come from the fact that GCC seems to have unrolled lot of recursive calls, while LDC has not done it. Maybe (probably) there are ways to do the same with LLVM (and LDC) too.The slowness of the class-based code may partially come from the virtual nature of the calls (absent in C++) and from time spent by the GC. I have tried to add the "final" keyword everywhere (in the D class code) with no difference in running time.----------------------------------------UPDATE Feb 11 2009:I timed two more versions:An alternative version of fibs_d, with "scope" where each struct is defined, like:scope auto f1 = Fib(_value - 1);But of course the timing is the same, it's useless, because scope acts where there's a memory allocation.I have tried a C++ version that is more similar to the D code, using a virtual function:virtual int value() {The results (fi2_c++, code not shown) are a bit worse: 3.17 s instead of 3.06 s.I have tried to make the method in fib_d final, so it's not virtual (as in C++):final int value() {but the running time is unchanged or a bit worse.Then to show a more apple-to-apple comparison I have timed the C++ code using llvm-g++:fibs_c++: 0.31 sfib_c++: 2.73 sfib2_c++: 2.83 sThe fibs timing is intermediate, while fib_c/fib2_c is now faster. It's a lot a matter of how the recursivity is unrolled, I think.----------------------------------------UPDATE Jul 3 2009:I have done tests again on Pubuntu, with GCC 4.2.4 and LDC based on DMD v1.045 and llvm 2.6svn (Thu Jul 2 23:07:48 2009):Now timings are similar to the C++ code. LDC isn't able still to de-virtualize the value() method.(A problem of LDC is that it's not a compiler, but a front-end for a compiler developed by other people, so when LDC doesn't perform certain optimizations it's sometimes not easy to understand where the problem is.)---------------------------------------- comments: Leave a comment

(Anonymous) Subject: Use LLVM backend for all the tests Link: (Link) Time: 2009-02-09 05:23 pm (UTC) If you really want to benchmark C++ vs. D, you should probably use the same backend. You can use llvm-g++ which uses GCC's C++ frontend and LLVM backend. (Reply) (Thread)



leonardo_m Subject: Re: Use LLVM backend for all the tests Link: (Link) Time: 2009-02-09 08:44 pm (UTC) You are right, I'll improve the timing data soon. (Reply) (Parent) (Thread)

(Anonymous) Subject: is this really a problem? Link: (Link) Time: 2009-03-10 02:40 pm (UTC) This benchmark seems a little artificial: fib2.d is simply the wrong way to do it (classes in c++ and d are not quite the same so its not really apples to apples). This type of computation really should be done with a value type, and only allocated on the heap if we were worried about stack. In that case though the algorithm would probably change to a preallocated vector container or something.



It would be nice to see faster class instantiation of course, but I don't think that's a deal breaker in most cases.



merlin (Reply) (Thread)



leonardo_m Subject: Re: is this really a problem? Link: (Link) Time: 2009-03-10 03:13 pm (UTC) I think it's a problem, even if a small one. Many small problems summed together lead to a slow performance, or to lot of tuning time to have a fast program.



Take a small C++ program, like a 1000-lines long good enough C++ toy ray tracer (a program not already translated to D), translate it to D, and then take a look at how much time you need to obtain a program with 80% of the performance of the C++ code compiled with a good enough compiler (DMD doesn't have a good enough back-end).



If you want enough C++ programmers to start using D, you have to care for the D performance compared to C++ too (even if it's not exactly the same), because they too will perform benchmarks.

So my benchmarks are much better than nothing, because they reveal small or big troubles. (Reply) (Parent) (Thread)

(Anonymous) Subject: Re: is this really a problem? Link: (Link) Time: 2009-07-15 03:48 pm (UTC) You are exactly right here. As somebody coming from C++, Java and C# the D programming language really looks very promising to me. It has a sophisticated design which leads to a much higher productivity and of course fun when coding in it than in C++. But at the moment the development time you saved while using it, you have to invest in optimizations.



Of course you should never write a D program like you would write the equivalent program in C++, Java or C#, only because the D programming language looks, from the syntax point of view, very similar to the other languages and then wonder why your programs are slow.



But the D frontend and the original dmd backend lack a lot of now really common optimizations and at the moment you still have to do a lot of optimizations by hand to get near C++ speed (which is really possible with d which shows that it has the potential of a future C++ replacement) which can also lead to make code look a bit untidy.



If you write your programs in C++, Java or C# in a clean code style and don't do any big mistakes you will always get a result which has a good performance right from the start. In D i sometimes was shocked how my "well-written" code performed after i compiled it and only a lot of tweaking got the speed right.



So to appeal C++, Java, C# and x (x for your favorite programming language) programmers to D it has to produce executables out of the box which are, from a performance point of view, at least comparable to Java and for the C++ crowd even a bit closer to C++.



I can live with having about 75% of the speed of C++ if i can have the better productivity of D. But at the moment sometimes getting 75% of the speed of C++ with D out of the box without any optimizations is rarely possible.



So i hope Leonardo you keep going on. And even if it might sometimes be a bit annoying to the D community it is good that somebody points out the weak spots. It is not about making the language look bad but to improve it. If D wants to succeed, these are the things which have to be addressed.



(Reply) (Parent) (Thread)



leonardo_m Subject: Re: is this really a problem? Link: (Link) Time: 2009-07-15 09:40 pm (UTC) You are exactly right here. As somebody coming from C++, Java and C# the D programming language really looks very promising to me.



For a lot of things C# is better than D2. If you want to develop generic desktop applications in a practical way, and you care of Windows only, C# today may be the best language.

But I like D more anyway because it gives me more freedom about memory is used (but not much more, because C# has structs and even unions), it's less Windows-centric, and first of all because helping design a new language in a quite small community of people is fun and teaches me many things.





It has a sophisticated design which leads to a much higher productivity and of course fun when coding in it than in C++.



It's simpler than C++, better designed, less powerful than C++, and it's quite less bug-prone. Those for me are the main advantages over C++.





But at the moment the development time you saved while using it, you have to invest in optimizations.



In the meantime things have changed. On Linux now you can use LDC, that produces binaries that are about as efficient (more or less, usually within 5-15%, sometimes they are faster) than binaries produced by Gcc/G++ from C/C++ code.



So with LDC if you write D1 code like (usign the style of) you write C code, or like you write C++ code, then I think you usually will not have performance problems. Be happy.



In D with LDC there are some performance problems expecially if you use it as Java or Scala, because:

- The GC is primitive still, so if you allocate and deallocate many small objects, like you do usually in Java because of its advanced GC that allows such carefree style of OOP programming; or because of functional-style programming that keeps producing tons of tiny immutables, then your performance will be low or very low.

- If you use D2 as Scala, with many higher-order functions, the code will get slow, because currently D compilers aren't able to perform most of the smart optimizations done by Scala compiler.

- Currently the built-in associative arrays are very strong, because despite being hash-based they never work worse than a balanced search tree (unlike Python dicts, that can degenerate in O(n^2) if the hash values are degenerate. This means D built-in associative arrays are strong against external attacks too), but in normal usage they are slow compared to other implementations, like Python dicts.



But this situation can and will improve (if D will become widespread), because:



- The GC is conservative, because of the lower level nature of D code compared to Java code. But programmers are smart people, and usually some better solution can be found. For example in future D may have a GC made in two parts, one is a moving one, that acts like a Java GC, especially useful in SafeD code, this is the GC used by most of the D code of most programs, and the second part of the GC acts in a conservative way, like the current GC. This is just an idea.

- Some optimizations strategies can be added to the D compiler to inline virtual methods and closures.

- Regarding associative arrays, you can use an efficient templated hash map from some library, and in future the situation is going to improve for the build in associative arrays too (because they will be partially defined in the library, keeping their handy syntax too). (Reply) (Parent) (Thread)

(Anonymous) Subject: Re: is this really a problem? Link: (Link) Time: 2009-07-17 01:53 am (UTC) "For a lot of things C# is better than D2. If you want to develop generic desktop applications in a practical way, and you care of Windows only, C# today may be the best language."



The thing is since mono, C# is also quite nice for the Linux world (i don't want to argue here about the prejudices against it because it originated by Microsoft or the nearly religious open source flame wars around it). And big pluses here are that it has a very good toolchain which is really nice to use, a good ide, a great framework with everything you need, one (!!!) standard library and you can really see a constant progress in it getting more mature. I hope i will see this at some point for D too.







"But I like D more anyway because it gives me more freedom about memory is used (but not much more, because C# has structs and even unions), it's less Windows-centric, and first of all because helping design a new language in a quite small community of people is fun and teaches me many things."



And not to forget that D has a better template system which is also an advantage over C#, and at least on the Linux world, a better performance (at least for the near future). And of course, conversions from C / C++ code are easier done. And it has system programming language like capabilities. So it really has a lot to offer and a greater and different range of application.







"It's simpler than C++, better designed, less powerful than C++, and it's quite less bug-prone. Those for me are the main advantages over C++."



Thats what i meant with higher productivity :-) But the great thing here is, that it's not so much less powerful than C++ (the only things missing are: no preprocessor (could be emulated by templates in a cleaner way), multiple inheritance (could be emulated by templates too) and limited operator overloading if i remember correctly, which are all really good things to be left out). So basically you have all the power of C++ which is something you can not say about Java / C#.









"In the meantime things have changed. On Linux now you can use LDC, that produces binaries that are about as efficient (more or less, usually within 5-15%, sometimes they are faster) than binaries produced by Gcc/G++ from C/C++ code.



So with LDC if you write D1 code like (usign the style of) you write C code, or like you write C++ code, then I think you usually will not have performance problems. Be happy."





Thank you very much for the info. This sounds nice and i really will try it out. But at the moment, as far as i have seen, it's not running with windows yet and it seems to support tango only. I hope that this will change soon.









"- The GC is primitive still, so if you allocate and deallocate many small objects, like you do usually in Java because of its advanced GC that allows such carefree style of OOP programming; or because of functional-style programming that keeps producing tons of tiny immutables, then your performance will be low or very low."



This is really a pity since D2 will offer some functional-style programming facilities. D really needs another garbage collector.







"But this situation can and will improve (if D will become widespread) ..."



Let's really hope for it and try to help.









"Some optimizations strategies can be added to the D compiler to inline virtual methods and closures."



They really have to be added since D is really lacking in this regard.



(Reply) (Parent) (Thread)

leonardo

View: Recent Entries. View: Archive. View: Friends. View: Profile. View: Website (My Website).