On comparing languages, C++ and Go

So, I recently stumbled upon Karan Misra’s post comparing Go and C++ on a neat little business card raytracer which made the rounds a few days ago.

Performance is a tricky matter. Novice programmers have a tendency to first over-optimize everything, then sometime in their career hear Knuth’s “premature optimization is the root of all evil” and then deride everybody who thinks about performance.

C++ is not my favorite language, but it is the one I spend the most time using. I lot of my working days are spent writing the kind of software where performance matters. That isn’t the case for the large majority of programs or programmers, but there are certain kinds of software (typically embedded and real time) where it’s worth the time to spend a few hours thinking about how to make it run fast. Your web site is probably not it, even if you were just featured on engadget. A LTE modem which should transfer 300Mbit while consuming milliwatts is is a better candidate.

Rob Pike is a brilliant programmer and person, and if you need convincing of that, well – just read his Wikipedia page. But I found his post on why C++ programmers haven’t flocked to go quite a bit off the mark. I certainly don’t use C++ instead of go because I like 200 pages of template error messages. But it offers me something I haven’t found in any other language: expressive enough so not feel completely stuck in the 70s but still enough control that I can predict what the machine will do. Go is not a good answer for those cases with its mandatory GC and lack of access to machine primitives.

Because when you actually do code for performance, in those small bits of code in inner loops where it’s warranted to do so, your priorities change. The language you code in ends up being… less relevant, abstractions fade away and you try to divine communication directly with the hardware that will be running your code. The thinking goes from “how do I express this idea clearly in code?” towards how do I pull and poke at the processor so that it executes this with a full pipeline?”. There’s you, the processor and the compiler in some kind of symbiosis and the various transformations that each do matter more than syntax.

So, after four paragraphs of rambling, let me try to stumble back to where I started: a raytracer. I took a look at it to see what you could do if you wrote it like you write software where performance matters. I didn’t want to spend all day doing it, so I stuck to modifying one single function which I reimplemented using G++ vector extensions and Intels AVX instruction set.

It had the below impact on run time – from top to bottom: Go, Original C++ and my optimized version.

My compiler flags were: c++ -std=c++11 -O2 -g -Wall -pthread -ffast-math -mtune=native -march=native (gcc version 4.6.3)

Prior to my optimization (as reported by perf stat):

       8863,934376 task-clock                #    3,934 CPUs utilized          
             1 213 context-switches          #    0,137 K/sec                  
                 7 cpu-migrations            #    0,001 K/sec                  
               535 page-faults               #    0,060 K/sec                  
    22 063 102 197 cycles                    #    2,489 GHz                     [83,27%]
    16 064 668 982 stalled-cycles-frontend   #   72,81% frontend cycles idle    [83,28%]
     5 227 501 506 stalled-cycles-backend    #   23,69% backend  cycles idle    [66,79%]
    21 652 209 811 instructions              #    0,98  insns per cycle        
                                             #    0,74  stalled cycles per insn [83,38%]
     1 979 364 705 branches                  #  223,305 M/sec                   [83,36%]
        55 751 528 branch-misses             #    2,82% of all branches         [83,34%]

       2,253349546 seconds time elapsed

and afterwards:

       4056,958385 task-clock                #    3,900 CPUs utilized          
               603 context-switches          #    0,149 K/sec                  
                 7 cpu-migrations            #    0,002 K/sec                  
               538 page-faults               #    0,133 K/sec                  
    10 091 407 696 cycles                    #    2,487 GHz                     [83,25%]
     6 897 321 723 stalled-cycles-frontend   #   68,35% frontend cycles idle    [82,97%]
     2 478 649 915 stalled-cycles-backend    #   24,56% backend  cycles idle    [66,89%]
    10 626 263 820 instructions              #    1,05  insns per cycle        
                                             #    0,65  stalled cycles per insn [83,49%]
       896 560 476 branches                  #  220,993 M/sec                   [83,49%]
        53 713 339 branch-misses             #    5,99% of all branches         [83,52%]

       1,040250808 seconds time elapsed

It goes from 22 billion to 10 billion cycles, from 2.2 seconds to 1 second on my not-very-fast-at-all Core i3-2100T. The go version (1.2rc1) takes 5.0 seconds on the same hardware. So the decently optimized c++ version is 2.2x faster than a decently optimized go version. But if you’re willing to really talk to the processor in one single function you can gain an additional 2.2x. And this is before we’ve seriously started to structure the raytracing kernel for performance, I would not be surprised if this was highly optimized production code we’d see another 2-3x. That is the kind of expressiveness that matters, for those few programs where performance matters.

The results for the other image sizes were really quite similar:

Code is on github.

If you’re interested in this kind of silly optimizing-for-favorite-language-until-it-bleeds, you might find Debian’s Computer Language Benchmarks Game fun. And yes, before some eagle eyed commenter notices – it doesn’t work if you have a number of objects not divisible by the number of elements in your SIMD words. But this is a toy and I didn’t want to clutter the code.

Go is a good language, but as Rob found – it offers more to Python and Ruby programmers who gain a good performance gain at little lost expressiveness. But I don’t see it replacing C or C++ as the tool of choice for writing core infrastructure. Unfortunately, because C++ needs replacing. Personally, I’m hoping for Rust.

Future readers might be interested in checking out the Hacker news and /r/programming posts for more detail and discussion.

29 Responses to “On comparing languages, C++ and Go”

  1. Martin says:

    The thing is: how much effort does it takes to do the same app in Go and in C++??
    I think that (replace Go with any other like python, ruby, etc) may be more slow but, you can get results quicker, and then have room for optimizations.

  2. I recommend checking out D if you haven’t already. Definitely my favorite language and a worthy replacement for C++.

  3. henrik says:

    Absolutely. Python is my personal favorite for when resources don’t matter (which is most of the time). But Go actually considers itself a systems programming language, which I think makes it fair to critique performance.

  4. henrik says:

    I’ve followed D from the side lines for quite a while. I think the language itself is great, but I’m not convinced they’ll be able to get enough interest to have the critical mass to build a good compiler and tools. Quite the Catch-22.

  5. steve says:

    You can’t make a reasonable comparison between two languages if you spend much more time optimizing the code for one language and not the other.
    of course there is room for improvement for both the c++ code and the go code.
    I think its more interesting how well the compilers perform given a straight forward idiomatic implementation.

  6. YG says:

    I think you’ve missed other points as well, Rob states very well that the problem they were trying to solve is going from 84 keywords? plus a massive manual on how to use/combine them to drastically lower that number to only 25. I think this what attracts me to Go, there is no bloating and the performance seems to be beating all other existing frameworks except c++ ones (http://www.techempower.com/benchmarks/).

    Interesting post though :)

  7. Rajiv says:

    I couldn’t agree more. I dislike writing C++ though C++11 has made things a bit less torturous. When performance matters, it’s difficult to beat it though. I feel like a language like Go might be a better choice in some applications. With it’s ever improving standard library (c++ still sucks here) and sensible defaults, it’s easier to get an application server running, giving more room for profiling and optimizing. Hence Rob’s observation of Python, Ruby developers flocking to it. I don’t see embedded systems or infrastructure software being written in a garbage collected language anytime soon.
    I like where Rust is going but the threading model seems like Go’s lightweight coroutines. The IO is also managed by the scheduler just like it is in Go. While this makes life easier, it does take away control. Explicit OS threads that can be locked to cores and control over IO through epoll/select/kqueue etc is a requirement for a certain class of applications. People might still end up using C++ for those. I wish that Rust builds it’s light weight threads and pretty blocking io as a library, so that it leaves room for developers who really want to make those decisions. Otherwise I am really psyched about it.

  8. Rajiv says:

    Can you also compare it to the java timings?

  9. Major Asshole says:

    Use the right tool for the job. If you are getting a complex process to work on an 8 bit processor, use C. If you are putting a date on a web page, use PHP. Is this hard?

  10. moriarty says:

    YG, I think the main issue is that Rob wonders why C++ programmers don’t switch to Go. Everyone seems to agree that Go is faster than Python/Ruby/PHP, that’s why those folks can switch to Go.
    On the other hand, if someone really need performance (LTE modem example), they have absolutely no reasons to switch from C++ to Go. If they were in a state where they could spare some CPU cycles, they probably wouldn’t be using C++ in the first place.

  11. Brennan Riddell says:

    You mention, at the end, that ‘C++ needs replacing.’ As an amateur Linux Network programmer, primarily in C++, I am wondering *why* you say this?

  12. CJ says:

    Thank you, thank you, thank you for pointing this out. The whole “Which language is fastest thing” seems pretty ridiculous to people that know about measuring performance and improving it in terms of using the hardware. This is even true in expensive run time systems like Java, as the mechanical sympathy blog written by Martin Thompson never tires of discussing.

  13. shane says:

    Go will never be as optimizable as a non-GC language, but I think that’s fine. I think they’ve made very good performance/convenience compromises. (Managing memory across threads is hell!) The ray-trace article is an anomaly, at best, but for those of us looking for a Java alternative, Go is more than viable.

  14. Francisco says:

    Hi Henrik,

    Thank you for your post. For people that just know the basics of several programming languages this kind of articles are good for having other points of views. Also some comments are very worth of reading. (I added you to my feedly to follow your blog ;-) )

    Personally I know assembler in some microprocessors, C, some C++ and now I’m learning Python, but I think the more abstraction you get from the processor language, the poorer is the performance. Yes it depends on programmer and compiler, but it also depends on how close is the programmer mind to the processor way of work. Isn’t it?

    There is another point, it is when a company or a whole industry sector is asking to work in a programming language, like in electronics, they ask you to program in C and C++ obviously for its performance, also because for the assets that these companies have already on software and way of working. Don’t you think so?

    Kind regards and keep going!
    Francisco

  15. mike says:

    go has excellent interoperability with c-language, quite easy to use, so why not just call c-functions from go for the really critical parts of code? sounds like a marriage made in heaven!

  16. henrik says:

    @mike: Well, two reasons: Go considers itself a systems programming language and aims to be as fast as C (See the performance section of http://golang.org/doc/faq). And I’m really hoping for something able to improve on C and C++ for the performance code niche.

    I think Go is actually turning out to be more used as a faster python/ruby than a safer C. Which is not a bad thing at all! In fact it’s great! But it doesn’t solve my problem of a better tool to write performance sensitive bits in.

  17. henrik says:

    @Brennan: I’ve used C++ almost daily for a decade, and built and shipped products using it for nearly as long. Maybe some piece is just that I would like to see something new? It’s obviously not going anywhere, C++ will be a huge and widely used language for decades to come. All I’m really looking for a better tool for the type of code I write.

  18. Ivan Kovač says:

    I’ve found that post [Pike] rather annoying. I think Go misses on all fronts – it is both less performant and less convenient.

    Much like the C programmer looks at it and says “but it’s 3x slower”, I say “but it’s 5x less convenient”. It offers the middle road, but apparently not many are enticed by this particular trade-off. It’s C with memory management and built-in concurrency. To the first camp, the second part is the problem, to the second camp the first bit is the unappealing one – if all you wanted to do is knock out some code super fast, you’d use Python, and if I’d want to make some code run super fast, I’d use C [actually, as a Python programmer, I'd use Cython, but I was using Python as an example there, rather than mandating what should everyone use].

    Even when it comes to concurrency, the one reason I’d think about it, I’d probably still opt for something nicer – most likely Clojure (ha, no article about Python programmers flocking to Lisp, wonder why).

  19. Both Rust and D are poised at being the “next great thing” for systems languages, aiming squarely to compete with C++. I’ve used both for hobby projects and find they do an excellent job offering better abstractions and both feel like they require less code than their C++ counterpart (while performing at similar speeds and memory consumption).

    Personally, I’ve been in favor for D-lang. It’s less opinionated than Rust, and the syntax hasn’t been as dramatically changed.

  20. Tom Maisey says:

    Bravo. Agreed – although Go looks really nice, I’m into realtime audio synthesis, and therefore GC is anathema. I need to know when memory is being allocated and freed, simple as that.

  21. Daniel says:

    I’d just like to point out that, as far as I know, Google has quietly removed any references to Go being a systems language long ago, instead promoting it for web applications. Reminds me of Sun and Java, in this regard…

  22. pjmlp says:

    You are comparing compiler backends, not languages.

    Which ANSI C++ section are the vector instructions defined?

  23. henrik says:

    @pjmlp: I’m comparing programming environments for the use case of “I want it to go fast damnit”, for those few small inner loops where you’re willing to sacrifice portability and clean code for speed. Afaik, Go does offer a way of interacting with assembly functions, but nothing comparable to SIMD intrinsics. Had Go had a decent way of exposing vector instructions, I would have happily done a Go version of the AVX code.

  24. mortdeus says:

    Let’s completely throw out the fact that you can write your performance critical parts in c, and go’s compilers are no where near as mature as GCC/llvm. None of you have taken any considerations into the things that make Go so useful. Things that translate into faster programs in general, rather than faster programs in theory.

    The designers of Go have correctly identified the bottleneck as the programmer when it comes to writing highly concurrent efficient code. For every 20 C++ programmers there are maybe 2-3 competant programmer who write scalable systems that are worth relying on. In my experience working on projects written in Go, code written in Go is more reliable.

    In a world where open source and multicore cpus is becoming the defacto environment we work in as developers, we need a language that is harder to abuse in collaborating on concurrent and scalable system design. Anybody too ignorant to see that need to stop thinking about zeros and ones for a second so they can start looking at the bigger picture.

  25. Daniel says:

    (I stand corrected in that the FAQ still retains a few mentions of Go aiming to be a systems language. The difference in other pages when compared to 2009 is quite telling, though.)

  26. YG says:

    just wanted to add this link here as I see it quite relevant to mr.Abelsson work.

    http://benchmarksgame.alioth.debian.org

    @moriarty: I read the article and I still don’t think that’s his original idea. He mentions here (https://www.youtube.com/watch?v=p9VUCp98ay4) clearly that C++ is horrible for being too complex and too expressive. Only few projects need that level of expressiveness and quite often that is not the case.

  27. pjmlp says:

    @henrik

    All your optimizations are not C++:

    1 – They are not defined in ANSI/ISO C++

    2 – They are g++ specific and not even portable across C++ compilers

    The only correct way to compare languages, instead of compiler backends, is without any use of language extensions or compiler specific extensions.

    How would your benchmark fare with another set of compilers?

    Personally I don’t care that much about Go, being on the C++ group Rob Pike talks about. What I care is about rigor from compiler design point of view.

  28. henrik says:

    @pjmlp – Despite appearances, I’m not trying to compare languages – I think that a rather pointless exercise. I was trying to point out how real high performance, well optimized code is written: you often the decision to target specific compilers and machines in certain parts of the code. You do this to exploit features not available in the abstract machine and language models. You’re quite right that you won’t find SIMD intrinsics in the C++ ISO spec, but you do find them in all practical implementations that matter.

    And that is the only language comparison that I think is interesting in this article – C and C++, in practice if not in theory, offers escape hatches to exploit low level features, while languages such Go, Java, C# and friends do not. Which is fine, most programs doesn’t need nor should use them. But I’m coming from the point of view of someone who writes the low level libraries that enables people to do fast image filtering, or MPEG decoding or whatever computationally heavy task which is used often enough to be worth spending time optimizing.

    So again, I’m not trying to compare languages. I’m trying to point out what a practical language/compiler for well optimized code must have: enough control so that you can shove the language out of the way when it impairs performance too much.

Leave a Reply