hn-classics/_stories/2010/13694809.md

12 KiB
Raw Blame History

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2017-02-21T11:28:54.000Z Fooling the masses with performance results on parallel computers (2010) https://blogs.fau.de/hager/archives/5260 jsnell 53 15 1487676534
story
author_jsnell
story_13694809
13694809 2010

Source

Georg Hager's Blog | Fooling the masses with performance results on parallel computers prelude

Georg Hager's Blog

Random thoughts on High Performance Computing

Area menu

Breadcrumb

  • You are here:
  • Home»
  • Fooling the masses with performance results on parallel computers prelude

Kurzinfo

Pages
Categories
Tags

benchmarking C++ ccNUMA Compilers energy Fooling the masses GPU LIKWID MPI OpenMP Optimization performance modeling SC TBB

Friedrich-Alexander-Universität Erlangen-Nürnberg

FAU-Blogdienst

Content

Fooling the masses with performance results on parallel computers prelude

April 30, 2010

In 1991, David H. Bailey published his insightful “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers.” In that humorous article, Bailey pinpointed typical “evade and disguise” techniques for presenting mediocre performance results in the best possible light. These are the original 12 ways:

  1. Quote only 32-bit performance results, not 64-bit results.
  2. Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application.
  3. Quietly employ assembly code and other low-level language constructs.
  4. Scale up the problem size with the number of processors, but omit any mention of this fact.
  5. Quote performance results projected to a full system.
  6. Compare your results against scalar, unoptimized code on Crays.
  7. When direct run time comparisons are required, compare with an old code on an obsolete system.
  8. If MFLOPS rates must be quoted, base the operation count on the parallel implementation, not on the best sequential implementation.
  9. Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar.
  10. Mutilate the algorithm used in the parallel implementation to match the architecture.
  11. Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment.
  12. If all else fails, show pretty pictures and animated videos, and dont talk about performance.

There are further explanations in the original paper for each item.

After two decades, its high time for an update. In 1991 the supercomputing landscape was governed by the “chicken vs. oxen” debate: The famous question “If you were plowing a field, which would you rather use?… Two strong oxen or 1024 chickens?” is attributed to Seymour Cray who couldnt have said it better. Crays machines were certainly dominating in the oxen department, but competition from massively parallel systems like the Connection Machine was building up. At that time, users were much more used to dive into system-specific optimizations — with no MPI and OpenMP standards, portability of parallel programs was pretty much restricted to a certain vendor. And the use of double precision floating point was probably not as much a matter of course as it is today.

In the past two decades, hybrid, hierarchical systems, multi-core processors, accelerator technology, and the dominating presence of commodity hardware have reshaped the landscape of High Performance Computing. Its also not so much oxen vs. chickens anymore; ants have received more than their share of hype. However, some things never change. My points (which I prefer to call “stunts”) are derived from Baileys original collection, and some are identical or merely reformulated. Others are new, reflecting todays boundary conditions.

Although these musings are certainly inspired by experience with many publications and talks in HPC, I wish to point out that (i) no offense is intended, (ii) I am not immune to the inherent temptations myself and (iii) this all still just meant to be fun.

This is the list of stunts. It will be extended along the way:

  1. Report speedup instead of absolute performance!
  2. Slow down code execution!
  3. The log scale is your friend!
  4. Quietly employ weak scaling to show off!
  5. Instead of performance, plot absolute runtime versus CPU count!
  6. Ignore affinity and topology issues! 
  7. Be creative when comparing scaled performance!
  8. Impress your audience with awe-inspiring accuracy!
  9. Boast massive speedups with accelerators!
  10. Always emphasize the "interesting" part of your work!
  11. Show data! Plenty. And then some.
  12. Redefine “performance” to suit your needs!
  13. If they get you cornered, blame it all on OS jitter!
  14. Secretly use fancy hardware setups and software tricks!
  15. Play mysterious!
  16. Worship the God of Automation!

 

 

This entry was posted in Fooling the masses and tagged Fooling the masses by Georg Hager.Permanenter Link zum Eintrag

Post navigation

Previous post link

Next post link →

Technical menu