Skip to content

Latest commit

 

History

History
123 lines (101 loc) · 6.49 KB

supercomputer_performance.md

File metadata and controls

123 lines (101 loc) · 6.49 KB

Supercomputer performance

There are some common metrics for measuring how powerful a supercomputer is and for comparison between each other. The oldest and most classic one is the number of floating point operations per second (FLOP/s) that the computer can perform.

What is a FLOP/s? Floating point number is the computer representation of a real number. If we do a single calculation involving two real numbers, e.g. 2.1 + 4.3, in one second, that is equal to one floating-point operation per second (1 FLOP/s). The operations considered in this measure are the basic arithmetic operations: addition, subtraction, multiplication, and division. Computers can operate also on integer numbers, and even though integer performance is important for some applications, in most scientific problems the vast majority of arithmetic operations are with real numbers, thus FLOP/s has become the standard measure.

When a computer can execute 1 billion FLOP/s, then we can say its performance is 1 giga FLOP/s (GFLOP/s). Similarly, one trillion (1012) FLOP/s is 1 tera FLOP/s (TFLOP/s), and when a supercomputer can execute 1 quadrillion or 1015 FLOP/s, 1 peta FLOP/s (PFLOP/s).

The pure computing power of a CPU core is determined by the clock speed and the maximum number of floating operations it can perform in one clock cycle. As an example, a CPU core in any given laptop might have a clock speed of 3 GHz, meaning that in a second it can perform 3 billion cycles.

$$ [3]\text{GHz}=[3\times10^{12}]\frac{1}{\text{s}} $$

Futhermore in a cycle this theoretical core can perform 16 floating operations or 16 FLOP's. All together the core will have a peak performance of 48 GFLOP/s.

$$ [3\times10^{12}]\frac{1}{\text{s}} \times[16]\text{FLOP}=[48]\frac{\text{GFLOP}}{\text{s}}$$

This pure computing power is the theoretical peak performance of a CPU core. For a multicore CPU, the theoretical peak performance is the number of CPU cores times the theoretical peak of a single core. Similary, one can calculate the theoretical peak performance of GPUs. Furthermore, the theoretical peak performance of a whole supercomputer is obtained by multiplying the theoretical peak performance of its CPUs and GPUs by the number of them in the system.

As the term theoretical might suggest, this computing power cannot be obtained normally in real calculations. Before a CPU can calculate 2.1 + 4.3, it needs to fetch two numbers from memory, and afterwards it needs to store the result back to memory. This does not happen instantaneously, so in practice computational speed is determined not only by the pure computing power of a CPU, but also by how fast the CPU can access the memory. Different applications have a different ratio of floating operations per memory access. In some cases the same number is used in multiple computations e.g. when calculating both 2.1 + 4.3 and 2.1 + 5.3. If the speed of memory access is the limiting factor (as is the case with most modern computers), applications performing many floating point operations with the same data can achieve larger proportion of the peak performance than applications with fewer operations per data. In supercomputers, a CPU in one node might also need to access data in another node, and thus the speed of communicating data between nodes can also limit the practical performance. Real world applications also need to read and write data to the disk, which means that the speed of I/O (input/output or transfer of data between processors and storage) may also further limit the performance.

TOP500 list

A Benchmark is an application which is used to measure the performance and functionality of a (super)computer. Typically, the runtime of the benchmark application is recorded and used as a metric. With benchmarks, one can compare the performance of different computers.

LINPACK is a common benchmark which measures a system's floating point computing power. LINPACK performs linear algebra operations to solve a system of linear equations, and it achieves typically about 75% of the theoretical peak performance.

TOP500 is a ranking list for supercomputers that collects LINPACK results submitted by organizations that operate a supercomputer. The list is released twice a year and shows the 500 most powerful supercomputers in the world ranked according to their computational power measured by the LINPACK benchmark.

In the first ever TOP500 list, June 1993, the most powerful supercomputer was from USA and had a performance of 60 GFLOP/s. In comparison, in November 2020, the fastest supercomputer (from Japan) had a performance of 440 PFLOP/s, which is almost 7 million times faster than the winner 27 years before. Similarly, the last system (#500) on the list in June 1993 had a power of 0,4 GFLOP/s, while in November 2020 it had a power of 1,3 PFLOP/s, which is about 3 million times faster. Just like normal computers, supercomputers have exhibited a tremendous increase in computational power over these years.

"TOP500 performance" TOP500 list from 1993

To compare and understand this better, modern laptop can have performance of around 300 GFLOP/s, which means that it would have been the number one supercomputer in June 1993. In 2020, the number one supercomputer has the power equivalence of over a million laptops.

The Mahti supercomputer at CSC – IT For Science has a power of 7,5 PFLOP/s, which means it can execute 7.5*10^15 operations per second, corresponding to around 24 000 laptops combined. Even if all people on Earth would do one math operation per second, the combined performance would still be 1 million times lower than that of Mahti.

The LUMI supercomputer will have a theoretical peak performance of 550 PFLOP/s and is expected to be in the top 10 supercomputers of the world when it is installed in 2021.

Performance terminology:

Operations per second Scientific notation Metric prefix Unit
1 000 103 Kilo KFLOP/s
1 000 000 106 Mega MFLOP/s
1 000 000 000 109 Giga GFLOP/s
1 000 000 000 000 1012 Tera TFLOP/s
1 000 000 000 000 000 1015 Peta PFLOP/s
1 000 000 000 000 000 000 1018 Exa EFLOP/s