KCORES Graphics Card Memory Leaderboard for LLMs Inference
This evaluation focuses on the memory bandwidth of graphics cards and presents a ranking of their bandwidth availability. The memory bandwidth availability in this ranking refers to the capability of graphics cards to provide sufficient bandwidth when running a specific model (here, llama-3.1-70b-instruct-4bit, which requires approximately 48GB of VRAM) using parallel frameworks like mlc-llm to leverage computing power.
KCORES Graphics Card Memory Leaderboard for LLMs Inference
The calculation formula is ( VMEM_Bandwidth * Num ) / LLMs_Size = VMEM_Availability, where VMEM_Bandwidth is the memory bandwidth (GB/s) of the specific graphics card model, Num is the minimum number of graphics cards required to run a 48GB LLM, and LLMs_Size is 48GB.VMEM_Availability is the final availability score.
This formula assumes that each computation requires scanning the memory space that matches the model size. Thus, the overall memory bandwidth divided by the LLM size gives the final throughput for that combination, which can be simply understood as the maximum theoretical number of tokens that this combination of graphics cards can output per second.
NVIDIA GB200 NVL72
x1
12000.00
NVIDIA GB200 Grace Blackwell Superchip
x1
333.33
NVIDIA B200 SXM 192GB
x1
170.83
NVIDIA GeForce RTX 3080 10GB
x8
126.72
NVIDIA GeForce RTX 2080 Ti 11GB
x8
102.67
NVIDIA GeForce RTX 3070 Ti 8GB
x8
101.38
NVIDIA GeForce RTX 3060 Ti GDDR6X 8GB
x8
101.38
NVIDIA Tesla V100 SXM2 16GB (version 2019)
x4
94.17
NVIDIA GeForce RTX 5080 16GB (Preliminary)
x4
85.33
Intel Arc A750 8GB
x8
85.33
Intel Arc A580 8GB
x8
85.33
NVIDIA GRID/DRIVE A100A 32GB
x2
77.92
NVIDIA GeForce RTX 3080 12GB (Ti 12GB)
x4
76.03
NVIDIA Tesla V100 PCIe/SXM2 16GB
x4
74.75
NVIDIA GeForce RTX 3070 8GB
x8
74.67
NVIDIA GeForce RTX 3060 Ti 8GB
x8
74.67
NVIDIA GeForce RTX 2070 8GB
x8
74.67
NVIDIA GeForce RTX 2080 8GB
x8
74.67
NVIDIA GeForce RTX 5090 32GB (Preliminary)
x2
74.67
NVIDIA GeForce RTX 5070 Ti 16GB (Preliminary)
x4
74.67
NVIDIA GeForce RTX 5060 8GB (Preliminary)
x8
74.67
NVIDIA H100 PCIe/SXM5 96GB
x1
70.00
NVIDIA H100 SXM5 80GB
x1
70.00
NVIDIA H800 SXM5 80GB
x1
70.00
NVIDIA Quadro RTX 4000 8GB
x8
69.33
NVIDIA A100 PCIe/SXM4 40GB
x2
65.00
NVIDIA A800 40GB Active Ampere
x2
65.00
NVIDIA GeForce RTX 3080 Ti 20GB
x4
63.36
Intel Arc B570 10GB
x8
63.33
NVIDIA GeForce RTX 4080 SUPER 16GB
x4
61.36
NVIDIA Quadro GP100 16GB
x4
61.02
NVIDIA Tesla P100 SXM2/DGXS 16GB
x4
61.02
NVIDIA GeForce RTX 4080 16GB
x4
59.73
NVIDIA GeForce RTX 4070 Ti SUPER 16GB
x4
56.03
NVIDIA GeForce RTX 5070 12GB (Preliminary)
x4
56.00
NVIDIA TITAN V 12GB
x4
54.28
NVIDIA RTX A4500 Ampere 20GB
x4
53.33
NVIDIA A30X 24GB
x2
50.83
NVIDIA GeForce RTX 3070 Ti 16GB
x4
50.69
NVIDIA GeForce RTX 4060 Ti 8 GB
x8
48.00
NVIDIA Tesla V100S PCIe 32GB
x2
47.08
Intel Arc A770 16GB
x4
46.67
NVIDIA Tesla P100 PCIe 12GB
x4
45.76
NVIDIA TITAN Xp 12GB
x4
45.63
NVIDIA GeForce RTX 4060 8GB
x8
45.33
NVIDIA H100 PCIe/CNX 80GB
x1
42.50
NVIDIA A100/A100X SXM4 80GB
x1
42.50
NVIDIA A800 SXM4 80GB
x1
42.50
NVIDIA H800 PCIe 80GB
x1
42.50
NVIDIA H100 SXM5 64GB
x1
42.08
NVIDIA GeForce RTX 4090 24GB
x2
42.08
NVIDIA GeForce RTX 3090 Ti 24GB
x2
42.08
NVIDIA GeForce RTX 4080 12GB
x4
42.02
NVIDIA GeForce RTX 4070 Ti 12GB
x4
42.02
NVIDIA GeForce RTX 4070 SUPER 12GB
x4
42.02
NVIDIA GeForce RTX 4070 12GB
x4
42.02
NVIDIA Tesla V100 SXM3 32GB
x2
40.88
NVIDIA Quadro P4000 8GB
x8
40.55
NVIDIA A100 PCIe 80GB
x1
40.42
NVIDIA A800 80GB Active Ampere
x1
40.42
NVIDIA GeForce RTX 3060 8GB
x8
40.00
NVIDIA GeForce RTX 3090 24GB
x2
39.01
NVIDIA GRID A100B 48GB
x1
38.96
NVIDIA A30 PCIe 24GB
x2
38.88
Intel Arc B580 12GB
x4
38.00
NVIDIA Tesla V100 PCIe/SXM2/DGXS 32GB
x2
37.42
NVIDIA RTX A4000 Ampere 16GB
x4
37.33
NVIDIA Quadro RTX 5000 16GB
x4
37.33
NVIDIA GeForce RTX 3050 8GB
x8
37.33
NVIDIA Quadro GV100 32GB
x2
36.18
NVIDIA TITAN V CEO Edition 32GB
x2
36.18
NVIDIA L40/L40G 24GB
x2
36.00
NVIDIA GeForce RTX 4080 Mobile 12GB
x4
36.00
NVIDIA Tesla T10 16GB
x4
35.85
Apple MacMini M2 Pro 16GB
x8
33.33
NVIDIA Tesla P4 8GB
x8
32.05
NVIDIA RTX A5500 Ampere 24GB
x2
32.00
NVIDIA RTX A5000 Ampere 24GB
x2
32.00
NVIDIA RTX A1000 Ampere 8GB
x8
32.00
NVIDIA GeForce RTX 3060 12GB
x4
30.00
NVIDIA RTX 4000 Ada 20GB
x4
30.00
NVIDIA Tesla P10 24GB
x2
28.93
NVIDIA Quadro RTX 6000 24GB
x2
28.00
NVIDIA TITAN RTX 24GB
x2
28.00
NVIDIA Tesla M60 16GB
x4
26.73
NVIDIA RTX 4000 SFF Ada 20GB
x4
26.67
NVIDIA T1000 8GB Turing
x8
26.67
NVIDIA Tesla T4/T4G 16GB
x4
26.67
NVIDIA A10M 24GB
x2
25.01
NVIDIA A10/A10G PCIe 24GB
x2
25.01
NVIDIA Tesla M40 12GB
x4
24.03
NVIDIA RTX 5000 Ada 32GB
x2
24.00
NVIDIA RTX A2000 12GB Ampere
x4
24.00
NVIDIA GeForce RTX 4060 Ti 16GB
x4
24.00
Apple MacMini M4 Pro 24GB
x4
22.75
Apple MacBook Pro M4 Max 48GB
x2
22.75
NVIDIA Tesla K80 24GB
x2
20.05
NVIDIA RTX 6000 Ada 48GB
x1
20.00
Apple MacMini M4 16GB
x8
20.00
NVIDIA RTX 2000 Ada 16GB
x4
18.67
NVIDIA Quadro P6000 24GB
x2
18.03
NVIDIA L20 48GB
x1
18.00
NVIDIA L40/L40S 48GB
x1
18.00
NVIDIA RTX 5880 Ada 48GB
x1
18.00
NVIDIA RTX 4500 Ada 24GB
x2
18.00
Apple MacStudio M1 Max 32GB
x2
17.07
Apple MacStudio M1 Ultra 64GB
x1
17.07
Apple MacStudio M1 Ultra 128GB
x1
17.07
Apple MacStudio M2 Max 32GB
x2
17.07
Apple MacStudio M2 Ultra 64GB
x1
17.07
Apple MacStudio M2 Ultra 128GB
x1
17.07
Apple MacStudio M2 Ultra 192GB
x1
17.07
NVIDIA Jetson Orin NX 8GB
x8
17.07
Apple MacBook Pro M3 Max 48GB
x2
17.07
Jetson Orin Nano Super 8GB
x8
17.00
DDR6 12 Channel 8400 512GB
x1
16.80
NVIDIA A16 PCIe 64GB
x1
16.68
NVIDIA A2 16GB
x4
16.68
Apple MacMini M2 16GB
x8
16.67
NVIDIA RTX A6000 Ampere 48GB
x1
16.00
NVIDIA A40 PCIe 48GB
x1
14.50
NVIDIA Tesla P40 24GB
x2
14.46
NVIDIA Quadro RTX 8000 48GB
x1
14.00
NVIDIA Tesla M10 32GB
x2
13.87
NVIDIA L4 24GB
x2
12.50
NVIDIA Tesla M40 24GB
x2
12.02
NVIDIA Jetson Orin Nano 8GB
x8
11.38
NVIDIA Jetson AGX Xavier 16GB
x4
11.38
Apple MacMini M4 Pro 48GB
x2
11.38
Apple MacBook Pro M4 Max 64GB
x1
11.38
Apple MacBook Pro M4 Max 128GB
x1
11.38
Apple MacMini M1 16GB
x8
11.11
NVIDIA Project DIGITS 128GB
x1
10.67
Apple MacMini M4 24GB
x4
10.00
NVIDIA Jetson Xavier NX 8GB
x8
9.95
DDR5 12 Channel 4800 512GB
x1
9.38
Apple MacStudio M1 Max 64GB
x1
8.53
Apple MacStudio M2 Max 64GB
x1
8.53
Apple MacStudio M2 Max 96GB
x1
8.53
NVIDIA Jetson AGX Orin 32GB
x2
8.53
NVIDIA Jetson Orin NX 16GB
x4
8.53
Apple MacBook Pro M3 24GB
x4
8.53
Apple MacBook Pro M3 Max 64GB
x1
8.53
Apple MacBook Pro M3 Max 128GB
x1
8.53
Apple MacMini M2 Pro 32GB
x2
8.33
Apple MacMini M2 24GB
x4
8.33
Apple MacBook Pro M3 Pro 36GB
x2
6.40
DDR5 8 Channel 4800 512GB
x1
6.25
NVIDIA Jetson AGX Xavier 32GB
x2
5.69
Apple MacMini M4 Pro 64GB
x1
5.69
Apple MacBook Pro M4 Pro 64GB
x1
5.69
Apple MacMini M4 32GB
x2
5.00
Apple MacBook Pro M4 32GB
x2
5.00
NVIDIA Jetson Xavier NX 16GB
x4
4.98
NVIDIA Jetson AGX Orin 64GB
x1
4.27
DDR4 8 Channel 3200 512GB (EPYC SP3 LGA-4189)
x1
4.17
DDR4 6 Channel 2933 384GB (LGA-3647)
x1
2.85
DDR4 4 Channel 3200 256GB (LGA-2011-3)
x1
1.56
- Considering the need to perform computations using tensor parallelism, which only accepts a number of GPUs that is a power of 2, although a GPU with 8GB of memory theoretically only requires 6 cards to accommodate a 48GB model, it still requires 8 cards to enable tensor parallelism. Therefore, the number of GPUs needed has been rounded up.
- Note that models without tensor cores are marked in gray, indicating that these graphics cards may perform poorly when running LLMs. (You can read more about the advantages of tensor cores here: Tensor Cores ).
- All graphics card data is sourced from techpowerup.
- All MAC devices reserve 8GB as system memory for computation.
- The NVIDIA CMP series graphics cards are not listed. Although these cards are equipped with HBM memory, they have significant performance deficiencies in floating-point calculations. For more info please see: nvidia-cmp-170hx-review.
- Some graphics cards have multiple cores, and here the calculation simply adds together their VRAM capacity and bandwidth. (NVIDIA B200 SXM 192GB has 2 cores, NVIDIA Tesla M60 16GB has 2 cores, NVIDIA Tesla K80 24GB has 2 cores, NVIDIA A16 PCIe 64GB has 4 cores, NVIDIA Tesla M10 32GB has 4 cores)
- Due to the current lack of support for first-generation Tensor Cores in Flash-Attention, these graphics cards are also marked in gray. For more details, see: V100 GPU not supported .
- Some unofficial modified models are not listed, such as the 2070 modified version with 22GB of memory, etc.