MLPerf Inference Benchmark Data Download
Quick Start: Copy one of the commands below and run it in your terminal to download the indicated dataset.
Available Downloads
DeepSeek-R1 Benchmark
(click to expand)
DeepSeek-R1-0528 Model
DeepSeek-R1-0528 model for the DeepSeek-R1 benchmark (~689GB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
DeepSeek-R1 Datasets
Full preprocessed dataset and calibration dataset for the DeepSeek-R1 benchmark (~163MB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
Llama 3.1 8b Benchmark
(click to expand)
Full CNN evaluation dataset (Inference Datacenter)
CNN dataset for the Llama 3.1 8b Inference Datacenter benchmark (~267MB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri
5000 samples CNN evaluation dataset (Inference Edge)
Sample CNN dataset for the Llama 3.1 8b Inference Edge benchmark (~101MB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri
CNN-DailyMail calibration dataset
CNN-DailyMail calibration dataset for the Llama 3.1 8b benchmark (~21MB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri
Whisper Benchmark
(click to expand)
Whisper Model
Whisper large-v3 model for the Whisper benchmark (~25GB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/model https://inference.mlcommons-storage.org/metadata/whisper-model.uri
Whisper Dataset
LibriSpeech dataset for the Whisper benchmark (~4.6GB)
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/dataset https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri