Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

Last updated: Recently

Model Type

Text

Max Context

Max Output

Parameter Scale

Llama 3.1 Nemotron Ultra 253B v1 Reasoning by NVIDIA: Text model; TTFT 0.721s, 42.5 tok/s.

Input Modality

Output Modality

Inference Speed

42.502 tokens/s

Latest Release Date

4/7/2025

SDK Ecosystem

artificial-analysismanufactureraa-bootstrap

Model Overview Fields

name

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

Release date

4/7/2025

Performance

42.502 tokens/s

model_creator.name

NVIDIA

AA Evaluation Scores

AA Intelligence Index

AA Coding Index

13.1

AA Math Index

63.7

MMLU Pro

0.825

GPQA

0.728

HLE

0.081

LiveCodeBench

0.641

SciCode

0.347

Math-500

0.952

AIME

0.747

AIME 2025

0.637

IFBench

0.382

LCR

0.073

TerminalBench Hard

0.023

TAU2

0.114

Output Speed (tokens/s)

42.502 tok/s

TTFT (s)

0.721s

First Answer Token (s)

47.778s

NVIDIA

◎

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

$0.60/M$1.80/M

Llama 3.1 Nemotron Ultra 253B v1 Reasoning by NVIDIA: Text model; TTFT 0.721s, 42.5 tok/s.

There are no reviews for this model yet.

artificial-analysismanufactureraa-bootstrap

0.0