2025-06-17 00:46:44

某组织 (SGLang) now achieves 7,583 tokens per second per GPU running 某AI模型 R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

A0,17%

OVER0,74%

TOKEN-8,23%

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

22 Likes