This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
某组织 (SGLang) now achieves 7,583 tokens per second per GPU running 某AI模型 R1 on the GB200 NVL72, a 2.7x leap over H100.
We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at