NVIDIA Targets About 90% Reduction in Token Generation Costs With Rubin Platform

Published  January 6, 2026   0
User Avatar Abhishek
Author
Nvidia Unveils Rubin AI Platform At CES 2026

At CES 2026, Nvidia CEO Jensen Huang presented Rubin, the company’s first “extreme-codesigned” AI platform. The new platform should reduce token-generation costs by roughly 90% versus its previous platform. This Blackwell successor, now in production, is named after astronomer Vera Rubin.

The platform includes Rubin GPUs capable of 50 petaflops of NVFP4 inference, Vera CPUs designed for data movement and agentic processing, NVLink 6 (scale‑up networking), Spectrum‑X Ethernet Photonics (scale‑out networking), ConnectX‑9 SuperNICs, and BlueField‑4 DPUs. Huang explained that the reason behind codesigning all components is that reaching AI gigascale requires chips, trays, racks, networking, storage, and software to be fully optimized and free of bottlenecks, keeping training and inference costs as low as possible.

Completing the platform is the NVIDIA Inference Context Memory Storage Platform, an AI-native KV-cache tier that delivers 5x improvements in cost efficiency, performance, and power efficiency in long-context inference. “The faster you train AI models, the faster you can get the next frontier out to the world,” said Huang. “This is your time to market. This is technology leadership.”

Add New Comment

Login to Comment Sign in with Google Log in with Facebook Sign in with GitHub