Nvidia proposes “speculative decoding,” which uses a second, smaller model to guess what the main model will output for a given prompt in an attempt to speed it up • DigiBanker

Nvidia announced advances in artificial intelligence software and networking innovations aimed at accelerating AI infrastructure and model deployment. It unveiled Spectrum-XGS, or “giga-scale,” for its Spectrum-X Ethernet switching platform designed for AI workloads. Spectrum-X connects entire clusters within the data center, allowing massive datasets to stream across AI models. Spectrum-XGS extends this by providing orchestration and interconnection between data centers. “We’re introducing this new term, ‘scale across,’” said Dave Salvator, director of accelerated computing products at Nvidia. “These switches are basically purpose built to enable multi-site scale with different data centers able to communicate with each other and essentially act as one gigantic GPU.” Salvator said the system minimizes jitter and latency, the variability in packet arrival times and the delay between sending data and receiving a response. Dynamo is Nvidia’s inference serving framework, which is how models are deployed and process knowledge. Nvidia is also researching “speculative decoding,” which uses a second, smaller model to guess what the main model will output for a given prompt in an attempt to speed it up. “The way that this works is you have what’s called a draft model, which is a smaller model which attempts to sort of essentially generate potential next tokens,” said Salvator. Because the smaller model is faster but less accurate, it can generate multiple guesses for the main model to verify. “And we’ve already seen about a 35% performance gain using these techniques.” According to Salvator, the main AI model does verification in parallel against its learned probability distribution. Only accepted tokens are committed, so rejected tokens are discarded. This keeps latency under 200 milliseconds, which he described as “snappy and interactive.”

Read Article