• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Scaling next-gen AI would require a shift towards tightly integrated, domain-specific and compute-centric networking with specialized interconnects to support direct memory-to-memory transfers and use dedicated hardware to speed information sharing among processors

August 5, 2025 //  by Finnovate

Fulfilling the promise of AI requires a step-change in capabilities far exceeding the advancements of the internet era. To achieve this, we as an industry must revisit some of the foundations that drove the previous transformation and innovate collectively to rethink the entire technology stack. We are now witnessing a decisive shift towards specialized hardware — including ASICs, GPUs, and tensor processing units (TPUs) — that deliver orders of magnitude improvements in performance per dollar and per watt compared to general-purpose CPUs. This proliferation of domain-specific compute units, optimized for narrower tasks, will be critical to driving the continued rapid advances in AI. These specialized systems will often require “all-to-all” communication, with terabit-per-second bandwidth and nanosecond latencies that approach local memory speeds. To scale gen AI workloads across vast clusters of specialized accelerators, we are seeing the rise of specialized interconnects, such as ICI for TPUs and NVLink for GPUs. These purpose-built networks prioritize direct memory-to-memory transfers and use dedicated hardware to speed information sharing among processors, effectively bypassing the overhead of traditional, layered networking stacks. This move towards tightly integrated, compute-centric networking will be essential to overcoming communication bottlenecks and scaling the next generation of AI efficiently. Traditional fault tolerance relies on redundancy among loosely connected systems to achieve high uptime. ML computing demands a different approach. First, the sheer scale of computation makes over-provisioning too costly. Second, model training is a tightly synchronized process, where a single failure can cascade to thousands of processors. Finally, advanced ML hardware often pushes to the boundary of current technology, potentially leading to higher failure rates. Instead, the emerging strategy involves frequent checkpointing — saving computation state — coupled with real-time monitoring, rapid allocation of spare resources and quick restarts. The underlying hardware and network design must enable swift failure detection and seamless component replacement to maintain performance. While traditional system design focuses on maximum performance per chip, we must shift to an end-to-end design focused on delivered, at-scale performance per watt. This approach is vital because it considers all system components — compute, network, memory, power delivery, cooling and fault tolerance — working together seamlessly to sustain performance.

Read Article

Category: Additional Reading

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.