The ever-growing costs of running AI models in production — known as inference — have long been a bottleneck for companies looking to deploy artificial intelligence at scale. Now, Google Cloud and NVIDIA announce a collaboration that promises to radically change this picture.

New A5X Instances to Halve the Bill — and More

During the Google Cloud Next conference, which took place on April 22–23, 2026, the two tech giants presented what they describe as a new generation of AI infrastructure. The core of the offering consists of A5X bare-metal instances, built on NVIDIA Vera Rubin NVL72 rack-scale systems, according to AI News.

Through close coordination between hardware and software — what is known in the industry as “co-design” — the parties claim that the new architecture can deliver up to ten times lower inference cost per token and ten times higher token throughput per megawatt compared to the previous generation.

Ten times lower inference cost per token is not a marginal improvement — it's a potential restructuring of AI budgets across the entire industry.

It is important to emphasize that Google Cloud has not yet announced concrete hourly prices for the A5X instances. The claims of cost improvements are currently based on the company's own stated performance targets and cannot be independently verified.

NVIDIA and Google Promise 10x Cheaper AI: Norwegian Companies Can Save Millions

Massive Scalability

One of the more striking technical specifications is the system's ability to scale. The A5X instances use NVIDIA ConnectX-9 SuperNICs combined with Google's own Virgo network. This is intended to enable clusters of up to 80,000 NVIDIA Rubin GPUs within a single data center, and further up to 960,000 GPUs distributed across multiple locations.

The goal is to handle what is described as agentic AI and physical AI at scale — i.e., AI systems that act autonomously and potentially interact with the physical world.

NVIDIA and Google Promise 10x Cheaper AI: Norwegian Companies Can Save Millions

Competitive Landscape: AWS Responds with Cuts

The A5X launch does not happen in a vacuum. Amazon Web Services has already implemented price reductions on its GPU infrastructure. According to available pricing information, an AWS P5 instance with eight NVIDIA H100 GPUs cost around $60 per hour before summer 2025. After AWS announced reductions of up to 45 percent, the price dropped to approximately $33–$34 per hour. Spot purchases and Savings Plans can, according to market data, push the GPU price down to $1.90–$2.10 per GPU-hour.

AWS also offers its own custom-built chips. Inferentia-based instances are marketed with up to 70 percent lower cost per inference compared to comparable EC2 instances, while Trainium2 is claimed to provide 30–40 percent better price-performance than P5 instances.

10x
Promised cost reduction (token/inference) on A5X
45%
AWS' price reduction on H100 P5 instances (2025)

What Does This Mean for Norwegian Companies?

For Norwegian businesses that are already running or planning to run AI in production, this development is potentially significant. Inference costs — that is, the cost of actually using a pre-trained model — constitute for many companies the largest ongoing AI expense, often surpassing the costs of the training itself.

If Google Cloud's claims hold true in practice, companies in finance, healthcare, energy, and industry — sectors where Norway has major players — could see significantly lower operating costs for AI-based systems. However, since A5X is currently aimed at massive scale, it is primarily the largest players who will have access in the first instance.

Inference costs are the hidden expense in Norwegian AI initiatives — and now prices are being pushed down from multiple fronts.

Until Google Cloud publishes actual prices and independent benchmarks are available, Norwegian IT managers and purchasers should treat the promised performance figures as indications rather than guarantees. The competition between Google and AWS is real, however, and the pressure on prices seems to continue downwards regardless of which platform ends up taking the lead.