Skip to main content

July 2026

July 1, 2026

New Release Deploy Pods with private AWS ECR images - BETA

New tutorial covering how to pull container images from private AWS ECR repositories into Runpod Pods using cross-account IAM delegation. Includes configuring ECR repository policies, adding ECR credentials in the Runpod console, and deploying a Pod with a private image, without managing credentials directly. Read the tutorial

Breaking Lifecycle operations are now CLI-only

Flash SDK methods for endpoint and app lifecycle operations—deploy, undeploy, update, and creating or deleting apps and environments—now raise a FlashUsageError that points to the equivalent flash command. Run these operations through the Flash CLI instead, which keeps the build and manifest pipeline and local state tracking consistent.

New Release High-Performance Network Volumes now available

You can now attach high-performance network volumes to Pods, Serverless endpoints, and Instant Clusters for significantly faster model load times. Look for the purple diamond icon to identify compatible datacenters.

New Release Deploy When Available

You can now request a GPU that’s currently out of capacity and get notified by email when it becomes available. Runpod saves your pod configuration so you can deploy immediately when capacity returns.

Improvement Hub navigation consolidated

Hub navigation items are now consolidated into a single unified entry, making it easier to find templates and repos.

Bug Fix Billing records now show correct data for deleted resources

SKU, region, and creation timestamps now appear correctly in billing views and exports for deleted Pods and network volumes.

New Release Async Jobs for Serverless

You can now submit a job to a Serverless endpoint and retrieve the result asynchronously when capacity is available. Jobs queue and process automatically when a worker is free, with no always-on workers or polling loops required.

New Release Serverless Worker Fitness Checks

Serverless workers now run automated health checks before accepting jobs. Runpod automatically removes unhealthy workers from rotation, reducing failed requests and improving endpoint reliability.

New Release 24GB MiG instances now available

You can now partition H100 and RTX PRO 6000 GPUs into up to seven independent 24GB MiG instances, giving you more granular, lower-cost access without reserving a full card.

New Release Cost Centers now generally available

Cost Centers let teams allocate and track GPU spend by project, team, or business unit. Detailed cost breakdowns are now available in billing, and all users receive itemized invoices as of May 1.

Improvement New Pod deploy flow with workload-first GPU selection

The Pod deployment experience has been redesigned. Instead of picking a GPU first, you now choose a template or workload type and get recommended GPUs ranked as recommended, compatible, or incompatible. The new flow includes Save as Template, AI-assisted GPU selection, and a Notify Me When Available option for out-of-capacity cards.

New Release Flash is now generally available

Flash is now generally available. You can run Python functions on cloud GPUs with a single @Endpoint decorator, with no containers or infrastructure setup required. Workers scale automatically, dependencies install on remote workers, and you can deploy production APIs with flash deploy.

New Release Instant Cluster Expansion and Priority FlashBoot now live

Instant Clusters can now expand to more nodes faster. Priority FlashBoot reduces cold-start times for cluster workers. Both features are live with no configuration changes needed. Expanding an existing cluster is currently only available to Runpod admins. To add nodes to an existing cluster, reach out to the Runpod team.

New Release FlashBoot for CPU Serverless now in public beta

CPU Serverless workers now support FlashBoot, dramatically reducing cold-start times for your CPU endpoints. GA is planned for later this quarter.

Improvement GPU price reductions across popular SKUs

GPU prices have been reduced across a range of SKUs, lowering the cost of your training and inference workloads. Updated pricing is reflected in the console and pricing page.

Bug Fix Serverless GPU exclusions now correctly respected

GPU type exclusions set on Serverless endpoints were not being enforced, causing workloads to land on excluded GPU types and resulting in incorrect billing. The issue is now fixed, and new alerting has been added to detect recurrence.
We’ve updated our release notes format for easier navigation. Updates from April 2026 onwards are listed above. Browse earlier releases by year and month in the archive below.

Flash beta: Run Python functions on cloud GPUs

Flash is now in public beta. Flash is a Python SDK that lets you run functions on Runpod Serverless GPUs with a single decorator:
from runpod_flash import Endpoint, GpuType

@Endpoint(
    name="hello-gpu", 
    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
    dependencies=["torch"]
) 
async def hello():  # This function runs on Runpod
    import torch
    gpu_name = torch.cuda.get_device_name(0)
    print(f"Hello from your GPU! ({gpu_name})")
    return {"gpu": gpu_name}

asyncio.run(hello())
print("Done!") # This runs locally
Key features:
  • Remote execution: Mark functions with @Endpoint to run on GPUs/CPUs automatically.
  • Auto-scaling: Workers scale from 0 to N based on demand.
  • Dependency management: Packages install automatically on remote workers.
  • Two patterns: Queue-based endpoints for batch work, load-balanced endpoints for REST APIs
  • Flash apps: Build production-ready APIs with flash init, flash dev, and flash deploy
Get started:

Overview

Learn more about Flash.

Quickstart

Run your first GPU workload in 5 minutes.

Create endpoints

Learn queue-based and load-balanced patterns.

Flash CLI

Development and deployment commands.

Flash: Multi-datacenter deployments

Flash now supports deploying endpoints to multiple datacenters simultaneously. Pass a list of datacenters to distribute your workload across regions for improved availability and reduced latency. You can also attach network volumes per datacenter for region-specific data access.

New Public Endpoints and expanded examples

New Public Endpoints: Expansion of available models across all categories.New integrations and guides:
  • Vercel AI SDK integration: New @runpod/ai-sdk-provider package for TypeScript projects with streaming, text generation, and image generation support.
  • AI coding tools guide: Configure OpenCode, Cursor, and Cline to use Runpod Public Endpoints as your model provider.
New tutorials:

GitHub release rollback GA and load balancing Serverless repos in beta

  • GitHub release rollback: Roll back your Serverless endpoint to any previous build from the console. Restore an earlier version when you encounter issues without waiting for a new GitHub release.
  • Load balancing Serverless repos (beta): Load balancing endpoints are now available in the Hub. Publish or convert any listing to load balancer type by setting "endpointType": "LB" in your hub.json file, then deploy as a Serverless endpoint or Pod from the Hub page. Maintain a single listing for your model and let users choose their deployment method—autoscaling Serverless or dedicated Pod resources.