New features, fixes, and improvements for the Runpod platform.
July 2026
July 1, 2026
New Release Deploy Pods with private AWS ECR images - BETA
New tutorial covering how to pull container images from private AWS ECR repositories into Runpod Pods using cross-account IAM delegation. Includes configuring ECR repository policies, adding ECR credentials in the Runpod console, and deploying a Pod with a private image, without managing credentials directly. Read the tutorial
June 2026
Breaking Lifecycle operations are now CLI-only
Flash SDK methods for endpoint and app lifecycle operations—deploy, undeploy, update, and creating or deleting apps and environments—now raise a FlashUsageError that points to the equivalent flash command. Run these operations through the Flash CLI instead, which keeps the build and manifest pipeline and local state tracking consistent.
New Release High-Performance Network Volumes now available
You can now attach high-performance network volumes to Pods, Serverless endpoints, and Instant Clusters for significantly faster model load times. Look for the purple diamond icon to identify compatible datacenters.
New Release Deploy When Available
You can now request a GPU that’s currently out of capacity and get notified by email when it becomes available. Runpod saves your pod configuration so you can deploy immediately when capacity returns.
Improvement Hub navigation consolidated
Hub navigation items are now consolidated into a single unified entry, making it easier to find templates and repos.
Bug Fix Billing records now show correct data for deleted resources
SKU, region, and creation timestamps now appear correctly in billing views and exports for deleted Pods and network volumes.
May 2026
New Release Async Jobs for Serverless
You can now submit a job to a Serverless endpoint and retrieve the result asynchronously when capacity is available. Jobs queue and process automatically when a worker is free, with no always-on workers or polling loops required.
New Release Serverless Worker Fitness Checks
Serverless workers now run automated health checks before accepting jobs. Runpod automatically removes unhealthy workers from rotation, reducing failed requests and improving endpoint reliability.
New Release 24GB MiG instances now available
You can now partition H100 and RTX PRO 6000 GPUs into up to seven independent 24GB MiG instances, giving you more granular, lower-cost access without reserving a full card.
New Release Cost Centers now generally available
Cost Centers let teams allocate and track GPU spend by project, team, or business unit. Detailed cost breakdowns are now available in billing, and all users receive itemized invoices as of May 1.
Improvement New Pod deploy flow with workload-first GPU selection
The Pod deployment experience has been redesigned. Instead of picking a GPU first, you now choose a template or workload type and get recommended GPUs ranked as recommended, compatible, or incompatible. The new flow includes Save as Template, AI-assisted GPU selection, and a Notify Me When Available option for out-of-capacity cards.
April 2026
New Release Flash is now generally available
Flash is now generally available. You can run Python functions on cloud GPUs with a single @Endpoint decorator, with no containers or infrastructure setup required. Workers scale automatically, dependencies install on remote workers, and you can deploy production APIs with flash deploy.
New Release Instant Cluster Expansion and Priority FlashBoot now live
Instant Clusters can now expand to more nodes faster. Priority FlashBoot reduces cold-start times for cluster workers. Both features are live with no configuration changes needed. Expanding an existing cluster is currently only available to Runpod admins. To add nodes to an existing cluster, reach out to the Runpod team.
New Release FlashBoot for CPU Serverless now in public beta
CPU Serverless workers now support FlashBoot, dramatically reducing cold-start times for your CPU endpoints. GA is planned for later this quarter.
Improvement GPU price reductions across popular SKUs
GPU prices have been reduced across a range of SKUs, lowering the cost of your training and inference workloads. Updated pricing is reflected in the console and pricing page.
Bug Fix Serverless GPU exclusions now correctly respected
GPU type exclusions set on Serverless endpoints were not being enforced, causing workloads to land on excluded GPU types and resulting in incorrect billing. The issue is now fixed, and new alerting has been added to detect recurrence.
We’ve updated our release notes format for easier navigation. Updates from April 2026 onwards are listed above. Browse earlier releases by year and month in the archive below.
Flash is now in public beta. Flash is a Python SDK that lets you run functions on Runpod Serverless GPUs with a single decorator:
from runpod_flash import Endpoint, GpuType@Endpoint( name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"]) async def hello(): # This function runs on Runpod import torch gpu_name = torch.cuda.get_device_name(0) print(f"Hello from your GPU! ({gpu_name})") return {"gpu": gpu_name}asyncio.run(hello())print("Done!") # This runs locally
Key features:
Remote execution: Mark functions with @Endpoint to run on GPUs/CPUs automatically.
Auto-scaling: Workers scale from 0 to N based on demand.
Dependency management: Packages install automatically on remote workers.
Two patterns: Queue-based endpoints for batch work, load-balanced endpoints for REST APIs
Flash apps: Build production-ready APIs with flash init, flash dev, and flash deploy
Flash now supports deploying endpoints to multiple datacenters simultaneously. Pass a list of datacenters to distribute your workload across regions for improved availability and reduced latency. You can also attach network volumes per datacenter for region-specific data access.
GitHub release rollback GA and load balancing Serverless repos in beta
GitHub release rollback: Roll back your Serverless endpoint to any previous build from the console. Restore an earlier version when you encounter issues without waiting for a new GitHub release.
Load balancing Serverless repos (beta): Load balancing endpoints are now available in the Hub. Publish or convert any listing to load balancer type by setting "endpointType": "LB" in your hub.json file, then deploy as a Serverless endpoint or Pod from the Hub page. Maintain a single listing for your model and let users choose their deployment method—autoscaling Serverless or dedicated Pod resources.
Pod migration in beta and Serverless development guides
Pod migration (beta): Migrate your Pod to a new machine when your stopped Pod’s GPU is occupied. Provisions a new Pod with the same specifications and automatically transfers your data to an available machine.
New Serverless development guides: We’ve added a comprehensive new set of guides for developing, testing, and debugging Serverless endpoints.
Slurm Clusters GA, cached models in beta, and new Public Endpoints available
Slurm Clusters are now generally available: Deploy production-ready HPC clusters in seconds. These clusters support multi-node performance for distributed training and large-scale simulations with pay-as-you-go billing and no idle costs.
Cached models are now in beta: Eliminate model download times when starting workers. The system places cached models on host machines before workers start, prioritizing hosts with your model already available for instant startup.
Hub revenue sharing launches and Pods UI gets refreshed
Hub revenue share model: Publish to the Runpod Hub and earn credits when others deploy your repo. Earn up to 7% of compute revenue through monthly tiers with credits auto-deposited into your account.
Pods UI updated: Refreshed modern interface for interacting with Runpod Pods.
S3-compatible storage and updated referral program
S3-compatible API for network volumes: Upload and retrieve files from your network volumes without compute using AWS S3 CLI or Boto3. Integrate Runpod storage into any AI pipeline with zero-config ease and object-level control.
Referral program revamp: Updated rewards and tiers with clearer dashboards to track performance.
Port labeling, price drops, Runpod Hub, and Tetra beta test
Port labeling: Name exposed ports in the UI and API to help team members identify services like Jupyter or TensorBoard.
Price drops: Additional price reductions on popular GPU SKUs to lower training and inference costs.
Runpod Hub: A curated catalog of one-click endpoints and templates for deploying community projects without starting from scratch.
Tetra beta test: A Python library for running code on GPU with Runpod. Add a @remote() decorator to functions that need GPU power while the rest of your code runs locally.
Smoother auth and multi-region Serverless with persistent storage
The new and improved Runpod login experience: Streamlined sign-in and team access for faster, more consistent auth flows.
Network volumes added to Serverless: Attach persistent storage to Serverless workers to retain models and artifacts across restarts and speed cold starts through caching.
Serverless region support: Pin or allow specific regions for endpoints to reduce latency and meet data-residency needs.
Serverless API v2: Revised request and response schema with improved error semantics and new endpoints for better control over job lifecycle and observability.
Runpod now offers encrypted volumes: Enable at-rest encryption for persistent volumes with no application changes required using platform-managed keys.