Video Processor: Per-Endpoint Resource Configuration Research
Problem
We have two distinct workload profiles hitting the same video processor Cloud Run service:
| Workload | Concurrency | File Size | Resource Needs |
|---|---|---|---|
| Response videos (from video flow) | High — many concurrent | Typically < 50 MB | Lower CPU/memory, higher concurrency |
| Final/asset videos (user uploads) | Low — rarely 2 at a time | Up to 500 MB | Higher CPU/memory, lower concurrency |
Currently, the single video-processor-${environment} service uses one set of resource limits (CPU, memory, concurrency, max instances) for all endpoints. This means we either over-provision for light requests or under-provision for heavy ones.
Can Cloud Run configure resources per endpoint?
No. Cloud Run resource configuration (CPU, memory, concurrency, scaling) is a service-level setting. It applies uniformly to every request regardless of the URL path. There is no native feature for per-route resource configuration.
Options
Option A: Two Cloud Run services + Load Balancer path routing
Deploy two Cloud Run services from the same Docker image with different resource configs. Put a Google Cloud Application Load Balancer in front with a URL map that routes by path using Serverless NEGs (Network Endpoint Groups).
+---------------------+
requests ------> | Application LB |
| (URL Map) |
| |
| /api/process-video |--> video-processor-responses (high concurrency, 2 CPU, 2Gi)
| /api/hls-encode |--> "
| /api/thumbnail |--> "
| |
| /api/process-asset |--> video-processor-assets (low concurrency, 4 CPU, 8Gi)
| /api/rotate-video |--> "
+---------------------+
Pros:
- Clean separation, each service scales independently
- Single external URL for callers
- Native GCP feature, well-documented
Cons:
- Adds a load balancer (~$18/month + per-request cost)
- More terraform complexity (LB, SSL cert, URL map, serverless NEGs, backend services)
- Slightly higher latency from the extra hop
Option B: Two Cloud Run services, two env vars (simple split)
Skip the load balancer. Deploy two services and have callers use the right URL directly:
VIDEO_PROCESSOR_URL-> response processing serviceASSET_PROCESSOR_URL-> asset/final video service
Pros:
- Simplest to implement
- No extra infra cost
- Minimal terraform change — reuse existing module twice with different configs
- Our terraform module already accepts
video_processor_configoverrides
Cons:
- Callers need to know which URL to use
- Two URLs to manage per environment
Option C: Application-level concurrency control (no infra change)
Keep one service but add a semaphore/queue in the Express app that limits concurrent heavy operations:
// Pseudocode
const heavySemaphore = new Semaphore(1); // only 1 asset job at a time
const lightSemaphore = new Semaphore(10); // 10 response jobs at a time
Pros:
- Zero infra change, works today
Cons:
- Doesn't solve the memory/CPU problem — still need to provision for the worst case (500 MB asset video), wasting resources on light requests
- Doesn't help with independent scaling
- One stuck asset job blocks a slot
Benchmark Data (2026-03-05/06)
Full data in hls-benchmark-data.csv. Key findings:
Memory requirements (HLS encoding, 3 renditions at 1080p)
| Memory | Small videos (<34 MB) | Large videos (200MB+, 179s) |
|---|---|---|
| 512 Mi | OOM | OOM |
| 1 Gi | OOM | OOM |
| 2 Gi | OK | OOM |
| 4 Gi | OK | OK |
CPU scaling (200MB+ video, 179s duration, 4Gi memory)
| CPU | FFmpeg Time | Speedup vs 1 CPU | Realtime ratio |
|---|---|---|---|
| 1 | 34.4s | 1x | 5.2x |
| 2 | 15.2s | 2.3x | 11.8x |
| 4 | 8.0s | 4.3x | 22.4x |
CPU scaling (small videos, 2Gi memory)
| CPU | Avg FFmpeg Time (6s video) | Realtime ratio |
|---|---|---|
| 1 | ~50s | ~10x |
| 2 | ~25s | ~4x |
Other observations
- CPU scaling is nearly linear — 2x CPU gives ~2.3x speedup
- FFmpeg is 95%+ of total processing time
- Download from Vercel Blob: <0.5s even for 200MB
- Upload to R2: <1s consistently
- Cold start with startup probe: ~8s
- ffprobe latency: ~1.8s cold, ~0.2s warm
Suggested configs (Option B)
| Service | CPU | Memory | Concurrency | Max instances | Rationale |
|---|---|---|---|---|---|
| Response videos | 2 | 2Gi | 1 | 20 | Handles <50MB videos, good speed/cost balance |
| Asset/final videos | 2–4 | 4Gi | 1 | 5 | Must handle 200MB+, lower concurrency needed |
Recommendation
Option B is the pragmatic choice. The terraform module already supports config overrides via video_processor_config, so instantiating it twice with different configs is straightforward. The caller-side change is small since the video processing client already uses an env var for the URL.
Option A makes sense later if we want a single public endpoint or need more sophisticated traffic management, but it adds meaningful complexity for the current scale.