Video Processor: Per-Endpoint Resource Configuration Research

Problem

We have two distinct workload profiles hitting the same video processor Cloud Run service:

Workload	Concurrency	File Size	Resource Needs
Response videos (from video flow)	High — many concurrent	Typically < 50 MB	Lower CPU/memory, higher concurrency
Final/asset videos (user uploads)	Low — rarely 2 at a time	Up to 500 MB	Higher CPU/memory, lower concurrency

Currently, the single video-processor-${environment} service uses one set of resource limits (CPU, memory, concurrency, max instances) for all endpoints. This means we either over-provision for light requests or under-provision for heavy ones.

Can Cloud Run configure resources per endpoint?

No. Cloud Run resource configuration (CPU, memory, concurrency, scaling) is a service-level setting. It applies uniformly to every request regardless of the URL path. There is no native feature for per-route resource configuration.

Options

Option A: Two Cloud Run services + Load Balancer path routing

Deploy two Cloud Run services from the same Docker image with different resource configs. Put a Google Cloud Application Load Balancer in front with a URL map that routes by path using Serverless NEGs (Network Endpoint Groups).

                    +---------------------+
  requests ------> |  Application LB     |
                    |  (URL Map)          |
                    |                     |
                    |  /api/process-video |--> video-processor-responses (high concurrency, 2 CPU, 2Gi)
                    |  /api/hls-encode    |-->        "
                    |  /api/thumbnail     |-->        "
                    |                     |
                    |  /api/process-asset |--> video-processor-assets (low concurrency, 4 CPU, 8Gi)
                    |  /api/rotate-video  |-->        "
                    +---------------------+

Pros:

Clean separation, each service scales independently
Single external URL for callers
Native GCP feature, well-documented

Cons:

Adds a load balancer (~$18/month + per-request cost)
More terraform complexity (LB, SSL cert, URL map, serverless NEGs, backend services)
Slightly higher latency from the extra hop

Option B: Two Cloud Run services, two env vars (simple split)

Skip the load balancer. Deploy two services and have callers use the right URL directly:

VIDEO_PROCESSOR_URL -> response processing service
ASSET_PROCESSOR_URL -> asset/final video service

Pros:

Simplest to implement
No extra infra cost
Minimal terraform change — reuse existing module twice with different configs
Our terraform module already accepts video_processor_config overrides

Cons:

Callers need to know which URL to use
Two URLs to manage per environment

Option C: Application-level concurrency control (no infra change)

Keep one service but add a semaphore/queue in the Express app that limits concurrent heavy operations:

// Pseudocode
const heavySemaphore = new Semaphore(1); // only 1 asset job at a time
const lightSemaphore = new Semaphore(10); // 10 response jobs at a time

Pros:

Zero infra change, works today

Cons:

Doesn't solve the memory/CPU problem — still need to provision for the worst case (500 MB asset video), wasting resources on light requests
Doesn't help with independent scaling
One stuck asset job blocks a slot

Benchmark Data (2026-03-05/06)

Full data in hls-benchmark-data.csv. Key findings:

Memory requirements (HLS encoding, 3 renditions at 1080p)

Memory	Small videos (<34 MB)	Large videos (200MB+, 179s)
512 Mi	OOM	OOM
1 Gi	OOM	OOM
2 Gi	OK	OOM
4 Gi	OK	OK

CPU scaling (200MB+ video, 179s duration, 4Gi memory)

CPU	FFmpeg Time	Speedup vs 1 CPU	Realtime ratio
1	34.4s	1x	5.2x
2	15.2s	2.3x	11.8x
4	8.0s	4.3x	22.4x

CPU scaling (small videos, 2Gi memory)

CPU	Avg FFmpeg Time (6s video)	Realtime ratio
1	~50s	~10x
2	~25s	~4x

Other observations

CPU scaling is nearly linear — 2x CPU gives ~2.3x speedup
FFmpeg is 95%+ of total processing time
Download from Vercel Blob: <0.5s even for 200MB
Upload to R2: <1s consistently
Cold start with startup probe: ~8s
ffprobe latency: ~1.8s cold, ~0.2s warm

Suggested configs (Option B)

Service	CPU	Memory	Concurrency	Max instances	Rationale
Response videos	2	2Gi	1	20	Handles <50MB videos, good speed/cost balance
Asset/final videos	2–4	4Gi	1	5	Must handle 200MB+, lower concurrency needed

Recommendation

Option B is the pragmatic choice. The terraform module already supports config overrides via video_processor_config, so instantiating it twice with different configs is straightforward. The caller-side change is small since the video processing client already uses an env var for the URL.

Option A makes sense later if we want a single public endpoint or need more sophisticated traffic management, but it adds meaningful complexity for the current scale.

Video Processor: Per-Endpoint Resource Configuration Research

Problem

Can Cloud Run configure resources per endpoint?

Options

Option A: Two Cloud Run services + Load Balancer path routing

Option B: Two Cloud Run services, two env vars (simple split)

Option C: Application-level concurrency control (no infra change)

Benchmark Data (2026-03-05/06)

Memory requirements (HLS encoding, 3 renditions at 1080p)

CPU scaling (200MB+ video, 179s duration, 4Gi memory)

CPU scaling (small videos, 2Gi memory)

Other observations

Suggested configs (Option B)

Recommendation

References