All docs/general

docs/architecture/video-processor-split-research.md

Video Processor: Per-Endpoint Resource Configuration Research

Problem

We have two distinct workload profiles hitting the same video processor Cloud Run service:

WorkloadConcurrencyFile SizeResource Needs
Response videos (from video flow)High — many concurrentTypically < 50 MBLower CPU/memory, higher concurrency
Final/asset videos (user uploads)Low — rarely 2 at a timeUp to 500 MBHigher CPU/memory, lower concurrency

Currently, the single video-processor-${environment} service uses one set of resource limits (CPU, memory, concurrency, max instances) for all endpoints. This means we either over-provision for light requests or under-provision for heavy ones.

Can Cloud Run configure resources per endpoint?

No. Cloud Run resource configuration (CPU, memory, concurrency, scaling) is a service-level setting. It applies uniformly to every request regardless of the URL path. There is no native feature for per-route resource configuration.

Options

Option A: Two Cloud Run services + Load Balancer path routing

Deploy two Cloud Run services from the same Docker image with different resource configs. Put a Google Cloud Application Load Balancer in front with a URL map that routes by path using Serverless NEGs (Network Endpoint Groups).

                    +---------------------+
  requests ------> |  Application LB     |
                    |  (URL Map)          |
                    |                     |
                    |  /api/process-video |--> video-processor-responses (high concurrency, 2 CPU, 2Gi)
                    |  /api/hls-encode    |-->        "
                    |  /api/thumbnail     |-->        "
                    |                     |
                    |  /api/process-asset |--> video-processor-assets (low concurrency, 4 CPU, 8Gi)
                    |  /api/rotate-video  |-->        "
                    +---------------------+

Pros:

  • Clean separation, each service scales independently
  • Single external URL for callers
  • Native GCP feature, well-documented

Cons:

  • Adds a load balancer (~$18/month + per-request cost)
  • More terraform complexity (LB, SSL cert, URL map, serverless NEGs, backend services)
  • Slightly higher latency from the extra hop

Option B: Two Cloud Run services, two env vars (simple split)

Skip the load balancer. Deploy two services and have callers use the right URL directly:

  • VIDEO_PROCESSOR_URL -> response processing service
  • ASSET_PROCESSOR_URL -> asset/final video service

Pros:

  • Simplest to implement
  • No extra infra cost
  • Minimal terraform change — reuse existing module twice with different configs
  • Our terraform module already accepts video_processor_config overrides

Cons:

  • Callers need to know which URL to use
  • Two URLs to manage per environment

Option C: Application-level concurrency control (no infra change)

Keep one service but add a semaphore/queue in the Express app that limits concurrent heavy operations:

// Pseudocode
const heavySemaphore = new Semaphore(1); // only 1 asset job at a time
const lightSemaphore = new Semaphore(10); // 10 response jobs at a time

Pros:

  • Zero infra change, works today

Cons:

  • Doesn't solve the memory/CPU problem — still need to provision for the worst case (500 MB asset video), wasting resources on light requests
  • Doesn't help with independent scaling
  • One stuck asset job blocks a slot

Benchmark Data (2026-03-05/06)

Full data in hls-benchmark-data.csv. Key findings:

Memory requirements (HLS encoding, 3 renditions at 1080p)

MemorySmall videos (<34 MB)Large videos (200MB+, 179s)
512 MiOOMOOM
1 GiOOMOOM
2 GiOKOOM
4 GiOKOK

CPU scaling (200MB+ video, 179s duration, 4Gi memory)

CPUFFmpeg TimeSpeedup vs 1 CPURealtime ratio
134.4s1x5.2x
215.2s2.3x11.8x
48.0s4.3x22.4x

CPU scaling (small videos, 2Gi memory)

CPUAvg FFmpeg Time (6s video)Realtime ratio
1~50s~10x
2~25s~4x

Other observations

  • CPU scaling is nearly linear — 2x CPU gives ~2.3x speedup
  • FFmpeg is 95%+ of total processing time
  • Download from Vercel Blob: <0.5s even for 200MB
  • Upload to R2: <1s consistently
  • Cold start with startup probe: ~8s
  • ffprobe latency: ~1.8s cold, ~0.2s warm

Suggested configs (Option B)

ServiceCPUMemoryConcurrencyMax instancesRationale
Response videos22Gi120Handles <50MB videos, good speed/cost balance
Asset/final videos2–44Gi15Must handle 200MB+, lower concurrency needed

Recommendation

Option B is the pragmatic choice. The terraform module already supports config overrides via video_processor_config, so instantiating it twice with different configs is straightforward. The caller-side change is small since the video processing client already uses an env var for the URL.

Option A makes sense later if we want a single public endpoint or need more sophisticated traffic management, but it adds meaningful complexity for the current scale.

References