XYK

Serverless video trimmer & thumbnailer on GCP Functions

June 15th, 2019

Projects

A lot of services exist to dynamically crop, transform, and watermark images on-demand. Thumbor, Imgix, and Cloudinary are some of the more popular ones. These can be very useful for asset management and design, as images don’t need to be pre-processed into multiple formats, dimensions, and crops. A single master or mezzanine file is kept, and all output renditions are generated dynamically and cached.

Such a service doesn’t quite exist yet for video-based content, so I worked on a serverless function that can generate user-specified video clips and thumbnails. A single, long video file kept in a cloud bucket can be trimmed to arbitrary length on-demand, with latencies under a few seconds. Results are cached and subsequent requests are returned from that cache.

The byte-range request method of querying the mezzanine video file puts some requirements on the source, namely that the moov atom must be placed at the front. This helps save a lot of time scanning through the file to the desired trim timestamp. Ffmpeg’s default HTTP input protocol will automatically make correct byte-range requests to the source, only requesting the parts that are desired as specified by the parameters.

Usage

The general format of the URL is https://FUNCTION_URL/{operation}/{parameters}/{source_filename}.

The operation is either a video trim or a thumbnail.

Allowed parameters are fast, width, height, start and end:

  • fast (optional) does a straight codec copy (-c copy in ffmpeg), avoiding a slow transcoding step
  • width and height (optional) specifies output width and height in pixels. For thumbnails, a percentage can be specified.
  • start and end (optional) are the start and finish timestamps in seconds.

Example live URLs

These examples are on a source video of a timer counting from 00:00.

Return a thumbnail 200px wide at 50 seconds: https://asia-northeast1-personal-projects-225512.cloudfunctions.net/video-segmentation/thumbnail/start:50,width:200/5_minute_timer.mp4

Return a thumbnail at 50% duration (original video size): https://asia-northeast1-personal-projects-225512.cloudfunctions.net/video-segmentation/thumbnail/start:50%/5_minute_timer.mp4

Return a video clip from 10.5 seconds to 20.5 seconds, compressed to 240p: https://asia-northeast1-personal-projects-225512.cloudfunctions.net/video-segmentation/trim/start:10.5,end:20.5,height:240/5_minute_timer.mp4

Return a video clip from 10.5 seconds to 20.5 seconds (original video size): https://asia-northeast1-personal-projects-225512.cloudfunctions.net/video-segmentation/trim/start:10.5,end:20.5,fast/5_minute_timer.mp4

Advantages & disadvantages

The pseudo-streaming / byte-range approach to the input file has several key advantages:

  • Not limited by Lambda / GCF limitations on storage capacity, which for longer videos can be easy to hit
  • Trimming LONG videos (>5GB) is much faster, as the entire original file does not need to be downloaded first, only the relevant section
  • The video is being transcoded as it is being downloaded

Of course, serverless also comes with its own advantages:

  • Massive scalability & concurrency
  • No need to provision servers
  • Reduced costs, because functions run only when requested

This approach also comes with disadvantages (some pretty big):

  • The first uncached request will take some time before a response is generated
  • Not all video formats support pseudo/progressive streaming
  • Requires generation of signed URLs as it relies on HTTP byte range requests
  • Transcoding can be slow, as even the largest GCF configurations are very weak for video processing (max 2 vCPU)
  • GCF as of this time does not support caching with a CDN layer in front natively, so a regional GCS bucket is used instead as a asset store
  • 301/302/307 replies for existing cached assets is another set of handshakes for the client, slightly increasing load times

Possible improvements

  • Switch from HTTP triggers to a pub/sub model
  • Put behind Firebase Hosting and use it as a CDN
  • Improve cold start times
  • Improve ffmpeg start up times with a barebones static build
  • More video options such as watermarking, smart cropping
  • Improved HTTP byte range request behavior

Technologies used

  • python
  • Google Cloud Datastore
  • Google Cloud Storage
  • Google Cloud Functions
  • ffmpeg
  • ffprobe
  • Flask
  • terraform

Deployment

  • From gcloud command line: gcloud functions deploy serverless-ffmpeg-segmentation --runtime python37 --region=asia-northeast1 --trigger-http --entry-point=trim --memory=2048MB
  • Using terraform: run terraform init then terraform apply from the terraform folder. Will set up artifact bucket and automatically zip and create a function video-segmentation.

Notes

I ran into issues with the static build of ffmpeg in that it errors with a segfault if doing a overlay filter with a seeked HTTP streaming input. I believe it is related to the timestamp being negative after seeking on a pseudo-streaming input, but more testing is needed. No issues with the build installed through apt.

Github

Project is on Github.