📸 Photo Studio

A self-hosted photo & motion API: AI background removal, face-aware compositing, a full raster-edit pipeline, and reel-native video effects — one Docker container.

FastAPI + Docker rembg · 5 cutout models smart compositing resize · crop · colour · filters 6 motion effects → MP4 CPU-only, runs on the scraper ● live · foto-api.autonomousmedia.io

01What it does

Five capability groups behind a small HTTP API. Everything runs on CPU and is built for batch jobs on the scraper server. iPhone HEIC inputs are accepted everywhere.

✂️ Cutout

AI background removal via rembg. Five models from fast people-segmentation to BiRefNet portrait matting with clean hair edges. Returns a transparent PNG.

🪄 Compose

Drop one or more subjects onto a new background. Face/profile detection auto-decides side, scale and gaze — or place manually, or let a vision model decide.

🎛️ Edit

An ordered pipeline of raster ops: resize, crop, rotate, brightness/contrast/saturation/sharpness, blur, grayscale, sepia, autocontrast, vignette, borders, format convert.

🎬 Motion

Six reel-native effects rendered to MP4: Ken Burns, parallax, kinetic captions, karaoke subtitles, count-up data, whip/glitch transitions — for a stills + TTS news pipeline.

🗞️ Styles

Three editorial "news illustration" looks rendered from a photo + its cutout mask: torn-paper split-face, red sticker cutout, and neon money glow — ready for social posts.

02Background removal

One call strips the background and returns a transparent PNG. BiRefNet keeps fine hair and edge detail clean.

original
Original photo
cutout
Cutout (BiRefNet portrait) — transparent background
# model: u2net_human_seg (default, fast) · birefnet-portrait (best edges)
curl -F file=@photo.jpg "https://foto-api.autonomousmedia.io/v1/cutout?model=birefnet-portrait" -o cutout.png

03Smart compositing

Place a cut-out subject onto a new scene. In auto mode, face & profile detection picks the side, scale, and which way the subject should face — so the gaze leads into the frame and the scene's focal point stays visible.

Auto mode — zero config

subject
Subject
background
Background
composited
Auto result — placed front-left, facing the path & water
curl -F background=@path.jpg -F subjects=@man.jpg \
     -F 'params={"mode":"auto","cutout_model":"birefnet-portrait"}' \
     https://foto-api.autonomousmedia.io/v1/compose -o out.jpg

Multiple subjects & manual control

Pass several subjects in one call. Manual mode gives per-subject control of scale, position, facing and draw order (depth).

two-person composite
Two subjects composited onto one scene (manual placement, draw order = depth)
curl -F background=@path.jpg -F subjects=@man.jpg -F subjects=@woman.jpg \
     -F 'params={"mode":"manual","cutout_model":"birefnet-portrait","items":[
        {"scale":0.86,"target_frac":0.26,"draw_order":1,"warm":1.01},
        {"scale":0.60,"target_frac":0.60,"base_frac":0.99,"draw_order":0}
     ]}' \
     https://foto-api.autonomousmedia.io/v1/compose -o combo.jpg

Three placement modes

ModeHow placement is decidedCost
autoFace + profile detection (OpenCV). Picks side, scale, gaze. Default.Free, fast, scales infinitely
manualYou supply explicit per-subject params (items).Free
llmA vision model looks at the scene and decides placement. Opt-in.Tokens per image — needs ANTHROPIC_API_KEY
Manual items fields: scale (0–1 of frame height) · target_frac (0–1, where the face sits horizontally) · base_frac (0–1, vertical position of the subject's feet/bottom — lower value = further back) · facing · flip · draw_order (lower = behind) · feather · warm (<1 cooler, >1 warmer) · color_match (0–0.4, pull toward scene colour).

04General edits

Send an ordered list of operations; they apply in sequence. Good for thumbnails, normalisation, watermarks-free filters, format conversion.

original
grayscale
sepia
autocontrast + saturation + vignette
curl -F file=@photo.jpg \
     -F 'ops=[{"op":"resize","max":1600},{"op":"autocontrast"},{"op":"saturation","factor":1.3},{"op":"vignette","strength":0.45}]' \
     -F output_format=jpeg -F quality=90 \
     https://foto-api.autonomousmedia.io/v1/edit -o edited.jpg

Available operations

opparams
resizew / h / max (bounds long edge, keeps aspect)
cropbox=[l,t,r,b] or aspect="16:9" (centre crop)
rotate / flipdeg · axis="h"|"v"
brightness · contrast · saturation · sharpnessfactor (1.0 = unchanged)
blurradius
grayscale · sepia · invert · equalize
autocontrastcutoff (percent clipped, default 1)
posterizebits (1–8)
vignettestrength (0–1)
bordersize, color="#rrggbb"
convertformat="jpeg"|"png"|"webp"

05Motion / reels

Six reel-native effects rendered server-side to 9:16 MP4 — built for a stills + TTS news pipeline (no After Effects, no manual editing). Each is a pure function of time → frame; the same model maps 1:1 to a Remotion useCurrentFrame() setup if you move rendering to Node later. The clips below are produced by the service.

Ken Burnsphoto → motion · slow zoom + drift on a still
Parallaxreuses the cutout · subject moves faster than the background
Kinetic captionshighest ROI · word-pop with spring overshoot
Karaoke subtitlesTTS word-sync highlight
Count-up datanumber ticks up + bars grow in
Whip / glitchfast slide + blur spike + RGB split
Two effects depend on word-level timing. Kinetic captions and karaoke land best when driven off per-word timestamps — pass words:[{"w":"Räntan","t":0.0},…] from your ElevenLabs with_timestamps response (or WhisperX). Without timings they fall back to an even beat.
POST/v1/motion/{effect}
fieldtypenotes
effectpathken_burns · parallax · captions · karaoke · countup · transition
imagefilerequired for ken_burns/parallax; optional background for the others
image2filesecond scene for transition (optional)
paramsform (JSON)common: w, h, fps, duration · plus per-effect (below)

video/mp4 (H.264, 9:16 by default).

Per-effect params

effectparams
ken_burnszoom (1.16) · pan ("up-left"…) · kicker · headline
parallaxcutout_model · kicker · headline
captionswords (list) · highlight (index) · eyebrow · beat (s)
karaokewords (list or [{w,t}]) · gap (s) · lead (s)
countuptitle · corner · value · delta · bars (list) · labels (list)
transitionscene_a/scene_b {kicker,headline,tint} — or pass image+image2
# Ken Burns on a still, with a lower-third headline
curl -F image=@photo.jpg \
     -F 'params={"duration":4,"zoom":1.18,"kicker":"Stockholm · 06 juni","headline":"Stadshuset i kvällsljus"}' \
     https://foto-api.autonomousmedia.io/v1/motion/ken_burns -o kenburns.mp4

# Kinetic captions driven by word timings from TTS
curl -F 'params={"words":["Räntan","sänks","med 0,25","i juni"],"highlight":2}' \
     https://foto-api.autonomousmedia.io/v1/motion/captions -o captions.mp4

# Count-up data card from your pipeline JSON
curl -F 'params={"title":"OMXS30 · stängning","value":2487.6,"delta":1.4,
        "bars":[-0.8,1.2,0.9,-0.4,1.4],"labels":["VOLVO","EVO","SBB","SINCH","ATCO"]}' \
     https://foto-api.autonomousmedia.io/v1/motion/countup -o data.mp4

06Editorial styles

Three "news illustration" looks applied automatically to an ordinary photo. The subject is cut out (rembg) and the art — duotone colour ramps, torn-paper seams, sticker strokes, neon edge glow, halftone, scanlines — is composited around the mask. One photo, three social-ready looks:

source photo
Source photo (input)
split style
split — torn-paper warm/cool split-face on rust · 1080×1080
red style
red — white-outline sticker, red duotone, splatter bg · 1080×1350
money style
money — teal duotone, neon-green glow, $-tile, scanlines · 1600×900
Mask source. The original experiments used an Apple Vision Swift tool for the cutout; here the mask is the alpha from the built-in rembg cutout, so it runs anywhere (and keeps every person in multi-subject shots). Tune via cutout_modelu2net_human_seg (default, fast) or birefnet-portrait (cleaner edges).
POST/v1/style/{style}
fieldtypenotes
stylepathsplit · red · money
filefilethe photo (JPG/PNG/HEIC)
cutout_modelquerydefault u2net_human_seg
qualityqueryJPEG quality, default 92

image/jpeg at the style's native social size.

curl -F file=@photo.jpg "https://foto-api.autonomousmedia.io/v1/style/red" -o red.jpg
curl -F file=@photo.heic "https://foto-api.autonomousmedia.io/v1/style/money?cutout_model=birefnet-portrait" -o money.jpg

08API reference

All endpoints return the resulting media bytes (PNG/JPEG/WebP/MP4) except the JSON discovery routes. Interactive Swagger UI is at /swagger.

Base URL: the examples use the live endpoint https://foto-api.autonomousmedia.io (public, TLS via Let's Encrypt). For local development, swap it for http://localhost:8000.
methodroutereturns
GET/healthJSON — liveness + capabilities
GET/v1/modelsJSON — models, ops, effects, modes
POST/v1/cutoutimage/png (transparent)
POST/v1/composeimage/jpeg (or png/webp)
POST/v1/editimage/jpeg (or png/webp)
POST/v1/motion/{effect}video/mp4 (9:16)
POST/v1/style/{style}image/jpeg (editorial look)
GET/health

Liveness + capability snapshot (models, ops, whether LLM mode is enabled).

GET/v1/models

Lists cutout models, edit ops (with param hints), and available compose modes.

POST/v1/cutout
fieldtypenotes
filefilethe image (multipart)
modelqueryu2net_human_seg (default) · u2net · isnet-general-use · birefnet-portrait · birefnet-general
alpha_mattingquerybool, default true — softer, cleaner edges

image/png with alpha.

POST/v1/compose
fieldtypenotes
backgroundfilethe scene
subjectsfile[]one or more (repeat the field). Raw photos are auto-cut; pre-cut RGBA PNGs are used as-is.
paramsform (JSON)mode, cutout_model, auto_cutout, output_format, quality, items[]

image/jpeg (or PNG/WebP via output_format).

POST/v1/edit
fieldtypenotes
filefilethe image
opsform (JSON)array of {"op":...} objects, applied in order
output_format · qualityformdefault jpeg · 94
POST/v1/motion/{effect}

Renders a 9:16 MP4. Full fields & per-effect params in §05 Motion. → video/mp4.

POST/v1/style/{style}

Applies an editorial look (split/red/money). Details in §06 Editorial styles. → image/jpeg.

Python client

import requests, json
B = "https://foto-api.autonomousmedia.io"

# 1) cut a subject out
png = requests.post(f"{B}/v1/cutout", params={"model":"birefnet-portrait"},
                    files={"file": open("me.jpg","rb")}).content
open("me.png","wb").write(png)

# 2) composite onto a new scene (auto placement)
r = requests.post(f"{B}/v1/compose",
      files=[("background",open("scene.jpg","rb")),
             ("subjects",open("me.jpg","rb"))],
      data={"params": json.dumps({"mode":"auto"})})
open("out.jpg","wb").write(r.content)

09Deploy

Local

docker compose up -d --build
# → http://localhost:8000  (docs at /, Swagger at /swagger)
curl -s localhost:8000/health | jq

Scraper server

Heavy/batch image work belongs on the scraper. Ship the repo to /opt/apps/photo-studio/ and bring it up with Compose.

rsync -az --exclude .git ./ sandenskog@SCRAPER_IP:/opt/apps/photo-studio/
ssh sandenskog@SCRAPER_IP "cd /opt/apps/photo-studio && docker compose up -d --build"
Model storage. The default model is baked into the image; BiRefNet (~1 GB) downloads on first use into the models Docker volume so it persists across restarts. To bake it in too, build with --build-arg PREFETCH_BIREFNET=1.

Enable the vision-LLM placement mode (optional)

# in docker-compose.yml → environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
PLACEMENT_MODEL: claude-haiku-4-5   # cheap default; bump to claude-opus-4-8 for max quality

When set, {"mode":"llm"} becomes available on /v1/compose and /health reports llm_placement:true.

10Notes & limits