01What it does
Five capability groups behind a small HTTP API. Everything runs on CPU and is built for batch jobs on the scraper server. iPhone HEIC inputs are accepted everywhere.
✂️ Cutout
AI background removal via rembg. Five models from fast people-segmentation to BiRefNet portrait matting with clean hair edges. Returns a transparent PNG.
🪄 Compose
Drop one or more subjects onto a new background. Face/profile detection auto-decides side, scale and gaze — or place manually, or let a vision model decide.
🎛️ Edit
An ordered pipeline of raster ops: resize, crop, rotate, brightness/contrast/saturation/sharpness, blur, grayscale, sepia, autocontrast, vignette, borders, format convert.
🎬 Motion
Six reel-native effects rendered to MP4: Ken Burns, parallax, kinetic captions, karaoke subtitles, count-up data, whip/glitch transitions — for a stills + TTS news pipeline.
🗞️ Styles
Three editorial "news illustration" looks rendered from a photo + its cutout mask: torn-paper split-face, red sticker cutout, and neon money glow — ready for social posts.
02Background removal
One call strips the background and returns a transparent PNG. BiRefNet keeps fine hair and edge detail clean.


# model: u2net_human_seg (default, fast) · birefnet-portrait (best edges) curl -F file=@photo.jpg "https://foto-api.autonomousmedia.io/v1/cutout?model=birefnet-portrait" -o cutout.png
03Smart compositing
Place a cut-out subject onto a new scene. In auto mode, face & profile detection picks the side, scale, and which way the subject should face — so the gaze leads into the frame and the scene's focal point stays visible.
Auto mode — zero config



curl -F background=@path.jpg -F subjects=@man.jpg \
-F 'params={"mode":"auto","cutout_model":"birefnet-portrait"}' \
https://foto-api.autonomousmedia.io/v1/compose -o out.jpg
Multiple subjects & manual control
Pass several subjects in one call. Manual mode gives per-subject control of scale, position, facing and draw order (depth).

curl -F background=@path.jpg -F subjects=@man.jpg -F subjects=@woman.jpg \
-F 'params={"mode":"manual","cutout_model":"birefnet-portrait","items":[
{"scale":0.86,"target_frac":0.26,"draw_order":1,"warm":1.01},
{"scale":0.60,"target_frac":0.60,"base_frac":0.99,"draw_order":0}
]}' \
https://foto-api.autonomousmedia.io/v1/compose -o combo.jpg
Three placement modes
| Mode | How placement is decided | Cost |
|---|---|---|
auto | Face + profile detection (OpenCV). Picks side, scale, gaze. Default. | Free, fast, scales infinitely |
manual | You supply explicit per-subject params (items). | Free |
llm | A vision model looks at the scene and decides placement. Opt-in. | Tokens per image — needs ANTHROPIC_API_KEY |
items fields: scale (0–1 of frame height) · target_frac (0–1, where the face sits horizontally) · base_frac (0–1, vertical position of the subject's feet/bottom — lower value = further back) · facing · flip · draw_order (lower = behind) · feather · warm (<1 cooler, >1 warmer) · color_match (0–0.4, pull toward scene colour).04General edits
Send an ordered list of operations; they apply in sequence. Good for thumbnails, normalisation, watermarks-free filters, format conversion.




curl -F file=@photo.jpg \
-F 'ops=[{"op":"resize","max":1600},{"op":"autocontrast"},{"op":"saturation","factor":1.3},{"op":"vignette","strength":0.45}]' \
-F output_format=jpeg -F quality=90 \
https://foto-api.autonomousmedia.io/v1/edit -o edited.jpg
Available operations
| op | params |
|---|---|
resize | w / h / max (bounds long edge, keeps aspect) |
crop | box=[l,t,r,b] or aspect="16:9" (centre crop) |
rotate / flip | deg · axis="h"|"v" |
brightness · contrast · saturation · sharpness | factor (1.0 = unchanged) |
blur | radius |
grayscale · sepia · invert · equalize | — |
autocontrast | cutoff (percent clipped, default 1) |
posterize | bits (1–8) |
vignette | strength (0–1) |
border | size, color="#rrggbb" |
convert | format="jpeg"|"png"|"webp" |
05Motion / reels
Six reel-native effects rendered server-side to 9:16 MP4 — built for a stills + TTS news pipeline (no After Effects, no manual editing). Each is a pure function of time → frame; the same model maps 1:1 to a Remotion useCurrentFrame() setup if you move rendering to Node later. The clips below are produced by the service.
words:[{"w":"Räntan","t":0.0},…] from your ElevenLabs with_timestamps response (or WhisperX). Without timings they fall back to an even beat.| field | type | notes |
|---|---|---|
effect | path | ken_burns · parallax · captions · karaoke · countup · transition |
image | file | required for ken_burns/parallax; optional background for the others |
image2 | file | second scene for transition (optional) |
params | form (JSON) | common: w, h, fps, duration · plus per-effect (below) |
→ video/mp4 (H.264, 9:16 by default).
Per-effect params
| effect | params |
|---|---|
ken_burns | zoom (1.16) · pan ("up-left"…) · kicker · headline |
parallax | cutout_model · kicker · headline |
captions | words (list) · highlight (index) · eyebrow · beat (s) |
karaoke | words (list or [{w,t}]) · gap (s) · lead (s) |
countup | title · corner · value · delta · bars (list) · labels (list) |
transition | scene_a/scene_b {kicker,headline,tint} — or pass image+image2 |
# Ken Burns on a still, with a lower-third headline curl -F image=@photo.jpg \ -F 'params={"duration":4,"zoom":1.18,"kicker":"Stockholm · 06 juni","headline":"Stadshuset i kvällsljus"}' \ https://foto-api.autonomousmedia.io/v1/motion/ken_burns -o kenburns.mp4 # Kinetic captions driven by word timings from TTS curl -F 'params={"words":["Räntan","sänks","med 0,25","i juni"],"highlight":2}' \ https://foto-api.autonomousmedia.io/v1/motion/captions -o captions.mp4 # Count-up data card from your pipeline JSON curl -F 'params={"title":"OMXS30 · stängning","value":2487.6,"delta":1.4, "bars":[-0.8,1.2,0.9,-0.4,1.4],"labels":["VOLVO","EVO","SBB","SINCH","ATCO"]}' \ https://foto-api.autonomousmedia.io/v1/motion/countup -o data.mp4
06Editorial styles
Three "news illustration" looks applied automatically to an ordinary photo. The subject is cut out (rembg) and the art — duotone colour ramps, torn-paper seams, sticker strokes, neon edge glow, halftone, scanlines — is composited around the mask. One photo, three social-ready looks:




cutout_model — u2net_human_seg (default, fast) or birefnet-portrait (cleaner edges).| field | type | notes |
|---|---|---|
style | path | split · red · money |
file | file | the photo (JPG/PNG/HEIC) |
cutout_model | query | default u2net_human_seg |
quality | query | JPEG quality, default 92 |
→ image/jpeg at the style's native social size.
curl -F file=@photo.jpg "https://foto-api.autonomousmedia.io/v1/style/red" -o red.jpg curl -F file=@photo.heic "https://foto-api.autonomousmedia.io/v1/style/money?cutout_model=birefnet-portrait" -o money.jpg
07Gallery
A mixed bag of effects across varied source photos — portraits, live music, architecture, coast — all produced by the endpoints above.

neon glow on a portrait

sticker cutout on live music

torn-paper split-face
a street still → motion
cutout depth on a portrait

a subject dropped onto a beach

autocontrast + saturation + vignette
08API reference
All endpoints return the resulting media bytes (PNG/JPEG/WebP/MP4) except the JSON discovery routes. Interactive Swagger UI is at /swagger.
https://foto-api.autonomousmedia.io (public, TLS via Let's Encrypt). For local development, swap it for http://localhost:8000.| method | route | returns |
|---|---|---|
GET | /health | JSON — liveness + capabilities |
GET | /v1/models | JSON — models, ops, effects, modes |
POST | /v1/cutout | image/png (transparent) |
POST | /v1/compose | image/jpeg (or png/webp) |
POST | /v1/edit | image/jpeg (or png/webp) |
POST | /v1/motion/{effect} | video/mp4 (9:16) |
POST | /v1/style/{style} | image/jpeg (editorial look) |
Liveness + capability snapshot (models, ops, whether LLM mode is enabled).
Lists cutout models, edit ops (with param hints), and available compose modes.
| field | type | notes |
|---|---|---|
file | file | the image (multipart) |
model | query | u2net_human_seg (default) · u2net · isnet-general-use · birefnet-portrait · birefnet-general |
alpha_matting | query | bool, default true — softer, cleaner edges |
→ image/png with alpha.
| field | type | notes |
|---|---|---|
background | file | the scene |
subjects | file[] | one or more (repeat the field). Raw photos are auto-cut; pre-cut RGBA PNGs are used as-is. |
params | form (JSON) | mode, cutout_model, auto_cutout, output_format, quality, items[] |
→ image/jpeg (or PNG/WebP via output_format).
| field | type | notes |
|---|---|---|
file | file | the image |
ops | form (JSON) | array of {"op":...} objects, applied in order |
output_format · quality | form | default jpeg · 94 |
Renders a 9:16 MP4. Full fields & per-effect params in §05 Motion. → video/mp4.
Applies an editorial look (split/red/money). Details in §06 Editorial styles. → image/jpeg.
Python client
import requests, json B = "https://foto-api.autonomousmedia.io" # 1) cut a subject out png = requests.post(f"{B}/v1/cutout", params={"model":"birefnet-portrait"}, files={"file": open("me.jpg","rb")}).content open("me.png","wb").write(png) # 2) composite onto a new scene (auto placement) r = requests.post(f"{B}/v1/compose", files=[("background",open("scene.jpg","rb")), ("subjects",open("me.jpg","rb"))], data={"params": json.dumps({"mode":"auto"})}) open("out.jpg","wb").write(r.content)
09Deploy
Local
docker compose up -d --build
# → http://localhost:8000 (docs at /, Swagger at /swagger)
curl -s localhost:8000/health | jq
Scraper server
Heavy/batch image work belongs on the scraper. Ship the repo to /opt/apps/photo-studio/ and bring it up with Compose.
rsync -az --exclude .git ./ sandenskog@SCRAPER_IP:/opt/apps/photo-studio/
ssh sandenskog@SCRAPER_IP "cd /opt/apps/photo-studio && docker compose up -d --build"
models Docker volume so it persists across restarts. To bake it in too, build with --build-arg PREFETCH_BIREFNET=1.Enable the vision-LLM placement mode (optional)
# in docker-compose.yml → environment: ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY} PLACEMENT_MODEL: claude-haiku-4-5 # cheap default; bump to claude-opus-4-8 for max quality
When set, {"mode":"llm"} becomes available on /v1/compose and /health reports llm_placement:true.
10Notes & limits
- CPU inference. On the scraper, cutout runs on CPU — fast for
u2net_human_seg, a few seconds for BiRefNet. GPU is not required. - Auto-placement is a heuristic. It reads faces & profiles; a raised arm or an ambiguous pose can fool the gaze guess. Override with
manual, or enablellmmode for hard cases. - Foreground framing. Close-up/bust subjects compose believably as foreground figures (anchored near the bottom). You can't shrink a bust into a tiny distant figure — there are no legs to show.
- Motion = render-to-MP4. Effects are rendered frame-by-frame (PIL/numpy) and encoded with bundled ffmpeg — H.264, 9:16, CPU only. Fonts are configurable via PHOTO_FONT_BOLD/PHOTO_FONT_MONO; the Docker image ships DejaVu. For production reels, move rendering to Remotion (Node) — the time→frame model maps 1:1.
- Parallax depth is approximate. The subject is cut out and moved over a blurred copy of the original; small movements read as depth without an inpainting/depth-map step. Keep motion subtle.
- Size guard. Inputs above
MAX_PIXELS(default 50 MP) are rejected with 413. - No external calls by default. Cutout, compose-auto/manual and edit are fully local. Only
llmmode sends images to the Anthropic API.