Skip to main content

Overview

ondoki can convert video recordings into step-by-step workflows. Upload a video file, and an async processing pipeline extracts frames, transcribes audio, and generates annotated steps with AI.

Supported Formats

FormatExtension
MP4.mp4
MOV.mov
AVI.avi
MKV.mkv
WebM.webm
M4V.m4v
Maximum file size: 2 GB

How It Works

1

Upload Video

Upload a video file via the Video Import page or the API. The file is validated and saved to disk.
2

Queue Processing

A MediaProcessingJob is created and queued for the Celery media worker.
3

Extract Audio

Audio is extracted from the video for transcription.
4

Transcribe

Audio is transcribed using Whisper (configurable model size: base, small, medium, large).
5

Extract Key Frames

Key frames are extracted from the video at regular intervals.
6

Analyze & Generate

AI analyzes the frames and transcript to generate annotated steps with titles and descriptions.
7

Create Workflow

A new ProcessRecordingSession is created with source_type “video”, containing the generated steps and guide.

Processing Stages

You can track progress via the status endpoint:
StageDescription
uploadingFile is being uploaded
queuedJob is queued for processing
extracting_audioAudio track being extracted
transcribingAudio being transcribed to text
extracting_framesKey frames being extracted
analyzingAI analyzing content
generatingGenerating steps and guide
doneProcessing complete
failedProcessing failed (check error)
Progress is tracked as a percentage (0–100%).

API

Upload

POST /api/v1/video-import/upload
Content-Type: multipart/form-data

file: <video file>
project_id: <project_id>
Returns: session_id, job_id, task_id

Check Status

GET /api/v1/video-import/status/{session_id}
Returns:
{
  "status": "processing",
  "processing_stage": "transcribing",
  "progress": 45,
  "video_filename": "demo.mp4",
  "video_size_bytes": 52428800,
  "estimated_frames": 120,
  "frame_count": 0,
  "has_guide": false
}

Check Job Status

GET /api/v1/video-import/jobs/{job_id}

Requirements

Video import requires:
  1. Celery media worker — must be running (media-worker service in docker-compose)
  2. Whisper model — configured via WHISPER_MODEL env var (default: base)
  3. OpenAI API key — if using OpenAI’s Whisper API (set OPENAI_API_KEY)
Without the media worker running, video uploads will be accepted but processing will not start. The job will remain in queued status.

Whisper Model Sizes

ModelSizeSpeedAccuracy
base74 MBFastGood for clear audio
small244 MBMediumBetter accuracy
medium769 MBSlowerHigh accuracy
large1.5 GBSlowestBest accuracy
Set via WHISPER_MODEL environment variable. Larger models require more RAM and processing time.

Retry Behavior

Failed processing jobs are retried up to 3 times (configurable via max_attempts). The job status tracks:
  • Number of attempts
  • Error messages
  • Start and finish timestamps