Overview
ondoki can convert video recordings into step-by-step workflows. Upload a video file, and an async processing pipeline extracts frames, transcribes audio, and generates annotated steps with AI.Supported Formats
| Format | Extension |
|---|---|
| MP4 | .mp4 |
| MOV | .mov |
| AVI | .avi |
| MKV | .mkv |
| WebM | .webm |
| M4V | .m4v |
How It Works
Upload Video
Upload a video file via the Video Import page or the API. The file is validated and saved to disk.
Transcribe
Audio is transcribed using Whisper (configurable model size: base, small, medium, large).
Analyze & Generate
AI analyzes the frames and transcript to generate annotated steps with titles and descriptions.
Processing Stages
You can track progress via the status endpoint:| Stage | Description |
|---|---|
uploading | File is being uploaded |
queued | Job is queued for processing |
extracting_audio | Audio track being extracted |
transcribing | Audio being transcribed to text |
extracting_frames | Key frames being extracted |
analyzing | AI analyzing content |
generating | Generating steps and guide |
done | Processing complete |
failed | Processing failed (check error) |
API
Upload
session_id, job_id, task_id
Check Status
Check Job Status
Requirements
Video import requires:- Celery media worker — must be running (
media-workerservice in docker-compose) - Whisper model — configured via
WHISPER_MODELenv var (default:base) - OpenAI API key — if using OpenAI’s Whisper API (set
OPENAI_API_KEY)
Whisper Model Sizes
| Model | Size | Speed | Accuracy |
|---|---|---|---|
base | 74 MB | Fast | Good for clear audio |
small | 244 MB | Medium | Better accuracy |
medium | 769 MB | Slower | High accuracy |
large | 1.5 GB | Slowest | Best accuracy |
WHISPER_MODEL environment variable. Larger models require more RAM and processing time.
Retry Behavior
Failed processing jobs are retried up to 3 times (configurable viamax_attempts). The job status tracks:
- Number of attempts
- Error messages
- Start and finish timestamps