Overview
TwelveLabs is a multimodal AI company focused on deep video understanding, offering video-indexing + APIs for search, embedding, analysis/summarization, and generation. The platform is built on video foundation models and a video-language model, with two named model lines: Marengo (video encoder / embedding model used for indexing and multimodal embeddings) and Pegasus (video-language model used for analysis/summarize and generation). A notable algorithm is ViSeRet, highlighted as top-performing in the ICCV VALUE retrieval challenge. Capabilities include semantic, natural-language CTRL-F-style search across speech/text/audio/visuals; precise scene pinpointing; video-to-text generation (summaries, chapters, highlights); multimodal embeddings for search/recommendation; and the potential to deploy custom models with customer data. Deployments span cloud, private cloud, and on-premises, with NVIDIA-accelerated infrastructure and scalable infrastucture claimed.
Key Features
Marengo model
Video encoder / embedding model used for indexing and multimodal embeddings.
Pegasus model
Video-language (video-first) model used for analysis/summarize and generation.
ViSeRet
Notable algorithm mentioned as top-performing in ICCV VALUE retrieval challenge.
Multimodal search
Semantic, natural-language CTRL-F-style search across speech/text/audio/visuals.
Video-to-text generation
Summaries, chapters, highlights generation from video input.
Embeddings for search/recommendation
Multimodal embeddings to power search and recommendations.



Who Can Use This Tool?
- Enterprise teams:Index internal videos and search corporate video libraries.
- Media & Entertainment:Automate clip/highlight generation, chapters, and fast content production.
- Government & Security:Sensitive-content detection and policy-compliant video analysis.
- Advertising analytics:ROI analysis for ads, sponsorships, and sports analytics.
Pricing Plans
Free tier with limited indexing and API access for development
- ✓Up to 600 minutes (10 hours) of indexing
- ✓Index access: 90 days since creation
- ✓Concurrent indexing tasks: 5
- ✓Basic APIs available in free tier
Pay-as-you-go development plan with usage-based pricing (Marengo, Pegasus, and API usage)
- ✓Video indexing (one-time): $0.042 / minute
- ✓Embedding infrastructure services: $0.0015 / minute
- ✓Search API usage: $4 / 1000 queries
- ✓Embed API (Video: $0.042 / minute; Audio: $0.0083 / minute; Image: $0.10 / 1000; Text: $0.07 / 1000)
- ✓Analyze / Summarize (Pegasus): Input video $0.021 / minute; Output text $0.0075 / 1k tokens
- ✓Index limits: Unlimited for paid tiers; concurrent indexing tasks up to 25
Custom pricing and SLAs; rate limits scale with committed monthly spend.
- ✓Custom SLAs and support
- ✓Dedicated success manager and onboarding
- ✓Higher rate limits and concurrency
Pros & Cons
✓ Pros
- ✓Comprehensive video understanding with search, embedding, and generation capabilities
- ✓API-first with multiple deployment options (cloud, private cloud, on-prem)
- ✓Notable models (Marengo, Pegasus) and ViSeRet algorithm
- ✓Multimodal embeddings for search and recommendations
- ✓Free trial available
✗ Cons
- ✗Pricing details can be complex and require confirmation for exact quantities
- ✗Documentation root showed a 404 page (nudges to Playground/blog/research)
- ✗Some pricing granularities (e.g., final output-text price per 1M tokens) require verification with Sales
- ✗Enterprise specifics (rate limits, SLAs) are custom and need direct engagement
