View on GitHub

Teapot AI

Home | Models | Enterprise | Blog | Demo | Discord

Models

Teapot models are designed to run anywhere — from local CPUs and edge devices to production-scale systems — while staying strong on general-purpose tasks like question answering, summarization, and information extraction. They’re optimized for fast, efficient, and grounded responses, making them a good fit when latency, cost, and reliability matter.

Whether you need a lightweight model for on-device inference or a larger model for higher accuracy, the Teapot family provides flexible options that excel at in-context reasoning and hallucination-resistant outputs. If you’re looking to fine-tune for your specific use case (proprietary data, domain workflows, custom formatting/refusals), get in contact with us to discuss custom training and deployment.

TinyTeapot 🫖

Edge / CPU-friendly

Params: 77M Speed: ~40 tok/s (Colab CPU)

A lightweight grounded model designed for fast, low-latency inference while still performing strong in-context Q&A and hallucination-resistant extraction when given a document/passages to cite from.

Total downloads

1k+

Hugging Face ↗

Best for: mobile/CPU demos, low-latency grounded answering.

TeapotLLM 🫖

Higher accuracy

Params: 0.8B Speed: ~5 tok/s (Colab CPU)

The larger “previous work” in the Teapot family: stronger grounding and extraction fidelity for context-faithful Q&A, refusal behavior, and structured information extraction—at higher compute cost.

Total downloads

10k+

Hugging Face ↗

Best for: best-quality grounded extraction/Q&A (when latency is less critical).