Check out our demo at: https://teapotai.com/chat
We’re excited to introduce Teapot AI, an innovative browser-based language model agent that prioritizes privacy and runs entirely on the user’s device. This breakthrough allows for strong AI reasoning capabilities without compromising user data. Here’s a deep dive into the architecture that makes it all possible.
We created Teapot AI with a simple goal- to keep your data private and make AI affordable by running it on your own device. No more data leaks or hefty cloud fees. There are many intersting technical challenges that this uncovered.
Teapot AI is structured into several key components that interact to provide a seamless and private AI experience:
The foundation of Teapot AI’s knowledge acquisition lies in its scraping abilities. It consists of:
Once the data is scraped, Teapot AI needs to understand and find the best information related to your queries:
At the heart of our system is the Teapot Agent, which handles the core interactions with the user:
Conversation: Manages the ongoing dialogue, maintaining the context and flow of the conversation.
Chat Context: Retains the user’s conversation history to provide relevant and coherent responses.
To accurately interpret and respond to user queries, Teapot AI employs custom models:
Intent Model: Powered by brain.js, it determines the user’s intention, whether they’re asking a question, requiring scraping of data, or initiating social interaction. Transformers.js The AI’s brain is built on Transformers.js:
Fine-tuned text2text model: Specifically trained for browser efficiency, it generates relevant completions based on input data.
Embedding Model: Converts text to vector form, facilitating the nearest neighbor search for relevant information.
In our tests, Teapot AI has shown efficiency and accuracy in browser-based AI tasks. Despite utilizing a smaller model that runs locally, latency was reduced while simultaneously aiding user data privacy, with an average latency of less than 10 seconds on a MacBook Pro. One of our key learnings has been that smaller models running locally can significantly reduce latency and increase privacy. Additionally, we’ve found that while models like flan-t5 are not capable of robust chain of thought reasoning, they perform better with direct question answering tasks augmented through techniques like Retrieval-Augmented Generation (RAG).
Average Latency: <10 seconds on MacBook Pro
We believe in community-driven development and are looking to build a team focused on creating strong AI reasoning agents that respect user privacy.
Join us on Discord: Discord
Check us out on Hugging Face: Hugging Face
Join our mission to bring private, powerful AI directly to your browser!