Net wt. 1 SDK
100% real inference
Nutrition Facts
Serving size: 1 Sipp Client
Amount per pour
- Dependencies0
- Cold start< 1000ms
- EngineRust · C++ · GGML
- BackendBrowser-nativeWebGPU
% Dev Value
- Open source100%
- Type-safe100%
- Framework sludge0%
The fastest runtime for the web. Run models right in the browser, zero install, and zero dependencies. Build games, agents, vision, and chat. Add a secure cloud gateway when you need it, all from one client.
Net wt. 1 SDK
100% real inference
Serving size: 1 Sipp Client
Amount per pour
% Dev Value
WebGPU · WASM · Rust · C++ · GGUF · TypeScript · 100% real tokens. Contains no frameworks, no concentrate, no added sugar.
Manage and query multiple inference endpoints through a single, unified API. Switch or split traffic between local browser execution and cloud gateways without rewriting your code.
import { SippClient } from '@sipp/sipp';
// One client. Pour in the browser or from the cloud.
const blender = new SippClient();
// Run in the browser on WebGPU (or go native: CUDA · Vulkan · Metal)
const juice = await blender.add('edge', {
kind: 'local',
source: '/models/llama3.gguf',
});
// ...or pour from a secure cloud gateway. Same interface, either way.
const ice = await blender.add('cloud', {
kind: 'gateway',
baseUrl: 'https://gateway.example.com/v1/',
});
// Stream inference from either endpoint with one symmetric API
const [smoothie, snowcone] = await Promise.all([
blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }),
blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice })
]);Sipp's WebGPU backend runs the same weights up to 5× faster than other browser runtimes. No native install. Pick a model and watch the multipliers stack up.
Mobile support is currently being worked on. Try demos on desktop.
Sipp vs
8.4×
faster
Sipp vs
5.4×
faster
Measured on Qwen 2.5 0.5B · Q4_K_M. LILO · 1024 in / 512 out · NVIDIA 3080 · Chrome (N=3, 9 runs, 1 warmup). Multipliers show how many times faster Sipp runs vs each browser runtime.
A bare-bones chat running 100% in your browser. Pick a model, start the tap, and then chat. No account, no server.
Mobile support is currently being worked on. Try demos on desktop.
The juice machine
Idle1 · Pick your flavor
Mobile support is currently being worked on. Try demos on desktop.
Real apps running real models with Sipp. No servers, no install, no waiting. Every one runs the model right in your browser.
Mobile support is currently being worked on. Try demos on desktop.
GameDesktop
A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.
Desktop only
AgentsDesktop
A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.
Desktop only
VisionDesktop
Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.
Desktop only
ChatDesktop
Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.
Desktop only
Sipp leads with the fastest runtime on the web. The same client API follows you to Node, Rust, Python, and a self-hosted gateway.
Run model weights in the browser on WebGPU. No servers and no dependencies, just pure bliss.
Server-side inference and framework route handlers in any Node runtime.
Read the docs ›
Native apps and services built on the sipp crate.
Read the docs ›
Local and gateway inference from Python, with bare-metal backends for fast compute.
Read the docs ›
One HTTP boundary that owns your keys, routing, policies, and metrics.
Read the docs ›
Need managed infrastructure for production workloads?
Commercial solutionsFresh batch ready
Install Sipp, run a model in your browser on WebGPU, then scale to Node, Rust, Python, or your own gateway.