Blazing-fast WebGPU runtime · Open Source

AI inference,

The fastest runtime for the web. Run models right in the browser, zero install, and zero dependencies. Build games, agents, vision, and chat. Add a secure cloud gateway when you need it, all from one client.

Get Started
Open sourceZero installLocal + gateway
SippPink Lemonade Ed.

Net wt. 1 SDK

100% real inference

Nutrition Facts

Serving size: 1 Sipp Client

Amount per pour

  • Dependencies0
  • Cold start< 1000ms
  • EngineRust · C++ · GGML
  • BackendBrowser-nativeWebGPU

% Dev Value

  • Open source100%
  • Type-safe100%
  • Framework sludge0%
$ npm install @sipp/sipp
Runs in your browserFastest WebGPU runtimeBuild gamesAgents & botsVision & chatZero installLocal + gatewayFully Open Source
Ingredients

WebGPU · WASM · Rust · C++ · GGUF · TypeScript · 100% real tokens. Contains no frameworks, no concentrate, no added sugar.

Taste test

Easy Setup.
One Simple API.

Manage and query multiple inference endpoints through a single, unified API. Switch or split traffic between local browser execution and cloud gateways without rewriting your code.

  • Identical Code Paths
    Execute queries symmetrically across edge and cloud endpoints.
  • Multi-Endpoint Control
    Register local and remote models under one unified client.
  • Native Performance
    Tap local WebGPU execution or cloud gateways with equal ease.
recipe.ts
import { SippClient } from '@sipp/sipp';

// One client. Pour in the browser or from the cloud.
const blender = new SippClient();

// Run in the browser on WebGPU (or go native: CUDA · Vulkan · Metal)
const juice = await blender.add('edge', {
  kind: 'local',
  source: '/models/llama3.gguf',
});

// ...or pour from a secure cloud gateway. Same interface, either way.
const ice = await blender.add('cloud', {
  kind: 'gateway',
  baseUrl: 'https://gateway.example.com/v1/',
});

// Stream inference from either endpoint with one symmetric API
const [smoothie, snowcone] = await Promise.all([
  blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }),
  blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice })
]);
Same symmetric API, local or cloud.
Benchmark · WebGPU showdown

Same model.
Faster in the browser.

Sipp's WebGPU backend runs the same weights up to 5× faster than other browser runtimes. No native install. Pick a model and watch the multipliers stack up.

Mobile support is currently being worked on. Try demos on desktop.

Sipp vs

Transformers.js

8.4×

faster

TTFT8.4× faster
Sipp
Decode3.8× tok/s
Sipp
E2E latency3.5× faster
Sipp

Sipp vs

WebLLM

5.4×

faster

TTFT5.4× faster
Sipp
Decode3.5× tok/s
Sipp
E2E latency3.3× faster
Sipp

Measured on Qwen 2.5 0.5B · Q4_K_M. LILO · 1024 in / 512 out · NVIDIA 3080 · Chrome (N=3, 9 runs, 1 warmup). Multipliers show how many times faster Sipp runs vs each browser runtime.

Live demo · Fresh squeeze

Pick a model.
Sip the tokens.

A bare-bones chat running 100% in your browser. Pick a model, start the tap, and then chat. No account, no server.

Mobile support is currently being worked on. Try demos on desktop.

The juice machine

Idle

1 · Pick your flavor

Mobile support is currently being worked on. Try demos on desktop.

sipp · chatoffline

Start the tap on the left to wake the model, then chat away.

Built with Sipp · 100% in-browser

Pour it into
anything.

Real apps running real models with Sipp. No servers, no install, no waiting. Every one runs the model right in your browser.

Mobile support is currently being worked on. Try demos on desktop.

GameDesktop

🪄Desktop only

PromptCast

A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.

Desktop only

AgentsDesktop

🍌Desktop only

Banana Brawl

A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.

Desktop only

VisionDesktop

🎨Desktop only

Sketch Critic

Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.

Desktop only

ChatDesktop

💬Desktop only

Aria

Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.

Desktop only

Fresh batch ready

Pour your first inference.

Install Sipp, run a model in your browser on WebGPU, then scale to Node, Rust, Python, or your own gateway.

Get Started