Blazing-fast WebGPU runtime - Open Source

AI inference,

Engineered in Rust and C++, Sipp keeps model weights and tokens on-device, minimizing copies across the WASM boundary for real-time games, local agents, and vision/chat apps on WebGPU or desktop. When work moves beyond the device, the same API extends to self-hosted gateways or trusted providers.

Get Started

Open sourceWebGPU fast pathSelf-host gateway

SippPink Lemonade Ed.

Net wt. 1 SDK

100% real inference

Nutrition Facts

Serving size: 1 Sipp Client

Amount per pour

Dependencies0
Model formatGGUF
EngineRust + C++ + GGML
BackendBrowser-nativeWebGPU

% Dev Value

Open source100%
Type-safe100%
Endpoint mixer1 API

$ npm install @sipphq/sipp

Runs in your browserWebGPU runtimeRust + C++ coreGGUF weightsEndpoint APISelf-host gatewayOPFS cacheBuild gamesLocal agentsVision & chatZero installFully Open Source

Ingredients

WebGPU, WASM, Rust, C++, GGUF, TypeScript, OPFS-backed cache, gateways, providers. Contains no black-box runtime, no framework lock-in, no added sugar.

Taste test

Easy Setup.
One Simple API.

Initialize local WebGPU inference, stream real tokens, and keep GGUF weights in browser storage for repeat runs. Offload to a self-hosted gateway or trusted provider when need, same API.

Local Model Setup
Load GGUF weights into WebGPU from a single API, with no server path required.
Explicit Endpoint Mixing
Register local, gateway, or provider targets and mix between them.
Native Hot Path
Let Rust and C++ handle scheduling, memory, token throughput, and gateway builds.

recipe.ts

import { SippClient } from '@sipphq/sipp';

// One client. Pour in the browser or from the cloud.
const blender = new SippClient();

// Run in the browser on WebGPU (or go native: CUDA · Vulkan · Metal)
const juice = await blender.add('local', {
  kind: 'local',
  source: '/models/llama3.gguf',
});

// ...or pour from a provider you love. Same interface, either way.
const ice = await blender.add('provider', {
  kind: 'provider',
  provider: 'openai',
  model: process.env.OPENAI_MODEL ?? 'gpt-5-mini',
  apiKey: requiredEnv('OPENAI_API_KEY'),
});

// Stream inference from either endpoint with one symmetric API
const [smoothie, snowcone] = await Promise.all([
  blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }),
  blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice })
]);

* Local WebGPU, GGUF weights, OPFS-backed cache.

Benchmark · WebGPU showdown

Same model.
Faster in the browser.

Sipp's WebGPU backend cuts TTFT and runs decode up to 3x faster against other browser runtimes while keeping the same GGUF weights local. No native install. Pick a model and inspect the multipliers.

Run the benchmark

Mobile support is currently being worked on. Try demos on desktop.

Sipp vs

Transformers.js

8.4×

faster

TTFT8.4× faster

Sipp

1×

Decode3.8× tok/s

Sipp

1×

E2E latency3.5× faster

Sipp

1×

Sipp vs

WebLLM

5.4×

faster

TTFT5.4× faster

Sipp

1×

Decode3.5× tok/s

Sipp

1×

E2E latency3.3× faster

Sipp

1×

Measured on Qwen 2.5 0.5B · Q4_K_M. LILO · 1024 in / 512 out · NVIDIA 3080 · Chrome (N=3, 9 runs, 1 warmup). Multipliers show how many times faster Sipp runs vs each browser runtime.

Live demo · Fresh squeeze

Pick a model.
Sip the tokens.

A bare-bones chat running 100% in your browser. Pick a model, start the tap, and then chat. No account, no server.

Try the full demo

Mobile support is currently being worked on. Try demos on desktop.

The juice machine

Idle

1 · Pick your flavor

Mobile support is currently being worked on. Try demos on desktop.

Nothing downloads until you start. Weights are cached after the first pour.

sipp · chatoffline

Start the tap on the left to wake the model, then chat away.

Built with Sipp · 100% in-browser

Pour it into
anything.

Real apps running real models with Sipp. No servers, no install, no waiting. Every one runs the model right in your browser.

Mobile support is currently being worked on. Try demos on desktop.

GameDesktop

🪄Desktop only

PromptCast

A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.

Desktop only

GameLocal

🪄Live demo

PromptCast

A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.

Play in browser ›

AgentsDesktop

🍌Desktop only

Banana Brawl

A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.

Desktop only

AgentsLocal

🍌Live demo

Banana Brawl

A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.

Play in browser ›

VisionDesktop

🎨Desktop only

Sketch Critic

Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.

Desktop only

VisionLocal

🎨Live demo

Sketch Critic

Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.

Play in browser ›

ChatDesktop

💬Desktop only

Aria

Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.

Desktop only

ChatLocal

💬Live demo

Aria

Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.

Play in browser ›

One client - every target

Start in the browser.
Mix where it runs.

The same endpoint API follows you to native runtimes, trusted provider calls, and a self-hosted gateway when you want one boundary for local and remote work.

FeaturedWebGPU · zero install