Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Device Support

Sipp runs across a range of devices, operating systems, browsers, and GPU accelerators. This page documents which configurations are supported, at what level, and any known limitations.

Compute Backends

Backend names are shared across build configuration and runtime selection. The same name selects the backend in each surface.

BackendStatusFeature flagDefaultPlatformsNotes
CPUSupportednativeYesAllPortable fallback, no accelerator required
CUDASupportedcudaNoLinux, WindowsNVIDIA GPUs, compute capability 7.5+
MetalSupportedmetalNomacOSApple Silicon and AMD GPUs; use CPU on Intel integrated GPUs
VulkanSupportedvulkanNoLinux, WindowsVulkan 1.2+ GPU required
WebGPUSupportedGGML_WEBGPU (CMake)NoWASM browsersBrowser-only, requires shader-f16

Runtime selection:

  • CLI: --backend auto|cpu|cuda|metal|vulkan
  • Node.js: SIPP_NODE_BACKEND=cpu|vulkan|cuda|metal
  • Python: SIPP_PYTHON_BACKEND=cpu|vulkan|cuda|metal
  • Browser: backend: 'auto' | 'cpu' | 'webgpu' in model load options

Leave the variable unset for automatic backend selection.

Backend Availability by Package

BackendNode.jsPythonRustBrowser (WASM)Gateway
CPUYesYesYesYesYes
CUDAYesYesYesYes
MetalYesYesYes
VulkanYesYesYesYes
WebGPUYes

Additional llama.cpp Backends (Not Yet Exposed)

The vendored llama.cpp supports additional backends that Sipp does not currently expose as feature flags. Community contributions are welcome.

  • SYCL (Intel oneAPI)
  • HIP / ROCm (AMD)
  • OpenCL
  • OpenVINO
  • CANN (Huawei Ascend)
  • MUSA (Moore Threads)
  • Hexagon (Qualcomm DSP)
  • ZenDNN (AMD)
  • RPC (remote backend)

These backends require custom CMake flags on top of the vendored llama.cpp build and are not available through Sipp’s standard build or package commands.


Desktop Browser Support Matrix

The table below shows the first browser version where each feature is available for desktop operating systems. A dash () means the feature is not supported.

BrowserSupportWASM stWASM pthread¹WebGPUWebGPU + f16²OPFS³Workers
Chrome (Win, Mac, Linux)✅ Tested5792⁴113113864
Edge (Win, Mac, Linux)❌ Untested79⁵92⁴1131138679⁵
Firefox (Windows)❌ Untested5279⁴1411411113.5
Firefox (macOS)❌ Untested5279⁴145⁶145⁶1113.5
Firefox (Linux)❌ Untested5279⁴⚠ Nightly⚠ Nightly1113.5
Safari (macOS)❌ Untested1115.2⁴262616.44
Opera (Win, Mac, Linux)❌ Untested4478⁴99997211.5
ChromeOS❌ Untested5792⁴113113864
Other Chromium-based⁷❌ Untested57+92⁴11311386+4+

Footnotes:

  • ¹ WASM pthread requires the server to send Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp (or credentialless) HTTP headers. See WASM Threading below.
  • ² The shader-f16 WebGPU feature is required by Sipp’s browser WebGPU backend. Availability depends on GPU and driver support in addition to the browser version.
  • ³ Origin Private File System. Used for model data caching. Requires a secure context (HTTPS). Firefox support is behind the dom.fs.enabled preference until version 111.
  • ⁴ Version listed is when SharedArrayBuffer became available with cross-origin isolation headers. Earlier versions may have had the feature without the header requirement.
  • ⁵ Edge switched to a Chromium engine at version 79. The Chromium-based Edge supports WASM single-thread from 79, Workers from 79. The legacy EdgeHTML engine supported Workers from version 12 and WASM from version 16.
  • ⁶ Firefox 145 enables WebGPU on macOS version 26 (ARM64). Intel Mac support is in progress in Nightly.
  • ⁷ Includes Brave, Vivaldi, Arc, and other Chromium-derived browsers. Versions match their underlying Chromium release.

Mobile Browser Support Matrix

BrowserSupportWASM stWASM pthread¹WebGPUWebGPU + f16²OPFS³Workers
Chrome (Android)🟡 Pending5792⁴121⁵121⁵8656
Safari (iOS / iPadOS)❌ Untested1115.2⁴262616.45
Safari (visionOS)❌ Untested1115.2⁴262616.45
Samsung Internet (Android)❌ Untested816⁴2424214
Opera (Android)❌ Untested4478⁴80807211.5
Firefox (Android)❌ Untested5279⁴⚠ Beta/Nightly⚠ Beta/Nightly15052
Android WebView❌ Untested5792⁴⚠ Flag⁶⚠ Flag⁶8656

Footnotes:

  • ¹ Requires COOP/COEP HTTP headers as described in WASM Threading.
  • ² The shader-f16 feature may not be available on all mobile GPU/driver combinations even when the browser version supports it.
  • ³ Origin Private File System. Chrome for Android and Samsung Internet support OPFS. iOS Safari supports OPFS from 16.4.
  • ⁴ Version listed is when SharedArrayBuffer became available with cross-origin isolation headers.
  • ⁵ Chrome 121 on Android 12+ with Qualcomm or ARM GPUs. Support on other GPU vendors (Imagination, Samsung Xclipse) is still rolling out.
  • ⁶ Android WebView requires the --enable-unsafe-webgpu flag. Not recommended for production use.

WASM Threading

Sipp ships two WASM runtime artifacts:

ArtifactThread countToken streamingRequirements
sipp-wasm.js (single-thread)1postMessageNone
sipp-wasm-pthread.js (pthread)up to 4⁷SharedArrayBuffer ringCOOP + COEP headers, secure context

⁷ Defaults to min(4, navigator.hardwareConcurrency). Override with runtime.context.n_threads in model load options.

The client auto-detects pthread availability at runtime:

function supportsWasmPthreads(): boolean {
  return (
    typeof SharedArrayBuffer !== 'undefined' &&
    globalThis.crossOriginIsolated === true &&
    typeof Worker !== 'undefined'
  );
}

Set wasmThreading: 'single-thread' in client options when the hosting environment cannot serve COOP/COEP headers (for example, GitHub Pages or shared hosting without header control).


Platform & OS Support

OSx64arm64Other architecturesAvailable bindings
Linux (glibc)YesYesarm, loong64, riscv64, ppc64, s390xNode.js, Python, Rust
Linux (musl)YesYesarm, loong64, riscv64Node.js
Windows (MSVC)YesYesia32Node.js, Python, Rust
Windows (GNU)YesNode.js
macOSYesYesuniversal2Node.js, Python, Rust
AndroidYesarm (eabi)Node.js
FreeBSDYesYesNode.js
OpenHarmonyYesYesarmNode.js

Docker Containers

ProfileBackendHost OSNotes
CPUCPULinux, macOS, WindowsWorks everywhere, no GPU passthrough
CUDACUDALinux, Windows (WSL2)Requires NVIDIA Container Toolkit
VulkanVulkanLinux onlyWindows Docker Desktop does not support Vulkan passthrough
MetalMetal unavailable inside Linux containers

GPU & Accelerator Support

NVIDIA CUDA

Sipp targets NVIDIA GPUs with compute capability 7.5 and above. CUDA 13 removes support for architectures below 7.5.

ArchitectureCompute CapabilityTarget GPUs
Turing7.5T4, Quadro RTX, GeForce RTX 20-series
Ampere8.0, 8.6A100, A10, A40, RTX A6000, GeForce RTX 30-series
Ada Lovelace8.9L4, L40S, GeForce RTX 40-series
Hopper9.0H100, H200
Blackwell (Data Center)10.0B100, B200, GB200
Blackwell (Consumer/Edge)12.0, 12.1GeForce RTX 50-series, RTX PRO Blackwell

Vulkan

Any GPU with Vulkan 1.2 or later driver support works on Linux and Windows. Tested on:

  • NVIDIA: Turing, Ampere, Ada Lovelace, Hopper (proprietary driver)
  • AMD: RDNA 2 and later (AMDGPU PRO or RADV)
  • Intel: Gen12/Xe and later (ANV)

Windows Docker Desktop does not support the Vulkan backend.

macOS source builds can compile Vulkan through the LunarG SDK, but LunarG’s macOS drivers translate Vulkan to Metal. Sipp does not publish macOS Vulkan packages because the native Metal backend is simpler for normal macOS use and macOS Vulkan adds loader/ICD runtime requirements.

Metal

  • Apple Silicon: M1, M2, M3, M4 series
  • AMD: GPUs supported by macOS (Radeon Pro series)

Metal is macOS-only and unavailable inside Docker containers. Intel integrated GPUs expose Metal, but Sipp does not treat them as a recommended Metal target; use the CPU backend on those Macs unless you have tested the exact model, context size, and device and confirmed that Metal is stable and faster than CPU.

Apple Silicon can run x64 processes through Rosetta 2. A darwin-x64 Node or Python native package is only used by an x64 Node/Python process; native arm64 Node/Python installations use the darwin-arm64 packages and are the preferred path on Apple Silicon.

WebGPU (Browser)

Any GPU that the host browser exposes as a WebGPU adapter may work, but Sipp requires the shader-f16 feature for WebGPU acceleration. Common configurations:

GPU FamilyChrome (D3D12)Chrome (Vulkan)Firefox (wgpu)Safari (Metal)
NVIDIAYesYes (Linux)Yes
AMDYesYes (Linux)YesYes
Intel integratedYesYes (Linux)YesYes
Apple SiliconYesYes
Qualcomm (Android)Yes
ARM MaliYes (Android)

Language Binding Support

PackageInstall commandStatusRun timePrimary use
Browser (@sipp/sipp)npm install @sipp/sippPublished (npm)WASM / WebGPUBrowser-local GGUF inference, gateway clients
Node.js (@sipp/sipp-server)npm install @sipp/sipp-serverPublished (npm)N-API nativeServer processes, route handlers, backend services
Python (sipppy)pip install sipppyPublished (PyPI)PyO3 nativePython services, scripts, gateway clients
Rust (sipp-rs)cargo add sipp-rsPublished (crates.io)Native-backed Rust crateRust applications and services
Gateway serverSource-builtSource onlyAxum binaryHTTP gateway for local and provider targets
Gateway DockerDocker from sourceSource onlyContainerProduction container workflows
Gateway toolkitSource artifactSource onlyRust crateCustom gateway applications

Limitations & Work in Progress

  • Gateway server does not have a published binary or public container image yet. It must be built from source.
  • Windows Docker Vulkan is not supported. Use the CUDA or CPU profiles on Windows with WSL2.
  • macOS Docker is CPU-only. Metal cannot run inside a Linux Docker container.
  • Android and iOS are not first-class package targets. The browser WASM package works on mobile web browsers, but no native Android or iOS packages are published.
  • Chrome (desktop) is the primary tested browser target. Other desktop browsers (Edge, Firefox, Safari, Opera, Chromium derivatives) are untested.
  • Mobile browser support has not been validated yet. Chrome (Android) is the next target for testing.
  • Firefox WebGPU on Linux and Android is in active development (Nightly / Beta). Firefox WebGPU on macOS Intel is also in progress.
  • Gateways are compatible with OpenAI and OpenAI-compatible providers plus Anthropic. Additional provider support is added over time.