Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Library API Overview

The Sipp libraries for Rust, Node.js, Python, and Browser expose the same endpoint-oriented client model.

At a high level:

  1. Register an endpoint with add.
  2. Keep the returned EndpointRef.
  3. Pass that reference to query, chat, or embed.

This keeps application code the same whether inference runs locally, through a gateway, through a provider, or across a hybrid setup.

Core Client Methods

SippClient exposes four primary methods:

MethodPurpose
addRegister a local, gateway, or provider endpoint and return an EndpointRef.
queryGenerate text from a raw prompt string. No chat template is applied.
chatGenerate text from ordered { role, content } messages.
embedGenerate an embedding vector from text input.

add() — Register an Endpoint

add(id: string, descriptor: EndpointDescriptor) -> EndpointRef

add registers an endpoint with the current client instance.

The id is caller-defined and scoped to the client. Reusing an id replaces the existing endpoint. The returned EndpointRef is a lightweight handle with:

FieldDescription
kindEndpoint kind: "local", "gateway", or "provider".
idThe endpoint id registered on this client.

Pass the returned EndpointRef to query, chat, or embed to choose where the operation runs.

Local Endpoint

A local endpoint loads a GGUF model into the current process. The application owns model selection, runtime lifecycle, and cleanup.

FieldTypeDescription
kind"local"Endpoint kind selector.
modelPathstring / PathBufFilesystem path or browser URL for the GGUF artifact.
configNativeRuntimeConfig optionalLoad-time runtime configuration, including context size, GPU placement, scheduler policy, cache mode, sampling defaults, and observability.

Use a local endpoint when the current process should own model execution.

Gateway Endpoint

A gateway endpoint sends requests to a remote Sipp gateway over HTTP. The gateway process owns provider credentials, local model paths, access policy, concurrency, and metrics.

FieldTypeDescription
kind"gateway"Endpoint kind selector.
targetstringPublic target name resolved by the gateway. Sent as the model field in gateway profile requests.
baseUrlstringAbsolute HTTP(S) URL of the gateway service.
authentication{ kind, value?, headerName? }Auth strategy: "none", "bearer", or "header".
staticHeaders{ name, value }[] optionalAdditional HTTP headers attached to every request.
timeoutMs / timeoutPolicynumber / struct optionalConnection, request, and streaming read deadlines.
queryRoutestring optionalQuery route. Defaults to /v1/query.
chatRoutestring optionalChat route. Defaults to /v1/chat.
embedRoutestring optionalEmbedding route. Defaults to /v1/embed.
protocolOptionsmap optionalProfile-specific options merged into every request body.

Use a gateway endpoint when a separate service should own model access and operational policy.

Provider Endpoint

A provider endpoint calls a model provider directly. This is intended for trusted server-side code that manages its own credential lifecycle.

FieldTypeDescription
kind"provider"Endpoint kind selector.
provider"openai" / "anthropic" / "openai_compatible"Provider adapter.
modelstringProvider model identifier.
apiKeystring optionalProvider API key.
baseUrlstring optionalOverride for the provider base URL.

Use a provider endpoint when server-side code should call a provider API directly without a Sipp gateway.


query() — Generate from a Raw Prompt

query(request: SippQueryRequest) -> SippTextRun

query sends the prompt string to the selected endpoint exactly as supplied. No chat template is applied.

Use query when the application owns the full prompt shape, including custom templates, completion-style models, encoder-decoder text models, few-shot prompts, or agent loops that render prompts themselves.

Request Fields

FieldTypeDescription
endpointEndpointRefRegistered endpoint to target. May be omitted only when exactly one local endpoint supports the operation.
promptstringRaw prompt text.
optionsSippTextOptions optionalShared generation options: maxTokens, temperature, topP, and stop.
localLocalTextOptions optionalLocal-only options such as contextKey, grammar, jsonSchema, sampling overrides, and media inputs. Rejected by gateway endpoints.
endpointOptionsmap optionalFree-form options forwarded to gateway endpoint implementations.
providerOptionsmap optionalFree-form options forwarded to direct provider adapters. Rejected by gateway endpoints.
emitTokensbooleanWhen true, stream TokenBatch values through the returned run handle.

Return Value

query returns a SippTextRun.

MemberTypeDescription
responsePromise / FutureResolves to SippTextResponse when generation completes.
tokensAsync iterableStreams TokenBatch values when emitTokens is true.
cancel(reason)methodCancels an in-flight generation.

SippTextResponse contains the generated text, finishReason, token usage, and optional localStats for local endpoints.


chat() — Generate from Role Messages

chat(request: SippChatRequest) -> SippTextRun

chat sends ordered role/content messages to the selected endpoint. The endpoint owns message rendering.

Endpoint kindMessage handling
LocalRenders messages through the GGUF-declared tokenizer.chat_template. Fails if the model has no template.
GatewayForwards messages to the resolved gateway target. Provider targets handle their own message mapping.
ProviderSends messages using the provider’s native chat-completions format.

Request Fields

FieldTypeDescription
endpointEndpointRefRegistered endpoint to target.
messages{ role, content }[]Ordered conversation turns.
optionsSippTextOptionsSame shared generation options as query.
localLocalTextOptionsSame local-only options as query.
emitTokensbooleanSame streaming control as query.

Return Value

chat returns the same SippTextRun shape as query.


embed() — Generate an Embedding

embed(request: SippEmbedRequest) -> SippEmbeddingRun

embed produces a single embedding vector from text input. It does not accept generation options and does not stream tokens.

Request Fields

FieldTypeDescription
endpointEndpointRefRegistered endpoint to target.
inputstringText to vectorize.
localLocalEmbedOptions optionalLocal embedding options, including contextKey and normalize.
endpointOptionsmap optionalFree-form options for gateway endpoint implementations.
providerOptionsmap optionalFree-form options for direct provider adapters.

Return Value

embed returns a SippEmbeddingRun.

MemberTypeDescription
responsePromise / FutureResolves to SippEmbeddingResponse when encoding completes.
cancel(reason)methodCancels an in-flight embedding.

SippEmbeddingResponse contains the float values array, optional token usage, the pooling strategy, and the normalized flag.


Gateway and Client Symmetry

The same SippClient API works on both sides of the gateway boundary.

Server Side

A server process creates a SippClient, registers local endpoints, and maps HTTP routes to query, chat, or embed.

Server client:
  add("local-model", LocalDescriptor { modelPath, config })
  -> route handler decodes HTTP request
  -> route handler calls client.query/chat/embed
  -> route handler encodes HTTP response

The first-party Gateway Server uses this pattern. Application-owned Node, Python, or Rust servers can also use it through the gateway profile helpers.

Client Side

A client process creates a SippClient, registers gateway endpoints, and calls query, chat, or embed the same way it would call a local endpoint.

Client client:
  add("remote", GatewayDescriptor { target, baseUrl, authentication })
  -> client.query/chat/embed({ endpoint: ref, ... })
  -> request is sent to the gateway over HTTP

Hybrid Pattern

A single client can register multiple endpoint kinds. The application chooses where an operation runs by passing a different endpoint reference.

localRef = client.add("local", LocalDescriptor { ... })
gatewayRef = client.add("gateway", GatewayDescriptor { ... })

client.query({ endpoint: localRef, prompt, ... })
client.query({ endpoint: gatewayRef, prompt, ... })

The operation code stays the same. Only the endpoint reference changes.

Why the Endpoint Model Matters

The endpoint model gives applications one API surface across multiple deployment shapes.

BenefitDescription
Stable operation codequery, chat, and embed are called the same way for local, gateway, provider, and hybrid setups.
Swappable execution targetsMove inference between local models, gateway targets, and direct providers by changing endpoint descriptors.
Clear ownership boundariesLocal endpoints keep lifecycle in-process; gateway endpoints move access, credentials, policy, and metrics to a service boundary.
Language symmetryPatterns learned in one language package transfer directly to the others.
Extensible endpoint kindsNew endpoint kinds can be added without changing the operation call pattern.

Visual Summary

flowchart LR
    %% -------------------------
    %% Node Styling
    %% -------------------------
    classDef client_node fill:#eef6ff,stroke:#4a90e2,stroke-width:1.5px,color:#111,rx:6,ry:6;
    classDef setup_node fill:#f7f7f7,stroke:#999,stroke-width:1px,color:#111,rx:6,ry:6;
    classDef runtime_node fill:#f3fff0,stroke:#52a852,stroke-width:2px,color:#111,rx:6,ry:6;
    classDef gateway_node fill:#fff7e6,stroke:#d99000,stroke-width:2px,color:#111,rx:6,ry:6;
    classDef provider_node fill:#f8f0ff,stroke:#8e44ad,stroke-width:1.5px,color:#111,rx:6,ry:6;

    %% -------------------------
    %% Client Process
    %% -------------------------
    subgraph CLIENT["Client Process"]
        direction TB
        CApp["Application Code"]:::client_node
        CClient["SippClient<br/>add(...) -> EndpointRef<br/>query / chat / embed"]:::client_node
        CApp --> CClient

        %% Logical grouping for endpoint registration options
        subgraph CSetup["Endpoint Setup (options)"]
            direction LR
            CLocalEP["local (GGUF)"]:::setup_node
            CGatewayEP["gateway (Remote)"]:::setup_node
            CProviderEP["provider (API)"]:::setup_node
        end
        CClient -. "Registers" .-> CSetup

        %% Local execution flow for local ref
        subgraph CLocalRuntime["Local Runtime"]
            direction LR
            CLocalRun["GGUF Runtime"]:::runtime_node
        end

        %% Connection for local usage
        CClient -- "Local Ref (query)" --> CLocalRun
    end

    %% -------------------------
    %% Server Process
    %% -------------------------
    subgraph SERVER["Server Process / Gateway Server"]
        direction TB
        SGateway["Gateway Server<br/>HTTP: /v1/query, /chat, /embed"]:::gateway_node
        SClient["SippClient (same lib)"]:::client_node
        SGateway --> SClient

        %% Logical grouping for endpoint registration options
        subgraph SSetup["Endpoint Setup (options)"]
            direction LR
            SLocalEP["local (GGUF)"]:::setup_node
            SProviderEP["provider (API)"]:::setup_node
        end
        SClient -. "Registers" .-> SSetup

        %% Local execution flow for local ref
        subgraph SLocalRuntime["Local Runtime"]
            direction LR
            SLocalRun["GGUF Runtime"]:::runtime_node
        end

        %% Connection for local usage
        SClient -- "Local Ref (query)" --> SLocalRun
    end

    %% -------------------------
    %% External Providers
    %% -------------------------
    Providers["Provider APIs<br/>OpenAI / Gemini / Anthropic / etc."]:::provider_node

    %% -------------------------
    %% Cross-process / Remote connections
    %% -------------------------
    CClient == "Gateway Ref (query)" ==> SGateway
    CClient == "Provider Ref (query)" ==> Providers
    SClient == "Provider Ref (query)" ==> Providers

    %% -------------------------
    %% Styling Assignment to Nodes
    %% -------------------------
    class CApp,CClient client_node;
    class CLocalEP,CGatewayEP,CProviderEP,SLocalEP,SProviderEP setup_node;
    class CLocalRun,SLocalRun runtime_node;
    class SGateway gateway_node;
    class Providers provider_node;