Library API Overview
The Sipp libraries for Rust, Node.js, Python, and Browser expose the same endpoint-oriented client model.
At a high level:
- Register an endpoint with
add. - Keep the returned
EndpointRef. - Pass that reference to
query,chat, orembed.
This keeps application code the same whether inference runs locally, through a gateway, through a provider, or across a hybrid setup.
Core Client Methods
SippClient exposes four primary methods:
| Method | Purpose |
|---|---|
add | Register a local, gateway, or provider endpoint and return an EndpointRef. |
query | Generate text from a raw prompt string. No chat template is applied. |
chat | Generate text from ordered { role, content } messages. |
embed | Generate an embedding vector from text input. |
add() — Register an Endpoint
add(id: string, descriptor: EndpointDescriptor) -> EndpointRef
add registers an endpoint with the current client instance.
The id is caller-defined and scoped to the client. Reusing an id replaces
the existing endpoint. The returned EndpointRef is a lightweight handle with:
| Field | Description |
|---|---|
kind | Endpoint kind: "local", "gateway", or "provider". |
id | The endpoint id registered on this client. |
Pass the returned EndpointRef to query, chat, or embed to choose where
the operation runs.
Local Endpoint
A local endpoint loads a GGUF model into the current process. The application owns model selection, runtime lifecycle, and cleanup.
| Field | Type | Description |
|---|---|---|
kind | "local" | Endpoint kind selector. |
modelPath | string / PathBuf | Filesystem path or browser URL for the GGUF artifact. |
config | NativeRuntimeConfig optional | Load-time runtime configuration, including context size, GPU placement, scheduler policy, cache mode, sampling defaults, and observability. |
Use a local endpoint when the current process should own model execution.
Gateway Endpoint
A gateway endpoint sends requests to a remote Sipp gateway over HTTP. The gateway process owns provider credentials, local model paths, access policy, concurrency, and metrics.
| Field | Type | Description |
|---|---|---|
kind | "gateway" | Endpoint kind selector. |
target | string | Public target name resolved by the gateway. Sent as the model field in gateway profile requests. |
baseUrl | string | Absolute HTTP(S) URL of the gateway service. |
authentication | { kind, value?, headerName? } | Auth strategy: "none", "bearer", or "header". |
staticHeaders | { name, value }[] optional | Additional HTTP headers attached to every request. |
timeoutMs / timeoutPolicy | number / struct optional | Connection, request, and streaming read deadlines. |
queryRoute | string optional | Query route. Defaults to /v1/query. |
chatRoute | string optional | Chat route. Defaults to /v1/chat. |
embedRoute | string optional | Embedding route. Defaults to /v1/embed. |
protocolOptions | map optional | Profile-specific options merged into every request body. |
Use a gateway endpoint when a separate service should own model access and operational policy.
Provider Endpoint
A provider endpoint calls a model provider directly. This is intended for trusted server-side code that manages its own credential lifecycle.
| Field | Type | Description |
|---|---|---|
kind | "provider" | Endpoint kind selector. |
provider | "openai" / "anthropic" / "openai_compatible" | Provider adapter. |
model | string | Provider model identifier. |
apiKey | string optional | Provider API key. |
baseUrl | string optional | Override for the provider base URL. |
Use a provider endpoint when server-side code should call a provider API directly without a Sipp gateway.
query() — Generate from a Raw Prompt
query(request: SippQueryRequest) -> SippTextRun
query sends the prompt string to the selected endpoint exactly as supplied.
No chat template is applied.
Use query when the application owns the full prompt shape, including custom
templates, completion-style models, encoder-decoder text models, few-shot
prompts, or agent loops that render prompts themselves.
Request Fields
| Field | Type | Description |
|---|---|---|
endpoint | EndpointRef | Registered endpoint to target. May be omitted only when exactly one local endpoint supports the operation. |
prompt | string | Raw prompt text. |
options | SippTextOptions optional | Shared generation options: maxTokens, temperature, topP, and stop. |
local | LocalTextOptions optional | Local-only options such as contextKey, grammar, jsonSchema, sampling overrides, and media inputs. Rejected by gateway endpoints. |
endpointOptions | map optional | Free-form options forwarded to gateway endpoint implementations. |
providerOptions | map optional | Free-form options forwarded to direct provider adapters. Rejected by gateway endpoints. |
emitTokens | boolean | When true, stream TokenBatch values through the returned run handle. |
Return Value
query returns a SippTextRun.
| Member | Type | Description |
|---|---|---|
response | Promise / Future | Resolves to SippTextResponse when generation completes. |
tokens | Async iterable | Streams TokenBatch values when emitTokens is true. |
cancel(reason) | method | Cancels an in-flight generation. |
SippTextResponse contains the generated text, finishReason, token
usage, and optional localStats for local endpoints.
chat() — Generate from Role Messages
chat(request: SippChatRequest) -> SippTextRun
chat sends ordered role/content messages to the selected endpoint. The
endpoint owns message rendering.
| Endpoint kind | Message handling |
|---|---|
| Local | Renders messages through the GGUF-declared tokenizer.chat_template. Fails if the model has no template. |
| Gateway | Forwards messages to the resolved gateway target. Provider targets handle their own message mapping. |
| Provider | Sends messages using the provider’s native chat-completions format. |
Request Fields
| Field | Type | Description |
|---|---|---|
endpoint | EndpointRef | Registered endpoint to target. |
messages | { role, content }[] | Ordered conversation turns. |
options | SippTextOptions | Same shared generation options as query. |
local | LocalTextOptions | Same local-only options as query. |
emitTokens | boolean | Same streaming control as query. |
Return Value
chat returns the same SippTextRun shape as query.
embed() — Generate an Embedding
embed(request: SippEmbedRequest) -> SippEmbeddingRun
embed produces a single embedding vector from text input. It does not accept
generation options and does not stream tokens.
Request Fields
| Field | Type | Description |
|---|---|---|
endpoint | EndpointRef | Registered endpoint to target. |
input | string | Text to vectorize. |
local | LocalEmbedOptions optional | Local embedding options, including contextKey and normalize. |
endpointOptions | map optional | Free-form options for gateway endpoint implementations. |
providerOptions | map optional | Free-form options for direct provider adapters. |
Return Value
embed returns a SippEmbeddingRun.
| Member | Type | Description |
|---|---|---|
response | Promise / Future | Resolves to SippEmbeddingResponse when encoding completes. |
cancel(reason) | method | Cancels an in-flight embedding. |
SippEmbeddingResponse contains the float values array, optional token
usage, the pooling strategy, and the normalized flag.
Gateway and Client Symmetry
The same SippClient API works on both sides of the gateway boundary.
Server Side
A server process creates a SippClient, registers local endpoints, and maps
HTTP routes to query, chat, or embed.
Server client:
add("local-model", LocalDescriptor { modelPath, config })
-> route handler decodes HTTP request
-> route handler calls client.query/chat/embed
-> route handler encodes HTTP response
The first-party Gateway Server uses this pattern. Application-owned Node, Python, or Rust servers can also use it through the gateway profile helpers.
Client Side
A client process creates a SippClient, registers gateway endpoints, and
calls query, chat, or embed the same way it would call a local endpoint.
Client client:
add("remote", GatewayDescriptor { target, baseUrl, authentication })
-> client.query/chat/embed({ endpoint: ref, ... })
-> request is sent to the gateway over HTTP
Hybrid Pattern
A single client can register multiple endpoint kinds. The application chooses where an operation runs by passing a different endpoint reference.
localRef = client.add("local", LocalDescriptor { ... })
gatewayRef = client.add("gateway", GatewayDescriptor { ... })
client.query({ endpoint: localRef, prompt, ... })
client.query({ endpoint: gatewayRef, prompt, ... })
The operation code stays the same. Only the endpoint reference changes.
Why the Endpoint Model Matters
The endpoint model gives applications one API surface across multiple deployment shapes.
| Benefit | Description |
|---|---|
| Stable operation code | query, chat, and embed are called the same way for local, gateway, provider, and hybrid setups. |
| Swappable execution targets | Move inference between local models, gateway targets, and direct providers by changing endpoint descriptors. |
| Clear ownership boundaries | Local endpoints keep lifecycle in-process; gateway endpoints move access, credentials, policy, and metrics to a service boundary. |
| Language symmetry | Patterns learned in one language package transfer directly to the others. |
| Extensible endpoint kinds | New endpoint kinds can be added without changing the operation call pattern. |
Visual Summary
flowchart LR
%% -------------------------
%% Node Styling
%% -------------------------
classDef client_node fill:#eef6ff,stroke:#4a90e2,stroke-width:1.5px,color:#111,rx:6,ry:6;
classDef setup_node fill:#f7f7f7,stroke:#999,stroke-width:1px,color:#111,rx:6,ry:6;
classDef runtime_node fill:#f3fff0,stroke:#52a852,stroke-width:2px,color:#111,rx:6,ry:6;
classDef gateway_node fill:#fff7e6,stroke:#d99000,stroke-width:2px,color:#111,rx:6,ry:6;
classDef provider_node fill:#f8f0ff,stroke:#8e44ad,stroke-width:1.5px,color:#111,rx:6,ry:6;
%% -------------------------
%% Client Process
%% -------------------------
subgraph CLIENT["Client Process"]
direction TB
CApp["Application Code"]:::client_node
CClient["SippClient<br/>add(...) -> EndpointRef<br/>query / chat / embed"]:::client_node
CApp --> CClient
%% Logical grouping for endpoint registration options
subgraph CSetup["Endpoint Setup (options)"]
direction LR
CLocalEP["local (GGUF)"]:::setup_node
CGatewayEP["gateway (Remote)"]:::setup_node
CProviderEP["provider (API)"]:::setup_node
end
CClient -. "Registers" .-> CSetup
%% Local execution flow for local ref
subgraph CLocalRuntime["Local Runtime"]
direction LR
CLocalRun["GGUF Runtime"]:::runtime_node
end
%% Connection for local usage
CClient -- "Local Ref (query)" --> CLocalRun
end
%% -------------------------
%% Server Process
%% -------------------------
subgraph SERVER["Server Process / Gateway Server"]
direction TB
SGateway["Gateway Server<br/>HTTP: /v1/query, /chat, /embed"]:::gateway_node
SClient["SippClient (same lib)"]:::client_node
SGateway --> SClient
%% Logical grouping for endpoint registration options
subgraph SSetup["Endpoint Setup (options)"]
direction LR
SLocalEP["local (GGUF)"]:::setup_node
SProviderEP["provider (API)"]:::setup_node
end
SClient -. "Registers" .-> SSetup
%% Local execution flow for local ref
subgraph SLocalRuntime["Local Runtime"]
direction LR
SLocalRun["GGUF Runtime"]:::runtime_node
end
%% Connection for local usage
SClient -- "Local Ref (query)" --> SLocalRun
end
%% -------------------------
%% External Providers
%% -------------------------
Providers["Provider APIs<br/>OpenAI / Gemini / Anthropic / etc."]:::provider_node
%% -------------------------
%% Cross-process / Remote connections
%% -------------------------
CClient == "Gateway Ref (query)" ==> SGateway
CClient == "Provider Ref (query)" ==> Providers
SClient == "Provider Ref (query)" ==> Providers
%% -------------------------
%% Styling Assignment to Nodes
%% -------------------------
class CApp,CClient client_node;
class CLocalEP,CGatewayEP,CProviderEP,SLocalEP,SProviderEP setup_node;
class CLocalRun,SLocalRun runtime_node;
class SGateway gateway_node;
class Providers provider_node;
Related Docs
- Using the Core Library — per-language install steps and examples.
- Inference Operations — operation contracts, template behavior, and gateway target mapping.
- Local Inference — model sources, runtime options, threads, and browser execution.
- Gateway and Hybrid Inference — deployment shapes, endpoint model, and authentication patterns.
- Runtime Options — complete option layer map and field reference.