Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Node.js Package

The Node.js package target is @sipp/sipp-server. It exposes the native Sipp client API to Node server processes, route handlers, and framework server functions. Applications own framework routes, request validation, auth, and deployment policy.

See the Library API Overview for the shared add, query, chat, and embed contracts.

Install

npm install @sipp/sipp-server

Use this package only in Node runtime code. Browser components should use @sipp/sipp.

@sipp/sipp-server is a wrapper package. npm installs the matching optional platform package for the current OS and CPU, and the runtime loader selects the best packaged backend for that host.

Use It For

  • Server-side local GGUF inference.
  • Gateway-backed and provider-backed inference from server code.
  • Token streaming from Node processes.
  • Framework route handlers in Node runtimes.
  • Backend selection for native bindings.

Local GGUF Query

import { SippClient } from '@sipp/sipp-server';

const client = new SippClient();
const endpoint = await client.add('default', {
  kind: 'local',
  modelPath: process.argv[2],
  config: {
    context: { n_ctx: 2048 },
    scheduler: { continuous_batching: true, prefill_chunk_size: 0 },
    cache: { mode: 'live_slot_prefix' },
    observability: { runtime_metrics: true },
  },
});
const queryPrompt = [
  '<|system|>',
  'Answer concisely.',
  '<|user|>',
  'Explain Sipp in one sentence.',
  '<|assistant|>',
].join('\n');

const run = client.query({
  endpoint,
  // query: raw prompt; replace markers with the target model's template.
  prompt: queryPrompt,
  emitTokens: true,
  options: { maxTokens: 64, temperature: 0.7 },
  local: { contextKey: 'node-local' },
});

let streamed = '';
for await (const batch of run) {
  streamed += batch.text;
}
const response = await run.response;
console.log(streamed || response.text);

Set SIPP_NODE_BACKEND=cpu|vulkan|cuda|metal to choose a native backend. By default, macOS tries metal then cpu; Windows and Linux try cuda, vulkan, then cpu. See Runtime Options for local runtime config groups and request option boundaries.

On Intel Macs with integrated GPUs, prefer SIPP_NODE_BACKEND=cpu. The Metal backend is intended for Apple Silicon and tested AMD Mac GPUs. Apple Silicon can run x64 Node through Rosetta 2, but x64 packages are used only by an x64 Node process; native arm64 Node should use arm64 packages.

Gateway Chat

function requiredEnv(name: string): string {
  const value = process.env[name];
  if (value == null || value === '') {
    throw new Error(`${name} is required`);
  }
  return value;
}

const endpoint = await client.add('gateway', {
  kind: 'gateway',
  target: requiredEnv('SIPP_GATEWAY_TARGET'),
  baseUrl: requiredEnv('SIPP_GATEWAY_URL'),
  authentication: {
    kind: 'bearer',
    value: requiredEnv('SIPP_GATEWAY_TOKEN'),
  },
});
const messages = [
  { role: 'system', content: 'Answer concisely.' },
  { role: 'user', content: 'Explain gateway inference.' },
];
const run = client.chat({
  endpoint,
  messages,
  options: { maxTokens: 64 },
});
console.log((await run.response).text);

The application only needs the gateway URL, bearer token, and public target. Provider credentials and local model paths stay in the gateway process.

Direct Provider Chat

Use direct provider endpoints only in trusted server code. Keep the provider key in the server environment; OPENAI_API_KEY="<mock-openai-key>" is only a placeholder value in examples.

function requiredEnv(name: string): string {
  const value = process.env[name];
  if (value == null || value === '') {
    throw new Error(`${name} is required`);
  }
  return value;
}

const endpoint = await client.add('provider', {
  kind: 'provider',
  provider: 'openai',
  model: process.env.OPENAI_MODEL ?? 'gpt-5-mini',
  apiKey: requiredEnv('OPENAI_API_KEY'),
});
const messages = [
  { role: 'system', content: 'Answer concisely.' },
  { role: 'user', content: 'Explain provider inference.' },
];
const run = client.chat({
  endpoint,
  messages,
  options: { maxTokens: 64 },
});
console.log((await run.response).text);

Pass provider-only request fields through providerOptions. See Providers for the full provider/gateway split.

Gateway Profile Helpers

Use the gateway profile helpers when a Node route should behave like a first-party gateway endpoint for browser kind: 'gateway' clients. The helpers decode model, prompt, messages, input, and snake_case generation options, then format JSON or SSE responses. The route can execute the decoded request against a provider, a local endpoint, or a separate gateway.

import {
  SippClient,
  decodeGatewayQueryBody,
  gatewayErrorResponse,
  gatewayTextResponseBody,
  gatewayTextStreamResponse,
} from '@sipp/sipp-server';

function requiredEnv(name: string): string {
  const value = process.env[name];
  if (value == null || value === '') {
    throw new Error(`${name} is required`);
  }
  return value;
}

export async function handleQuery(request: Request): Promise<Response> {
  try {
    const decoded = decodeGatewayQueryBody(await request.json());
    const client = new SippClient();
    const endpoint = await client.add('provider', {
      kind: 'provider',
      provider: 'openai',
      model: decoded.target,
      apiKey: requiredEnv('OPENAI_API_KEY'),
    });
    const run = client.query({ ...decoded.request, endpoint });
    return decoded.stream
      ? gatewayTextStreamResponse(run)
      : Response.json(
          gatewayTextResponseBody(decoded.target, await run.response),
        );
  } catch (error) {
    const response = gatewayErrorResponse(error);
    return Response.json(response.body, response.init);
  }
}

Use decodeGatewayChatBody() and decodeGatewayEmbedBody() for /v1/chat and /v1/embed compatible routes. Use gatewayEmbeddingResponseBody() for finite embedding responses.

Framework Routes

Use @sipp/sipp-server in server-only code such as Next.js App Router route handlers with runtime = 'nodejs', TanStack Start server functions, Express routes, or background workers. Do not import it from browser bundles.