roomKit
Apache 2.0 · Alpha · LiveKit-powered

Voice & video rooms with AI agents built in.

One hosted SFU. One frame contract. 16 kHz mono PCM, 640-byte 20 ms frames. Spin up a room, drop in the bundled AI host, plug your own agent in with ten lines — your code never touches WebRTC.

16 kHzmonoInt16 LE20 ms640 B / frame
See the SDK

One frame. Frozen.

Every audio packet on every wire — same shape, every language, every release.

320samples
640bytes
20 msduration
50 frame/srate
1 cell = 5 samples = 10 bytes1 / 320 samples

How the bytes move.

LiveKit handles WebRTC. The gateway translates to plain PCM frames. Your BYO agent never sees an ICE candidate.

WebRTCWebRTCserver bridge640 B PCM framesHumanweb clientDefault AIlivekit-agentsLiveKit SFUWebRTC, hiddenGatewayREST + WS bridgeBYO agent@roomkit/sdk · callplatform

What you get out of the box.

Frozen wire contract

16 kHz mono PCM Int16 LE, 20 ms binary frames + JSON control on the same WebSocket. Mirror once, ship anywhere.

Bundled AI host

Silero VAD · Deepgram STT · GPT-4o-mini · ElevenLabs TTS. Set a systemPrompt, the agent joins and transcribes.

BYO agent SDKs

callplatform (Python) + @roomkit/sdk (Node). Same surface: recv(), send(), events(). Ships a SimulatedRoom for offline tests.

Mixed or per-track audio

Pin the stream to one participant for diarization-aware agents. Add ?stream=per-track&participantId=… to the WS URL.

Multi-tenant, JWT-scoped

API keys bind to tenants. POST /v1/rooms/:id/tokens/sign returns a dual {gatewayToken, livekitToken} pair.

Supervised bridge

Bounded-restart subprocess wrapper. Bridge crashes? Respawn + recoverable error event — the SDK never sees a drop.

Mixed or per-track. Same wire.

Same SDK call. Add ?stream=per-track&participantId=… and you get one clean lane per speaker.

?stream=mixedone downmix lane
One stream. Overlaps collide into a single waveform.
?stream=per-trackone lane per pinned speaker
alice
bob
carol
Three sockets, three clean streams. Diarization-aware agents prefer this.

Ten lines and you’re in a call.

Same primitives in every language. Mock with SimulatedRoom; ship by swapping the URL.

from callplatform import join

async with join(room_id="room-abc", token=TOKEN) as call:
    async for ev in call.events():
        if ev["type"] == "speech.ended":
            audio = await call.recv()        # 16k mono PCM, 640 B / 20 ms
            await call.send(my_llm_and_tts(audio))
terminal
docker-compose -f infra/docker-compose.yml up -d
pnpm install && pnpm --filter @roomkit/shared build
pnpm dev                # gateway :3000 · web :3001

Contribute

roomKit is built in lanes. Pick an open issue, branch from main, ship a focused PR. The wire contract is FROZEN — any change to packages/shared/src/wire.ts requires a coordinated version bump across every SDK.