WebSocket server for Yjs document sync, whiteboard collaboration, and code snapshots.
Overview#
NestJS WebSocket server on port 3001. Holds Yjs CRDT documents in memory (one for code, one for whiteboard) and syncs them to connected clients. Also handles awareness (cursors, selections), editor lock enforcement, and periodic code snapshots.Frontends connect over WebSocket with a short-lived collab token. The control-plane drives document lifecycle (create/destroy) and permission changes (lock/kick) through an internal HTTP API. When something interesting happens, such as a snapshot being ready or a user dropping off, the collab-plane POSTs back to the control-plane.Architecture#
| Direction | Protocol | Purpose |
|---|
| Frontend --> Collab-plane | WebSocket | Yjs sync, awareness, room events |
| Control-plane --> Collab-plane | HTTP (internal) | Document lifecycle, kick, health |
| Collab-plane --> Control-plane | HTTP (callback) | Snapshot ready, user disconnected |
WebSocket Protocol#
Connection Lifecycle#
Authentication#
The control-plane issues a collab token (JWT) when a user joins a room via POST /rooms/:id/join. Payload:Passed as a query parameter: ws://collab-plane:3001?token=<collabToken>. The POST /rooms/:roomId/join response includes the full collabUrl. Clients should use it as-is.Signature checked against JWT_SECRET
roomId in the token must match the room the client joins
Expired or revoked tokens get close code 4001
Message Framing#
Two message types on the wire:| Type | Format | Purpose |
|---|
| Binary | Raw Uint8Array | Yjs sync protocol (document updates, state vectors) |
| Text | JSON string | Control messages (join, awareness, room events, errors) |
JSON messages use this envelope:Yjs Document Sync Protocol#
Implements the standard y-protocols sync protocol. All sync messages are binary-encoded.y-protocols message type constants:messageYjsSyncStep1 = 0: contains state vector
messageYjsSyncStep2 = 1: contains encoded state updates (diff)
messageYjsUpdate = 2: incremental document update
Binary encoding: every sync message is wrapped in a messageSync outer envelope (byte 0x00). The second byte picks the sub-type. Wire format: [messageType, syncSubType, ...payload].| Outer Type | Inner Sub-Type | Wire Prefix | Name | Direction | Payload | Purpose |
|---|
messageSync (0x00) | 0x00 | [0, 0, ...] | SyncStep1 | Both | State vector (Uint8Array) | "Here is what I have; tell me what I'm missing" |
messageSync (0x00) | 0x01 | [0, 1, ...] | SyncStep2 | Both | Encoded updates (Uint8Array) | "Here are the updates you're missing" |
messageSync (0x00) | 0x02 | [0, 2, ...] | Update | Both | Incremental update (Uint8Array) | Real-time document mutation |
messageAwareness (0x01) | -- | [1, ...] | Awareness | Both | Awareness JSON | Cursor, selection, presence state |
The top-level discriminators (messageSync = 0, messageAwareness = 1) are defined by y-websocket, the reference WebSocket provider. The sync sub-types (SyncStep1 = 0, SyncStep2 = 1, Update = 2) are defined by y-protocols/sync.js. y-websocket also defines messageAuth = 2 and messageQueryAwareness = 3, though these are not used here.Standard y-protocols allows both sides to send SyncStep1 simultaneously. This implementation uses a server-first sequence instead.Each room has two separate Yjs documents:| Document | Y.js Shared Type | Content |
|---|
| Code editor | Y.Text | Source code with language metadata |
| Whiteboard | YKeyValue<TLRecord> (backed by Y.Array) | Flat store of all tldraw records (shapes, bindings, assets, pages) |
Awareness Protocol#
Awareness is ephemeral, non-persisted state: cursors, selections, typing indicators. JSON-encoded and broadcast to all peers in the room.| Direction | Type | Purpose |
|---|
| Client --> Server | awareness | Client sends own awareness state |
| Server --> Clients | awareness | Server broadcasts all peers' awareness states |
| Server --> Clients | awareness-remove | Peer disconnected, remove their cursor/selection |
Updates are throttled to 50ms minimum interval per client. States expire after 30 seconds of inactivity (standard y-protocols behavior). The server cleans up stale entries and broadcasts removals.Room Events#
Broadcast to all connected clients in a room:Whiteboard (Phase 4)#
Covers drawing algorithm diagrams, collaborative annotation, and whiteboard export.Separate Yjs document from the code editor, so the two never conflict. The frontend uses tldraw for the whiteboard UI. tldraw does not integrate with Yjs natively; it uses its own TLStore internally. A custom sync bridge (a useYjsStore hook) connects tldraw's store to the shared Yjs document, using YKeyValue from y-utility to store TLRecord objects flat in a Y.Array.Data Model#
The whiteboard Yjs document stores all tldraw records (shapes, bindings, assets, pages) in a single flat YKeyValue map:The sync bridge listens in both directions:Local to remote: store.listen(callback, { source: 'user', scope: 'document' }) fires on local edits, writes changes into the Yjs doc via yDoc.transact().
Remote to local: yStore.on('change', ...) fires on incoming Yjs updates, applies them via store.mergeRemoteChanges() (which prevents echo loops).
No whiteboard-specific WebSocket messages are needed. The Yjs CRDT handles merge and broadcast through the same sync protocol used for code editing.Drawing Capabilities#
Interviewer and candidate can draw simultaneously: freehand lines, arrows, rectangles, ellipses, text labels, sticky notes, connected flowcharts. Each user gets a color. Spectators see everything in real time but can't edit.Export#
During a session, users can export via tldraw's Editor API:| Format | Method | Notes |
|---|
| PNG | editor.toImage(shapes, { format: 'png' }) | Returns { blob: Blob }. Use pixelRatio: 2 for high-DPI. |
| SVG | editor.toImage(shapes, { format: 'svg' }) | Vector output, suitable for print |
| JSON | Serialize Yjs doc to JSON | Full fidelity, can be re-imported |
When the room transitions to finished, the collab-plane renders the whiteboard to PNG and uploads it to S3. The control-plane stores the S3 key in the session record. That is the canonical whiteboard artifact. Client-side exports during the session are for personal use only.Code Snapshots#
Covers code diffs, pre-submission read-only snapshots, session replay timeline, and contextual replay.Snapshots capture the Yjs document state at regular intervals, stored as binary state vectors. The control-plane uses them for code diffs, pre-submission review, and session replay.Snapshot Triggers#
| Trigger | When | Initiated By |
|---|
| Periodic | Every 30 seconds during coding / wrapup phases | Collab-plane scheduler |
| Phase change | On every room phase transition | Control-plane (via phase-change event) |
| Manual submit | Before code submission for evaluation | Control-plane (locks the editor via POST /internal/documents/:roomId/lock, triggering a snapshot) |
| Session end | When room transitions to finished | Control-plane (destroyDocument) |
Snapshot Flow#
Snapshot Storage#
Delivered to the control-plane via the CONTROL_INTERNAL.SNAPSHOT_READY callback as a JSON array of bytes, then persisted in the code_snapshots table:| Field | Type | Description |
|---|
id | UUID | Primary key |
roomId | UUID | FK to rooms |
snapshot | bytea | Binary Yjs state vector |
timestamp | timestamptz | When the snapshot was taken |
trigger | enum | periodic, phase-change, submit, session-end |
Diff Computation#
Diffs are computed client-side. The frontend applies two snapshots to temporary Yjs documents and compares the Y.Text contents. The collab-plane just delivers raw snapshots.Snapshot A (t=60s) + Snapshot B (t=120s) --> frontend diff view
Read-Only Snapshot#
Before submission, the frontend fetches the latest snapshot and shows it in a read-only editor for confirmation. Only after the user approves does the control-plane forward the code to the execution-plane.Editor Lock#
Covers locking and unlocking code editing permissions.Enforced at both server and client. When locked, candidates can't edit; their Yjs updates are rejected server-side and the editor UI disables input.Lock Flow#
1.
Host or interviewer toggles lock via POST /rooms/:roomId/control/lock-editor
2.
Control-plane sends lock state to the collab-plane via POST /internal/documents/:roomId/lock (see Internal HTTP API) 3.
Collab-plane sets lock state on the Yjs document metadata
4.
Broadcasts editor-lock event to all connected clients
5.
Server rejects incoming Yjs updates from locked users (not applied, not broadcast)
6.
Client disables editor input
Lock Rules#
| Role | When Locked | When Unlocked |
|---|
| Host | Can edit | Can edit |
| Interviewer | Can edit | Can edit |
| Candidate | Read-only (updates rejected) | Can edit |
| Spectator | Always read-only | Always read-only |
Whiteboard is not affected by editor lock. Both participants can always annotate regardless of code lock state.The control-plane separately blocks POST /rooms/:roomId/run and POST /rooms/:roomId/submit when locked (returns ROOM_EDITOR_LOCKED 409). The collab-plane enforces the CRDT write lock; the control-plane enforces the execution lock.
Reconnection#
Covers reconnection and state restoration after network interruption.Yjs CRDTs handle reconnection naturally: the sync protocol merges divergent states. The server holds the authoritative document, so reconnecting clients just re-sync.Reconnection Flow#
Offline Edits#
Edits made while disconnected stay in the local Yjs document. On reconnect, the sync protocol exchanges missing updates in both directions and the CRDT merge converges without conflicts.Reconnection Guarantees#
| Scenario | Behavior |
|---|
| Brief network blip (< collab token lifetime) | Auto-reconnect, full state sync, no data loss |
| Extended disconnection (collab token expired) | Client must re-join via control-plane to get a fresh collab token |
| Server restart | Documents lost from memory. On restart, server pulls the latest snapshot from the control-plane to reconstruct; clients re-sync on reconnect. If no snapshot exists, clients provide full state via SyncStep1/SyncStep2. |
| All clients disconnect | Document stays in memory for a configurable TTL (DOCUMENT_IDLE_TTL_MS, default 5 min). Final snapshot sent to control-plane on cleanup. |
Internal HTTP API#
Only the control-plane calls these. Not exposed to frontends. Secured with the X-Internal-Secret header.Document Lifecycle#
Callback Sequence#
Error Handling for Callbacks#
If the control-plane is unreachable: retry with exponential backoff (3 attempts, 1s / 2s / 4s), buffer snapshots in memory, and deliver them on the next successful callback. Real-time collaboration is never blocked by callback failures.Error Handling#
WebSocket Close Codes#
| Code | Meaning | Client Action |
|---|
1000 | Normal closure | Session ended normally |
1001 | Going away | Server shutting down, reconnect |
1008 | Policy violation | Invalid message format, do not reconnect |
1011 | Internal error | Server error, reconnect with backoff |
4001 | Unauthorized | Collab token invalid/expired, re-authenticate |
4002 | Kicked | User removed from room, do not reconnect |
4003 | Room closed | Document destroyed, do not reconnect |
4004 | Room not found | Room does not exist or has been destroyed |
4009 | Already connected | Duplicate connection for same user+room |
4029 | Rate limited | Too many messages, back off |
WebSocket Error Messages#
Sent as JSON right before closing (when applicable):Internal HTTP Errors#
| Status | Code | When | Response |
|---|
200 | | Success (all endpoints except document creation) | Typed response body |
201 | | Document created | POST /internal/documents returns 201 on success |
400 | COLLAB_INVALID_REQUEST | Invalid request body (missing roomId, bad format) | ErrorResponse |
404 | COLLAB_DOCUMENT_NOT_FOUND | No document exists for this room | ErrorResponse |
409 | COLLAB_DOCUMENT_ALREADY_EXISTS | Document already exists (create) | ErrorResponse |
500 | COLLAB_INTERNAL_ERROR | Unexpected internal error | ErrorResponse |
Modified at 2026-03-12 05:29:50