Collab-Plane

WebSocket server for Yjs document sync, whiteboard collaboration, and code snapshots.

Overview

NestJS WebSocket server on port 3001. Holds Yjs CRDT documents in memory (one for code, one for whiteboard) and syncs them to connected clients. Also handles awareness (cursors, selections), editor lock enforcement, and periodic code snapshots.

Frontends connect over WebSocket with a short-lived collab token. The control-plane drives document lifecycle (create/destroy) and permission changes (lock/kick) through an internal HTTP API. When something interesting happens, such as a snapshot being ready or a user dropping off, the collab-plane POSTs back to the control-plane.

Architecture

Communication patterns:

Direction	Protocol	Purpose
Frontend --> Collab-plane	WebSocket	Yjs sync, awareness, room events
Control-plane --> Collab-plane	HTTP (internal)	Document lifecycle, kick, health
Collab-plane --> Control-plane	HTTP (callback)	Snapshot ready, user disconnected

WebSocket Protocol

Connection Lifecycle

Authentication

The control-plane issues a collab token (JWT) when a user joins a room via POST /rooms/:id/join. Payload:

Passed as a query parameter: ws://collab-plane:3001?token=<collabToken>. The POST /rooms/:roomId/join response includes the full collabUrl. Clients should use it as-is.

Validation:

Signature checked against COLLAB_JWT_SECRET

roomId in the token must match the room the client joins

Expired or revoked tokens get close code 4001

Message Framing

Two message types on the wire:

Type	Format	Purpose
Binary	Raw `Uint8Array`	Yjs sync protocol (document updates, state vectors)
Text	JSON string	Control messages (join, awareness, room events, errors)

JSON messages use this envelope:

Yjs Document Sync Protocol

Implements the standard y-protocols sync protocol. All sync messages are binary-encoded.

y-protocols message type constants:

messageYjsSyncStep1 = 0: contains state vector

messageYjsSyncStep2 = 1: contains encoded state updates (diff)

messageYjsUpdate = 2: incremental document update

Binary encoding: every sync message is wrapped in a messageSync outer envelope (byte 0x00). The second byte picks the sub-type. Wire format: [messageType, syncSubType, ...payload].

Outer Type	Inner Sub-Type	Wire Prefix	Name	Direction	Payload	Purpose
`messageSync` (`0x00`)	`0x00`	`[0, 0, ...]`	`SyncStep1`	Both	State vector (`Uint8Array`)	"Here is what I have; tell me what I'm missing"
`messageSync` (`0x00`)	`0x01`	`[0, 1, ...]`	`SyncStep2`	Both	Encoded updates (`Uint8Array`)	"Here are the updates you're missing"
`messageSync` (`0x00`)	`0x02`	`[0, 2, ...]`	`Update`	Both	Incremental update (`Uint8Array`)	Real-time document mutation
`messageAwareness` (`0x01`)	--	`[1, ...]`	`Awareness`	Both	Awareness JSON	Cursor, selection, presence state

The top-level discriminators (messageSync = 0, messageAwareness = 1) are defined by y-websocket, the reference WebSocket provider. The sync sub-types (SyncStep1 = 0, SyncStep2 = 1, Update = 2) are defined by y-protocols/sync.js. y-websocket also defines messageAuth = 2 and messageQueryAwareness = 3, though these are not used here.

Standard y-protocols allows both sides to send SyncStep1 simultaneously. This implementation uses a server-first sequence instead.

Each room has two separate Yjs documents:

Document	Y.js Shared Type	Content
Code editor	`Y.Text`	Source code with language metadata
Whiteboard	`YKeyValue<TLRecord>` (backed by `Y.Array`)	Flat store of all tldraw records (shapes, bindings, assets, pages)

Awareness Protocol

Awareness is ephemeral, non-persisted state: cursors, selections, typing indicators. JSON-encoded and broadcast to all peers in the room.

Awareness messages:

Direction	Type	Purpose
Client --> Server	`awareness`	Client sends own awareness state
Server --> Clients	`awareness`	Server broadcasts all peers' awareness states
Server --> Clients	`awareness-remove`	Peer disconnected, remove their cursor/selection

Updates are throttled to 50ms minimum interval per client. States expire after 30 seconds of inactivity (standard y-protocols behavior). The server cleans up stale entries and broadcasts removals.

Room Events

Broadcast to all connected clients in a room:

Whiteboard (Phase 4)

Covers drawing algorithm diagrams, collaborative annotation, and whiteboard export.

Separate Yjs document from the code editor, so the two never conflict. The frontend uses tldraw for the whiteboard UI. tldraw does not integrate with Yjs natively; it uses its own TLStore internally. A custom sync bridge (a useYjsStore hook) connects tldraw's store to the shared Yjs document, using YKeyValue from y-utility to store TLRecord objects flat in a Y.Array.

Data Model

The whiteboard Yjs document stores all tldraw records (shapes, bindings, assets, pages) in a single flat YKeyValue map:

The sync bridge listens in both directions:

Local to remote: store.listen(callback, { source: 'user', scope: 'document' }) fires on local edits, writes changes into the Yjs doc via yDoc.transact().

Remote to local: yStore.on('change', ...) fires on incoming Yjs updates, applies them via store.mergeRemoteChanges() (which prevents echo loops).

No whiteboard-specific WebSocket messages are needed. The Yjs CRDT handles merge and broadcast through the same sync protocol used for code editing.

Drawing Capabilities

Interviewer and candidate can draw simultaneously: freehand lines, arrows, rectangles, ellipses, text labels, sticky notes, connected flowcharts. Each user gets a color. Observers see everything in real time but can't edit.

Export

During a session, users can export via tldraw's Editor API:

Format	Method	Notes
PNG	`editor.toImage(shapes, { format: 'png' })`	Returns `{ blob: Blob }`. Use `pixelRatio: 2` for high-DPI.
SVG	`editor.toImage(shapes, { format: 'svg' })`	Vector output, suitable for print
JSON	Serialize Yjs doc to JSON	Full fidelity, can be re-imported

When the room transitions to finished, the collab-plane renders the whiteboard to PNG and uploads it to S3. The control-plane stores the S3 key in the session record. That is the canonical whiteboard artifact. Client-side exports during the session are for personal use only.

Code Snapshots

Covers code diffs, pre-submission read-only snapshots, session replay timeline, and contextual replay.

Snapshots capture the Yjs document state at regular intervals, stored as binary state vectors. The control-plane uses them for code diffs, pre-submission review, and session replay.

Snapshot Triggers

Trigger	When	Initiated By
Periodic	Every 30 seconds during `coding` / `wrapup` phases	Collab-plane scheduler
Phase change	On every room phase transition	Control-plane (via phase-change event)
Manual submit	Before code submission for evaluation	Control-plane (updates room state via `POST /internal/documents/:roomId/state` with `editorLocked=true`, triggering a snapshot)
Session end	When room transitions to `finished`	Control-plane (destroyDocument)

Snapshot Flow

Snapshot Storage

Delivered to the control-plane via the CONTROL_INTERNAL.SNAPSHOT_READY callback as a JSON array of bytes, then persisted in the code_snapshots table:

Field	Type	Description
`id`	UUID	Primary key
`roomId`	UUID	FK to rooms
`snapshot`	`bytea`	Binary Yjs state vector
`timestamp`	`timestamptz`	When the snapshot was taken
`trigger`	enum	`periodic`, `phase-change`, `submit`, `session-end`

Diff Computation

Diffs are computed client-side. The frontend applies two snapshots to temporary Yjs documents and compares the Y.Text contents. The collab-plane just delivers raw snapshots.

Snapshot A (t=60s) + Snapshot B (t=120s) --> frontend diff view

Read-Only Snapshot

Before submission, the frontend fetches the latest snapshot and shows it in a read-only editor for confirmation. Only after the user approves does the control-plane forward the code to the execution-plane.

Editor Lock

Covers locking and unlocking code editing permissions.

Enforced at both server and client. When locked, candidates can't edit; their Yjs updates are rejected server-side and the editor UI disables input.

Lock Flow

Host or current interviewer toggles lock via POST /rooms/:roomId/control/lock-editor

Control-plane sends the updated phase and lock state to the collab-plane via POST /internal/documents/:roomId/state (see Internal HTTP API)

Collab-plane sets lock state on the Yjs document metadata

Broadcasts editor-lock event to all connected clients

Server rejects incoming Yjs updates from locked users (not applied, not broadcast)

Client disables editor input

Lock Rules

Role	When Locked	When Unlocked
Host	Can edit	Can edit
Interviewer	Can edit	Can edit
Candidate	Read-only (updates rejected)	Can edit
Spectator	Always read-only	Always read-only

Whiteboard is not affected by editor lock. Both participants can always annotate regardless of code lock state.

The control-plane separately blocks POST /rooms/:roomId/run and POST /rooms/:roomId/submit when locked (returns ROOM_EDITOR_LOCKED 409). The collab-plane enforces the CRDT write lock; the control-plane enforces the execution lock.

Reconnection

Covers reconnection and state restoration after network interruption.

Yjs CRDTs handle reconnection naturally: the sync protocol merges divergent states. The server holds the authoritative document, so reconnecting clients just re-sync.

Reconnection Flow

Offline Edits

Edits made while disconnected stay in the local Yjs document. On reconnect, the sync protocol exchanges missing updates in both directions and the CRDT merge converges without conflicts.

Reconnection Guarantees

Scenario	Behavior
Brief network blip (< collab token lifetime)	Auto-reconnect, full state sync, no data loss
Extended disconnection (collab token expired)	Client must re-join via control-plane to get a fresh collab token
Server restart	Documents lost from memory. On restart, server pulls the latest snapshot from the control-plane to reconstruct; clients re-sync on reconnect. If no snapshot exists, clients provide full state via SyncStep1/SyncStep2.
All clients disconnect	Document stays in memory for a configurable TTL (`DOCUMENT_IDLE_TTL_MS`, default 5 min). Final snapshot sent to control-plane on cleanup.

Internal HTTP API

Only the control-plane calls these. Not exposed to frontends. Secured with the X-Internal-Secret header.

Document Lifecycle

Callback Sequence

Error Handling for Callbacks

If the control-plane is unreachable: retry with exponential backoff (3 attempts, 1s / 2s / 4s), buffer snapshots in memory, and deliver them on the next successful callback. Real-time collaboration is never blocked by callback failures.

Error Handling

WebSocket Close Codes

Code	Meaning	Client Action
`1000`	Normal closure	Session ended normally
`1001`	Going away	Server shutting down, reconnect
`1008`	Policy violation	Invalid message format, do not reconnect
`1011`	Internal error	Server error, reconnect with backoff
`4001`	Unauthorized	Collab token invalid/expired, re-authenticate
`4002`	Kicked	User removed from room, do not reconnect
`4003`	Room closed	Document destroyed, do not reconnect
`4004`	Room not found	Room does not exist or has been destroyed
`4009`	Already connected	Duplicate connection for same user+room
`4029`	Rate limited	Too many messages, back off

WebSocket Error Messages

Sent as JSON right before closing (when applicable):

Internal HTTP Errors

Status	Code	When	Response
`200`		Success (all endpoints except document creation)	Typed response body
`201`		Document created	`POST /internal/documents` returns `201` on success
`400`	`COLLAB_INVALID_REQUEST`	Invalid request body (missing roomId, bad format)	`ErrorResponse`
`404`	`COLLAB_DOCUMENT_NOT_FOUND`	No document exists for this room	`ErrorResponse`
`409`	`COLLAB_DOCUMENT_ALREADY_EXISTS`	Document already exists (create)	`ErrorResponse`
`500`	`COLLAB_INTERNAL_ERROR`	Unexpected internal error	`ErrorResponse`

Overview#

Architecture#

WebSocket Protocol#

Connection Lifecycle#

Authentication#

Message Framing#

Yjs Document Sync Protocol#

Awareness Protocol#

Room Events#

Whiteboard (Phase 4)#

Data Model#

Drawing Capabilities#

Export#

Code Snapshots#

Snapshot Triggers#

Snapshot Flow#

Snapshot Storage#

Diff Computation#

Read-Only Snapshot#

Editor Lock#

Lock Flow#

Lock Rules#

Reconnection#

Reconnection Flow#

Offline Edits#

Reconnection Guarantees#

Internal HTTP API#

Document Lifecycle#

Callback Sequence#

Error Handling for Callbacks#

Error Handling#

WebSocket Close Codes#

WebSocket Error Messages#

Internal HTTP Errors#