1. Syncode
Syncode
  • Syncode
    • Conventions
    • Response Schemas
    • Error Taxonomy
    • Resource Model
    • Permission Model
    • Cross-Cutting Concerns
    • Security
    • Collab-Plane
    • Execution-Plane
    • AI-Plane
  • SynCode Control Plane API
    • Auth
      • Create a new account
      • Authenticate and get tokens
      • Refresh access token
      • Invalidate refresh token
      • Change current user's password
      • Request password reset email
      • Reset password with token
    • Users
      • Get current user profile
      • Update current user profile
      • Soft-delete account
      • Get public profile of another user
      • Upload avatar (presigned URL)
      • Get usage quotas and limits
      • Get current active room (for reconnection)
      • Get time-series training statistics
    • Rooms
      • Participants
        • List all participants in a room
        • Update participant (role, mute)
        • Kick a participant from the room
      • Control
        • Advance room phase
        • Select or change the problem
        • Update room settings
        • Lock code editor, run, and submit
        • Unlock code editor, run, and submit
        • Pause the coding timer
        • Resume the coding timer
        • Request a role swap (peer mode)
        • Accept or decline a role swap request
      • Media
        • Generate LiveKit access token
        • Record participant's recording consent
        • Start session recording
        • Stop session recording
      • AI
        • Send a message to AI interviewer
        • Poll AI message result
        • Get AI conversation history
        • Request a targeted hint
        • Get hint result
        • Request code review
        • Get review result
        • Get cross-session weakness tracking
      • StaticAnalysis
        • Request static analysis
        • Get analysis result
      • Feedback
        • Submit peer evaluation
        • Get all feedback for this room
        • Get my submitted feedback
      • Create a new room
      • List rooms for current user
      • Get room details
      • Destroy a room (host only)
      • Join a room via room code
      • Leave a room
      • Lookup room by invite code
      • Execute code (interactive run)
      • Submit code against test cases
      • List past runs in this room
      • List past submissions in this room
    • Problems
      • List and search problems
      • Create a problem (admin)
      • List all available tags
      • Get problem details
      • Update a problem (admin)
      • Delete a problem (admin)
    • Bookmarks
      • List bookmarked problems
      • Bookmark a problem
      • Remove bookmark
    • Execution
      • Get execution result (single run)
      • Get submission status and aggregated results
    • Sessions
      • List my session history
      • Get session details
      • Soft-delete a session
      • Get training report
      • Get session event timeline
      • Get code snapshots
      • Get recording download URL
      • Get peer feedback for this session
      • Get whiteboard export
      • Get AI conversation history
      • Compare multiple session reports
    • Matchmaking
      • Enter the matchmaking queue
      • Cancel matchmaking
      • Get current match status
      • Accept a proposed match
      • Decline a proposed match
    • Admin
      • System overview stats
      • List all users
      • Get user details (admin view)
      • Update user (ban, role change)
      • List all rooms
      • Force-close a room
      • Query audit logs
    • Health
      • Deep health check
    • Schemas
      • RoomStatus
      • CreateDocumentRequest
      • RoomRole
      • CreateDocumentResponse
      • RoomMode
      • DestroyDocumentResponse
      • SupportedLanguage
      • KickUserRequest
      • Difficulty
      • KickUserResponse
      • UserRole
      • LockEditorRequest
      • ErrorResponse
      • LockEditorResponse
      • Pagination
      • SnapshotReadyPayload
      • UserProfile
      • UserDisconnectedPayload
      • PublicProfile
      • CallbackAckResponse
      • RoomConfig
      • RoomParticipantSummary
      • RoomSummary
      • RoomDetail
      • RoomPreview
      • ProblemSummary
      • ProblemDetail
      • ProblemExample
      • TestCase
      • TagInfo
      • AiMessage
      • WeaknessEntry
      • PeerFeedbackRatings
      • PeerFeedbackEntry
      • SessionSummary
      • SessionDetail
      • SessionParticipant
      • SessionEvent
      • CodeSnapshot
      • Evidence
      • ReportDimension
      • AdminDashboard
      • AdminUserEntry
      • AdminUserDetail
      • AdminRoomEntry
      • AuditLogEntry
      • HealthResponse
      • MatchOpponent
  • SynCode Collab Plane API
    • Documents
      • Create a Yjs document
      • Destroy a Yjs document
      • Kick a user from the document
      • Toggle editor lock
    • Health
      • Health check
    • Callbacks
      • [Callback] Snapshot ready
      • [Callback] User disconnected
    • Schemas
      • CreateDocumentRequest
      • CreateDocumentResponse
      • DestroyDocumentResponse
      • KickUserRequest
      • KickUserResponse
      • SnapshotReadyPayload
      • LockEditorRequest
      • UserDisconnectedPayload
      • LockEditorResponse
      • CallbackAckResponse
      • ErrorResponse
  1. Syncode

Execution-Plane

Queue-driven sandbox worker for code execution, test-case evaluation, and static analysis.

Overview#

The execution-plane is a standalone NestJS application context with no HTTP server. It pulls jobs off BullMQ queues, runs user code in an isolated sandbox, and pushes results back to a result queue. The control-plane caches those results in Redis (24h TTL) and serves them to the frontend via polling endpoints.
It processes up to 5 concurrent executions per worker instance. Sandbox backends are swappable through the ISandboxProvider port, and the whole thing is wired into OpenTelemetry for traces, metrics, and logs.

Architecture#

A circuit breaker on the control-plane side guards against Redis connectivity issues.

Queue Contracts#

Run Code#

Queue names:
QueueDirectionPurpose
execution.run-codeControl-plane --> Execution-planeJob requests
execution.run-code.resultsExecution-plane --> Control-planeJob results
Sequence:
Job payload (existing contract):
Result payload (existing contract):

Submit Code#

The control-plane owns the fan-out. For N test cases, it enqueues N RunCodeRequest jobs on the same execution.run-code queue, each with the test case's stdin injected. The execution-plane has no concept of submissions or test cases; it just runs code. Aggregation and verdict computation happen in the control-plane.
Queue names: Same as Run Code (execution.run-code / execution.run-code.results). No new queues needed.
Sequence:
Submission result:

Static Analysis#

Same sandbox infrastructure, but runs linters instead of user code. Covers lint issues, cyclomatic/cognitive complexity, and duplication detection.
Queue names:
QueueDirectionPurpose
execution.analyzeControl-plane --> Execution-planeAnalysis job requests
execution.analyze.resultsExecution-plane --> Control-planeAnalysis results
Sequence:
Job schemas:
AnalyzeCodeResult is the internal queue payload. The HTTP response at GET /rooms/:roomId/analyze/:jobId follows the same polymorphic pattern as execution results: { status: 'queued' | 'running' } while pending, full result fields when done. The control-plane reads the queue payload's status to decide which shape to return.
Linter selection by language:
LanguageLinterComplexity Tool
Pythonruffradon
JavaScript / TypeScriptbiome (lint mode)escomplex
JavacheckstylePMD
C / C++cppchecklizard
Gogolangci-lintgocyclo
Rustclippy(built-in)

Job Schemas#

All job and result types across queues, in one place.

Run Code and Execution Client#

Static Analysis and Submission Aggregation#

Sandbox Providers#

Code execution is delegated to whatever ISandboxProvider is wired up in the infrastructure module.
Port interface:
Internal execution types (processor-to-sandbox, separate from the contract types):

Available Implementations#

| Provider | Description |
|---|---|---|
| E2bSandboxAdapter | E2B Code Interpreter cloud sandboxes. One sandbox per execution, killed on completion. |
| DockerSandboxAdapter | Local Docker containers with per-language images. Needed for languages E2B does not support (C, Go, Rust). |
| KataSandboxAdapter | Kata Containers for stronger isolation in production. |
The active provider is selected in the infrastructure module via the ISandboxProvider DI binding.
We can replace all of them with the KataSandboxAdapter later when we implement it??

Supported Languages#

LanguageIdentifierE2B SupportDocker (planned)Kata (planned)
PythonpythonYesPlannedPlanned
JavaScriptjavascriptYesPlannedPlanned
TypeScripttypescriptYesPlannedPlanned
JavajavaYesPlannedPlanned
C++cppYesPlannedPlanned
CcNoPlannedPlanned
GogoNoPlannedPlanned
RustrustNoPlannedPlanned
The canonical list is defined as SUPPORTED_LANGUAGES in the shared package. E2B currently covers Python, JavaScript, TypeScript, Java, and C++. Jobs for unsupported languages fail permanently (no retry).

Resource Limits and Timeouts#

Per-Execution Limits#

ResourceDefaultMaximumEnforcement
Wall-clock timeout30 seconds5 minutes (300,000 ms)Sandbox-level kill
Memory128 MB1,024 MBSandbox-level OOM kill
Output sizeUnlimitedSandbox-dependentoutputTruncated flag set when exceeded
CPU timeNo limitBounded by wall-clock timeoutReported via cpuTimeMs
Two layers of limits are in play. The control-plane caps requests at the HTTP boundary (default timeout 5s, max 30s; default memory 256 MB, max 512 MB). The execution-plane accepts wider ranges (max timeout 300s, max memory 1024 MB) as a safety net. The control-plane limits are what users actually hit.

Queue-Level Configuration#

SettingValuePurpose
Concurrency5Max simultaneous executions per worker instance
Lock duration330,000 ms (max timeout 300,000 ms + 30,000 ms safety margin)Must exceed max timeout to prevent stalled-job false positives
Stalled interval5 sec (default)How often BullMQ checks for stalled jobs
Shutdown timeout30 sec (default)Graceful shutdown wait for in-flight jobs

Rate Limits (enforced by control-plane)#

ScopeLimitWindow
Code execution (run)10 requests1 minute per user
Code submission (submit)3 requests1 minute per user
Violations return 429 Too Many Requests with a Retry-After header.

Result Caching#

Results are cached in Redis by the control-plane so the frontend can poll for them.
ParameterValue
Cache key formatexec-result:{jobId}
TTL24 hours
StorageRedis (via ICacheService / RedisCacheAdapter)
WriteControl-plane result listener, on receiving a result from the result queue
ReadGET /execution/:jobId endpoint
Polling behavior:
1.
GET /execution/:jobId checks the cache first.
2.
Cache miss: falls back to IExecutionClient.getJobStatus() to return the queue status (queued | running).
3.
Job not found at all: 404 Not Found (EXECUTION_JOB_NOT_FOUND).
Polymorphic response:
Pending: { status: 'queued' } or { status: 'running' }
Done: Full RunCodeResult with status: 'completed' | 'failed'

Error Handling#

Validation Errors (permanent failures)#

The processor validates each job before touching the sandbox. Validation failures are permanent (no retry), and a failed result is published immediately.
ConditionError Message
Unsupported language (not in SUPPORTED_LANGUAGES)Unsupported language: {language}
Sandbox does not support languageSandbox does not support language: {language}
Empty codeCode cannot be empty

Sandbox Errors#

Most errors are caught inside the E2B adapter and returned as status: 'failed'. These are permanent failures with no retry. This covers bad user code, runtime exceptions, and runCode failures.
The one exception: sandbox creation failures (Sandbox.create()). These propagate to BullMQ and trigger retries (3 attempts, exponential backoff, 1s base delay). Retry policy is set by the control-plane at enqueue time, not by the processor. These failures are transient (E2B API outage, network timeout, etc.).

Timeout Handling#

When execution exceeds timeoutMs:
1.
The sandbox kills the process.
2.
Result comes back as status: 'failed'.
3.
timedOut is set by heuristic: durationMs >= timeoutMs. Works when the sandbox throws on timeout, but can be false if the process is killed silently.
4.
durationMs reflects actual elapsed time.
With E2B, timedOut: true is best-effort because it relies on runCode throwing and elapsed time meeting timeoutMs. If the sandbox kills the process without throwing (just returns empty output), timedOut stays false. Docker and Kata providers should have more reliable timeout signaling.

Circuit Breaker (control-plane side)#

IExecutionClient is wrapped with a circuit breaker proxy. When the circuit opens:
POST /rooms/:id/run: 503 (error propagates directly from runCode).
POST /rooms/:id/submit: per-test-case errors in the response array (Promise.all with .catch() per job, never 503).

Observability#

Telemetry Stack#

SignalExporterDestination
TracesOTLP/HTTPTempo (via OTEL_EXPORTER_OTLP_ENDPOINT)
MetricsOTLP/HTTP (60s interval)Prometheus (via OTLP receiver)
Logspino-opentelemetry-transportLoki (via OTLP receiver)

Key Metrics#

MetricTypeLabelsPurpose
Queue depth (waiting + active)GaugequeueCapacity monitoring
Job processing durationHistogramqueue, language, statusLatency tracking
Job failure rateCounterqueue, language, error_typeReliability monitoring
Language distributionCounterlanguageUsage analytics
Sandbox creation latencyHistogramproviderProvider health
Active sandboxesGaugeproviderConcurrency monitoring

Environment Variables#

VariableRequiredDefaultDescription
REDIS_URLNoredis://localhost:6379Redis connection for BullMQ
E2B_API_KEYYes--E2B Code Interpreter API key
OTEL_EXPORTER_OTLP_ENDPOINTNo--OTLP endpoint (disables telemetry if unset)
NODE_ENVNodevelopmentEnvironment mode
Modified at 2026-03-12 05:26:10
Previous
Collab-Plane
Next
AI-Plane
Built with