AI-powered interview assistant: generates hints, performs code reviews, and conducts interactive AI interviews.
Overview#
The AI-plane is a standalone NestJS application context (NestFactory.createApplicationContext()), no HTTP server. It pulls jobs from BullMQ queues, calls LLM providers, and pushes results back through paired result queues. The control-plane is the only thing that talks to it, and only via Redis.The AI-plane handles three job types: interview responses (conversation + follow-ups), hint generation, and code review.Architecture#
Queue Contracts#
AI Interview Response#
Covers AI interviewer conversation and code-aware follow-up questions.| Property | Value |
|---|
| Job queue | ai.interview-response |
| Result queue | ai.interview-response.results |
| Concurrency | 3 |
| Job kind tag | ai:interview |
HTTP request body#
What the frontend sends to POST /rooms/:roomId/ai/message:BullMQ job payload#
What the control-plane enqueues after enrichment:Result payload#
weaknessSignals is internal to the control-plane's weakness aggregation pipeline; it is not returned in HTTP polling responses.
Sequence diagram#
Generate Hint#
Covers layered hint generation for candidates.| Property | Value |
|---|
| Job queue | ai.generate-hint |
| Result queue | ai.generate-hint.results |
| Concurrency | 5 |
| Job kind tag | ai:hint |
HTTP request body#
What the frontend sends to POST /rooms/:roomId/ai/hint:BullMQ job payload#
What the control-plane enqueues after enrichment:Result payload#
Sequence diagram#
Code Review#
Covers structured evaluation reports, improvement suggestions, and evidence-based scoring.| Property | Value |
|---|
| Job queue | ai.review-code |
| Result queue | ai.review-code.results |
| Concurrency | 3 |
| Job kind tag | ai:review |
HTTP request body#
What the frontend sends to POST /rooms/:roomId/ai/review:BullMQ job payload#
What the control-plane enqueues after enrichment:Result payload#
Typed categories, line-level suggestions, and evidence-based scoring:weaknessSignals, same as interview response, is internal only, stripped from HTTP polling responses.
Sequence diagram#
Conversation History#
Served by the control-plane directly from PostgreSQL. No AI-plane involvement.| Property | Value |
|---|
| Endpoint | GET /rooms/:roomId/ai/messages |
| Auth | Bearer (requires code:view room capability) |
| Pagination | Cursor-based (cursor, limit default 50) |
Individual messages come through the interview response queue. The control-plane persists them and serves history from the DB.
AI Capabilities by Room Mode#
| Capability | AI-mode rooms | Peer-mode rooms |
|---|
| AI interview conversation | Yes | No |
| AI follow-up questions | Yes | No |
| Adaptive difficulty | Yes | No |
| TTS voice output | Yes | No |
| STT voice input | Yes | No |
| Hint generation | Yes | Yes |
| Code review | Yes | Yes (post-session) |
| Weakness signal emission | Yes | Yes (from reviews) |
In AI-mode, the AI runs the interview. In peer-mode, hints and reviews are supplementary tools for the human participants.Adaptive Difficulty#
Question difficulty adjusts per-session based on candidate performance. Each interview response job analyzes conversation history and code quality to decide the next difficulty tier.Difficulty signals (inputs):| Signal | Source | Indicates |
|---|
| Code correctness | Test case pass rate from execution results | Solution quality |
| Response time | Timestamps between messages | Comfort with topic |
| Hint usage | Count of hint requests in session | Struggle level |
| Conversation depth | Number of follow-ups without resolution | Difficulty |
| Code complexity | Cyclomatic complexity, nesting depth | Solution sophistication |
Difficulty adjustments (outputs):| Current performance | Next question difficulty | Follow-up type |
|---|
| Solving quickly, no hints | hard: deeper algorithmic questions | question |
| Moderate pace, few hints | medium: standard follow-ups | question or evaluation |
| Struggling, multiple hints | easy: simpler sub-problems | hint or encouragement |
The difficulty field in InterviewResponseResult tells the frontend which tier the AI is targeting, so the UI can show progress indicators.Voice: TTS and STT#
In AI-mode rooms, the AI interviewer communicates by voice. This involves two directions: text-to-speech (TTS) for the AI's spoken output, and speech-to-text (STT) for transcribing the candidate's voice input.TTS (AI speaks)#
After the LLM generates a text response, the AI-plane sends it to a TTS provider, uploads the resulting audio to SeaweedFS with a presigned URL (1-hour expiry), and returns that URL as audioUrl in the job result. The frontend fetches and plays it.The TTS provider is abstracted behind an ITtsProvider interface so implementations can be swapped without changing job processing logic.| Var | Description |
|---|
TTS_PROVIDER | openai / google / azure / none |
TTS_VOICE | Voice ID (provider-specific) |
TTS_AUDIO_FORMAT | mp3 / ogg (default: mp3) |
When TTS_PROVIDER=none or unset, audio generation is skipped and audioUrl is omitted from the result.STT (Candidate speaks)#
The candidate speaks into their microphone. The browser captures audio via the Web Audio API / MediaRecorder, encodes it (Opus in WebM or raw PCM), and sends chunks to the control-plane. The control-plane forwards the audio to the AI-plane for transcription, and the resulting text is injected into the conversation as a user message before triggering the next interview response job.| Property | Value |
|---|
| Job queue | ai.transcribe |
| Result queue | ai.transcribe.results |
| Concurrency | 5 |
| Job kind tag | ai:transcribe |
Like TTS, the STT provider is abstracted behind an ISttProvider interface.| Var | Description |
|---|
STT_PROVIDER | openai / google / azure / none |
STT_LANGUAGE | Default language hint (BCP-47 tag, e.g., en) |
STT_MAX_AUDIO_SIZE_MB | Maximum upload size (default: 25 MB, matching most provider limits) |
When STT_PROVIDER=none or unset, the voice input endpoint returns 400 and voice input is unavailable. Users type instead.Weakness Tracking#
Cross-session weakness aggregation.Data flow#
How it works#
The AI-plane tags interview and review results with weaknessSignals, short string identifiers like 'edge_cases', 'time_complexity', 'off_by_one'. The control-plane's result consumer persists these to PostgreSQL, tied to the user and session. GET /users/me/ai/weaknesses aggregates across sessions:Future enhancement: the AI-plane could receive the user's historical weaknesses as part of the job data, letting it probe known weak areas.Weakness categories#
| Category | Description | Detected from |
|---|
edge_cases | Missing boundary/edge case handling | Code review, follow-up questions |
time_complexity | Suboptimal algorithmic complexity | Code review, interview discussion |
space_complexity | Excessive memory usage | Code review |
variable_naming | Poor variable/function naming | Code review (readability) |
code_structure | Deeply nested or poorly organized code | Code review (readability) |
off_by_one | Off-by-one errors in loops/indices | Code review (correctness) |
input_validation | Missing null/empty/type checks | Code review (edge cases) |
communication | Unclear explanation of approach | Interview conversation analysis |
Rate Limiting#
Rate limits are enforced by the control-plane before jobs are enqueued.| Scope | Limit | Window | Enforced at |
|---|
| AI hints | 3 requests | 5 min | Per user per room |
| AI messages | 20 requests | 1 min | Per user per room |
Over-limit requests get 429 Too Many Requests with a Retry-After header. The job never hits the queue. The 429 body includes a user-facing message so the frontend can display quota exhaustion clearly.Result Caching#
The control-plane caches job results in Redis after consuming them from result queues.| Property | Value |
|---|
| Cache key format | ai-result:{jobId} |
| TTL | 24 hours (86400 seconds) |
| Storage | Redis (ICacheService) |
| Written by | Control-plane result queue consumer |
| Read by | Control-plane GET endpoint |
Same Redis instance as execution results and other caches.Polling flow: The frontend polls the per-type endpoint (GET /rooms/:roomId/ai/message/:jobId, GET /rooms/:roomId/ai/hint/:jobId, or GET /rooms/:roomId/ai/review/:jobId) until it gets completed or failed. The control-plane checks the cache first; if nothing is cached yet, it queries BullMQ job status via IAiClient.getHintJobStatus() / getReviewJobStatus() / getInterviewJobStatus() and returns the queue state (queued | running).Error Handling#
LLM provider failures#
| Error type | AI-plane behavior | Control-plane behavior |
|---|
| LLM API timeout | Job fails, BullMQ retries (exponential backoff) | Returns queued or running status to frontend |
| LLM API rate limit | Job fails with retryable error | Same as timeout |
| LLM API auth error | Job fails permanently (no retry) | Returns failed status with error message |
| Invalid LLM response | Logged, job fails, retry | Returns failed if retries exhausted |
| TTS failure | Logged, result returned without audioUrl | Transparent. Result has no audio |
| STT failure | Job fails permanently (no retry; audio may be corrupted) | Returns failed status with error message |
BullMQ retry configuration#
| Setting | Value |
|---|
| Max attempts | 3 |
| Backoff type | Exponential |
| Backoff delay | 5000ms (5s, 10s, 20s) |
| Stall detection | 30s lock duration |
| Dead letter | Jobs moved to DLQ after max attempts |
Circuit breaker (control-plane side)#
IAiClient is wrapped with a circuit breaker proxy. When the AI-plane is unresponsive:CircuitBreakerOpenError -> 503 Service Unavailable
CircuitBreakerTimeoutError -> 504 Gateway Timeout
healthCheck() bypasses the circuit breaker; it always makes a real call so the control-plane health endpoint can report AI-plane status accurately.Observability#
OTel for traces and metrics, pino-opentelemetry-transport for structured log shipping.Metrics#
| Metric | Type | Labels | Description |
|---|
ai.queue.depth | Gauge | queue | Jobs waiting in each queue |
ai.queue.active | Gauge | queue | Jobs currently being processed |
ai.job.duration_ms | Histogram | queue, status | End-to-end job processing time |
ai.llm.latency_ms | Histogram | provider, model | LLM API call latency |
ai.llm.tokens.input | Counter | provider, model | Input tokens consumed |
ai.llm.tokens.output | Counter | provider, model | Output tokens generated |
ai.llm.failures | Counter | provider, error_type | LLM API failures by type |
ai.tts.latency_ms | Histogram | provider | TTS generation latency |
ai.tts.failures | Counter | provider | TTS generation failures |
ai.stt.latency_ms | Histogram | provider | STT transcription latency |
ai.stt.failures | Counter | provider | STT transcription failures |
ai.stt.audio_duration_ms | Histogram | provider | Duration of audio submitted for transcription |
Tracing#
Each job creates a span with:ai.job.type: hint / review / interview / transcribe
Child spans for LLM calls, TTS generation, and STT transcription
Logging#
Structured logs via nestjs-pino with:Log level: debug in development, info in production
OTel log shipping when OTEL_EXPORTER_OTLP_ENDPOINT is configured
Modified at 2026-03-12 05:26:10