recruiting-techarchitecturecase-study

How Listen Labs' Billboard Token Hack Was Built: A Technical Teardown for Engineers

UUnknown

2026-02-23

10 min read

Reverse-engineer Listen Labs' billboard hack: token encoding, challenge pipeline, scoring, and scaling — with code and architecture to reuse.

Hook: Why engineering hiring needs hacker-proof, scalable puzzles

Engineering teams are drowning in resumes and automated screenings that surface the same cookie-cutter profiles. Senior hiring managers and platform engineers tell us one thing: they want reliable, production-like signals of problem-solving under pressure — and they want them at scale. The Listen Labs billboard hiring stunt solved that problem publicly and virally in 2025. This teardown reverse-engineers how a production-grade gamified hiring system like that can be built in 2026, with reusable code, clear scoring rules, secure pipelines, and scaling design that teams can adapt for hiring 10 to 10,000 engineers.

The big-picture architecture

Before diving into token encoding and scoring, understand the event flow. A minimal, robust architecture for a billboard-style coding challenge looks like this:

Billboard -> Landing page -> Token decode -> Challenge portal -> Submission webhook -> Queue -> Sandbox workers -> Evaluator -> Scoring service -> ATS/web dashboard

Key components are a stateless public entry point, a signed token decoding service, a distributed evaluation pipeline, and a scoring/orchestration layer that writes to your ATS and triggers recruiter webhooks.

ASCII architecture diagram

+-----------------+   GET   +--------------+   POST   +-------------+
|  Billboard QR/   |------->|  Landing     |--------->|  Submission  |
|  token string    |        |  Page/Client |          |  Webhook     |
+-----------------+         +--------------+          +------+-------+
                                                               |
                                                               v
                                                     +-------------------+
                                                     |  Message Queue    |
                                                     |  (SQS/Kafka/Rabbit)|
                                                     +---------+---------+
                                                               |
                                          +--------------------+--------------------+
                                          |                                         |
                                          v                                         v
                                +------------------+                       +------------------+
                                | Sandbox Worker   |                       | Async Evaluator  |
                                |  (Firecracker)   |                       |  (LLM + Tests)   |
                                +--------+---------+                       +--------+---------+
                                         |                                          |
                                         v                                          v
                                +------------------+                       +------------------+
                                |  Results Storage |<----------------------+|  Scoring Service |
                                |  (Postgres/S3)   |                       +------------------+
                                +------------------+                               |
                                                                                   v
                                                                            +----------------+
                                                                            |  ATS / Dashboard|
                                                                            +----------------+

Reverse-engineering the billboard token encoding

Listen Labs displayed five strings of numbers that decoded to a challenge. From an engineering perspective, the ideal public-facing token must be:

Compact and visually distinct for offline media like billboards.
Opaque to prevent trivial guessing.
Verifiable server-side so you can safely map a token to a payload.
Human-readable if possible, or at least copyable from a phone.

A practical, reproducible token design

We propose a design that meets those requirements: payload JSON -> compressed -> base32 -> HMAC signature -> chunked numeric groups. This maps well to numbers-on-billboard without leaking raw data.

1. payload = { id: 'campaign-berghain-2025', issued: 1700000000 }
2. bytes = gzip(payload)
3. token_core = base32(bytes)  // uppercase, no padding
4. sig = hmac_sha256(secret, token_core)
5. full = token_core + '.' + base32(sig[0..8])
6. display = split_into_groups(full, 5)  // group into five numeric-looking blocks

Node.js encode/decode snippets

const zlib = require('zlib')
const crypto = require('crypto')
const base32 = require('hi-base32')

function encodeToken(payload, secret) {
  const bytes = zlib.gzipSync(Buffer.from(JSON.stringify(payload)))
  const core = base32.encode(bytes).replace(/=+$/, '').toUpperCase()
  const sig = crypto.createHmac('sha256', secret).update(core).digest()
  const shortSig = base32.encode(sig.slice(0, 6)).replace(/=+$/, '').toUpperCase()
  const full = `${core}.${shortSig}`
  return splitToGroups(full, 5)
}

function splitToGroups(s, parts) {
  const n = Math.ceil(s.length / parts)
  const groups = []
  for (let i = 0; i < parts; i++) groups.push(s.slice(i*n, (i+1)*n) || '')
  return groups.join(' ')
}

function decodeToken(display, secret) {
  const full = display.replace(/\s+/g, '')
  const [core, shortSig] = full.split('.')
  const expected = crypto.createHmac('sha256', secret).update(core).digest()
  const expectedShort = base32.encode(expected.slice(0,6)).replace(/=+$/, '').toUpperCase()
  if (expectedShort !== shortSig) throw new Error('Invalid signature')
  const bytes = base32.decode.asBytes(core)
  return JSON.parse(zlib.gunzipSync(Buffer.from(bytes)).toString())
}

This pattern provides a signed, compact, and reversible token that can be printed as numbers or letter groups on a billboard. In practice, Listen Labs' real token may have been an AI token sequence — but this implementation is deterministic, secure, and easy to debug.

Server-side challenge pipeline

Once a candidate copies the token into the landing page, the real work begins. The pipeline must be resilient to high burst traffic, support asynchronous evaluation, and be secure against abuse. We recommend a serverless-first, queue-driven architecture that separates front-line traffic from heavy compute.

Event flow and components

Landing page: minimal client that decodes token locally for UX but always validates server-side.
Submission webhook: POST endpoint that enqueues job metadata and returns a submission ID.
Message queue: SQS/Kafka/RabbitMQ to buffer evaluation requests.
Worker pool: autoscaling sandbox workers that run candidate code in a locked-down environment.
Evaluator: runs tests, gathers telemetry, invokes LLM-based analysis for heuristics, stores artifacts.
Scoring service: computes a score and triggers webhooks/ATS updates.

Example submission endpoint (Express + SQS)

const express = require('express')
const AWS = require('aws-sdk')
const bodyParser = require('body-parser')
const SECRET = process.env.TOKEN_SECRET

const sqs = new AWS.SQS()
const app = express()
app.use(bodyParser.json())

app.post('/submit', async (req, res) => {
  try {
    const { token, code, language } = req.body
    const payload = decodeToken(token, SECRET)
    const job = { payload, code, language, submittedAt: Date.now() }
    await sqs.sendMessage({ QueueUrl: process.env.QUEUE_URL, MessageBody: JSON.stringify(job) }).promise()
    return res.status(202).json({ status: 'queued' })
  } catch (err) {
    return res.status(400).json({ error: 'invalid token' })
  }
})

app.listen(3000)

Return a lightweight response so client UX is fast. The heavy lifting happens asynchronously.

Sandboxing and secure code execution

Evaluation must be realistic but safe. Modern approaches use lightweight virtualization or container sandboxes with strict resource limits and syscall filtering. In 2026, providers often use Firecracker microVMs, gVisor, or dedicated FaaS sandboxes optimized for running untrusted code at scale.

Resource limits: CPU, memory, filesystem size, network egress rules.
Timeouts: strict wall-clock and CPU-time limits to avoid DoS.
Artifact capture: store stdout, stderr, test traces, and code snapshot to S3/Postgres.
Language containers: prebuilt images for common languages to reduce cold-starts.

Worker pseudocode

loop: msg = queue.receive()
job = parse(msg)
spawn sandbox(workdir, limits)
write code to workdir
run tests with time and resource caps
collect stdout, test results, coverage
compute heuristics (plagiarism, runtime traces)
upload artifacts to storage
post results to scoring service
ack queue message

Candidate scoring: deterministic, explainable, and auditable

Organizations want a ranking that maps to interview decisions and is defensible for compliance. Use a composite rubric combining automated test pass rates, performance metrics, robustness, and anti-cheat signals.

Suggested weighted rubric

Correctness (unit/integration tests): 50%
Performance (time and memory): 15%
Robustness (edge-case tests, input fuzzing): 10%
Code quality (linters, style): 10%
Originality / anti-cheat (plagiarism + AI-assist signals): 15%

Scoring function example (Python)

def compute_score(results):
    score = 0
    score += results['tests_passed_ratio'] * 50
    score += max(0, (1 - results['median_time_s'] / results['time_budget'])) * 15
    score += results['robustness_score'] * 10
    score += results['lint_score'] * 10
    score += (1 - results['plagiarism_risk']) * 15
    return round(score, 2)

Store the raw metrics and the final score so recruiters can audit any decision. In 2026, regulatory scrutiny on automated decision-making in hiring is higher, so produce evidence and human-review hooks.

Webhooks and real-time notifications

Once scoring is complete, notify downstream systems. Design a retry-safe webhook system and include signed payloads to protect integrity.

POST /webhook/recruiter
{ candidateId: 'abc', score: 87.2, artifactsUrl: 'https://s3/...' , signature: '...' }

Server computes HMAC over payload and delivers with retries/backoff.

Use a backoff strategy and dead-letter queue for failed deliveries. For ATS integration, map score thresholds to recruiter tags and create triggers for human review for borderline cases.

Scaling considerations and cost control

Large public events can spike traffic by orders of magnitude. Plan for bursty concurrency and protect your infrastructure.

Traffic protection

Rate limiting at the edge and per-IP with progressive delays.
Captcha or device-fingerprint gating for large volumes from single clients.
WAF rules to block signature patterns used by scraping bots.

Compute scaling

Buffer with a message queue to decouple ingestion from evaluation.
Autoscale worker nodes based on queue length and worker latency.
Keep warm containers or microVM pools to reduce cold-start times for sandboxes.

Cost controls

Tiered evaluation: quick syntactic linting and unit tests first, deep analysis only for candidates who pass initial gates.
Use spot instances for non-critical batch evaluations.
Cap per-candidate evaluation time and retries.

Anti-cheat, plagiarism, and AI-assist detection

By 2026, candidates regularly use AI coding assistants. Good systems distinguish between acceptable AI-assisted work and copied solutions.

Plagiarism: use similarity checks against public repos and previously seen submissions with fuzzy matching and shingling.
AI-assist signals: detect sudden jumps in style, usage of very obscure function names, or improbable performance for the stated approach.
Time-series telemetry: if a candidate submits identical outputs in multiple languages or with matching exception traces, flag for review.

Example plagiarism heuristic

from difflib import SequenceMatcher

def similarity(a, b):
    return SequenceMatcher(None, a, b).ratio()

if similarity(candidate_code, known_repo_code) > 0.8:
    flag_plagiarism()

Pair automated signals with manual review workflows so recruiters can determine context (e.g., permissible use of libraries).

UX and gamification patterns you can reuse

Gamified recruiting works when the challenge is fun, fair, and clearly tied to the job. Reuse these patterns:

Progressive reveal: show the next puzzle only after verification or a scoring threshold to prevent mass leaking.
Leaderboards: but make them ephemeral and privacy-aware to avoid doxxing.
Immediate feedback: provide unit-test results and hints to keep candidates engaged.
Multiple paths: allow both algorithmic and systems-focused tasks to surface different skill sets.

2026 trends and why this approach still matters

Late 2025 and early 2026 saw three trends that make this design timely:

Event-driven hiring platforms became mainstream: webhooks, ephemeral sandboxes, and queue-driven evaluation are now best practices.
LLM function-calling and program synthesis are widely used for automated grading and code-understanding, but they require human-audited scoring to avoid bias.
Regulatory focus on automated hiring decisions means observable, auditable scoring pipelines with human review are essential.

Gamified hiring is not a gimmick: when well engineered, it gives reproducible, production-relevant signals that traditional interviews miss.

Operational checklist before you go public

Validate token signing and decoding end-to-end with test data.
Load-test landing page and submission endpoint with 10x expected peak.
Provision warm sandbox capacity and test VM lifecycle at scale.
Set up monitoring and alerts for queue depth, worker error rates, and evaluation latency.
Define clear human-review flows and compliance logging for auditability.

Actionable takeaways

Use signed, compact tokens for offline media; base32 + HMAC is easy and robust.
Queue-driven evaluation decouples UX from heavy compute and protects your frontend.
Sandbox execution in microVMs or gVisor keeps candidate code safe while preserving realism.
Composite scoring that is auditable reduces bias and maps to hiring actions.
Plan for bursts and abuse — rate limiting, captcha gating, and warm pools are non-negotiable.

Final notes and ethical considerations

Billboard stunts are high-visibility experiments. Respect candidate privacy, avoid discriminatory signal weights, and make your automated decisions reviewable. Document scoring rules and retain artifacts for investigations. By 2026, transparency and fairness are both regulatory expectations and hiring best practices.

Call to action

If you are designing a gamified hiring pipeline, start with a short pilot: build a signed-token landing page, a submission webhook, and a small sandbox worker pool. Use the code snippets in this teardown as a blueprint and run a closed beta with internal engineers. Want a ready-made reference architecture and terraform modules that implement the patterns above? Contact us to get the Listen Labs-style kit for scalable, auditable, and fun engineering hiring.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.