SEBY × HUDDLE TALENT · INTERNAL ACCOUNTABILITY DOC v1 · 27 MAY 2026

How we deliver Huddle Talent.

Two contractors who have never worked with Seby before. A foundation customer in exec search where confidentiality is the product. A Microsoft stack Seeda has deliberately not built MCPs for. This page is the contract between Michael, William, and Khoa. Read it once, then deliver against it.

Two things before you read anything else. Nothing gets built until the existing Seeda stack passes a fresh security, resilience, and scalability audit by William and Khoa (Section 00). And testing plus iteration is the work, not a phase that happens at the end. Every artefact assumes three to five loops to reach done. That is budgeted and expected (Section 06).

Customer · Huddle Talent (Cliff Wilson) Rate · A$150 / hr Owner · Michael Kingston Build · William Nguyen + Khoa Do Speed · TBC by Cliff (default Run) Kill switch · End of Week 1
THE STACK · WHAT YOU'LL BE DEPLOYING
↗ Full anatomy

This is the stack running daily at Seeda right now. Huddle's build is an adaptation of this anatomy onto Cliff's Microsoft tenancy. Before you touch a line of code, you will know every engine, every worker, every input, every output.

The Seby Brain anatomy
00 · AUDIT BEFORE BUILD

Nothing ships until the foundations are checked.

Michael has built the entire Seeda automation stack as a general business operator, not as a trained engineer, and without professional external advice. The architecture works in production today, but it has never been audited by qualified outside eyes. Huddle is a paying customer in a confidentiality-critical industry. We do not build on top of an unverified foundation.

AUDIT 01Security

  • Map every secret in flight: where it lives, who can read it, how it rotates, audit trail on access1Password vaults, .env files, GH Actions secrets, CF tokens, Firebase service accounts
  • Verify the 5-tier privacy boundary actually holds: penetration test cross-tier reads against the Seeda stackKhoa runs the test, documents every finding
  • Identify any path where outbound (Slack, Gmail, Twilio, Airwallex, Xero) can fire without the send-word gateIf found: P0, fix before Huddle build starts
  • Authorisation model review: Firebase auth, Cloudflare Access, App Check, repo-boundary checkAre they correctly composed? Any bypass paths?
  • Data-at-rest review: which memory files contain L0/L1 content, where they sync, who has filesystem accessdotclaude-memory, Google Drive transcripts, ~/seeda-private/

AUDIT 02Resilience

  • Failure-mode map for all 40+ scheduled tasks: what happens if each one silently breaks for 3 days?Charlie autonomy, Fathom pipeline, MRR truth, payroll preview
  • Backup and recovery: can the system be restored if dotclaude-memory, seeda-finance, seeda-ops are lost?What is the RTO and RPO? Are they documented?
  • Dependency graph: which automations break if a single MCP, LaunchAgent, or API goes downPostHog, Xero, Chargebee, Airwallex, Slack, Gmail, Calendar
  • Hook collision risk: do any of the pre-commit, PreToolUse, or Stop hooks contradict each other under load?Re-run the May 8 self-bite incident in a sandbox
  • Monitoring and alerting: who finds out, how fast, when something is broken?Is there a single dashboard or are failures invisible?

AUDIT 03Scalability

  • What breaks at 10 customers? At 50? At 200? Identify the first three bottlenecksLikely candidates: memory file size, MCP context, scheduled-task collisions
  • Multi-tenancy posture: can the Huddle stack be cleanly isolated from Seeda or future customers?Per-venture working dirs work today; do they scale?
  • Cost model at 10x usage: Claude credits, CF Pages, Firebase, MCP-host feesProject current ~A$1.2k/mo to 10x. Is the unit economics OK?
  • Code-quality review: identify the highest-risk skills and rules that need refactor before extensionNot aesthetic. Risk-prioritised.
  • Documentation gap analysis: what must a new engineer understand before contributing?The audit itself produces this list

Deliverable: a single audit report at team.seby.com.au/audit/seeda-stack-2026-05/ with severity-ranked findings, owner, and fix-or-accept decision per item. Michael reviews and accepts each finding. Anything tagged P0 blocks Huddle Pillar 1 ingest. Anything P1 must have a remediation plan before that pillar ships. Audit budget: ~20 hours William, ~30 hours Khoa, ~5 hours Michael across Week 0 (the week before Huddle kickoff).

01 · PILLAR × SPEED

Three pillars, three speeds, one matrix.

Cliff picks the speed. The pillars and the order are fixed. Brain, then CRM, then Boardroom.

Walk · 13 wks
Run · 5 wks DEFAULT
Sprint · 3 wks
Pillar 01The Brain
Wks 1–5
Outlook + Teams + OneDrive ingest. 5 privacy tiers. Audit log.
Wks 1–2
Same scope, compressed. Demo at end of Wk 1.
Wk 1
Brain-only sprint. CRM + Board come next month.
Pillar 02The CRM
Wks 6–10
Prospect pages, candidate dossiers, deal forecast.
Wks 3–4
Barrenjoey + ASB + Westpac restructured.
Wk 2
Pages-only. Forecast deferred.
Pillar 03The Boardroom
Wks 11–13
Recording capture, action tracking, deck pre-build.
Wk 5
Veronica's weekly + Cliff's next board pre-read.
Wk 3
Deck pre-build only. Action tracking deferred.
02 · WHO OWNS WHAT

Four lanes. No overlaps. No drift.

William and Khoa are split along their actual strengths, not split evenly. Michael QAs every milestone.

MichaelLead · QA · Seby
  • Owns the customer relationship and the weekly call with Cliff
  • QAs every artefact before it touches a Huddle data source
  • Signs off privacy tier mappings and audit-log schema
  • Personal cover if either contractor is unavailable for a week
WilliamOperator · Berlin · Remote
  • Prospect pages: Barrenjoey, ASB, Westpac restructured into the new pattern
  • Voice-memo ingest (174 of Cliff's recordings) and per-meeting briefs
  • Requirements writing: turns Cliff's two-line asks into Claude-actionable briefs
  • n8n + Claude Code workflows for the operator-facing surface
  • Weekly demo recording (Loom, max 5 min) for Michael's review
KhoaEngineer · Berlin · Remote
  • Builds the audit log, send-word gate, and repo-boundary check first, before any ingest
  • MS Graph wiring: Outlook, Teams chat, Teams recordings, OneDrive
  • Eval framework for every Claude-generated artefact (his JMIR-paper muscle)
  • Firebase auth + App Check + 5-tier access enforcement
  • Infra: GitHub Actions, Cloudflare Pages deploy, secrets via 1Password
CliffDirector · Huddle Talent
  • Picks the speed. Confirms the pillar order. Names the first data sources.
  • 30-minute weekly steer call (Friday Sydney time)
  • Signs off the 5 privacy tier definitions in Week 1
  • Final acceptance on each pillar against the criteria in section 06
03 · TIMELINE

Six weeks. Audit first. Build second.

Default Run speed. Week 0 is the audit (Section 00) and is non-negotiable. Weeks 1 to 5 are the Huddle build. If Cliff picks Walk or Sprint, the build bars stretch or compress, the audit week stays.

Wk 0 · Audit
Wk 1
Wk 2
Wk 3
Wk 4
Wk 5
Audit · securityKhoa lead · ~12 hrs
Audit · resilienceKhoa + William · ~18 hrs
Audit · scalabilityWilliam lead · ~15 hrs
Audit report + sign-offMichael · ~5 hrs
Audit-driven fixesP0 only · variable
Audit log + gatesKhoa · ~20 hrs
MS Graph ingestKhoa · ~25 hrs
Prospect pagesWilliam · ~22 hrs
Voice-memo ingestWilliam · ~15 hrs
Eval harness + iterationKhoa · ~20 hrs across all weeks
Boardroom pillarKhoa + William · ~20 hrs
Michael QA + Cliff callEvery week
Michael (QA, weekly call, Boardroom) William (operator surface) Khoa (engineering substrate)
04 · WEEK 1 · DAILY

Five days. Kill-switch at the end.

If the Day 5 demo does not show Cliff something that materially saves him time, we stop and rescope. The retainer-free promise dies if this week drifts. No real Huddle client data touches the system this week. Sanitised Barrenjoey fixtures only.

DAY 1 · MON
Kickoff + tier sign-off
Cliff sign-off on the 5 privacy tier definitions. Michael walks William and Khoa through the venture plan, the brand kit, and the Cliff-only memory boundary. First three data sources named (default: Outlook, Teams chat, OneDrive design folder).
Owner: MichaelCliff · William · Khoa on call
DAY 2 · TUE
Audit log + send-word gate live
Khoa ships the audit log, send-word gate, and repo-boundary check against a synthetic fixture. Nothing reads or writes to a real Huddle source until this is green. Michael QAs against the Seeda equivalents.
Owner: KhoaMichael QA gate
DAY 3 · WED
MS Graph read-only against Cliff's tenant
First real read. Outlook + Teams chat scoped to Cliff-only tier. No write paths. No customer data leaves Cliff's tenancy. Khoa pairs with Michael for the Azure AD app registration.
Owner: KhoaMichael paired
DAY 4 · THU
Barrenjoey prospect page · old vs new
William restructures the existing Barrenjoey deck into the new prospect-page pattern, side by side with the old. Tech stack, market map, buyer map, blockers, next action. Sanitised inputs only.
Owner: WilliamMichael QA review
DAY 5 · FRI
Demo + go/no-go
Live demo to Cliff. Cliff runs three queries the brain has to answer correctly. Decision: continue to Week 2 (Run pace), stop and rescope, or pause. Weekly timesheet + invoice to Cliff. Definition-of-done checklist signed off for Pillar 1 scaffolding.
Owner: MichaelCliff sign-off
05 · DEFINITION OF DONE

Each pillar, three boxes.

What exists, what Cliff can do, what "broken" looks like. Tick all three or it is not done.

PILLAR 01The Brain

  • Outlook, Teams chat, Teams recordings, OneDrive all ingest into the brain with 5-tier classification on every recordVerified by Khoa's eval harness, audit-logged
  • Cliff runs 10 test queries; brain returns ≥ 8 correct, 0 cross-tier leaksTest set agreed in Wk 1, frozen for the engagement
  • Brain refreshes itself nightly without manual intervention for 7 consecutive daysLaunchAgent or equivalent on Cliff's machine

PILLAR 02The CRM

  • Barrenjoey, ASB, Westpac all live as prospect pages with tech stack, buyer map, deal value, forecast, next actionOne source of truth, fed from the brain
  • Pre-meeting brief drafted in ≤ 5 minutes from "meeting in 2 hours" prompt, ≤ 30% rejection rate over 10 briefsTracked in eval harness
  • Post-meeting transcript folds back into the prospect page automatically; page updates within 30 minutes of the meeting endingFathom or Teams recording, both supported

PILLAR 03The Boardroom

  • Every board and management meeting recorded, parsed, and actions extracted with assignee + due dateVeronica is the first user; her weekly is the proof case
  • Next deck pre-built from prior deck + actions + new context; Cliff opens at ≥ 90% completeMeasured: time-to-finished-deck before vs after
  • Cross-meeting action tracking: no action lost across two consecutive cyclesEval harness checks for dropped actions
06 · TESTING + ITERATION IS THE WORK

Three to five loops per artefact. Budgeted, not apologised for.

Building with Claude and AI is not a one-pass write. Every artefact, every skill, every prompt, every brief gets tested, found wanting, iterated, retested. Anyone who quotes a fixed timeline assuming first-pass success has either never shipped AI in production or is lying. We assume the opposite and budget for it.

WHATEvery artefact, three loops minimum

  • Brief written, Claude produces v1, human reads, rejects or amends, v2 produced, retestedThree loops is the floor, not the ceiling
  • 30% first-pass rejection is the expected baseline, not a failure signalSub-10% rejection means the eval is too easy. Sub-50% rejection means the brief is too vague.
  • Every reject logged with the reason: hallucination, wrong tone, wrong tier, missing fact, format breakThe reasons become the eval suite for the next artefact
  • Regression suite runs nightly: every passed artefact must still pass after any infra changeKhoa's eval harness owns this

WHOIteration stamina is a job requirement

  • William: same prompt, fifteen variants, still sharp on the sixteenthThis is in the JD because this is the work
  • Khoa: builds the eval framework before he builds anything that needs evaluatingHis JMIR paper is literally on evaluation methodology. Use that muscle.
  • Michael: QA gate weekly, no rubber-stampingAnything Michael does not personally test is not signed off
  • Cliff: 30-minute steer call weekly, points us at the next thing to hardenHis rejections drive the next iteration loop

HOWThe eval-and-iterate loop

  • Eval set frozen at the start of each pillar. 10 queries minimum, drawn from real Huddle workNo moving the goalposts mid-week
  • Every prompt change reruns the full eval set before mergeIf the eval set takes longer than 10 min, parallelise it
  • Failed evals categorised: data problem, prompt problem, model problem, eval problemEach category has a different remediation path
  • Demo-driven testing: weekly Loom shows Cliff what improved, not what was addedDiff thinking, not feature thinking

If iteration time is not budgeted, the timeline lies. Across the 5-week Run pace, expect roughly 40% of total hours to be testing, eval-writing, regression, and re-prompting, not net-new build. That is normal. That is the work. If a contractor reports "done" without showing the iteration trail (the rejected v1s, the eval results, the regression diff), the work is not done.

07 · ACCEPTANCE CRITERIA

Numbers, not vibes.

Tied to a demo. If Khoa's eval harness cannot measure it, it is not a criterion.

PillarMetricTargetMeasured by
01 BrainCorrect answers across 10 frozen test queries≥ 8 / 10Cliff live demo · Wk 1 Fri
01 BrainCross-tier leaks in 100 audit-log spot checks0Khoa eval harness · daily
01 BrainHallucinated facts in 20 named-entity queries0Eval harness · pre-Wk 2 demo
02 CRMPre-meeting brief generation time≤ 5 minStopwatch on 10 trials
02 CRMFirst-pass rejection rate by Cliff or Michael≤ 30 %Tracked in eval harness
02 CRMTranscript-to-page update latency≤ 30 minLogged timestamps
03 BoardDeck completeness at Cliff first open≥ 90 %Side-by-side vs prior cycle
03 BoardLost actions across two cycles0Veronica weekly review
AllSend-word gate bypasses0Audit log spot checks
AllWeekly demo recordings (Loom)1 per weekWilliam delivery
08 · PRIVACY TIERS

The most important page in this doc.

An exec-search firm lives or dies on confidentiality. Mis-tier one record and the engagement is over.

T1
Cliff-onlyMost sensitive
Board pre-reads, candidate financials, deal economics, personal notes. Encrypted at rest, audit log on every read.
SeesCliff · Michael (QA only)
WritesCliff · Michael
BoardWilson Select directors
Strategy, pricing decisions, P&L, partnership terms. Tier-2 audit log on access.
SeesCliff · Board
WritesCliff
LeadershipCliff + Veronica
Operating decisions, weekly management meeting capture, action tracking.
SeesCliff · Veronica
WritesCliff · Veronica
TeamWSG delivery team
Diagnostic delivery notes, candidate dossiers (non-financial), workshop materials, brand assets.
SeesMichael Buckley · Veronica Byrne · James Casino
WritesCliff · Veronica
Customer-safeShareable
Public-facing decks, marketing collateral, anonymised case studies. The only tier safe to put in a customer email.
SeesAnyone
WritesCliff · Veronica · Michael
09 · RISK REGISTER

What can go wrong, and what we do about it.

First-project-with-unknown-contractors risks. Not generic risks.

Cross-tier data leakHigh

Mitigation: Khoa ships audit log + send-word gate + repo-boundary check on Day 2, before any real ingest. Week 1 uses sanitised fixtures only. 100-record spot check daily.

Khoa is single point of failure on MS GraphHigh

Mitigation: Khoa documents every MS Graph integration as he ships it. Michael shadow-pairs on the Azure AD app registration in Wk 1. If Khoa unavailable, Michael covers personally at A$150/hr.

William over-builds vs Cliff's actual workflowMed

Mitigation: Cliff's voice memos drive the design. William turns Cliff's two-line ask into the brief, Michael QAs the brief before any build. Weekly Loom check.

Berlin · Sydney TZ latency stalls weekly demoMed

Mitigation: Both contractors commit to a 2-hour Sydney-morning overlap window Tue + Thu. Weekly demo recorded async Friday Sydney time.

Quality drift on first-pass outputMed

Mitigation: ≤ 30% first-pass rejection target. Anything Cliff rejects twice triggers a pause-and-rescope conversation with Michael, not a third attempt.

Scope creep into out-of-scope itemsLow

Mitigation: Finance integration, Tranquil IT handover, deep voice training, team rollout are all explicitly out for the first 5 weeks. New scope = new proposal.

Premature MS Graph over-buildLow

Mitigation: Decision D-04 (venture plan) holds: only build MS-side integration that has a Cliff use case this week. No generic Outlook MCP.

Communication latency between contractorsLow

Mitigation: Shared Slack channel with Michael. Daily standup async in-channel by 09:00 Berlin. Blockers tagged to Michael directly.

Audit reveals P0 finding that blocks Wk 1High

Mitigation: Audit-driven fix budget reserved in Wk 1 timeline. If a P0 lands, Huddle build pauses, Cliff is informed in writing within 24 hrs, remediation plan in 48 hrs. No build over an unverified foundation.

Michael's lack of formal engineering training shows up as a defectMed

Mitigation: Section 00 audit exists specifically to find these. Each finding ranked, owned, and either fixed or accepted in writing. The audit report is the artefact that converts unknown unknowns into known knowns.

Iteration time underestimated; week budgets blownMed

Mitigation: 40% iteration overhead baked into all hour estimates (Section 06). If a sprint week needs more, contractor asks first, no silent overruns. Weekly timesheet shows iteration vs net-new hours separately.

Eval suite drifts or becomes too easyLow

Mitigation: Khoa owns suite integrity. New evals added every week from Cliff's actual rejections. Suite frozen at start of each pillar to prevent goalpost-moving mid-week.

10 · STOP-IF CRITERIA

The lines we draw before we start.

Written now, not negotiated later. Hitting any of these stops the engagement until Michael and Cliff have rescoped together.

11 · WEEKLY RHYTHM

Same shape every week. No surprises.

Two contractors who do not know Seby yet need a predictable rhythm before they need flexibility.

MON
Plan
Michael posts the week's targets in Slack by 10:00 Berlin. Hours estimated per swimlane.
TUE
Overlap call
90 min · Sydney 17:00 · Berlin 09:00. Working session, not a status meeting.
WED
Mid-week QA
Michael reviews work-in-progress, flags anything drifting before Friday demo.
THU
Build day
No meetings. Both contractors heads-down. Async only.
FRI
Demo + invoice
Loom demo from William. Timesheet from both. Invoice to Cliff. Cliff steer call · 30 min.