v1.0Sub-500ms on-device signal processing

The moment
before you react.

Cue reads your voice acoustically — no transcription, no cloud upload — and delivers a peripheral nudge in under 500ms. Coaching that arrives while the conversation is still in flight.

Download for macOS / Windows How it works

On-device only · No audio leaves your device · HIPAA-compatible by architecture

Live session · 04:38

Speech Rate (ZCR)

↑

Energy (RMS)

Spectral Centroid

↑

Spectral Flatness

Monologue Duration

↑

432ms

Slow down.

Pace 18% above baseline for 640ms. Listeners read this as defensive.

Architecture

Five signals.
No words.

Cue processes raw audio locally using an AudioWorklet, extracting acoustic features 128ms at a time. No speech-to-text. No NLP. No upload.

Energy (RMS)

Volume and projection proxy. Sustained elevation reads as aggression or defensiveness before words register.

Root Mean Square

Speech Rate (ZCR)

Zero-crossing rate as a pace estimator relative to your personal calibrated baseline. Above baseline signals cognitive overload.

Zero-Crossing Rate

Spectral Centroid

Laryngeal tension proxy. Higher values indicate sympathetic activation — tense voices escalate conversations via listener mirroring.

Frequency centroid

Spectral Flatness

Harmonic-to-noise ratio proxy. Tracks vocal quality and breath support under pressure.

Wiener entropy

Monologue Duration

Continuous unbroken speech detection. Triggers after 30 seconds without a 2-second pause — the turn-taking signal.

Unbroken speech ≥ 30s

End-to-end latency

<500ms

Escalation

When all three primary signals sustain above baseline simultaneously, Cue activates a 24-second breathing pacer (4s inhale / 4s exhale) in your peripheral vision.

Post-session

Integration Tape: EQ score, emotional arc, the moment you missed, and one micro-skill to practice next time.

Scientific foundation

Acoustic truth
precedes thought.

Listeners decode vocal emotion in milliseconds — before words are processed (System 1). Cue intervenes at the acoustic layer, where behavior actually forms.

Scherer / Juslin & Laukka

Vocal emotion decodes across cultures in milliseconds

Listeners respond to acoustic features before processing words — your tone is already working on the other person.

Goldman-Eisler

Speech rate varies systematically with arousal

Pace above personal baseline reads as defensiveness or anxiety to listeners — independent of what you're actually saying.

Sundberg / Titze / Banse & Scherer

Spectral centroid rises with laryngeal tension

Tense voices trigger sympathetic mirroring in listeners, escalating conversations through acoustic contagion.

Winstein

Feedback timing governs skill transfer

Post-action feedback doesn't build durable habits. In-moment feedback does. This is why post-call tools plateau.

Duncan / Sacks / Stivers

Turn gaps cluster near 200ms cross-culturally

Pause behavior is the primary turn-taking signal. Cue's monologue detector catches the gap before it becomes a rupture.

Proprietary corpus

Real-world labeled emotional signal data

The free consumer tier generates a signal corpus that competitors cannot purchase or replicate. The moat compounds.

Why post-call dashboards fail

Winstein (1994) showed that feedback timing governs skill transfer. Post-action feedback does not produce durable behavior change. In-moment feedback does.

The Cue thesis

Gong, Chorus, and Salesforce Einstein are post-call. Their feedback arrives hours after the moment has passed. Cue arrives in 432ms.

Category distinction

Post-call analytics
is a different category.

Gong, Chorus, and Otter are transcription-first. By the time their feedback arrives, the conversation is over. Cue operates in a different category.

Capability	Cue	Gong / Chorus	Otter / Fireflies
Feedback latency	<500ms	2–8 hours	Post-call
Audio uploaded to cloud
Speech-to-text transcription
Works during the conversation
Personal baseline calibration
HIPAA / legal / clinical safe
Works on any audio source		Zoom only	Zoom / Meet
Peripheral nudge (non-intrusive)

The privacy moat

HIPAA, FERPA, attorney-client privilege, clinical confidentiality, and M&A discretion make cloud-based transcription tools unusable in the segments where real-time coaching matters most. Cue's on-device architecture isn't a feature — it's the only viable form factor.

B2B SaaS SalesCustomer SuccessHealthcare / CliniciansLegal / AttorneysExecutive CoachingFinancial AdvisorsTherapistsInsurance

Pricing

Start free.
Scale on outcomes.

Enterprise pricing is outcome-based — you pay on measurable conversation deltas, not per seat.

Free

$0forever

Full signal processing. Unlimited sessions. Your data never leaves your device.

All 5 acoustic signal features
Personal baseline calibration
Peripheral nudge + breathing pacer
Integration Tape post-session
macOS + Windows desktop app

Download free

Pro

$15/ month

Session history, trend analysis, and advanced coaching for individuals serious about their craft.

Everything in Free
Full session history + arc charts
Cross-session trend analysis
Moment-you-missed callouts
Priority access to new features

Start Pro

Enterprise

Outcomecontract

Priced on measurable conversation deltas for revenue teams. Baseline established, outcomes tracked.

Everything in Pro × entire team
Outcome contract pricing (no seats)
Baseline + delta measurement
CRO / VP Sales outcome reporting
Dedicated onboarding + CSM

Enterprise contracts are priced on measurable outcomes — deal-loss rate reduction, rep ramp time, retention success rate.
You don't pay for seats. You pay for results.

The momentbefore you react.

Five signals.No words.

Energy (RMS)

Speech Rate (ZCR)

Spectral Centroid

Spectral Flatness

Monologue Duration

Acoustic truthprecedes thought.

Post-call analyticsis a different category.

Start free.Scale on outcomes.

The moment
before you react.

Five signals.
No words.

Acoustic truth
precedes thought.

Post-call analytics
is a different category.

Start free.
Scale on outcomes.