Ben Ogren | How I Built a Voice Agent with ElevenLabs

Performance reviews suck. Everyone knows it. Managers spend hours crafting questions, employees struggle to remember what happened three months ago, and HR gets a pile of generic responses that don't actually help anyone grow.

At Candor, I've been building an AI-powered system that automatically collects 360° feedback based on who you actually work with. But even with smart automation, I still had one major problem: people hate filling out surveys.

So when one of my advisors threw out a "wild idea" during a board meeting, everything clicked.

The Lightbulb Moment

I was deep in an advisory board meeting, discussing user adoption challenges, when one of my advisors interrupted with what she called a "wild idea":

"I think one of the barriers is the actual writing of the words. When I would practice hard conversations, it always sounded better in practice than when I had to deliver it. Similar with positive or negative feedback.

She went on to mention how one of her consulting clients would send her Loom videos instead of Slack messages because "he didn't want to write - he wanted to talk."

The room went quiet for a moment. Then I said: "I couldn't agree more. It's always been a hunch of mine with feedback. It'd be much easier to just record something."

That was it. The missing piece I'd been searching for.

From Idea to Implementation

The advisor's insight crystallized something I'd been feeling but couldn't articulate: people don't struggle with feedback because they don't know what to say - they struggle because writing it down feels formal, time-consuming, and often loses the nuance of what they really mean.

Think about it - when you're grabbing coffee with a colleague and they ask about working with Sarah, you naturally say things like:

"Oh, Sarah's great! She really stepped up during that client crisis last month. Her communication style is super clear, and she's gotten so much better at stakeholder management since she started. The only thing I'd say is sometimes she gets a bit overwhelmed when we have too many competing priorities, but honestly, who doesn't?"

But put that same person in front of a survey form asking "Rate Sarah's communication skills from 1-10" and suddenly they freeze up.

Building the Voice Experience

The User Journey I Designed

Here's what the experience looks like now:

Smart Setup: My system already knows who you work with (I analyze your calendar, emails, and project collaborations), so when you start a feedback session, your teammates are automatically loaded.

Voice Coach Introduction: An AI voice coach greets you by name and explains the process: "Hi! I'm here to collect feedback about your teammates. I'll have a brief conversation about each person, asking 3 focused questions. Ready to start with Sarah?"

Natural Conversation: Instead of reading survey questions, you have a 3-4 minute conversation about each teammate. The AI asks follow-up questions, clarifies responses, and keeps things focused.

Automatic Processing: After each conversation, my system extracts structured feedback data and converts it into the same format as traditional surveys - but with much richer context.

Review & Refine: You can review the extracted feedback, make edits, and add anything the AI missed before finalizing.

The Technical Architecture (Simplified)

I built this as a React application with several key components:

Frontend: A VoiceInterface component that handles the microphone, audio playback, and conversation flow. Users see a simple interface showing who they're discussing and can control the conversation pace.

Backend Integration: I created several API endpoints that work together:

/api/voice-agent/session - Creates voice feedback sessions

/api/voice-agent/contextual-data - Provides the AI with relevant context about relationships and roles

/api/voice-agent/process-transcripts - Uses OpenAI to extract structured feedback from conversations

ElevenLabs Integration: I use ElevenLabs' Conversational AI to power the voice interactions. The AI agent is configured with a specific prompt that guides it to ask exactly 3 questions per teammate and stay focused on feedback.

Here's the core prompt I use:

plain text

You are a friendly and professional AI feedback coach helping someone provide thoughtful, contextual feedback about ONE specific teammate. Your questions should be tailored to their working relationship and industry context.

CONVERSATION CONTEXT:
- Teammate to discuss: {{current_teammate}}
- Their relationship: {{current_relationship}} 
- Their job title: {{current_job_title}}
- Relationship type: {{relationship_type}}
- Company industry: {{company_industry}}
- Company name: {{company_name}}
- Recent questions context: {{recent_questions_context}}
- Conversation: {{teammate_number}} of {{total_teammates}}
- The purpose of the discussion: {{conversation_type}} - {{session_purpose}}

CONVERSATION BOUNDARIES:
- Keep the ENTIRE conversation under {{max_duration_minutes}} minutes maximum
- Focus ONLY on {{current_teammate}} - don't discuss other people
- Use active listening skills to summarize, paraphrase, or ask probing follow up questions
- End naturally when you have sufficient detail across key areas

RELATIONSHIP-SPECIFIC FOCUS:
**If relationship_type is "manager-report" (they manage {{current_teammate}}):**
Focus on managerial skills: leadership style, communication, delegation, goal setting, team collaboration, conflict resolution, feedback provision, motivation, development opportunities, and alignment with company values.

**If relationship_type is "report-manager" ({{current_teammate}} is their manager):**
Focus on leadership effectiveness: communication style, delegation skills, decision-making, team development, coaching ability, emotional intelligence, and conflict resolution.

**If relationship_type is "peer":**
Focus on peer-to-peer collaboration: teamwork, communication, project collaboration, knowledge sharing, and contributions to team success. Avoid referring to managers to prevent confusion.

**If relationship_type is "skip-level-manager" ({{current_teammate}} is skip-level above them):**
Focus on organizational leadership: leadership style, communication across levels, decision-making impact, vision setting, and team dynamics influence.

**If relationship_type is "skip-level-report" ({{current_teammate}} is multiple levels below):**
Focus on leadership qualities visible from distance: communication effectiveness, decision-making impact, team management, vision & strategy, and organizational alignment.

**If relationship_type is "peer-with-boss" (cross-organizational peers):**
Focus on cross-functional collaboration: communication, influence, strategic thinking, and partnership effectiveness. Avoid mentioning specific managers.

INDUSTRY-SPECIFIC GUIDANCE:
**If company_industry is provided:**
- Ask questions specific to {{company_industry}} business operations
- Focus on skills and behaviors important in {{company_industry}} workforce
- Reference industry-relevant challenges and opportunities
- Consider {{current_job_title}} responsibilities within {{company_industry}} context
- Do not ask questions about comparing their current experience against other industry specific experience - this might be their first job in this industry.

**If current_job_title is provided:**
- Tailor questions to skills important for {{current_job_title}} roles
- Ask about job-specific competencies and behaviors
- Focus on performance areas relevant to their position

CONVERSATION STYLE:
**BE CONVERSATIONAL & ADAPTIVE:**
- Start with a rating question (1-10 scale)
- Make sure your questions have a mix of rating and open-ended questions
- Use active listening skills to summarize, paraphrase, or ask probing follow up questions

**AVOID REPETITION:**
{{recent_questions_context}}

**QUALITY GUIDELINES:**
- Focus on observable behaviors and actions, not general traits
- Suggest the use of SBI (Situation-Behavior-Impact), DESC (Describe-Express-Specify-Consequences), STAR (Situation-Task-Action-Result), COIN (Context-Observation-Impact-Next steps), or AID (Action-Impact-Development) methods when asking for feedback
- Keep questions professional and appropriate
- Make questions industry-relevant when possible

CONVERSATION FLOW:
**START:** Ask a question about {{current_teammate}}
**MIDDLE - ADAPT BASED ON RELATIONSHIP OR JOB TITLE**
**ACTIVE LISTENING EXAMPLES:**
- "Could you clarify what you mean by..."
- "So, it sounds like you're saying..."
- "What I'm hearing you say is..."
- "I understand,..."
- "That makes sense..."
- "What are your thoughts on this?"
- "How did that make you feel?"
- "What changes would you make?"
- "Thank you for sharing"

**END NATURALLY:** When you have sufficient detail OR approaching {{max_duration_minutes}} minutes:
"Great! That completes the feedback for {{current_teammate}}. You can now move to your next teammate or finish the session." and then end the conversation

IMPORTANT GUARDRAILS:
- NEVER exceed {{max_duration_minutes}} minutes total conversation time
- Stay focused solely on {{current_teammate}}
- Don't ask about other teammates, managers, or general company topics
- Use the relationship context to ask relevant questions
- Incorporate industry and role context naturally
- Quality over quantity - detailed insights matter more than many questions

Remember: This is a natural conversation informed by their specific working relationship, industry context, and role requirements. Adapt your questions based on their responses while staying focused on gathering meaningful, contextual feedback about {{current_teammate}}.

Smart Context Integration

The magic happens in how I provide context to the AI. Before each conversation, my system automatically gathers:

Relationship data: "Sarah is your peer - you both report to Mike"

Role information: "She's a Senior Product Manager"

Recent feedback patterns: "Let's explore different aspects than previous feedback"

Company context: Industry, company values, recent projects

This means the AI can ask intelligent, contextual questions like:

"How would you rate Sarah's stakeholder communication skills given her role in product management?"

"As peers working on the same team, how well does she collaborate on cross-functional projects?"

Technical Challenges I Solved

1. Audio Quality and Reliability

Voice applications are notoriously finicky. I spent considerable time ensuring the audio pipeline was robust:

javascript

typescript

// My VoiceInterface component handles multiple audio states
const [connectionStatus, setConnectionStatus] = useState('disconnected');
const [isRecording, setIsRecording] = useState(false);
const [conversationId, setConversationId] = useState(null);

// Graceful fallback to traditional survey if voice fails
const handleVoiceError = (error) => {
  console.error('Voice interface error:', error);
  toast({
    title: 'Voice Agent Error',
    description: 'There was an issue with the voice agent. Please try the traditional survey instead.',
    variant: 'destructive',
  });
  router.push(`/feedback/choice?session=${sessionId}`);
};

2. Session State Management

Keeping track of who's been discussed, storing transcripts, and managing the overall session flow required careful state management:

javascript

typescript

// Track progress through teammates
const [teammates, setTeammates] = useState([]);
const [currentTeammateIndex, setCurrentTeammateIndex] = useState(0);
const [sessionComplete, setSessionComplete] = useState(false);

const handleTeammateComplete = async (transcript) => {
  // Mark current teammate as discussed
  const updatedTeammates = [...teammates];
  updatedTeammates[currentTeammateIndex].discussed = true;
  updatedTeammates[currentTeammateIndex].transcript = transcript;
  
  // Check if all teammates are complete
  const completedCount = updatedTeammates.filter(t => t.discussed).length;
  if (completedCount >= teammates.length) {
    setSessionComplete(true);
  }
};

3. Converting Conversations to Structured Data

The trickiest part was reliably extracting structured feedback from natural conversations. I use OpenAI's GPT-4 with a carefully crafted prompt:

javascript

plain text

const openaiPrompt = `
You are analyzing a voice conversation transcript to extract the natural questions and answers that were discussed.

CONTEXT:
- This is feedback about ${transcript.recipientName}
- Company industry: ${session.feedback_cycles.companies?.industry || 'Unknown'}

TRANSCRIPT:
${transcript.transcript}

TASK:
Extract the natural questions and answers from this conversation. For each meaningful topic discussed:

1. Identify the core question being addressed
2. Extract the response/answer given
3. Determine if this is better represented as a rating (1-10 scale) or text response
4. Create a clear, professional question that captures what was discussed

Return a JSON array in this format:
[
  {
    "questionText": "How would you rate Sarah's communication skills?",
    "questionType": "rating",
    "ratingValue": 7,
    "textResponse": null,
    "hasComment": true,
    "commentText": "She's very clear in meetings and follows up well."
  }
]

This approach lets me capture the nuance of spoken feedback while still generating the structured data needed for analysis and reporting.

What I Learned

1. Voice UI is Different from Web UI

Designing for voice required rethinking my entire user experience. Visual feedback becomes crucial when users can't see traditional UI elements. I added clear audio cues, conversation progress indicators, and easy ways to pause or restart.

2. Context is Everything

The AI coach is only as good as the context I provide. Spending time on the relationship mapping and contextual data gathering was crucial for natural conversations.

3. Fallbacks are Essential

Not everyone is comfortable with voice, and technical issues happen. Having a seamless fallback to traditional surveys was critical for user trust.

4. Processing is Key

The conversation is just the beginning - the real value comes from reliably extracting actionable insights from natural speech patterns.

Looking Forward

I'm continuing to iterate on the voice experience based on user feedback. Some areas I'm exploring:

Custom voice training: Training AI coaches that match company culture and communication styles

Real-time feedback: Moving beyond formal review cycles to ongoing voice-based feedback

Integration with daily workflows: Voice feedback triggered by project completions or meeting patterns

The Bigger Picture

This voice integration represents something bigger than just a feature add - it's about removing friction from human connection in the workplace. Performance feedback should feel natural, not like homework.

By combining smart automation (knowing who you work with) with natural interaction (voice conversations), I'm moving closer to my vision of feedback that actually helps people grow without consuming everyone's time.

My advisor was right: the barrier isn't that people don't know what to say - it's that writing it down feels impossible. Voice removes that barrier entirely.

The traditional performance review might finally be getting the upgrade it desperately needed.

Want to try Candor's voice-powered feedback system? I'm currently in beta with select companies. Get in touch to learn more.

For the Technical Folks

If you're curious about implementation details, here are some key technical decisions I made:

Architecture: Next.js React app with Supabase backend, ElevenLabs for voice, OpenAI for transcript processing

Authentication: Supabase Auth with row-level security

Real-time features: WebSocket connections for voice, real-time UI updates for session progress

Data pipeline: Voice → ElevenLabs → Transcript → OpenAI → Structured feedback → Database

Deployment: Vercel for frontend, Supabase for backend and database

The full implementation involves about 15 API endpoints and several React components, but the core concept is surprisingly straightforward: create context, have a conversation, extract insights, store structured data.