React 앱에서 브라우저 Speech Recognition API 사용하기

브라우저의 Web Speech Recognition API를 활용하면 React 앱에서 음성 받아쓰기를 간단히 구현할 수 있습니다. 실제 프로젝트 기준으로 지원 브라우저, 훅 설계, 성능/UX 팁까지 한 번에 정리합니다.

1. 지원 여부와 제약

- 지원 브라우저: Chromium 계열(Chrome, Edge)과 Safari에서 주로 동작합니다. 구현체 이름은 대부분 webkitSpeechRecognition이며, 표준 SpeechRecognition도 일부 환경에서 제공됩니다. 파이어폭스는 미지원입니다.

- 보안 맥락: 마이크 권한과 동일하게 HTTPS(또는 localhost)에서만 안정적으로 동작합니다. 시작은 사용자 제스처(버튼 클릭 등)로 트리거해야 합니다.

- 세션 특성: 브라우저/OS에 따라 연속 인식이 중간에 종료될 수 있어 onend에서 재시작 처리가 필요합니다.

2. 커스텀 훅 만들기 (useSpeechRecognition)

import { useEffect, useMemo, useRef, useState, useCallback } from 'react';

export function useSpeechRecognition(options = {}) {
  const {
    lang = 'ko-KR',
    interim = true,
    continuous = true,
    autoRestart = true,
    throttleMs = 120,
  } = options;

  const SpeechRecognition = useMemo(() => {
    if (typeof window === 'undefined') return null;
    return window.SpeechRecognition || window.webkitSpeechRecognition || null;
  }, []);

  const supported = !!SpeechRecognition;

  const recognitionRef = useRef(null);
  const shouldRestartRef = useRef(false);
  const [isListening, setIsListening] = useState(false);
  const [error, setError] = useState(null);
  const [currentLang, setCurrentLang] = useState(lang);
  const [interimTranscript, setInterimTranscript] = useState('');
  const [finalTranscript, setFinalTranscript] = useState('');
  const interimBufferRef = useRef('');
  const finalBufferRef = useRef('');
  const throttleTimerRef = useRef(null);

  useEffect(() => {
    if (!supported) return;
    const rec = new SpeechRecognition();
    rec.interimResults = interim;
    rec.continuous = continuous;
    rec.lang = currentLang;

    rec.onstart = () => {
      setIsListening(true);
      setError(null);
    };

    rec.onresult = (e) => {
      let interimChunk = '';
      let finalChunk = '';
      for (let i = e.resultIndex; i < e.results.length; i++) {
        const res = e.results[i];
        const text = res[0].transcript;
        if (res.isFinal) {
          finalChunk += text + ' ';
        } else {
          interimChunk += text + ' ';
        }
      }
      if (finalChunk) {
        finalBufferRef.current += finalChunk;
        setFinalTranscript(finalBufferRef.current.trim());
        interimBufferRef.current = '';
        setInterimTranscript('');
      }
      if (interimChunk) {
        interimBufferRef.current = interimChunk;
        if (!throttleTimerRef.current) {
          throttleTimerRef.current = setTimeout(() => {
            setInterimTranscript(interimBufferRef.current.trim());
            throttleTimerRef.current = null;
          }, throttleMs);
        }
      }
    };

    rec.onerror = (e) => {
      setError(e.error || 'unknown-error');
      const fatal = ['not-allowed', 'service-not-allowed', 'bad-grammar'].includes(e.error);
      if (fatal) {
        shouldRestartRef.current = false;
      }
    };

    rec.onend = () => {
      setIsListening(false);
      if (autoRestart && shouldRestartRef.current) {
        try { rec.start(); } catch (_) {}
      }
    };

    recognitionRef.current = rec;
    return () => {
      try { rec.onresult = null; rec.onend = null; rec.onerror = null; rec.onstart = null; } catch (_) {}
      try { rec.stop(); } catch (_) {}
      recognitionRef.current = null;
    };
  }, [SpeechRecognition, supported, interim, continuous, autoRestart, currentLang, throttleMs]);

  const start = useCallback(() => {
    if (!supported || !recognitionRef.current) return;
    shouldRestartRef.current = true;
    setError(null);
    try { recognitionRef.current.start(); } catch (_) {}
  }, [supported]);

  const stop = useCallback(() => {
    shouldRestartRef.current = false;
    if (!supported || !recognitionRef.current) return;
    try { recognitionRef.current.stop(); } catch (_) {}
  }, [supported]);

  const toggle = useCallback(() => {
    if (isListening) stop(); else start();
  }, [isListening, start, stop]);

  const reset = useCallback(() => {
    interimBufferRef.current = '';
    finalBufferRef.current = '';
    setInterimTranscript('');
    setFinalTranscript('');
    setError(null);
  }, []);

  const setLang = useCallback((next) => {
    setCurrentLang(next);
  }, []);

  return {
    supported,
    isListening,
    error,
    interimTranscript,
    finalTranscript,
    start,
    stop,
    toggle,
    reset,
    setLang,
    lang: currentLang,
  };
}

핵심 포인트: window.SpeechRecognition와 window.webkitSpeechRecognition 모두를 검사합니다. 중간 결과(interim)는 상태 업데이트를 throttle하여 리렌더 폭주를 막습니다. onend에서 자동 재시작을 구현해 모바일/사파리 종료 문제를 완화합니다.

3. 예제 컴포넌트: 언어 선택 + 시작/정지 + 복사

import React, { useState } from 'react';
import { useSpeechRecognition } from './useSpeechRecognition';

const LANGS = [
  { code: 'ko-KR', label: '한국어' },
  { code: 'en-US', label: 'English (US)' },
  { code: 'ja-JP', label: '日本語' },
];

export default function VoiceDemo() {
  const [selected, setSelected] = useState('ko-KR');
  const {
    supported,
    isListening,
    error,
    interimTranscript,
    finalTranscript,
    start,
    stop,
    toggle,
    reset,
    setLang,
    lang,
  } = useSpeechRecognition({ lang: selected, interim: true, continuous: true, autoRestart: true });

  const onChangeLang = (e) => {
    const next = e.target.value;
    setSelected(next);
    setLang(next);
  };

  const copyAll = async () => {
    try {
      await navigator.clipboard.writeText((finalTranscript + ' ' + interimTranscript).trim());
      alert('클립보드에 복사했습니다.');
    } catch (e) {
      console.error(e);
      alert('복사에 실패했습니다.');
    }
  };

  if (!supported) {
    return (
      <div>
        <p>이 브라우저는 Speech Recognition API를 지원하지 않습니다.</p>
        <p>대안: WebAssembly/서버 기반 STT(예: OpenAI Whisper, Google Cloud STT, Azure Speech)를 사용하거나 지원 브라우저로 접속하세요.</p>
      </div>
    );
  }

  return (
    <div style={{ maxWidth: 720, margin: '0 auto', fontFamily: 'system-ui, sans-serif' }}>
      <h2>음성 받아쓰기 데모</h2>

      <label>
        언어: <select value={lang} onChange={onChangeLang}>
          {LANGS.map(l => (
            <option key={l.code} value={l.code}>{l.label}</option>
          ))}
        </select>
      </label>

      <div style={{ marginTop: 12 }}>
        <button onClick={toggle} aria-pressed={isListening}>{isListening ? '정지' : '시작'}</button>
        <button onClick={reset} style={{ marginLeft: 8 }}>초기화</button>
        <button onClick={copyAll} style={{ marginLeft: 8 }}>복사</button>
      </div>

      <div style={{ marginTop: 16, padding: 12, border: '1px solid #ddd', borderRadius: 6 }}>
        <p><strong>최종 텍스트</strong></p>
        <p>{finalTranscript || '인식 결과가 여기에 표시됩니다.'}</p>
        <p style={{ opacity: 0.7, fontStyle: 'italic' }}>{interimTranscript}</p>
      </div>

      {error && (
        <p style={{ color: 'crimson' }}>오류: {String(error)}</p>
      )}

      <p style={{ marginTop: 8, fontSize: 12, color: '#666' }}>
        팁: 버튼 클릭으로 시작해야 권한 팝업이 안정적으로 표시됩니다. 정지 후 다시 시작하면 언어 변경이 즉시 반영됩니다.
      </p>
    </div>
  );
}

접근성 팁: 버튼에 aria-pressed를 부여하고, 상태 변화를 시각뿐 아니라 텍스트로도 전달합니다. 가능한 경우 키보드 접근을 고려합니다.

4. Next.js/SSR 안전 처리

SSR 환경에서 window가 없으므로 컴포넌트를 클라이언트 전용으로 로드합니다.

// pages or app 라우트에서
import dynamic from 'next/dynamic';
const VoiceDemo = dynamic(() => import('../components/VoiceDemo'), { ssr: false });
export default function Page() {
  return <VoiceDemo />;
}

또는 훅 내부에서 typeof window 검사를 이미 수행했으므로, 초기 렌더 시 supported가 false였다가 클라이언트에서 true가 되는 깜빡임을 피하려면 위처럼 ssr: false가 권장됩니다.

5. UX와 성능 최적화 팁

- 상태 업데이트 최소화: interimTranscript는 throttle로 묶어 리렌더를 줄입니다. 대용량 상태는 useRef로 누적하고, 화면 반영만 가볍게 setState합니다.

- 자동 재시작: 모바일/사파리에서 인식이 자주 종료될 수 있어 onend에서 조건부 재시작을 걸어 둡니다.

- 언어 코드: BCP-47 코드(ko-KR, en-US 등)를 사용하면 인식률이 향상됩니다. UI에 언어 전환을 노출하세요.

- 사용자 제스처: 자동 시작은 차단될 수 있습니다. 버튼 클릭으로 start()를 호출하세요.

- 네트워크/서비스 한계: 브라우저 구현은 품질 편차가 있습니다. 정확도가 핵심이면 서버 기반 STT를 옵션으로 제공합니다.

6. 에러와 디버깅 체크리스트

- not-allowed: 마이크 권한 거부. 브라우저 설정에서 권한을 허용해야 합니다.

- audio-capture: 입력 장치가 없거나 접근 불가. 외부 마이크 확인.

- no-speech: 무음으로 타임아웃. 마이크 감도와 발화 시간을 확인.

- service-not-allowed: 브라우저 정책/플래그 문제. 최신 버전 업데이트 또는 설정 확인.

오류 발생 시 자동 재시작을 중단하고 사용자에게 안내 문구와 재시도 버튼을 제공합니다.

7. 확장 아이디어

- 음성 명령: 최종 텍스트에서 키워드(예: "초기화")를 파싱해 앱 액션을 트리거합니다.

- 문장 부호: 일정 길이마다 마침표를 보정하거나 단축키로 구두점 삽입 UI를 제공합니다.

- 서버 대체: 정확도 향상을 위해 Whisper 등 서버 STT API를 백업 경로로 두고, 브라우저 미지원 시 자동 스위치합니다.

8. 마무리

Web Speech Recognition API는 간단한 받아쓰기와 음성 명령에 충분히 실용적입니다. 위 훅과 컴포넌트 패턴을 사용하면 React에서 안전하고 부드러운 UX를 구현할 수 있습니다. 프로덕션에서는 브라우저 지원, 권한 처리, 자동 재시작, 성능 최적화를 반드시 포함하세요.

저작자표시 변경금지 (새창열림)

'React' 카테고리의 다른 글

React에서 사용자 프로필 편집 페이지 제작하기 (0)	2026.05.29
React로 차트 애니메이션 직접 구현하기 (0)	2026.05.28
React에서 모션 센서 API 활용하여 인터랙션 강화하기 (0)	2026.05.28
React에서 Intersection Observer로 비디오 자동 재생 제어하기 (0)	2026.05.27
React 앱에서 로컬 개발 환경과 프로덕션 환경 분리하기 (0)	2026.05.27

이것저것 소소한 생활이야기

React 앱에서 브라우저 Speech Recognition API 사용하기

1. 지원 여부와 제약

2. 커스텀 훅 만들기 (useSpeechRecognition)

3. 예제 컴포넌트: 언어 선택 + 시작/정지 + 복사

4. Next.js/SSR 안전 처리

5. UX와 성능 최적화 팁

6. 에러와 디버깅 체크리스트

7. 확장 아이디어

8. 마무리

'React' 카테고리의 다른 글

티스토리툴바

React 앱에서 브라우저 Speech Recognition API 사용하기

1. 지원 여부와 제약

2. 커스텀 훅 만들기 (useSpeechRecognition)

3. 예제 컴포넌트: 언어 선택 + 시작/정지 + 복사

4. Next.js/SSR 안전 처리

5. UX와 성능 최적화 팁

6. 에러와 디버깅 체크리스트

7. 확장 아이디어

8. 마무리

'React' 카테고리의 다른 글

'React' Related Articles

티스토리툴바