promotional bannermobile promotional banner
premium banner
VoiceCast is a bridge that links voice commands to Hytale RootInteractions, enabling modders to build voice-activated casting and interaction flows.

Description

VoiceCast

VoiceCast is a bridge that links voice commands to Hytale RootInteractions, letting modders build voice-activated casting and interaction flows.

You define “spells” (castables) in assets, each pointing to a RootInteraction plus one or more voice aliases (keywords). When the player says a keyword, VoiceCast triggers that interaction.


What is VoiceCast?

VoiceCast is a lightweight system that listens to a player’s voice and triggers RootInteractions when recognized keywords match your configured spell aliases.

In practice: you create a VoiceCastCastable asset with:

  • rootInteractionId (what gets executed)
  • languageIds / aliases (what words/phrases can activate it)
  • optional requirements (required item / consumed item)

Modes

VoiceCast currently supports two backends:

1) Native (Experimental)

Native mode hooks directly into the game’s voice chat stream (the proximity voice chat packets) and performs offline speech-to-text using Vosk.

  • Status: Experimental
  • Availability: Exclusive (for now) to the current pre-release builds that include built-in voice chat
  • Pros: no browser link, no Web Speech API quirks, works offline, consistent behavior server-side
  • Requirement: a Vosk model (auto-download supported)

2) Web (Legacy / optional)

Web mode exposes a small web UI where the player opens a private link in a browser. The browser captures mic audio and performs speech recognition (Web Speech API).

  • Status: available, but not the default focus anymore (now that native voice chat exists in pre-release)
  • Pros: zero model download, simple conceptually
  • Cons: requires opening a browser + depends heavily on browser support / permissions

Language support

VoiceCast supports multiple languages depending on the selected backend:

  • Native (Vosk): depends on which Vosk model you install (or auto-download). Default is English out of the box.
  • Web (Web Speech API): depends on the browser’s speech engine and locale.

If you want a language added to the default auto-download mapping, tell me which language + which Vosk model you want as the “recommended small model”.


How it works (high level)

Native (Experimental)

  1. VoiceCast reads incoming voice chat audio packets (pre-release voice chat).
  2. Audio is decoded server-side and sent to Vosk.
  3. The transcript is matched against your spell aliases.
  4. VoiceCast triggers the mapped RootInteraction for that player.

Web (Optional)

  1. The server exposes a small web UI (embedded web server).
  2. A player runs /voicecast to generate a private clickable link.
  3. The player opens the link in a browser and starts listening.
  4. VoiceCast matches the transcript and triggers the mapped RootInteraction.

Configuration & server setup

To keep this README short and accurate, the full configuration guide lives in the Wiki, including:

  • Native setup (Vosk model auto-download, language mapping, troubleshooting)
  • Web setup (LAN/dedicated/domain/HTTPS)
  • Common issues and recommended settings

Roadmap

  • Improve native accuracy and UX (better phrase handling, better alias matching, per-language tuning)
  • Expand default model auto-download mappings (more languages)
  • Keep web mode as a fallback where native voice chat isn’t available

Bug reports

Please report bugs and weird behavior. When reporting, include:

  • Your server version (especially whether it’s the pre-release with voice chat)
  • Your VoiceCast config
  • Console logs (enable debug logs if needed)
  • A sample spell config (redact private info)