Description
VoiceCast
VoiceCast is a bridge that links voice commands to Hytale RootInteractions, letting modders build voice-activated casting and interaction flows.
You define “spells” (castables) in assets, each pointing to a RootInteraction plus one or more voice aliases (keywords). When the player says a keyword, VoiceCast triggers that interaction.
What is VoiceCast?
VoiceCast is a lightweight system that listens to a player’s voice and triggers RootInteractions when recognized keywords match your configured spell aliases.
In practice: you create a VoiceCastCastable asset with:
rootInteractionId(what gets executed)languageIds/ aliases (what words/phrases can activate it)- optional requirements (required item / consumed item)
Modes
VoiceCast currently supports two backends:
1) Native (Experimental)
Native mode hooks directly into the game’s voice chat stream (the proximity voice chat packets) and performs offline speech-to-text using Vosk.
- Status: Experimental
- Availability: Exclusive (for now) to the current pre-release builds that include built-in voice chat
- Pros: no browser link, no Web Speech API quirks, works offline, consistent behavior server-side
- Requirement: a Vosk model (auto-download supported)
2) Web (Legacy / optional)
Web mode exposes a small web UI where the player opens a private link in a browser. The browser captures mic audio and performs speech recognition (Web Speech API).
- Status: available, but not the default focus anymore (now that native voice chat exists in pre-release)
- Pros: zero model download, simple conceptually
- Cons: requires opening a browser + depends heavily on browser support / permissions
Language support
VoiceCast supports multiple languages depending on the selected backend:
- Native (Vosk): depends on which Vosk model you install (or auto-download). Default is English out of the box.
- Web (Web Speech API): depends on the browser’s speech engine and locale.
If you want a language added to the default auto-download mapping, tell me which language + which Vosk model you want as the “recommended small model”.
How it works (high level)
Native (Experimental)
- VoiceCast reads incoming voice chat audio packets (pre-release voice chat).
- Audio is decoded server-side and sent to Vosk.
- The transcript is matched against your spell aliases.
- VoiceCast triggers the mapped RootInteraction for that player.
Web (Optional)
- The server exposes a small web UI (embedded web server).
- A player runs
/voicecastto generate a private clickable link. - The player opens the link in a browser and starts listening.
- VoiceCast matches the transcript and triggers the mapped RootInteraction.
Configuration & server setup
To keep this README short and accurate, the full configuration guide lives in the Wiki, including:
- Native setup (Vosk model auto-download, language mapping, troubleshooting)
- Web setup (LAN/dedicated/domain/HTTPS)
- Common issues and recommended settings
Roadmap
- Improve native accuracy and UX (better phrase handling, better alias matching, per-language tuning)
- Expand default model auto-download mappings (more languages)
- Keep web mode as a fallback where native voice chat isn’t available
Bug reports
Please report bugs and weird behavior. When reporting, include:
- Your server version (especially whether it’s the pre-release with voice chat)
- Your VoiceCast config
- Console logs (enable debug logs if needed)
- A sample spell config (redact private info)


