Voice Cast

ByS3B4S5C
Mods

VoiceCast is a bridge that links voice commands to Hytale RootInteractions, enabling modders to build voice-activated casting and interaction flows.

Description

VoiceCast

VoiceCast is a bridge that links voice commands to Hytale RootInteractions, letting modders build voice-activated casting and interaction flows.

You define “spells” (castables) in assets, each pointing to a RootInteraction plus one or more voice aliases (keywords). When the player says a keyword, VoiceCast triggers that interaction.

What is VoiceCast?

VoiceCast is a lightweight system that listens to a player’s voice and triggers RootInteractions when recognized keywords match your configured spell aliases.

In practice: you create a VoiceCastCastable asset with:

rootInteractionId (what gets executed)
languageIds / aliases (what words/phrases can activate it)
optional requirements (required item / consumed item)

Modes

VoiceCast currently supports two backends:

1) Native (Experimental)

Native mode hooks directly into the game’s voice chat stream (the proximity voice chat packets) and performs offline speech-to-text using Vosk.

Status: Experimental
Availability: Exclusive (for now) to the current pre-release builds that include built-in voice chat
Pros: no browser link, no Web Speech API quirks, works offline, consistent behavior server-side
Requirement: a Vosk model (auto-download supported)

2) Web (Legacy / optional)

Web mode exposes a small web UI where the player opens a private link in a browser. The browser captures mic audio and performs speech recognition (Web Speech API).

Status: available, but not the default focus anymore (now that native voice chat exists in pre-release)
Pros: zero model download, simple conceptually
Cons: requires opening a browser + depends heavily on browser support / permissions

Language support

VoiceCast supports multiple languages depending on the selected backend:

Native (Vosk): depends on which Vosk model you install (or auto-download). Default is English out of the box.
Web (Web Speech API): depends on the browser’s speech engine and locale.

If you want a language added to the default auto-download mapping, tell me which language + which Vosk model you want as the “recommended small model”.

How it works (high level)

Native (Experimental)

VoiceCast reads incoming voice chat audio packets (pre-release voice chat).
Audio is decoded server-side and sent to Vosk.
The transcript is matched against your spell aliases.
VoiceCast triggers the mapped RootInteraction for that player.

Web (Optional)

The server exposes a small web UI (embedded web server).
A player runs /voicecast to generate a private clickable link.
The player opens the link in a browser and starts listening.
VoiceCast matches the transcript and triggers the mapped RootInteraction.

Configuration & server setup

To keep this README short and accurate, the full configuration guide lives in the Wiki, including:

Native setup (Vosk model auto-download, language mapping, troubleshooting)
Web setup (LAN/dedicated/domain/HTTPS)
Common issues and recommended settings

Roadmap

Improve native accuracy and UX (better phrase handling, better alias matching, per-language tuning)
Expand default model auto-download mappings (more languages)
Keep web mode as a fallback where native voice chat isn’t available

Bug reports

Please report bugs and weird behavior. When reporting, include:

Your server version (especially whether it’s the pre-release with voice chat)
Your VoiceCast config
Console logs (enable debug logs if needed)
A sample spell config (redact private info)