C2ME OpenCL Acceleration Module

C2ME addon that provides hardware accelerated world generation through OpenCL

C2ME OpenCL Acceleration Module

Experimental C2ME addon that provides hardware accelerated world generation through OpenCL. Requires the base C2ME mod.
It is strongly recommended to install ScalableLux, because lighting can easily become a bottleneck.

Note

This mod requires Java 25 to function correctly, even on versions before 26.1.

World generation should have full vanilla parity in vanilla worldgen, with one exception:
Biome borders may get shifted by one or two blocks in very rare cases due to the vanilla implementation being order-dependent.

Usual worldgen non-determinism applies.

Currently only noise stage and biome stage are implemented.
The expected performance uplift is 80+% on vanilla overworld when cpu-bound. Performance will vary depending on seeds, datapacks, etc.

Also, worldgen now include GPU driver bugs in them. Please backup your worlds before using this on existing worlds.
Some worldgen mods is known to fail catastrophically. See Mod compatibility section for details.

Platform compatibility matrix

  • Supported: known to be fully working in general
  • Partial: known to be working with some big caveats
  • Unsupported: known to be not working at all
  • N/A: Not applicable because the combination doesn't exist.
Vendor Generation Driver Windows Linux
NVIDIA Maxwell and beyond Proprietary and Open Supported Partial¹
NVIDIA Kepler Proprietary Unknown Supported
NVIDIA Older cards Any Unknown Unknown
NVIDIA nouveau supported GPUs Rusticl on nouveau N/A Unknown
Intel Gen9, Gen9.5⁵ Official² Partial³ Partial⁴
Intel Gen11, Gen12, Gen12.5⁶ Official² Unsupported¹² Unsupported¹²
Intel Gen12.7, Xe2, Xe3 and beyond⁷ Official² Partial³ Partial⁴
Intel Older graphics Official² Unknown Unknown
Intel iris supported GPUs Rusticl on iris N/A Unknown
AMD RDNA1 and beyond¹⁴ Official⁸ Supported⁹ Supported
AMD GCN Official¹⁰ Unsupported¹³ Unsupported¹³
AMD radeonsi supported GPUs Rusticl on radeonsi N/A Partial¹¹
Qualcomm Any Official Unsupported¹² Unsupported¹²
Apple Any Official Unsupported¹² Unsupported¹²
  • ¹ The driver is known to hang after a while. Driver branch 535 LTS seem to work fine.
  • ² The official driver package on Windows. Gen9 and Gen9.5 needs up-to-date drivers:
    Gen9: https://www.intel.com/content/www/us/en/download/762755/intel-6th-gen-processor-graphics-windows.html
    Gen9.5: https://www.intel.com/content/www/us/en/download/776137/intel-7th-10th-gen-processor-graphics-windows.html
    For Linux: https://github.com/intel/compute-runtime
  • ³ GPU is known to crash on pretty much all non-vanilla worldgen. Your millage may vary.
  • ⁴ GPU is known to crash with some complex worldgen datapacks, such as Terralith. Your millage may vary.
  • ⁵ Gen9 and Gen9.5 are integrated graphics on 6th-9th gen core processors, and 10th gen non-G series core processors
  • ⁶ Gen11 and Gen12, Gen12.5 here are integrated graphics in 10th G-series core processors, 11th-14th gen core processors, plus Arc DG1, Arc A-series
  • ⁷ Gen12.7 refers to Meteor Lake and Arrow Lake integrated graphics.
    At this point it is integrated graphics in Core Ultra 100 series and above, plus Battlemage dedicated graphic and above.
  • ⁸ The official driver package on Windows. The ROCm runtime on Linux.
  • ⁹ Driver versions 26.5.1 is known to always crash. Existing installations upgraded to 26.6.1 may also crash as well.
    If you are experiencing crashes on 26.6.1, it is recommended to DDU then do a fresh installation.
  • ¹⁰ The official driver package on Windows. The AMDGPU-Pro runtime on Linux.
  • ¹¹ Mesa 26.1.x branch is known to work on RDNA3/4. Anything can happen with any hardware combination, including corrupted worldgen. Your millage may vary.
  • ¹² Missing FP64 support
  • ¹³ Driver crashes
  • ¹⁴ Integrated graphics for Ryzen 7000 series and 9000 series not included. They are too slow for this task.

Any hardware not listed here are in Unknown status. Feel free to test other hardware configurations that meets the minimum requirements detailed below.

Minimum hardware requirements

  • a working OpenCL 1.2+ driver
  • cl_khr_fp64 support (fp64 support)

Nice to have things

  • a working OpenCL 3.0 driver
  • cl_khr_device_uuid for stable device matching
  • CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE for optimal performance. Not present on AMD GPUs
  • cl_khr_priority_hints and cl_khr_throttle_hints for queue priority. Only known to be present on Intel GPUs
  • Non-uniform workgroups, not present on Nvidia GPUs and some AMD ones

Performance expectations using Chunky

For 1200+ cps targets in vanilla overworld:

CPUs: modern mid-range desktops (9700X, 9800X3D, 245K)
GPUs:

  • Nvidia GTX 1060 or greater
  • AMD Radeon RX 6500 XT or greater
  • Intel Arc B570 or greater (there's no weaker GPUs from Intel, so there's that)

For 2500+ cps targets in vanilla overworld:

CPUs: modern flagship desktops (9950X, 9950X3D, 285K, 270K+)
GPUs:

  • Nvidia GTX 1080 Ti or higher, RTX 4060 Ti or higher
  • AMD Radeon RX 7600 XT or greater
  • Intel Arc B570 or greater (there's still no weaker GPUs from Intel)

Usage

Windows

  1. Install Fabric
  2. Install the mod
  3. Done

Linux

Since most linux distributions do not ship OpenCL out-of-the-box, you'll have to install it manually.

Running in Flatpak is an unsupported configuration

Flatpak currently only support Rusticl (and probably NVIDIA) as a OpenCL runtime, which is very rough on Linux. See the compatibility matrix above.

For Nvidia users

Usually the nvidia driver package your distro provides includes the OpenCL driver. You should be able to just use it. If it does not, see the following distro-specific setups.

Debian / Ubuntu

All vendors: install ocl-icd-opencl-dev
Nvidia: install nvidia-driver-full package
AMD: install rocm-opencl-icd package
Intel: install intel-opencl-icd for Gen12 and above or intel-opencl-icd-legacy for older iGPUs (not available on Debian 13 though, you'll have to compile them)

Arch Linux

https://wiki.archlinux.org/title/General-purpose_computing_on_graphics_processing_units#OpenCL
All vendors: install ocl-icd
Nvidia: install opencl-nvidia
AMD: install rocm-opencl-runtime
Intel: install intel-compute-runtime for Gen12 and above or intel-compute-runtime-legacy from AUR for older iGPUs

Fedora

All vendors: install ocl-icd-devel
Nvidia: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/fedora.html
AMD: install rocm-opencl
Intel: install intel-opencl for Xe

Using Rusticl (very experimental)

This requires Mesa 26.1 and above and rusticl+fp64 to be enabled.

Compatibility

Datapack compatibility

This feature is guaranteed to work with datapacks that can be loaded with vanilla. For example:

  • Stardust Labs datapacks (Terralith, Incedium, …)
  • Tectonic
  • CliffTree
  • … more

Mod compatibility

Most non-worldgen mods should work.

For worldgen mods:

  • Mods that repackage datapacks (that is, if renamed to a .zip that can still work as a datapack), see datapack compatibility.
  • Tectonic as a mod works
  • Mods that use custom density functions do NOT work, for now. (Such as Enderscape)
  • Mods that rolls their entire new world generator do NOT work and probably never will without a lot of work. (Such as Big Globe)
  • Some other special cases:
    • Biomes O' Plenty: causes biome placement to fail completely
    • TerraBlender: also causes biome placement to fail completely

Known issues:

  • Shader compilation is known to take a while, depending on the datapack used.
  • Extra memory usage outside of heap is expected for shader compilation.
  • PoCL with CPU backend will almost certainly crash even if pocl is blacklisted in the config. The solution is to remove it entirely.
  • Datapacks referencing minecraft:beardifier density function directly can have slight errors in terrain shape. There's no plans to fix this, as vanilla isn't affected by this, and fixing this will halve the gpu throughput.

Tuning recommendations, for people that just want worldgen to go fast

Credit goes to skillnoob_ on discord.

Mods:

  • ScalableLux (Light Engine Optimization, bottleneck in high performance chunk generation)
  • Lithium (General Optimization mod for various things)
  • FerriteCore (Memory usage Improvements)
  • Structure Layout Optimizer (Makes Structures generate faster)
  • zFastNoise (speeds up noise and surface builder in worldgen)

Java/JVM flags:

  • -XX:+UseCompactObjectHeaders -Dchunky.maxWorkingCount=768 (The -Dchunky.maxWorkingCount=768 argument is only relevant if you are using chunky).
  • Use -XX:+UseZGC if you are allocating more than 16GB of memory, otherwise use -XX:+UseG1GC or -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational.

Then in the c2me.toml in your config folder you can change the globalExecutorParallelism = "default" option to your thread count or slightly below.
For example if you have a 16 thread CPU you'd need to change it to globalExecutorParallelism = 16.
You can also enable gcFreeChunkSerializer = true in the config, which can increase chunk gen performance.

Frequently Asked Questions

Does this use multiple GPUs?

By default, it does least-busy scheduling on all OpenCL devices it can find.
However, usually it is your CPU that's the bottleneck. See below.
So don't expect multi-GPU to bring any improvements unless you only have a bunch of GT1030.

My GPU is barely used and my CPU is being maxed out. What is happening?

With a reasonable pair of CPU and GPU, you will be CPU bound. This is mostly because only noise stage and biome stage are implemented.
Other stages may get implemented in the future.

How do I select GPUs?

You can specify whitelists and blacklists in the config file. Device UUID can be found in the logs.
AMD GPU names are gfx something in the logs, not their marketing names.

Does voxy work with this?

Yes.
In short, to generate render distance for voxy, install Chunky, run /voxy import current, then start a chunky task. It is recommended to join their discord server for more information.

Does Distant Horizons work with this?

Short answer: not recommended. Use Voxy instead.

Long answer: DH is too slow to see the benefit this brings.
If you still intend to use DH, use the Internal Server / Full - Save Chunks mode in DH for acceleration to work.
Even so, you may not see improvements because LoD generation is the slowest part in the chain already.

I'm getting OpenCL error [-1001]. What does it mean?

The OpenCL ICD loader is unable to locate any OpenCL drivers. Check your driver installation.
It is recommended to use clinfo tool as a quick check.

Does this work on dedicated servers?

Works on dedicated servers and singleplayer as long as drivers is in place.
Only linux x86_64, linux arm64 and windows x86_64 binaries is shipped on dedicated server.

Can I make it fall back to normal worldgen if initialization fails?

Not by default. This can be done with openclAccel.allowIncompatibilityFallback in the config file.

I heard that Vulkan is THE graphics API to go. Why OpenCL?

  • I'm familiar with it
  • I need untyped pointers in vulkan, which did not exist in Vulkan until very recently
    and that's effectively requiring Vulkan 1.4, shrinking hardware compatibility by a lot
  • Vulkan does not clearly specify FP64 precision outside "at least that of FP32".
  • correctly rounded fp division and sqrt is still missing from the vulkan spec

Why not CUDA? or Rocm? or Level0? or Metal?

No vendor locked APIs.

I'm on AMD with a RDNA GPU, and I'm seeing crashes before worldgen even starts. Why?

Driver versions 26.5.1 is known to always crash. Existing installations upgraded to 26.6.1 may also crash as well.
If you are experiencing crashes on 26.6.1, it is recommended to DDU then do a fresh driver installation.

See footnote 9 in Platform compatibility matrix.

The C2ME OpenCL Acceleration Module Team

profile avatar
  • 24
    Followers
  • 12
    Projects
  • 39.4M
    Downloads

More from ishlandmcView all