Running Local LLMs on RHEL 9: A CTO’s Guide to Overcoming fapolicyd and GPU Detection

Running Local LLMs on RHEL 9: A CTO’s Guide to Overcoming fapolicyd and GPU Detection

The Challenge: Enterprise Security vs. Modern AI

Building an AI lab on Red Hat Enterprise Linux (RHEL) 9 is a different beast compared to Ubuntu. While most tutorials assume a "wide open" environment, RHEL comes with "hardened" defaults: SELinux and fapolicyd.

Before we dive into the terminal commands and configuration files, you can watch the high-level overview of why I made the switch from Ubuntu to RHEL and see the final result in action in this video:

Recently, while setting up an HP Z4 G4 workstation with an NVIDIA RTX 2000 Blackwell (16GB VRAM) and 128GB RAM, I ran into a classic RHEL wall: Ollama would start, but it couldn't "see" the GPU. It kept defaulting to the CPU, turning a powerhouse into a typewriter.

Here is how I solved it without compromising the system's security posture.

0: The Driver Pivot - From Proprietary to Open Kernel

The first instinct on RHEL is often to install the standard proprietary NVIDIA driver. However, with the RTX 2000 Blackwell, I hit a wall where the closed-source modules wouldn't properly negotiate memory mapping with the RHEL 9 kernel, leading to silent failures in Ollama’s CUDA initialization.

The Fix: I pivoted to the NVIDIA Open Kernel Modules (available in the 590+ driver branch).

  • Why it matters: Unlike the legacy closed drivers, the Open Kernel modules are designed to integrate more natively with the modern Linux kernel's memory management. For a CTO, this is the "cleaner" way to run high-performance compute on RHEL 9.
  • The Symlink Hack: Even with the right drivers, Ollama's discovery engine specifically looks for libnvidia-ml.so. In RHEL, the library is installed as libnvidia-ml.so.1. Without a manual symbolic link, the GPU remains "invisible" to the software.

The Action:

# Ensure you are using the 'open' flavor of the driver
sudo dnf module install nvidia-driver:open-dkms 

# Bridge the library gap for NVML detection
sudo ln -sf /usr/lib64/libnvidia-ml.so.1 /usr/lib64/libnvidia-ml.so
sudo ldconfig

1. The Invisible Barrier: fapolicyd

The most common reason for "0 VRAM" detection on RHEL 9 is fapolicyd (File Access Policy Daemon). It doesn't just block unauthorized binaries; it blocks authorized binaries from loading "untrusted" shared libraries.

When Ollama attempts to load its GPU runner (libggml-cuda.so), fapolicyd steps in and says "No."

The Solution: Instead of disabling security, we created a priority rule to allow the Ollama stack.

What to change: Create /etc/fapolicyd/rules.d/05-ollama.rules:

allow perm=open exe=/usr/local/bin/ollama : dir=/usr/local/lib/ollama/
allow perm=execute exe=/usr/local/bin/ollama : dir=/usr/local/lib/ollama/
allow perm=open all : path=/usr/lib64/libcuda.so.590.48.01
allow perm=open all : path=/usr/lib64/libnvidia-ml.so.590.48.01

Note: I also added the entire library directory to the Trust database using fapolicyd-cli -f add /usr/local/lib/ollama/.

2. Environment Overrides in Systemd

RHEL's service isolation means that even if you have the drivers installed, the ollama service might not know where to look for them.

What to change: Use systemctl edit ollama.service to inject the necessary paths and remove memory locks:

[Service]
Environment="LD_LIBRARY_PATH=/usr/lib64"
Environment="OLLAMA_MODELS=/home/ollama_models"
Environment="OLLAMA_LLM_LIBRARY=cuda_v12"
LimitMEMLOCK=infinity

This forces Ollama to use the system's NVIDIA libraries and allows it to reserve the necessary VRAM without being throttled by systemd's default memory limits.

3. Storage and the SELinux Trap

If you have 128GB of RAM, you’ll likely want to run massive models (like Llama 3.1 70B). These don't fit in the root partition. However, moving models to /home/ollama_models triggers SELinux denials.

The Solution: Proper labeling.

sudo semanage fcontext -a -t var_lib_t "/home/ollama_models(/.*)?"
sudo restorecon -Rv /home/ollama_models

This tells RHEL that the custom directory is a valid location for service data, preventing "Permission Denied" errors during model loading.

4. Satisfying the NVML Discovery

Ollama uses the NVIDIA Management Library (NVML) to detect how much VRAM is available. On RHEL, these libraries are often suffixed with version numbers that the detection script might miss.

The Solution: A simple symbolic link ensures the "handshake" between the software and the hardware happens successfully.

sudo ln -sf /usr/lib64/libnvidia-ml.so.1 /usr/lib64/libnvidia-ml.so

The Result: 15.9 GiB VRAM & Hybrid Inference

After these adjustments, the system successfully initialized the Blackwell GPU.

  • Small Models (8B): Running at 70-90 tokens/sec.
  • Large Models (70B): Running in Hybrid Mode. 14GB stays in VRAM, while the remaining 26GB+ spills over into the 128GB system RAM.

For a CTO or an IT Consultant, this setup provides the best of both worlds: the speed of local hardware and the compliance of an Enterprise Linux environment.

Don't disable your firewall or your policy daemons just to get AI working. In a regulated environment, security is not optional. With the right rules in place, RHEL 9 is perhaps the most stable platform for local AI development available today.