Lately, there is a need of private chatbot service as a complete alternative to OpenAI’s ChatGPT. So, I decide to implement one at home and make it accessible to everyone in my household alongside with my network printer and NAS (OpenMediaVault).

In the past, I used to recommend people using Llama series for English tasks and Qwen series for Chinese tasks. There was no open-source model that’s strong enough in multilingual tasks comparing to proprietary ones (GPT/Claude).

However, as we all know—things have changed recently. I have been using DeepSeek-V2 occasionally every time I got tired with Qwen2.5 and have been falling behind with DeepSeek V2.5 and V3 due to lack of hardware. But DeepSeek didn’t let me down, R1 performs so impressive and provides as small as 1.5B!

This means we can run it even on CPU with some considerable user experience. As many people has GPUs for gaming, speed is not an issue. To make local LLMs process uploaded documents and images is a big advantage since OpenAI limits this usage for free accounts.

Although Installing Open WebUI with Bundled Ollama Support is very easy with official one-line command:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

But to get RAG (Web search) working is not easy for most people, so I would like to find some out-of-box solution.

As I mentioned in my last post, harbor is a great testbed for experimenting with different LLM stack. But it is not only great for that, it’s also an all-in-one solution for self-hosting local LLMs with RAG working out-of-box. So, let’s begin implementing it from scratch and feel free to skip steps since most people don’t start from OS installation.

System Preparation (Optional)

As same as previously, go through install process using debian-11.6.0-amd64-netinst.iso

Add to sudoer usermod -aG sudo username then reboot

(Optional) Add extra swap

fallocate -l 64G /home/swapfile
chmod 600 /home/swapfile
mkswap /home/swapfile
swapon /home/swapfile

and make the swapfile persistent nano /etc/fstab

UUID=xxxxx-xxx swap swap defaults,pri=100 0 0
/home/swapfile swap swap defaults,pri=10 0 0

Check with swapon --show or free -h

Disable Nouveau driver

bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
update-initramfs -u
update-grub
reboot

Install dependencies

apt install linux-headers-`uname -r` build-essential libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev gcc software-properties-common sudo git python3 python3-venv pip libgl1 git-lfs -y

(Optional) Perform uninstall if needed

apt-get purge nvidia*
apt remove nvidia*
apt-get purge cuda*
apt remove cuda*
rm /etc/apt/sources.list.d/cuda*
apt-get autoremove && apt-get autoclean
rm -rf /usr/local/cuda*

Install cuda-tookit and cuda

wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.debsudo cp /var/cuda-repo-debian11-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4

sudo apt install libxnvctrl0=550.54.15-1
sudo apt-get install -y cuda-drivers

Install the NVIDIA Container Toolkit since harbor is docker-based

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Then sudo apt-get update and sudo apt-get install -y nvidia-container-toolkit

Perform a cuda post-install action nano ~/.bashrc

export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Then sudo update-initramfs -u, ldconfig or source ~/.bashrc to apply changes

after reboot, confirm with nvidia-smi and nvcc --version

Install Miniconda (Optional, not for harbor)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && sudo chmod +x Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh

Docker & Harbor

Install docker

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Perform post-install for docker without sudo

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-world

Manual install Harbor

git clone https://github.com/av/harbor.git && cd harbor
./harbor.sh ln

Verify with harbor --version

Add with RAG support to defaults with harbor defaults add searxng

Use harbor defaults list to check, now there are three services active: ollama, webui, searxng

Run with harbor up to bring up these services in docker

Use harbor ps as docker ps , and harbor logs to see tailing logs

Now the open-webui frontend is serving at 0.0.0.0:33801 and can be accessed from http://localhost:33801 or clients from LAN with server’s IP address.

Monitor VRAM usage with watch -n 0.3 nvidia-smi

Monitor log with harbor up ollama --tail or harbor logs

All ollama commands are usable such as harbor ollama list

It’s time to access from other devices (desktop/mobile) to register an admin account and download models now.

Using Local LLM

After login with admin account, click top right avatar icon, open Admin Panel then Settings, or simply access via `http://ip:33801/admin/settings.

Click Models, and at the top right click the Manage Models which looks like a download button.

Put deepseek-r1 or any other model in the textbox below Pull a model from Ollama.com and click the download button on the right side.

After model downloaded, it may require a refresh and the newly downloaded model will be usable under the drop down menu on the New Chat (home) page.

Now, it’s not only running a chatbot alternative to ChatGPT, but also a fully functional API alternative to OpenAI API, plus a private search engine alternative to Google!

webui is accessible within LAN via: http://ip:33801

ollama is accessible within LAN via: http://ip:33821

searxng is accessible within LAN via: http://ip:33811

Call Ollama API with any application with LLM API integration:

http://ip:33821/api/ps
http://ip:33821/v1/models
http://ip:33821/api/generate
http://ip:33821/v1/chat/completions