Lately, there is a need of private chatbot service as a complete alternative to OpenAI’s ChatGPT. So, I decide to implement one at home and make it accessible to everyone in my household alongside with my network printer and NAS (OpenMediaVault).
In the past, I used to recommend people using Llama series for English tasks and Qwen series for Chinese tasks. There was no open-source model that’s strong enough in multilingual tasks comparing to proprietary ones (GPT/Claude).
However, as we all know—things have changed recently. I have been using DeepSeek-V2 occasionally every time I got tired with Qwen2.5 and have been falling behind with DeepSeek V2.5 and V3 due to lack of hardware. But DeepSeek didn’t let me down, R1 performs so impressive and provides as small as 1.5B!
This means we can run it even on CPU with some considerable user experience. As many people has GPUs for gaming, speed is not an issue. To make local LLMs process uploaded documents and images is a big advantage since OpenAI limits this usage for free accounts.
Although Installing Open WebUI with Bundled Ollama Support is very easy with official one-line command:
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
But to get RAG (Web search) working is not easy for most people, so I would like to find some out-of-box solution.
As I mentioned in my last post, harbor is a great testbed for experimenting with different LLM stack. But it is not only great for that, it’s also an all-in-one solution for self-hosting local LLMs with RAG working out-of-box. So, let’s begin implementing it from scratch and feel free to skip steps since most people don’t start from OS installation.
System Preparation (Optional)
As same as previously, go through install process using debian-11.6.0-amd64-netinst.iso
Add to sudoer usermod -aG sudo username
then reboot
(Optional) Add extra swap
fallocate -l 64G /home/swapfile
chmod 600 /home/swapfile
mkswap /home/swapfile
swapon /home/swapfile
and make the swapfile persistent nano /etc/fstab
UUID=xxxxx-xxx swap swap defaults,pri=100 0 0
/home/swapfile swap swap defaults,pri=10 0 0
Check with swapon --show
or free -h
Disable Nouveau driver
bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
update-initramfs -u
update-grub
reboot
Install dependencies
apt install linux-headers-`uname -r` build-essential libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev gcc software-properties-common sudo git python3 python3-venv pip libgl1 git-lfs -y
(Optional) Perform uninstall if needed
apt-get purge nvidia*
apt remove nvidia*
apt-get purge cuda*
apt remove cuda*
rm /etc/apt/sources.list.d/cuda*
apt-get autoremove && apt-get autoclean
rm -rf /usr/local/cuda*
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.debsudo cp /var/cuda-repo-debian11-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
sudo apt install libxnvctrl0=550.54.15-1
sudo apt-get install -y cuda-drivers
Install the NVIDIA Container Toolkit since harbor is docker-based
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Then sudo apt-get update
and sudo apt-get install -y nvidia-container-toolkit
Perform a cuda post-install action nano ~/.bashrc
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Then sudo update-initramfs -u
, ldconfig
or source ~/.bashrc
to apply changes
after reboot, confirm with nvidia-smi
and nvcc --version
Install Miniconda (Optional, not for harbor)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && sudo chmod +x Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
Docker & Harbor
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Perform post-install for docker without sudo
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-world
git clone https://github.com/av/harbor.git && cd harbor
./harbor.sh ln
Verify with harbor --version
Add with RAG support to defaults with harbor defaults add searxng
Use harbor defaults list
to check, now there are three services active: ollama
, webui
, searxng
Run with harbor up
to bring up these services in docker
Use harbor ps
as docker ps
, and harbor logs
to see tailing logs
Now the open-webui frontend is serving at 0.0.0.0:33801
and can be accessed from http://localhost:33801
or clients from LAN with server’s IP address.
Monitor VRAM usage with watch -n 0.3 nvidia-smi
Monitor log with harbor up ollama --tail
or harbor logs
All ollama commands are usable such as harbor ollama list
It’s time to access from other devices (desktop/mobile) to register an admin account and download models now.
Using Local LLM
After login with admin account, click top right avatar icon, open Admin Panel
then Settings
, or simply access via `http://ip:33801/admin/settings.
Click Models
, and at the top right click the Manage Models
which looks like a download button.
Put deepseek-r1
or any other model in the textbox below Pull a model from Ollama.com
and click the download button on the right side.
After model downloaded, it may require a refresh and the newly downloaded model will be usable under the drop down menu on the New Chat
(home) page.
Now, it’s not only running a chatbot alternative to ChatGPT, but also a fully functional API alternative to OpenAI API, plus a private search engine alternative to Google!
webui is accessible within LAN via: http://ip:33801
ollama is accessible within LAN via: http://ip:33821
searxng is accessible within LAN via: http://ip:33811
Call Ollama API with any application with LLM API integration:
http://ip:33821/api/ps
http://ip:33821/v1/models
http://ip:33821/api/generate
http://ip:33821/v1/chat/completions