No DALL-E, No Midjourney and No Colab

This is a guide showing how to build your own stable diffusion server on what you already have or cheap used hardwares. It may not satisfy for a serious production use but pretty viable for learning, testing or casual use.

Before we start, here’s some comments on OpenAI:

I have been using ChatGPT and its API regulary. Since my last post, the API has been upgraded to gpt-3.5-turbo which broke my program and gpt-4 seems coming up soon. Therefore, I may not fix my code proactively.

I also built a customized chatbot project which was running on the web version of ChatGPT. However, one day it suddenly got blocked by OpenAI’s cloudflare filter.

I believe that was done on purpose —by tightening up the IP range of all popular VPN providers. I have to switch over to a proxy login endpoint https://bypass.duti.tech/api/, thanks to acheong08’s Reverse engineered ChatGPT API. By doing this, it compromises some of security but it’s better than using the paid API.

The reason why I keep trying this route is not just for saving money, but to counter attack against the Big Tech companies.

There are a list of alternatives to OpenAI’s GPT. I would like to test llama.cpp sometime, as it can run on as low-profile as a Raspberry Pi.

The stand I’m taking is simple. I love the technology but I have problem with big tech company. I’ll work against it unless the product goes truly open source and their business model goes nonprofit —because neutrality and transparency is crucial for such technology.

AI itself is not a threat but the capital behind it.

Hardware

Requirements

This article is for both old GPUs and CPU. If using a GPU, make sure it supports FP16. Otherwise, just run it on CPU regardless because it will run into VRAM issues.

2GB or larger Nvidia card of Maxwell 1 (745, 750, and 750ti)

According to this buying guide, my spare GTX 750 Ti with 2GB VRAM is the oldest GPU that supports FP16.

RAM size should be larger than 8GB. According to my test, 4GB won’t even run and 8GB works only with small models. So 12~16GB is the minimum for non-testing.

Hard drive is not important, minimum is 20GB (debian clean install + base sd-webui + 2 pruned models).

Using SSD can increase the speed of loading weights (switching models) but not the generating speed.

My mini Server Build

Because of the 2GB VRAM on 750ti is barely enough for real production (Restore faces+Hires. fix+VAE+LoRA+multi-ControlNet+inpainting+upscaling). After working for hours, SD crashes quite often. Therefore, I had to heavily rely on my Self-Healing Daemon.

Finally, I decided to build a dedicated GPU server which is smaller and more capable for various AI projects.

  • HP Z2 G4 Mini Workstation - $85
    • Barebones w/o AC adapter
    • C246 Chipset
    • w/ P600 Mobile GPU (rare 4GB version)
  • HP 230W Power Supply - $18
    • 19.5V 11.8A
    • 7.4mm x 5.0mm Connector
    • HSTNN-xxxx for EliteBook Mobile Workstations
  • Intel Pentium Gold G5420 $22
    • 3.8 GHz 2 cores 4 threads 54W
    • LGA1151 revision 2 for Coffee Lake
  • Spare RAM sticks - $0/$40
    • DDR4 16+4GB 2133MHz SODIMM
  • Spare SSD - $0/$30
    • 512GB NVMe M.2 2280

Total cost for me is about $120. For buying everything from scratch would be around $200.

Note:

  • These are all used parts on ebay, so price and availability changes quite a bit.
  • Performance should be similar between different Pascal mobile GPUs, e.g. Quadro P500, P520, P600, P620, P1000, MX1x0 and GTX10x0. Even between Maxwell (750ti) and Pascal (P600), I don’t gain noticible speed improvement but doubled VRAM for capability.
  • CPU does not matter for SD running on GPU mode. So Pentium/Celeron is good enough for generating images. However, using tensorflow based tools like Dreambooth (for training) requires CPU to support AVX Instructions. In this case, i3-8100 or Xeon E-2124 (more $$) can be considered, however, software workaround is also available for tinkers. Although I don’t intend to do any training on this build.
  • Creativity takes time to think and plan before take the shot. People who like spray and pray tend to spend more and hope for the best but it’s far from the way. Both bolt-action and full-auto have their value, but I’d perfer doing it just right.

Asuka Benchmark

If you don’t understand the naming, never mind, it’s just the “hello world” test for SD, a.k.a. “hello asuka” test.

This was the gold standard in the community.

Sampler: Euler
Seed: 2870305590
CFG: 12
Resolution: 512x512
Prompt: masterpiece, best quality, masterpiece, asuka langley sitting cross legged on a chair
Negative Prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,signature, watermark, username, blurry, artist name

Results:

  • Run on Quadro P600 (384CUDA4G) - 0:56 per asuka, 2.80s/it
  • Run on GTX 750 Ti (640CUDA2G) - 1:09 per asuka, 3.47s/it
  • Run on E5-2670 (8C16T) - 3:45 per asuka, 10s/it
  • Run on E3-1245 (4C8T) - 5:30 per asuka, 16s/it
  • Run on i5-2300 (4C4T) - 6:40 per asuka, 20s/it
  • Run on i5-2520M (2C4T) - 12:30 per asuka, 37s/it
  • Run on G5420 (2C4T) - 12:50 per asuka, 38s/it

Installation

Debian Linux

Why Linux?

Because it runs efficient and open-source.

Why Debian?

I’ve tried RPM distros and that didn’t went well. Then I found the script says:

Tested on Debian 11 (Bullseye)

So just go to debian.org and get debian-11.6.0-amd64-netinst.iso

Use Etcher or Rufus to flash the ISO file into a USB drive

Boot into the USB installer we just created and select Graphical install

Before going into the network configuring step, connect the Ethernet cable so this can let it configure network interfaces automatically.

After setup root and user credentials, I used entire disk while partitioning since this device is dedicated for SD.

Other steps are good by default. Except software selection.

Do not install any desktop environment since it may cause trouble while installing graphics driver later on.

Enable SSH server because we want to use SD from other machines within a local network.

While installing GRUB, in my case the HDD is /dev/sda.

After installation finish and reboot, log in with root account.

Run apt-get install sudo -y then usermod -aG sudo username to add the user account as sudoer. Reboot to apply this change.

Run ip a to get the IP address, in my case the ethernet is enp0s25.

Use mRemoteNG, Remmina or terminal to SSH into the SD dedicated machine from a daily driver computer.

Install the dependencies:

apt install wget git python3 python3-venv libgl1 git-lfs libglib2.0-0

Troubleshooting NIC

If the ethernet doesn’t work, saying something like “Missing firmware rtl81xxxxx.fw”, then we will need to install firmware-realtek driver package.

Download firmware-realtek_20210315-3_all.deb from another machine, copy it to a USB drive and plug in.

Run lsblk to confirm the USB drive is sdb1 then

mount /dev/sdb1 /mnt

cd /mnt

apt install /mnt/firmware-realtek_20210315-3_all.deb

After installing the driver, we need to make sure the network config is right.

nano /etc/network/interfaces

auto enp2s0
allow-hotplug eth0
iface enp2s0 inet dhcp

Run systemctl restart networking then ip a the network connection should be working now.

Use umount /mnt to eject the USB drive.

NVIDIA Driver and CUDA Toolkit

I didn’t follow debian wiki to install the driver. By that way, it will install stable version 470 but we can install latest 530 with CUDA Toolkit.

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run

apt install linux-headers-`uname -r` build-essential libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev -y

chmod +x cuda_12.1.0_530.30.02_linux.run

sh cuda_12.1.0_530.30.02_linux.run

Select Driver and Toolkit, after installation run nvidia-smi to verify:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 750 Ti       Off| 00000000:03:00.0 Off |                  N/A |
| 42%   32C    P8                1W /  52W|    527MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    277320      C   python3                                     524MiB |
+---------------------------------------------------------------------------------------+

Troubleshooting nvidia

Disable Nouveau driver if needed

bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
update-initramfs -u
update-grub
reboot

Stable Diffusion web UI

Thanks to AUTOMATIC1111 made everything so easy.

Although, I am aware of both cmdr2’s and InvokeAI’s project, but AUTOMATIC1111’s is the most mature and supported.

Run this script with user account to install:

bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh)

It may returns some errors on non-GPU devices and that’s fine.

Go into the SD’s directory:

cd stable-diffusion-webui/

Edit #export COMMANDLINE_ARGS="" line in the config file:

nano webui-user.sh

For old GPU (like my 750ti):

export COMMANDLINE_ARGS="--lowvram --listen --xformers --always-batch-cond-uncond --opt-split-attention --enable-insecure-extension-access"

export PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.6,max_split_size_mb:24"

For CPU only:

export COMMANDLINE_ARGS="--listen --skip-torch-cuda-test --use-cpu all --no-half --no-half-vae --opt-split-attention --enable-insecure-extension-access"

Then run ./webui.sh to start

Note: Even with those optimization commands above, it may still get CUDA out of memory error under heavy load. Cases could be:

  • Sending to img2img/inpaint/sketch back and forth
  • Increasing Batch size or Width/Height
  • Using SD upscale script with large Batch count
  • Using large size ControlNet models (only with control_*.safetensors, coadapter-*.pth and t2iadapter_*.safetensors work fine even with muli-controlnet)

Working with caution can avoid the issue most of the time. When CUDA out of memory occurs, just refresh the web page and generate again. If it persist, go to SSH, Ctrl+C to terminate webui process and ./webui.sh to restart.

However, when I need to use those large controlnet models, such as normal,hed,mlsd and scribble. I have to switch the COMMANDLINE_ARGS into CPU only. By this way, cpu mode can handle larger model and heavier load by using RAM as VRAM. Therefore, the capacity increases from 2GB to 16GB at the cost of slow generating.

Basic Usage

  • Open the IP 192.168.1.x:7860 from a modern browser to start using SD.

  • From the SD server machine or through SSH, run watch -n 2 nvidia-smi to monitor the GPU status. For CPU usage, simply use top or apt install bpytop then bpytop.

  • Go to Civitai and Hugging Face to find and download models. Use wget or git-lfs to download models via SSH directly onto the server. Or use scp/rsync/sftp/syncthing to transfer files between local and remote.

  • For Civitai, right click Download button and Copy Link, then go to SSH run wget https://civitai.com/api/download/models/xxxxx --content-disposition. For Hugging Face, just use ‘wget’ with raw file link.

  • For example, if you want to batch download a collection of anime models, use git clone https://huggingface.co/AIARTCHAN/aichan_blend then mv aichan_blend/*.safetensors stable-diffusion-webui/models/Stable-diffusion/

  • To batch download ControlNet models, use git clone https://huggingface.co/webui/ControlNet-modules-safetensors then mv T2I-Adapter/models/*.safetensors stable-diffusion-webui/extensions/sd-webui-controlnet/models/

  • Go to the official wiki to find and download extensions or use the webui built-in extensions page.

  • Use git pull to update the webui when needed.

  • Manage/remove styles by nano stable-diffusion-webui/styles.csv

  • Learn more from the SD RESOURCE GOLDMINE and Educational MegaGrid.

ControlNet

Learn everything from these:

Preprocessor and Model Combinations:

canny -> control_canny - t2iadapter_canny
mlsd -> control_mlsd
hed -> control_hed
scribble -> control_scribble - t2iadapter_sketch
fake_scribble -> control_scribble - t2iadapter_sketch
openpose -> control_openpose - t2iadapter_openpose - t2iadapter_keypose
openpose_hand -> control_openpose - t2iadapter_openpose
segmentation -> control_seg - t2iadapter_seg
depth -> control_depth - t2iadapter_depth
depth_leres -> control_depth - t2iadapter_depth
depth_leres_boost -> control_depth - t2iadapter_depth
normal_map -> control_normal
binary -> control_scribble - t2iadapter_sketch
color -> t2iadapter_color
pidinet -> control_hed
clip_vision -> t2iadapter_style

Self-Healing Daemon

To make webui auto restart in the background when it crashes, we need to use another script. Thanks to this guide.

nano webuid.sh

#!/bin/bash
#Scripts to restart services if not running
ps -ef | grep python3 |grep -v grep > /dev/null
if [ $? != 0 ]
then
      echo "restarting sd-webui" && cd /home/$username/stable-diffusion-webui && ./webui.sh
fi

sudo chmod 755 /home/$username/stable-diffusion-webui/webuid.sh

Run ./webuid.sh to test it

Edit crontab -e to make it auto start

@reboot /home/$username/stable-diffusion-webui/webuid.sh
*/1 * * * * /home/$username/stable-diffusion-webui/webuid.sh

PS: This script may conflicts with other python tools such as bpytop, so using top instead.

To see the output from webui.sh, put exec &> >(tee -a "webui.log") in the beginning of webui-user.sh, use tail -f webui.log to see it like before, and use pkill python3 to force restart it.