- Running Microsoft’s 1-Bit BitNet LLM on My Dell R7910 – A Self-Hosting Adventure
So Microsoft dropped this 1-bit LLM called BitNet, and I couldn’t resist trying to get it running on my new homelab server. Spoiler alert: it actually works incredibly well, and now I have a pretty capable AI assistant running entirely on CPU power! For the demo click on the plus button on the bottom right ;)
My Setup (And Why This Matters)
I’m running this on my Dell Precision Rack 7910 – yeah, it’s basically a workstation crammed into a rack case, but hey, it works! Here’s what I’m working with:
My Dell R7910:
- Dual Xeon E5-2690V4 processors (28 cores total)
- 64GB ECC RAM
- Running Proxmox VE
- Already hosting Nextcloud, Jellyfin, and WordPress
The cool thing about BitNet is that it doesn’t need fancy GPU hardware. While I’m running it on dual Xeons, you could probably get away with much less.
Minimum specs you’d probably want:
- Any modern 4+ core CPU
- 8GB RAM (though 16GB+ is better)
- 50GB storage space
- That’s literally it – no GPU required!
What the Heck is BitNet Anyway?
Before we dive in, let me explain why I got excited about this. Most AI models use 32-bit or 16-bit numbers for their “weights” (basically the model’s learned knowledge). BitNet uses just three values: -1, 0, and +1.
Sounds crazy, right? But somehow it works! The 2 billion parameter BitNet model:
- Uses only ~400MB of RAM (my Llama models use 4-8GB+)
- Runs 2-6x faster than similar models
- Uses way less power
- Still gives pretty decent responses
I mean, when I first heard “1-bit AI,” I thought it would be terrible, but Microsoft’s research team clearly knew what they were doing.
The Journey: Setting This Thing Up
Step 1: Creating a Container for BitNet
Since I’m already running a bunch of services running on Proxmox on my R7910, I decided to give BitNet its own LXC container. This keeps things clean and prevents it from messing with my other stuff.
In Proxmox, I created a new container with these specs:
- Template: Ubuntu 22.04 LTS
- CPU: 16 cores (leaving 12 for my other services)
- Memory: 32GB (plenty of headroom)
- Storage: 80GB
Important: You need to edit the container config file to add these lines, or the build will fail:
# Edit /etc/pve/lxc/[YOUR_CONTAINER_ID].conf features: nesting=1 lxc.apparmor.profile: unconfined
Trust me, I learned this the hard way after wondering why cmake was throwing mysterious errors!
Step 2: Getting the Environment Ready
First things first – we need the right tools. BitNet is picky about its build environment:
# Basic stuff apt update && apt upgrade -y apt install -y curl wget git build-essential cmake clang clang++ libomp-dev
Now here’s where I made my first mistake – I tried to use Python’s venv initially, but BitNet’s instructions specifically mention conda, and there’s a good reason for that. Just install Miniconda:
cd /tmp wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod +x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc
Step 3: The BitNet Installation Saga
This is where things got interesting. The GitHub instructions look straightforward, but there are some gotchas:
mkdir -p /opt/bitnet && cd /opt/bitnet # This --recursive flag is CRUCIAL - don't skip it! git clone --recursive https://github.com/microsoft/BitNet.git cd BitNet # Create the conda environment conda create -n bitnet-cpp python=3.9 conda activate bitnet-cpp pip install -r requirements.txt pip install huggingface_hub
Now for the fun part – downloading the model and building everything:
# Download the official Microsoft model mkdir -p models huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir models/BitNet-b1.58-2B-4T # Build the whole thing python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
Pro tip: This build step takes a while. On my dual Xeons, it was about 10 minutes of heavy CPU usage. Grab a coffee – it’s compiling a ton of optimized C++ code.
Step 4: Testing My New AI
Once the build finished, I had to try it out:
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello, how are you?" -cnv
And it worked! The responses weren’t GPT-4 quality, but they were coherent and surprisingly good for something running entirely on CPU. My partner thought I was connected to a service like openAI because the responses were so fast and the resource usage was so low ^_^
But I didn’t want to just run it in a terminal. I wanted to integrate it with AnythingLLM that I already had running.
Step 5: Making BitNet Play Nice with AnythingLLM
Here’s the cool part – BitNet comes with a built-in API server:
python run_inference_server.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p 8080 --host 0.0.0.0
Then in AnythingLLM, I just added it as a “Generic OpenAI” provider:
- API Endpoint:
http://[my_container_ip]:8080
- Model Name:
BitNet-b1.58-2B-4T
- Token Context Window: 4096 (can be adjusted)
- Max Tokens: 1024 (can be adjusted)
And boom – I had BitNet responding to queries through AnythingLLM’s nice web interface!
Step 6: Making It Actually Reliable
Running things manually is fun for testing, but I wanted this to be a proper service. So I created a systemd service:
sudo nano /etc/systemd/system/bitnet.service
[Unit] Description=BitNet LLM API Server After=network.target Wants=network.target [Service] Type=simple User=root Group=root WorkingDirectory=/opt/bitnet/BitNet Environment=PATH=/root/miniconda3/envs/bitnet-cpp/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ExecStart=/root/miniconda3/envs/bitnet-cpp/bin/python run_inference_server.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p 8080 --host 0.0.0.0 Restart=always RestartSec=10 StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target
sudo systemctl daemon-reload sudo systemctl enable bitnet.service sudo systemctl start bitnet.service
Now BitNet automatically starts when the container boots, and if it crashes, systemd brings it right back up.
The Results: How Does It Actually Perform?
I’ve been running this setup for a while now, and I’m honestly impressed. On my R7910:
- Response time: Usually 1-3 seconds for short responses when running in the container itself.
- When connected to anythingllm given the context windows and max tokens I put in it averaged about 7-8 seconds before responding but then would output that response relatively quickly
- Memory usage: Steady ~400MB as advertised
- CPU usage: Spikes during inference, then drops to almost nothing
- Quality: Good enough for basic tasks, coding help, and general questions
It’s not going to replace GPT-4 for complex reasoning, but for a lot of everyday AI tasks, it’s surprisingly capable. And the fact that it’s running entirely on my own hardware with no API calls or subscriptions? That’s pretty sweet.
Demo!!
Click the plus button on the bottom right!
<!– Paste this script at the bottom of your HTML before the tag. See more style and config options on our docs https://github.com/Mintplex-Labs/anything-llm/tree/master/embed/README.md –>Lessons Learned and Gotchas
Things that tripped me up:
- Forgetting the
--recursive
flag when cloning – this downloads necessary submodules - Not installing clang – the error messages weren’t super clear about this
- Trying to use venv instead of conda – just follow their instructions!
- Container permissions – those LXC config additions are crucial
Performance tips:
- Give it plenty of CPU cores if you can
- 32GB RAM is probably overkill, but it’s nice to have headroom
- The
i2_s
quantization seems to be the sweet spot for quality vs speed
What’s Next?
I’m planning to experiment with:
- Different quantization types (
tl1
vsi2_s
) - Running multiple model variants simultaneously
- Maybe trying some fine-tuning if Microsoft releases tools for that
The self-hosting AI space is moving fast, and BitNet feels like a real game-changer for those of us who want capable AI without needing a mortgage-sized GPU budget.
Wrapping Up
Setting up BitNet on my Dell R7910 turned out to be way more straightforward than I expected, once I figured out the few gotchas. If you’ve got a decent CPU and some spare RAM, I’d definitely recommend giving it a shot.
Having a capable AI assistant running entirely on your own hardware is pretty liberating. No API keys, no usage limits, no privacy concerns about your data leaving your network. Just pure, self-hosted AI goodness.
Plus, there’s something satisfying about telling people your AI assistant is running on a 1-bit model that uses less RAM than Chrome with a few tabs open!
- Documenting Dad: An AR-Enhanced Scientific Study of Dad in the Wild
In the spirit of Iceland’s cherished Jólabókaflóðið (Christmas Book Flood) tradition, I embarked on creating something special for my dad this Christmas – a whimsical “field guide” studying the curious creature I fondly call “Pops.”
The Concept
Imagine David Attenborough narrating a nature documentary, but instead of following a rare bird or elusive big cat, he’s studying… my dad. That’s essentially what “Pops: A Field Guide” is – a scientific-style observation of my father’s natural habits, behavioral patterns, and comical fear of cute dogs, all written with love and humor.
The Technology
What made this book truly exciting for me to create is its augmented reality (AR) feature. Certain photos throughout the book serve as AR markers – when viewed through a phone or tablet, they spring to life as videos. Making it work is as simple as scanning a QR code in the book and opening up the webpage.
[Code Repository: GitLab Link]
The AR implementation uses:
- AR.js for marker-based augmented reality
- A-Frame for 3D scene rendering
- Web-based delivery (no app required!)
The Creation Process
Writing and Design
Using Canva’s book creation tools, I crafted each page to mimic traditional nature field guides, complete with:
- “Scientific” observations
- Habitat studies
- Behavioral patterns
- Timeline Analysis
All written with a blend of factual family history and playful scientific parody.
Audio Narration
To complete the nature documentary feel, I created an audiobook version narrated in a style reminiscent of Sir David Attenborough. This added an extra layer of authenticity to the “scientific” observations. Here is a clip:
Technical Implementation
The AR features were implemented using marker-based tracking, allowing videos to appear when specific photos are scanned. The entire system runs in a web browser, making it accessible without any special software installation.
The Jólabókaflóðið Connection
The Icelandic tradition of Jólabókaflóðið, where books are exchanged on Christmas Eve, provided the perfect occasion for this gift. This tradition, which translates to “Christmas Book Flood,” celebrates the joy of reading and sharing stories during the holiday season.
The Result
The final product is a multi-layered experience:
- A physical book filled with photos and humorous observations
- An AR experience bringing memories to life through video
- An audiobook adding another dimension of entertainment
- A celebration of family, technology, and tradition
Technical Details
For those interested in creating similar AR-enhanced books, the technical implementation is available on GitLab. The system uses:
- Pattern-based markers for video triggering
- Web-based AR for universal accessibility
- Responsive design for various devices
- Audio on/off functionality
- Cross-platform compatibility
Creating Your Own
Interested in making something similar? The basic steps are:
- Design your book (I used Canva)
- Generate AR markers for your chosen images (link in the gitlab README doc)
- Set up the web-based AR viewer on a server (local or on the net)
- Test thoroughly across devices
- Print and share!
Conclusion
“Pops: A Field Guide” represents more than just a clever use of technology – it’s a celebration of family, humor, and the joy of giving. By combining traditional bookmaking with modern AR technology, it creates a unique way to preserve and share family memories.
- Magical Talking Pumpkin!
This Halloween, I decided to venture beyond the traditional jack-o’-lanterns and candy bowls. Social media had been flooded with these incredible window displays where people projected spooky videos to create the illusion of ghosts and zombies wandering through their homes. While setting up my own window projection, I found myself wondering: could I push the Halloween magic even further?
During my research, I stumbled upon videos designed to be projected onto real pumpkins, making them appear to sing. Pretty cool concept, but the short, repetitive loops left me wanting more. That’s when inspiration struck – why not create a pumpkin that could actually interact with trick-or-treaters?
Enter ‘Pumpkin’, my magical talking jack-o’-lantern with a surprisingly sassy personality! Using GPT-3.5 for natural conversation and Azure AI services to synchronize the mouth movements in real-time, I managed to bring this charming character to life. The result? A carved pumpkin that doesn’t just sit there looking spooky – it chats, jokes, and sometimes even throws a bit of playful shade at passersby.
The technical setup might sound complex, but it’s essentially a marriage of AI language models and animation technology. GPT-3.5 handles the conversational heavy lifting, while Azure’s AI services translate those words into perfectly timed mouth movements projected onto a real pumpkin.
What makes Pumpkin special isn’t just the technology – it’s the personality that emerged. Whether it’s cracking jokes about being “harvested too early” or commenting on costume choices, each interaction is unique and unexpected. It’s exactly the kind of magical Halloween moment I hoped to create for our community, blending traditional holiday spookiness with modern technology in a way that brings smiles to both kids and adults.
Technical Details
Source code is on github!
- “Binary Dawn” Rises: Afroswing Meets Solarpunk
About the Album
Created with Udio, “Binary Dawn” is a mesmerizing fusion of Afroswing and ambient sounds, painting a solarpunk vision of a sustainable future. This album explores the intersection of technology and nature, weaving together themes of artificial intelligence, universal basic income, and permaculture. With its pulsing rhythms and ethereal soundscapes, “Binary Dawn” invites listeners to imagine a world where open-source solutions and communal living harmonize with cutting-edge technology.
- Creating a Game with AI
In the ever-evolving world of game development, the integration of AI has opened up new avenues for creativity and efficiency. One of the tools I’m trying out in this space is Claude, an AI model designed to assist in generating detailed, interactive content. Recently, I embarked on a journey to create a fully functional 2D game using Claude.
The Concept
After watching the Horizon documentary ‘The Trouble with Space Junk’ I’ve had this idea for a game that shows the consequences of space pollution. Inspired by the classic game Asteroids, “The Trouble with Space Junk” the game was born. The objective? Navigate a spacecraft through space, manage the debris, and survive as long as possible. When you play it it might feel futile, but there is a way to not die ;)
The Process
Using Claude, I was able to transform this idea into a working game with minimal coding (there was one bug I found and corrected). The process began with crafting a detailed prompt for Claude, outlining the game mechanics, design elements, and overall objectives. This prompt served as the blueprint for the game, guiding Claude in generating the necessary HTML, CSS, and JavaScript code.
Here’s the prompt I used: https://thinkcolorful.org/wp-content/uploads/2024/08/SpaceJunkGamePrompt.txt
With this prompt, Claude was able to produce a functional game. The AI handled everything from the movement controls and shooting mechanics to the dynamic behavior of the space junk, which breaks down into smaller, faster pieces as the game progresses.
The Result
The result was a retro-style game that not only entertains but also puts the player in a situation where they have to deal with space debris.
You can play the game here (note: this is not mobile friendly and you will need a keyboard): https://thinkcolorful.org/spacejunk.html
function toggleSpoiler(id) { var x = document.getElementById(id); if (x.style.display === “none” || x.style.display === “”) { x.style.display = “block”; } else { x.style.display = “none”; } }Click to reveal spoiler
The game begins with a simple start screen, and as players navigate through the increasingly cluttered space, they quickly learn that mindlessly shooting at debris only makes the problem worse. Survive for 5 minutes, and you’ll be rewarded with a message of triumph.
Conclusion
Creating a concept game with AI using Claude was lots of fun. It allowed me to focus more on the creative aspects of game design and experimenting with different game mechanics while the AI handled the technical details. This collaboration between human creativity and AI efficiency opens up endless possibilities for indie developers, educators, and hobbyists alike.