Blogs

Making my local LLM voice assistant faster and more scalable with RAG

If you read my previous blog post, you probably already know that I like my smart home open-source and very local, and that certainly includes any voice assistant I may have. If you watched the video demo, you have probably also found out that it’s… slow. Trust me, I did too. Prefix caching helps, but it feels like cheating. Sure, it’ll look amazing in a demo, but as soon as I start using my LLM for other things (which I do, quite often), that cache is going to get evicted and that first prompt is still going to be slow....

Building a fully local LLM voice assistant to control my smart home

I’ve had my days with Siri and Google Assistant. While they have the ability to control your devices, they cannot be customized and inherently rely on cloud services. In hopes of learning something new and having something cool I could use in my life, I decided I want better. The premises are simple: I want my new assistant to be sassy and sarcastic. I want everything running local. No exceptions. There is no reason for my coffee machine downstairs to talk to a server on the other side of the country....

Self-hosting and NAT Loopback

When I started hosting my services, I quickly ran into a major problem. Everything was timing out, but it was somehow working just fine when I was not connected to my home network! So turns out, this was because my router does not support what’s called NAT Loopback (also called NAT Hairpinning). Like many things you’ll see in production, the 32-bit address space of IPv4 was meant to be for a prototype....