3 seconds for a local LLM is a GREAT objective.... I am interesting on a product like this but I think I prefer to run all on unraid local, with a powerful GPU. good work.
I like where you're going with this! While I applaud your approach on the server for most of your market, I, like many of the IT geeks here in the comments, am not your target audience for the hardware portion for that product. (But we all know people who are the audience for it! Don't be discouraged there!) I'm definitely keen, however, to keep an eye on the software configuration for my own eventual hardware. But as far as the satellites go, I agree with others (and you) in the comments that Google and Amazon have spoiled us with the pricing on their devices. You're packing more features in, and dealing with quantities that won't provide the cost efficiencies in manufacturing, nor the financial margin for pricing it competitively with those pervasive competitors. But... you're also offering significant flexibility and privacy. And for some, that's worth the extra spend. I'm presuming that you're planning on building a integrated board rather than off-shelf components (such as the Matrix Voice) to shave production costs. One suggestion I have is to offer two levels for the satellites. For a large space like a living room or entertainment space, that large array of microphones are likely critical. But for a smaller room (like a bedroom or a home office) with far less ambient noise and space to cover, those seven microphones (at a base cost of $65 USD just for that hat) are overkill. Build a "Large room" model and a "Small room" model. The small room model could carry the same features, but fewer microphones (two or three might be sufficient), and perhaps a less expensive speaker, as it won't need to be as loud... Maybe not as diminutive as an ATOM Echo, but something closer to the SEEED ReSpeaker 2-mic, which retails for $12.90 USD for the hat. (Yes, I know it lacks the extra connectors you'd need for the other sensors, so there's some cost adjustment there.) Being that the Matrix is open-source hardware, this reduced configuration shouldn't be difficult for a qualified engineer with that design as an inspiration. This could satisfy many people that would want a satellite in multiple living spaces, but might balk at the price of that one device across many rooms. Allow them a less expensive option for some of the rooms, with the nicer piece in the larger rooms, and you may hit a sweet spot in the market.
I'm hopeful the Satellite1 hardware we just launched on FutureProofHomes.net hits the correct price point. It's got some serious hardware in there to compete with the big guys and be a true contender to big tech. Checkout the waitlist and the product. Thanks for all your thoughts and feedback.
The 9s of prompt processing can be further optimized if you switch from llama.ccp to vllm(or aphrodite) and change the format of the LLM from GGUF to EXL2. Biggest advantage of EXL2 is that it runs fully on memory and vllm will do a better job keeping the model always loaded than default settings llama.ccp. Depending on the memory consumption of faster whisper you might also be able to go up to a Q6 but that will need testing. Another thing that might interest you is reading up on 4 bit KV catching. My guess from the little info I could see on screen is that a non trivial ammount of those 9s is spent in moving the LLM in and out of memory but since you are using a 7B model at Q4 you should be able to leave the model always in memory and not have any issues, specially since you are using faster whisper with it's reduced memory footprint. It will be a tinny bit more involved, ideally you might also want to ditch docker(dont trust me there, I'm just not familiar with dockers impact) but I do believe that would be a very nice starting point.
Oh, I didnt expect this to be highligted, I typed it in a rush and there are some errors. I'm nowhere near an expert just a dev who likes to thinker. GGUF vs EXL2/AWQ/...: Both formats are perfectly fine for quantizing an LLM, GGUF is designed to run everywhere(ARM CPUs, x86, GPUs, ...) while EXL2/AWQ are focused on GPU only inferencing. Since the board has a GPU and that GPU is more powerful than the ARM cores, picking EXL2/AWQ might help efficiency a bit. On the other hand there is some anecdotal evidence that GGUF can be better(higher quality answers) at the same quantization than EXL2 but even then it might be worth a small tradeoff if you can improve the responsivenes a bit. Regardless this will need a lot of testing because different functionality in cards from the same generation can have a big impact on performance(P40 vs P100 for example). Llama.ccp vs vllm(or others): There is nothing wrong with llama.ccp and you can use llama.ccp with EXL2. vllm and Aphrodite, KoboldCCP are just common recomendations for high performance, low latency inferencing(there are many others too). 4bit KV caching: You can use 4bit quantization with the context itself(prompt, responses, ...), this method is a bit smarter than 8 bit(truncation) and will save you some memory. When you only have 8Gb for the whole system every bit helps. Docker: Same as with 4bit cache I just assumed taking docker out of the equation might save a bit more ram for inferencing. I did no research on this or it's impact. Adding on, things are rapidly changing with 1bit and 2bit quants based on the 1.58b paper now starting to show up with very good results, memory capacity might not even be an issue by the time you are you are ready to ship. Sorry if my rushed comment made you waste some time on non useful research.
I will definitely try this! However, vLLM state as requirements that you need a GPU with minimum compute capability of 7.0 (Nvidia grade). I am using an older GPU (GTX 1080) with compute capability 6.1, will this work? Currently, I am running llama-cpp-python within a Docker container and the Natural Language Processing of the "what time is it?" takes 3 seconds. Single function calling takes about 4 to 6 seconds, double function calling 8-9 seconds and triple 12 seconds.
Love the concept. Would definitely buy the voice/bluetooth/mwave part, if could be ported over to local LLM running on any personal hardware (old pc converted to server, nvidia GPU acceleration, MAC silicon run, or any hardware that can run the llama server).
I think it will be done by design. Any model, capable of OpenAI functions, can do the job with correct system message. The question stands in speed and reliability. But since open source model is used, you can spin it on, say, Ollama, on your own hardware.
Yep. That’s the easy path forward. But still has challenge that folks don’t configure the model/engine correctly. The main challenge is how to get this tech in everyone’s home.
It would be cool to have a device that can plug directly into a sound system or single speaker that turns it into voice assistant with actually good sound quality (for playing music/casting)
I had bought an new Intel ARC A770 graphics card with 16GB RAM for running LLM. It was relatively cheap (under 300 Euros in germany). And it should outperform an Nvidia RTX 4060. I haven't tested it yet since it was delivered yesterday and now i am setting up an new server (Proxmox on an AMD Ryzen 7 5700G). I can't wait to see more videos from you for building my own offline alexa system with interaction in home assistant.
This concept is the holy grail for replacing a house full of Google Home or Alexa devices. I agree with everyone else that unless you can find some HW that is purpose built for LLMs, selling a server will likely not be as popular as having the customer provide their own compute. As for pricing, the max I think i could pay would be $100 CAD per room. Any more and it is too expensive to get multiples for every room. Obviously the cheaper you make it the more adoption you'd see. I assume the satellite would also run an audio streaming client like you did in your previous video. While I think an AUX out is a must so you can plug it in to a real speaker, you might want to consider putting a smaller one onboard for those who just want a one device solution.
So cool! Onboard and gave my feedback thru the link. I'd buy something like this in a heat beat to replace my AlexaHA voice control set up. Essential to me would be 1/ local 5G wifi 2/ TTS 3/ "just works" out of the box ..happy to do all the HA config but don't want any stuff around with hardware 4/ ability to select what AI engine to use both now and into the future. Thanks for the awesome video and fantastic build! What an exciting project!
I just love the work that you are doing! Tried to go into the exact same direction but you are far more advanced. I also would love to see this setup as a product, because I realized that progress on HA also impacts my custom setup and therefore needs a to of attention and maintenance. So yes, I absolutely would be interested if you can offer this as a product in the future.
Thanks for the kind words @Marc. You'll be excited to know a new video is coming soon and that we just launched our own hardware on FutureProofHomes.net. It's a waitlist for now, but the product is real and I'm excited to share.
Thank you! At this point in time this is all meant to be inspirational, so I'm glad it working. I'll continue driving for a more fleshed out product though.
Another really cool video 😎 Obviously, performance would need to dramatically improve to make this viable to anyone, 3 seconds is still probably pushing it even. I think the price point could be a stumbling block too, the server looks like it will need a fair bit of grunt to run it all. However, the privacy aspect is very appealing, especially in this day and age where everything seems to be stealing your data… Best of luck with it, I will by keeping a close eye on your progress 👍🏻
Thank you for the video. I was hoping you would continue to go down the path you were in the previous videos running LLMs locally but with a dedicated graphics card integration along with the raspberry pi zeros. A lot of folks already have a dedicated server that they can slap a dedicated GPU in. There are some local LLMs that supposedly work well with HA but would rather fine tune something like the mistral LLM to perform the commands and also use it for a wealth of information. Where I’m getting caught up is defining some of the perimeters that the large language models can use toexecute the commands/system prompts.
Perhaps I can do a deep dive on making LLMs great the outputs that home assistant expects. I’ve tried all the LLMs (even the ones for HA), I’ve found this LLM the best all-around solution. After some fine-tuning it could be spot on.
It isn't that simple. I saw an interview recently with the guy who created JAnand Nvidia has already reached out to him wanting to build a JA LLM. Obviously Nvidia wants to use Jetson but they have to port Wyoming. And other add ones to run on GPu's and not CPU's so they have to optimize the code. There is a detailed thread on nvidias dev forums with the head speech guy from HA. Even Nvidia developers are having issues and get the framework in place.
Love it! Get the performance a bit faster and I'll be ready to buy. I do need to do presence detectors, I'd get this, but it's a bit too large to put in the corner of my rooms.
I am still confused of the pros and cons of going down this path vs Willow. I think you'd need a video to explain why this option is better. Advantages in the future etc. Exciting project. I watch and wait with interest.
I think this is a very interesting concept. However in my opinion it would become even more interesting if you can reduce price of satellite to similar level as amazon devices. Also I think it would be great if satellite is more like appliance, meaning microcontroller and not Raspberry Pi with linux distro. There is ESP32-S3 and recently announced ESP32-P4 with more support for AI. It would be interesting to see how much can be done with such device, probably stable local wake word and more.
Agree. The final product will not be based on Rpi. And cost efficiency for global distribution is important. I'll do my absolutely best to hit the mark without sacrificing key features. $20-$40 Amazon prices may not be possible however. Amazon looses money on Alexa products because they make revenue elsewhere, which I don't/can't do with a fully private voice assistant.
I would consider to pay €250-€300 for the brains and €75 for a satellite. This is on the high end for a product with some rough edges, but worth it given the unique local characteristics
I've been looking at doing something similar to this, although my setup is perhaps slightly more 'traditional'. I have a giant server with the potential for a full size modern GPU, and I'd really only need a bunch of mic remotes for each room. While I do want GPT functionality, the main purpose is to actually continually record audio, and only save it if I tell it. It's a way for me to hash out ideas/solve problems/etc, when I don't have pen and paper in front of me. I can then have an LLM ingest the audio, run a summariser, and log it into my notes database along with the audio.
Fantastic! I'd ceratailty be interested if it could support 10+ satellites. I've been playing with some Onju voice devices which do work but not quite hitting the mark functionality wise. Including mmWav and ESPresence is definitely a plus. I think the key to success is hainvg be as easy (or easier) to get up and running as a Google Home or Alexa device. Well done. Keep at it.
Super cool!! But I’m not quite following regarding the LLM setup. However it is fairly simple to install the Jetson Orin Nano 8GB with a downloaded image. Just connect the NVMe memory via a USB cradle to a PC and “burn” the downloaded image onto it (e.g. using Raspberry Pi Imager). Afterwards mount the image onto a Linux system and edit “/boot/extlinux/extlinux.conf” change “root=/dev/mmcblk0p1” to “root=/dev/nvme0n1p1”.
Local everything! I love this idea and that all the speakers aren't going to snitch to Google... but as someone with a Home Lab in the basement, I have MORE than enough CPU/GPU. Would the server also come in a docker stack or a VM, with the 'drop in' style setup? Another odd sized, non-rackable device isn't something I'd really need.
I would definitely buy something like this. If needed wonder if you could offload the LLM portion to a central server with more power. Just use the nodes for voice recognition portions.
Yep. When the satellite is released you should be able to use with your own server or the HomeX server if you need/want it. It's all open source software. Sky is the limit.
Very excited to sit down and watch this when I can pay more attention. I've put in 25 hours to trying to make local AI work and the first 20 of that was just trying to figure out why it wasn't working with my GPU. Their documentation is written before people that are working on the project, and the error logs all but useless. Looking forward to trying this method.
Super interesting project! Just build my first voice assistant box based on a esp32 s3, an addressable led strip, little speaker and an inmp441 microphone for around 20€ and now im hooked 😁 it works beautifully with local wakeword detection. Not yet sure about the microphone performance in a loud environment but on the first look it works like a charm (with homeassistant cloud and chatgpt as the brain) :) maybe worth a look if you want to get up a "HomeX lite" with more competitive pricing! Would love a off the shelf version for a reasonable price 😊
Just launched a waitlist for our upcoming voice satellite hardware on the website. I hope this first product's price point is "lite enough". Let me know. :)
@@FutureProofHomes Just subscribed to the waitinglist :) Do I understand it right, that ill need a rpi zero 2 + your PCBs to work? So its about 100€ for the full system? Or does it work without a Pi?
Biggest strugge is local satellite with good microphones. Totally onboard if you want to go and order 300+ ready sets from PCBway or other sources. Maybe they have nice looking cases also - not like those home printed crap. I'd suggest to design board for mmWave like optional maybe?
Are you envisioning a completely custom built product at the end of this? Or still something built around 'off the shelf' components? It's a super cool project, and I can certainly see a lot of demand for an 'off-the-cloud' AI assistant. I guess price is going to be a barrier for many if the entry point is north of $500.
Custom product at the end of the rainbow. Server will be optional as the software will be open source. If you want to use your own server/gpu, then by all means, do that! That’s the vision.
@@FutureProofHomes It's a very cool project, and an ultimate goal for many home automation enthusiasts I'm sure. I expect there is a good chance that by the time you get your product designed the built in NPU units hitting all mainstream CPUs at the moment will be both better and more available in lower end hardware, lowering the cost of entry. It would also be great to have the ability for your satellites to work with a lower end HA set up and offload the AI part to the cloud... I know this may be agains the original goals of the project, but if your satellites turn out to be the best thing on the market for that purpose you would capture a whole lot more folk who aren't ready to spend north of $500 on the server. I know it was kind of an open en ded question in your video... but do you have an idea of what kinda price-point you are targeting for your satellites?
Great to catch up and see another video, I am at least casually interested in the multi-sensor satellite and the server. I re-subbed today, somehow YT dropped me again. Keep up the good work... we're watching!
I'm doing the same thing except I offload everything thing to my homelab which is so much much faster than yours. And I prefer to make esp mic works actually, what's harm in letting your homelab listen to the mic, it's powerful enough.
Very cool project. I also would like to build something like that myself. Your video made me aware of the Jetsons - thank you for that. I'll do tests with my desktop GPU first, but in the longer term I'd like to do it like you (preferably with the assistant called Jarvis :). I watched your video a few weeks ago and yes: please stay tuned - the (rapid) development of your environment, as can be seen in your videos, is impressive 👍👍
Very cool project, cant wait to see what you can make, satallites are needed, especially with the led's :). but i prefer to run to run the LLM on my unraid server with gpu
You can run Functionary + vLLM on Unraid right now if you want. docs.vllm.ai/en/latest/serving/deploying_with_docker.html Using the HomeX server will be optional. Use your own GPUs/Server if ya want
This setup would require really high performing hardware on the server side, and then limit it to be only used for voice assistant. Ideally that same hardware could/should be accessible for any local computing that would be needed. For example object detection for the security cameras. If the server would be built around PC hardware to get the price down it could still be made into a custom all in one device made for this purpose only. It wouldn't be super compact but at least it would have the performance needed. But in all honesty if paying the amount the server would cost the end user would still expect it to be able to be used for more I think. This last suggestion would be a much bigger investment on the software side but also at least in my opinion would also make the hardware a much more attractive purchase. If the server was setup as a VM with some easy to use UI where the user could pick and choose what features(VM's) should be on the machine so things like - Home Assistant - Voice Control - Security Cameras (Frigate NVR)
Have you considered a hybrid business model similar to something like ixsystems does with TrueNAS? Essentially you'd provide the software, or validated configuration of various softwares bundled under a single package under open-source license. If you did this for both the client side and server side it would likely encourage adoption, community engagement and contributions to the project. And then sell "prosumer" or even commercial grade and supported "official / validated hardware" for it to run on? To further fund the project? Im not sure how feasible this is for this project. But just an idea as many other businesses operate like this. RedHat, Canonical and Docker just to name a few.
Amazing video! Thanks for the effort. Even though you don't have a tutorial yet, I was able to have Extended OpenAI work with the Functionary LLM by going through the settings used in this video and your multiple Github issue posts. I am currently running the small-v2.4 model on a GTX 1080 and can get Natural Language Processing times of 3 seconds!
I am really keen on the satellite with added microwave presence detection. Regardless of the other hardware, the all in 1 satellite is a winner combination device
I had a handful of Google Home devices scattered around my home. Music playback was the goal. We used their voice control sparingly. I removed them after they repeatedly responded to conversations even though the wake work hadn't been spoken. (Yes, I know I could turn off their microphones.) I didn't like looking at them sitting on shelves and tables, and I didn't like the sound quality. I just completed the installation of ceiling-mounted stereo speakers in multiple rooms. The amps and streaming devices are rack-mounted in a central location which means I have full control over streaming technology, amp specs, and I don't have to worry about speaker placement near outlets. But I've lost voice control. I watch lots of videos about wall-mounted dashboards and the idea is appealing, until I think about getting up and walking to another location to see the status of something or controlling a scene, device, etc. It seems to me voice control is the better option. Voice control is more convenient. We all carry phones around so I can use it as my mini-dashboard when I need that sort of interface. I think the answer to your "price" question may be connected to competing with the cost of implementing a dashboard solution. A voice solution is inherently more mobile, too. I didn't give you a definitive answer but I will intently watch this pogress. Thank you for the effort.
Problem with using a watch or phone as a satellite is you don’t always have the device with you, or a guest doesn’t have it with them, and when the assistant speaks back you can’t hear them. I think we need home voice assistants and personal voice assistants on devices to work together.
How many people have some form of GPU around already for transcoding on their plex servers? A 4060TI with 16GB memory is around 500. I saw someone suggesting those because of the higher memory. Terrible for gaming but great for starting out with LLMs. It would be interesting to see how fast that would run if it was on a cheap motherboard with a Celeron processor and minimal system RAM. Again depending on how low the power draw is when the card is idling…. Either way keep up the good work. I’d focus on your satellites first. People could use those for presence etc and tie to public chatGPT in the mean time if they wanted.
@@FutureProofHomes this is something that is massivly missing in the wild, its extremely complicated to use alternative voice assistance other then google, Amazon or apple, i backed $1000 for mycroft but that seemed to be a flop. i really like your videos and have been following for a while. If i had more spare time id help out on the project in a heartbeat. Would you be interested in runnig this as a kickstarter?
I love the principal of it, however, having another server feels wasteful in addition to the HA's server. Why don't you team up with Nabu Casa given that they most likely already have an AI version of their server hw on the works. Even if they don't, they should, so teaming up would be symbiotic 🤔
Yep. Would be ideal to have an HA server w/ GPU to do everything under one hood. It should be noted that the HomeX satellite will not require the HomeX server. You cold power the satellite from your own hardware.
Id be ready to buy a bunch for each of the rooms in my house. It would be good to have the satellites < $100 and flexibility on hardware. Given the costs of JoshAI (and its closed nature) I would think 400-500 for the central server would be reasonable for a larger scale sub 4 second response core.
I am impressed with your work, but its going to turn into a hardware compute problem because LLM compute times will be difficult to decrease. Your box or hardware that you want to sell as a product is going to end up being quite expensive if you want a performant system. Best go as software package with customer supplied hardware requirements.
Valid points! All I know is that I cannot except a future where all the compute is in one companies cloud. We need a DIY solution for the nerds and a out-of-the-box solution for the non-technical people.
regarding pricing... I think that depends on the performance. Not just speed, but the abilities of the LLM. And I am concerned about how small the LLM will have to be to work well on the hardware. How capable will it be? LLMs are getting better at smaller sizes. Fine tuning will help to a degree. But I'm going to watch and see how things progress.
I was doing sketches for a similar thing :) Basically an all-in-one satellite plug and play device for one room. (Presense, BLE device tracking, temp, humidity, mic, speaker). Tho I was thinking for the AI to be a complete desktop with 2 GPUS maybe. That way it would be more powerfull and I could potentially use it for other tasks besides home automation like a real life assistant.
I really do want the product to be a self-contained unit rather than a multi-purpose server or exposed hardware. I’ll keep trying but I am already running things faster on my server.
Very interesting project. Could you imagine running the server on a Raspberry pi 5 with an coral USB accelerator ? Could drop the price by nearly 200 $.
If your GPU supports Cuda, try running it through your cuda cores rather than the onboard processor. I’m betting you could cut your llm time by 80%. (This was a suggestion from my IT son, way above my computer knowledge as I’m more hardware and he’s more software.)
I want something like this and would pay a few hundred dollars for it. One caveat is that I very much prefer to be able to train it to integrate directly with RTI, my home automation system, rather than having to add HA to my ecosystem.
Hope this goes somewhere. this is exactly what I want. though the hardware to run it seems odd. I would rather something that could work on most thin client PCs I can only assume there is cheaper ways to get a more powerful CPU and run consumer desktop GPU.
@FutureProofHomes any idea if I can use GROK ai on my Wyoming satellite? Love home assistant chat GPT but ain't fun! Lol have been looking at local AI but learning alot but trying not to rush anything.
In one of your questions on your website, you asked how much I would pay for one, and it's NOT $500! Sorry, but this isn't Apple tax! I would pay up, too, maybe $150.
That pricing is for the optional server with a gpu in it. You'd be hard pressed to even find a GPU for $150 that can produce a less than 3 sec response time. No Apple Tax here my friend. I promise. I do hope hardware prices go down in the future or software evolves to be highly optimized. Last thought too... the server is optional. If you want to use your own hardware/server/gpu you can do that.
Can you please post your prompt and functions? I want to play with functionary and a few other models to compare. Hopefully I can get a quantized version of firefunction running.
Prompt =========== Your name is Jarvis. I want you to act as smart home manager of Home Assistant. I will provide information of smart home along with a question, you will truthfully make correction or answer using information provided in one sentence in everyday language. Do not ask for permissions or confirmation. Current Time: {{now()}} Please announce all timestamps in a human readable 12hr format EST. Current Area: {{area_id(current_device_id)}} Available Devices: ```csv entity_id,name,state,area_id,aliases {% for entity in exposed_entities -%} {{ entity.entity_id }},{{ entity.name }},{{ entity.state }},{{area_id(entity.entity_id)}},{{entity.aliases | join('/')}} {% endfor -%} ``` Areas: ```csv area_id,name {% for area_id in areas() -%} {{area_id}},{{area_name(area_id)}} {% endfor -%} ``` The current state of devices is provided in available devices. Use execute_services function is only for requested action, not for current states. Make decisions based on current area first.
Functions ========== - spec: name: execute_services description: Use this function to execute service of devices in Home Assistant. parameters: type: object properties: list: type: array items: type: object properties: domain: type: string description: The domain of the service service: type: string description: The service to be called service_data: type: object description: The service data object to indicate what to control. properties: entity_id: type: string description: >- The entity_id retrieved from available devices. It must start with domain, followed by dot character. required: - entity_id required: - domain - service - service_data function: type: native name: execute_service - spec: name: get_attributes description: Get attributes of any home assistant entity parameters: type: object properties: entity_id: type: string description: entity_id required: - entity_id function: type: template value_template: "{{states[entity_id]}}" - spec: name: play_music description: Use this function to play music on a certain media_player parameters: type: object properties: music_query: type: string description: The artist, album, or type of music to play mass_media_player: type: string description: The correct entity value starts with "media_player" and ends with "satellite". required: - music_query function: type: script sequence: - service: script.play_music data: music_query: '{{music_query}}' mass_media_player: '{{mass_media_player}}'
I’m not a “guru” but I’m a self taught web and app developer I taught myself how to program in flutter for apps and Qwik for websites I do also know React native but I know Qwik better
Have you considered using a Coral TPU? instead of Nvidia GPU? I’ve seen some people who have set up local LLM‘s using Coral. And it might provide you a lower cost alternative to the Orion board.
Good stuff. The Orin Nano isn't quite fast enough despite being quite pricey. I wonder how it's GPU would compare performance wise with an old graphics card like a 1080ti or something else people might have gathering dust. It would be more power hungry and take up space (you'd have to hide a case somewhere) but might be an option for people not wanting to buy the jetson.
Agree. I wish the Orin Nano packed more of a punch. I really do want the product to be a self-contained and feel like a single unit rather than a multi-purpose server or exposed hardware.
After watching your previous videos about local Smart Home assistant, I also started to play around with Local AI, llama.cpp, vllm and so on.. Thanks a lot for all these amazing videos! I have an AMD GPU doing the heavy lifting, got ROCm / Hardware Acceleration to work and so on. But I somehow have issues with every prompt through Home Assistant returning me HTTP 500 and several validation errors. Could you maybe share your prompt and functions / tools? I'm especially interested in how you got the model to respond to your "chit chat" questions (first president of the US).
@@FutureProofHomes I tried to respond multiple times yesterday, but it seems no comment actually made it. I'm using functionary v2.2 small. It behaves similar in q4 or f16. The errors it generates are something like "missing tools / functions", etc. So it seems that the model might not be receiving from home assistant what it needs? Or the other way round maybe?
@@FutureProofHomes just tried the new python bindings and chit-chat now works more or less as expected. Have to try function calling and also it's multilanguage abilities next :) thanks for your great work! I've seen, that you were digging deep and opened some issues - highly appreciated!
I've been looking at the ASUS NUC 14 Pro which has an Arc GPU and NPU which could be interesting for AI, but unclear how it compares to the Jetson in terms of performance.
I think you’re too focused on making everything an IoT item that you’re not seeing the bigger picture. At 550 dollars you can buy a refurbished small form factor computer with an 8600T in it and add a brand new Geforce GTX 1650 to it. Use the excess horsepower to run other docker containers to assist home automation, or self host some cloud type apps. Use the Intel iGPU for transcoding. Load Linux and toss it in the closet next to your router and let it run. Now the remotes, those could be very compelling purchase.
How do get something like this shipped and running in homes all over the world though? Homes where IT nerds don’t live in them to set up and maintain the thing? That’s the challenge. Somebody will figure this out. And it will be a big win for all of us.
3 seconds for a local LLM is a GREAT objective.... I am interesting on a product like this but I think I prefer to run all on unraid local, with a powerful GPU.
good work.
TY!
@@FutureProofHomes I said a mistake, I want to say unraid and not unpaid.😄
I like where you're going with this!
While I applaud your approach on the server for most of your market, I, like many of the IT geeks here in the comments, am not your target audience for the hardware portion for that product. (But we all know people who are the audience for it! Don't be discouraged there!) I'm definitely keen, however, to keep an eye on the software configuration for my own eventual hardware.
But as far as the satellites go, I agree with others (and you) in the comments that Google and Amazon have spoiled us with the pricing on their devices. You're packing more features in, and dealing with quantities that won't provide the cost efficiencies in manufacturing, nor the financial margin for pricing it competitively with those pervasive competitors.
But... you're also offering significant flexibility and privacy. And for some, that's worth the extra spend.
I'm presuming that you're planning on building a integrated board rather than off-shelf components (such as the Matrix Voice) to shave production costs.
One suggestion I have is to offer two levels for the satellites.
For a large space like a living room or entertainment space, that large array of microphones are likely critical. But for a smaller room (like a bedroom or a home office) with far less ambient noise and space to cover, those seven microphones (at a base cost of $65 USD just for that hat) are overkill. Build a "Large room" model and a "Small room" model. The small room model could carry the same features, but fewer microphones (two or three might be sufficient), and perhaps a less expensive speaker, as it won't need to be as loud... Maybe not as diminutive as an ATOM Echo, but something closer to the SEEED ReSpeaker 2-mic, which retails for $12.90 USD for the hat. (Yes, I know it lacks the extra connectors you'd need for the other sensors, so there's some cost adjustment there.) Being that the Matrix is open-source hardware, this reduced configuration shouldn't be difficult for a qualified engineer with that design as an inspiration.
This could satisfy many people that would want a satellite in multiple living spaces, but might balk at the price of that one device across many rooms. Allow them a less expensive option for some of the rooms, with the nicer piece in the larger rooms, and you may hit a sweet spot in the market.
I'm hopeful the Satellite1 hardware we just launched on FutureProofHomes.net hits the correct price point. It's got some serious hardware in there to compete with the big guys and be a true contender to big tech. Checkout the waitlist and the product. Thanks for all your thoughts and feedback.
The 9s of prompt processing can be further optimized if you switch from llama.ccp to vllm(or aphrodite) and change the format of the LLM from GGUF to EXL2. Biggest advantage of EXL2 is that it runs fully on memory and vllm will do a better job keeping the model always loaded than default settings llama.ccp. Depending on the memory consumption of faster whisper you might also be able to go up to a Q6 but that will need testing. Another thing that might interest you is reading up on 4 bit KV catching.
My guess from the little info I could see on screen is that a non trivial ammount of those 9s is spent in moving the LLM in and out of memory but since you are using a 7B model at Q4 you should be able to leave the model always in memory and not have any issues, specially since you are using faster whisper with it's reduced memory footprint.
It will be a tinny bit more involved, ideally you might also want to ditch docker(dont trust me there, I'm just not familiar with dockers impact) but I do believe that would be a very nice starting point.
Oh, I didnt expect this to be highligted, I typed it in a rush and there are some errors. I'm nowhere near an expert just a dev who likes to thinker.
GGUF vs EXL2/AWQ/...: Both formats are perfectly fine for quantizing an LLM, GGUF is designed to run everywhere(ARM CPUs, x86, GPUs, ...) while EXL2/AWQ are focused on GPU only inferencing. Since the board has a GPU and that GPU is more powerful than the ARM cores, picking EXL2/AWQ might help efficiency a bit. On the other hand there is some anecdotal evidence that GGUF can be better(higher quality answers) at the same quantization than EXL2 but even then it might be worth a small tradeoff if you can improve the responsivenes a bit. Regardless this will need a lot of testing because different functionality in cards from the same generation can have a big impact on performance(P40 vs P100 for example).
Llama.ccp vs vllm(or others): There is nothing wrong with llama.ccp and you can use llama.ccp with EXL2. vllm and Aphrodite, KoboldCCP are just common recomendations for high performance, low latency inferencing(there are many others too).
4bit KV caching: You can use 4bit quantization with the context itself(prompt, responses, ...), this method is a bit smarter than 8 bit(truncation) and will save you some memory. When you only have 8Gb for the whole system every bit helps.
Docker: Same as with 4bit cache I just assumed taking docker out of the equation might save a bit more ram for inferencing. I did no research on this or it's impact.
Adding on, things are rapidly changing with 1bit and 2bit quants based on the 1.58b paper now starting to show up with very good results, memory capacity might not even be an issue by the time you are you are ready to ship.
Sorry if my rushed comment made you waste some time on non useful research.
Sounds like you know your stuff. If interested schedule some time with me via my website so we can speak. I have questions.
I will definitely try this! However, vLLM state as requirements that you need a GPU with minimum compute capability of 7.0 (Nvidia grade). I am using an older GPU (GTX 1080) with compute capability 6.1, will this work?
Currently, I am running llama-cpp-python within a Docker container and the Natural Language Processing of the "what time is it?" takes 3 seconds. Single function calling takes about 4 to 6 seconds, double function calling 8-9 seconds and triple 12 seconds.
@@FutureProofHomes @Rasekov I love when smart people with ideas meet like this. Very often, magic happens
Love the concept. Would definitely buy the voice/bluetooth/mwave part, if could be ported over to local LLM running on any personal hardware (old pc converted to server, nvidia GPU acceleration, MAC silicon run, or any hardware that can run the llama server).
this
I think it will be done by design. Any model, capable of OpenAI functions, can do the job with correct system message. The question stands in speed and reliability. But since open source model is used, you can spin it on, say, Ollama, on your own hardware.
Yep. That’s the easy path forward. But still has challenge that folks don’t configure the model/engine correctly. The main challenge is how to get this tech in everyone’s home.
Ready to buy 4 !
It would be cool to have a device that can plug directly into a sound system or single speaker that turns it into voice assistant with actually good sound quality (for playing music/casting)
This can do that!
I had bought an new Intel ARC A770 graphics card with 16GB RAM for running LLM. It was relatively cheap (under 300 Euros in germany). And it should outperform an Nvidia RTX 4060.
I haven't tested it yet since it was delivered yesterday and now i am setting up an new server (Proxmox on an AMD Ryzen 7 5700G).
I can't wait to see more videos from you for building my own offline alexa system with interaction in home assistant.
Yes. Always on, voice actived private AI with home assist would be great! £50 per room and £300 server would be reasonable.
Noted!
This concept is the holy grail for replacing a house full of Google Home or Alexa devices. I agree with everyone else that unless you can find some HW that is purpose built for LLMs, selling a server will likely not be as popular as having the customer provide their own compute.
As for pricing, the max I think i could pay would be $100 CAD per room. Any more and it is too expensive to get multiples for every room. Obviously the cheaper you make it the more adoption you'd see.
I assume the satellite would also run an audio streaming client like you did in your previous video. While I think an AUX out is a must so you can plug it in to a real speaker, you might want to consider putting a smaller one onboard for those who just want a one device solution.
So cool! Onboard and gave my feedback thru the link. I'd buy something like this in a heat beat to replace my AlexaHA voice control set up. Essential to me would be 1/ local 5G wifi 2/ TTS 3/ "just works" out of the box ..happy to do all the HA config but don't want any stuff around with hardware 4/ ability to select what AI engine to use both now and into the future. Thanks for the awesome video and fantastic build! What an exciting project!
I’d use Fallback Conversation Agent in HA. So easy commands can be done by Home Assistant quickly. And LLM would only be used for complex requests.
Noted! Good feedback!
The ability to differentiate the voices and start the prompt with such would be amazing (to tailor memory and responses)
I am loving to see this it's something I have been planning on making for a bit
Yep. This product isn't rocket science. But many people want it!
I just love the work that you are doing! Tried to go into the exact same direction but you are far more advanced. I also would love to see this setup as a product, because I realized that progress on HA also impacts my custom setup and therefore needs a to of attention and maintenance. So yes, I absolutely would be interested if you can offer this as a product in the future.
Thanks for the kind words @Marc. You'll be excited to know a new video is coming soon and that we just launched our own hardware on FutureProofHomes.net. It's a waitlist for now, but the product is real and I'm excited to share.
This is fantastic progress, I want one! What about NUC +16GB RAM, 512GB SSD & Coral TPU? sure wattage goes out the window.
Love your work, you inspired me to create something similar on my local hardware
Thank you! At this point in time this is all meant to be inspirational, so I'm glad it working. I'll continue driving for a more fleshed out product though.
Another really cool video 😎
Obviously, performance would need to dramatically improve to make this viable to anyone, 3 seconds is still probably pushing it even.
I think the price point could be a stumbling block too, the server looks like it will need a fair bit of grunt to run it all.
However, the privacy aspect is very appealing, especially in this day and age where everything seems to be stealing your data…
Best of luck with it, I will by keeping a close eye on your progress 👍🏻
Thanks for the kind & truthful words. If it was easy/fast/cheap everyone would be doing it.. today. We’re just not there yet… but it will happen. :)
@@FutureProofHomes It will happen? It's so encouraging it's not just vaporware!
super cool project. looking forward to this developing.
Thank you for the video. I was hoping you would continue to go down the path you were in the previous videos running LLMs locally but with a dedicated graphics card integration along with the raspberry pi zeros. A lot of folks already have a dedicated server that they can slap a dedicated GPU in.
There are some local LLMs that supposedly work well with HA but would rather fine tune something like the mistral LLM to perform the commands and also use it for a wealth of information.
Where I’m getting caught up is defining some of the perimeters that the large language models can use toexecute the commands/system prompts.
Perhaps I can do a deep dive on making LLMs great the outputs that home assistant expects.
I’ve tried all the LLMs (even the ones for HA), I’ve found this LLM the best all-around solution. After some fine-tuning it could be spot on.
It isn't that simple. I saw an interview recently with the guy who created JAnand Nvidia has already reached out to him wanting to build a JA LLM. Obviously Nvidia wants to use Jetson but they have to port Wyoming. And other add ones to run on GPu's and not CPU's so they have to optimize the code. There is a detailed thread on nvidias dev forums with the head speech guy from HA. Even Nvidia developers are having issues and get the framework in place.
Hi! Definitely YES. Very interesting project. When do you think you can have it ready approximately?
We're waiting for the next episode ;)
Me too! :) :) I know what you guys want though. The FutureProofHomes.net site just went live with waitlist for our satellite hardware.
Love it! Get the performance a bit faster and I'll be ready to buy. I do need to do presence detectors, I'd get this, but it's a bit too large to put in the corner of my rooms.
Moar performance! I hear you. We’re gonna find it too.
@@FutureProofHomes if I can help in any way, let me know. I like what you are doing man, keep it up!
I am still confused of the pros and cons of going down this path vs Willow. I think you'd need a video to explain why this option is better. Advantages in the future etc. Exciting project. I watch and wait with interest.
I think this is a very interesting concept. However in my opinion it would become even more interesting if you can reduce price of satellite to similar level as amazon devices. Also I think it would be great if satellite is more like appliance, meaning microcontroller and not Raspberry Pi with linux distro. There is ESP32-S3 and recently announced ESP32-P4 with more support for AI. It would be interesting to see how much can be done with such device, probably stable local wake word and more.
Agree. The final product will not be based on Rpi. And cost efficiency for global distribution is important. I'll do my absolutely best to hit the mark without sacrificing key features. $20-$40 Amazon prices may not be possible however. Amazon looses money on Alexa products because they make revenue elsewhere, which I don't/can't do with a fully private voice assistant.
I would consider to pay €250-€300 for the brains and €75 for a satellite.
This is on the high end for a product with some rough edges, but worth it given the unique local characteristics
Appreciate the feedback!
I've been looking at doing something similar to this, although my setup is perhaps slightly more 'traditional'. I have a giant server with the potential for a full size modern GPU, and I'd really only need a bunch of mic remotes for each room. While I do want GPT functionality, the main purpose is to actually continually record audio, and only save it if I tell it. It's a way for me to hash out ideas/solve problems/etc, when I don't have pen and paper in front of me. I can then have an LLM ingest the audio, run a summariser, and log it into my notes database along with the audio.
Fantastic! I'd ceratailty be interested if it could support 10+ satellites. I've been playing with some Onju voice devices which do work but not quite hitting the mark functionality wise. Including mmWav and ESPresence is definitely a plus. I think the key to success is hainvg be as easy (or easier) to get up and running as a Google Home or Alexa device. Well done. Keep at it.
Agree. Elegant onboaring and ease of use is the key.
I'm liking where you are going with this, to be quite honest.
Thanks @MatthewN8OHU!
Super cool!! But I’m not quite following regarding the LLM setup. However it is fairly simple to install the Jetson Orin Nano 8GB with a downloaded image. Just connect the NVMe memory via a USB cradle to a PC and “burn” the downloaded image onto it (e.g. using Raspberry Pi Imager). Afterwards mount the image onto a Linux system and edit “/boot/extlinux/extlinux.conf” change “root=/dev/mmcblk0p1” to “root=/dev/nvme0n1p1”.
Local everything! I love this idea and that all the speakers aren't going to snitch to Google... but as someone with a Home Lab in the basement, I have MORE than enough CPU/GPU. Would the server also come in a docker stack or a VM, with the 'drop in' style setup? Another odd sized, non-rackable device isn't something I'd really need.
Great work. I'd prefer to run it on standard PC/server hardware instead of the jetson.
I would definitely buy something like this. If needed wonder if you could offload the LLM portion to a central server with more power. Just use the nodes for voice recognition portions.
Yep. When the satellite is released you should be able to use with your own server or the HomeX server if you need/want it. It's all open source software. Sky is the limit.
Very excited to sit down and watch this when I can pay more attention. I've put in 25 hours to trying to make local AI work and the first 20 of that was just trying to figure out why it wasn't working with my GPU. Their documentation is written before people that are working on the project, and the error logs all but useless. Looking forward to trying this method.
Yep. This is all new tech. So a bunch of nerds writing documentation for themselves. :) At least we know we're not late to the game.
Yes. Comes down to the price! If you use the "sell many" format, I would probably buy 1
Noted!
Yes yes yes!!! Great work. I want it
Love your work! Keep it up!
Can’t stop. Won’t stop.
Might be slow, but this impressive work! Thanks for the video!
Thanks for watching!
Super interesting project! Just build my first voice assistant box based on a esp32 s3, an addressable led strip, little speaker and an inmp441 microphone for around 20€ and now im hooked 😁 it works beautifully with local wakeword detection. Not yet sure about the microphone performance in a loud environment but on the first look it works like a charm (with homeassistant cloud and chatgpt as the brain) :) maybe worth a look if you want to get up a "HomeX lite" with more competitive pricing! Would love a off the shelf version for a reasonable price 😊
Just launched a waitlist for our upcoming voice satellite hardware on the website. I hope this first product's price point is "lite enough". Let me know. :)
@@FutureProofHomes Just subscribed to the waitinglist :) Do I understand it right, that ill need a rpi zero 2 + your PCBs to work? So its about 100€ for the full system? Or does it work without a Pi?
I would love to have this on the market. Happy pay up to1500$ and monthly for upgrade kits and support
High roller over here! :D
I mean for the right specs and service it could definitely be priced for it :)
this is amazing !! looking forward next videos!
Yes Yes Yes. I call dibs at being first in line.
My bouncer knows your name now. We got you.
Biggest strugge is local satellite with good microphones. Totally onboard if you want to go and order 300+ ready sets from PCBway or other sources. Maybe they have nice looking cases also - not like those home printed crap. I'd suggest to design board for mmWave like optional maybe?
Go to FutureProofHomes.net. Website just went live to get on waitlist for our dedicated satellite hardware.
I could definitely use one of these in each room.
Lol! The Satellite1 waitlist just went live on our website. Check it out!
Are you envisioning a completely custom built product at the end of this? Or still something built around 'off the shelf' components? It's a super cool project, and I can certainly see a lot of demand for an 'off-the-cloud' AI assistant. I guess price is going to be a barrier for many if the entry point is north of $500.
Custom product at the end of the rainbow. Server will be optional as the software will be open source. If you want to use your own server/gpu, then by all means, do that! That’s the vision.
@@FutureProofHomes It's a very cool project, and an ultimate goal for many home automation enthusiasts I'm sure. I expect there is a good chance that by the time you get your product designed the built in NPU units hitting all mainstream CPUs at the moment will be both better and more available in lower end hardware, lowering the cost of entry. It would also be great to have the ability for your satellites to work with a lower end HA set up and offload the AI part to the cloud... I know this may be agains the original goals of the project, but if your satellites turn out to be the best thing on the market for that purpose you would capture a whole lot more folk who aren't ready to spend north of $500 on the server.
I know it was kind of an open en ded question in your video... but do you have an idea of what kinda price-point you are targeting for your satellites?
@@FutureProofHomes WOW yes that is how it should be! I am going to follow this channel closely !
Great to catch up and see another video, I am at least casually interested in the multi-sensor satellite and the server. I re-subbed today, somehow YT dropped me again. Keep up the good work... we're watching!
Glad to have you back. :)
Absolutely amazing. Looking forward to a sub 1 second response time that feels like you are talking to a human in real time!
Me too! It's gonna happen.
You had me at Jetson Orin Nano
Looking forward to your GPT-4 updates!
I'm doing the same thing except I offload everything thing to my homelab which is so much much faster than yours.
And I prefer to make esp mic works actually, what's harm in letting your homelab listen to the mic, it's powerful enough.
Very cool project. I also would like to build something like that myself. Your video made me aware of the Jetsons - thank you for that. I'll do tests with my desktop GPU first, but in the longer term I'd like to do it like you (preferably with the assistant called Jarvis :). I watched your video a few weeks ago and yes: please stay tuned - the (rapid) development of your environment, as can be seen in your videos, is impressive 👍👍
Very cool project, cant wait to see what you can make, satallites are needed, especially with the led's :). but i prefer to run to run the LLM on my unraid server with gpu
You can run Functionary + vLLM on Unraid right now if you want. docs.vllm.ai/en/latest/serving/deploying_with_docker.html
Using the HomeX server will be optional. Use your own GPUs/Server if ya want
Having just the client would be amazing with esp32 and raspberry pi with mmwave espresence etc
This setup would require really high performing hardware on the server side, and then limit it to be only used for voice assistant. Ideally that same hardware could/should be accessible for any local computing that would be needed.
For example object detection for the security cameras.
If the server would be built around PC hardware to get the price down it could still be made into a custom all in one device made for this purpose only.
It wouldn't be super compact but at least it would have the performance needed.
But in all honesty if paying the amount the server would cost the end user would still expect it to be able to be used for more I think.
This last suggestion would be a much bigger investment on the software side but also at least in my opinion would also make the hardware a much more attractive purchase.
If the server was setup as a VM with some easy to use UI where the user could pick and choose what features(VM's) should be on the machine so things like
- Home Assistant
- Voice Control
- Security Cameras (Frigate NVR)
Mini-PC with multi-purpose capabilities. I get it. Thanks for thinking out loud here!
Have you considered a hybrid business model similar to something like ixsystems does with TrueNAS?
Essentially you'd provide the software, or validated configuration of various softwares bundled under a single package under open-source license. If you did this for both the client side and server side it would likely encourage adoption, community engagement and contributions to the project.
And then sell "prosumer" or even commercial grade and supported "official / validated hardware" for it to run on? To further fund the project?
Im not sure how feasible this is for this project. But just an idea as many other businesses operate like this. RedHat, Canonical and Docker just to name a few.
I haven't, but now I am. :) Really interesting food for thought. Noted. Thank you.
Amazing video! Thanks for the effort. Even though you don't have a tutorial yet, I was able to have Extended OpenAI work with the Functionary LLM by going through the settings used in this video and your multiple Github issue posts. I am currently running the small-v2.4 model on a GTX 1080 and can get Natural Language Processing times of 3 seconds!
Huge! Awesome to hear!
Hmm, wonder what my 3090 would do?
Love the project. Keep up the good work.
Thanks, will do!
do want! very cool :)
I am really keen on the satellite with added microwave presence detection. Regardless of the other hardware, the all in 1 satellite is a winner combination device
Perfect. Yes. The satellite itself can be successful in its own right.
Awesome! Learning a lot here thank you!
Happy to hear! You’re welcome!
I had a handful of Google Home devices scattered around my home. Music playback was the goal. We used their voice control sparingly. I removed them after they repeatedly responded to conversations even though the wake work hadn't been spoken. (Yes, I know I could turn off their microphones.) I didn't like looking at them sitting on shelves and tables, and I didn't like the sound quality. I just completed the installation of ceiling-mounted stereo speakers in multiple rooms. The amps and streaming devices are rack-mounted in a central location which means I have full control over streaming technology, amp specs, and I don't have to worry about speaker placement near outlets. But I've lost voice control. I watch lots of videos about wall-mounted dashboards and the idea is appealing, until I think about getting up and walking to another location to see the status of something or controlling a scene, device, etc. It seems to me voice control is the better option. Voice control is more convenient. We all carry phones around so I can use it as my mini-dashboard when I need that sort of interface. I think the answer to your "price" question may be connected to competing with the cost of implementing a dashboard solution. A voice solution is inherently more mobile, too. I didn't give you a definitive answer but I will intently watch this pogress. Thank you for the effort.
Problem with using a watch or phone as a satellite is you don’t always have the device with you, or a guest doesn’t have it with them, and when the assistant speaks back you can’t hear them. I think we need home voice assistants and personal voice assistants on devices to work together.
Yep I’m ready to buy
How many people have some form of GPU around already for transcoding on their plex servers? A 4060TI with 16GB memory is around 500. I saw someone suggesting those because of the higher memory. Terrible for gaming but great for starting out with LLMs. It would be interesting to see how fast that would run if it was on a cheap motherboard with a Celeron processor and minimal system RAM. Again depending on how low the power draw is when the card is idling…. Either way keep up the good work. I’d focus on your satellites first. People could use those for presence etc and tie to public chatGPT in the mean time if they wanted.
Absolutely! satellites are the key here..
I would love to see this project!!
Absolutely would buy, name your price man! Haha
Nice! Really glad to hear peeps are interested.
@@FutureProofHomes this is something that is massivly missing in the wild, its extremely complicated to use alternative voice assistance other then google, Amazon or apple, i backed $1000 for mycroft but that seemed to be a flop. i really like your videos and have been following for a while. If i had more spare time id help out on the project in a heartbeat. Would you be interested in runnig this as a kickstarter?
I love the principal of it, however, having another server feels wasteful in addition to the HA's server. Why don't you team up with Nabu Casa given that they most likely already have an AI version of their server hw on the works. Even if they don't, they should, so teaming up would be symbiotic 🤔
Yep. Would be ideal to have an HA server w/ GPU to do everything under one hood. It should be noted that the HomeX satellite will not require the HomeX server. You cold power the satellite from your own hardware.
Id be ready to buy a bunch for each of the rooms in my house. It would be good to have the satellites < $100 and flexibility on hardware. Given the costs of JoshAI (and its closed nature) I would think 400-500 for the central server would be reasonable for a larger scale sub 4 second response core.
I know Josh.ai! They know me. Hey guys!
11:05 parallel function calls is about the LLM returning multiple function calls at once, instead of just 1
So no luck having several assistants working in parallel?
Monsti, where did you read this? Can you share.
I did test multiple voice assistants at once. It works.
Great one, full power to you man!
TY!
I am impressed with your work, but its going to turn into a hardware compute problem because LLM compute times will be difficult to decrease. Your box or hardware that you want to sell as a product is going to end up being quite expensive if you want a performant system. Best go as software package with customer supplied hardware requirements.
Valid points! All I know is that I cannot except a future where all the compute is in one companies cloud. We need a DIY solution for the nerds and a out-of-the-box solution for the non-technical people.
regarding pricing... I think that depends on the performance. Not just speed, but the abilities of the LLM. And I am concerned about how small the LLM will have to be to work well on the hardware. How capable will it be? LLMs are getting better at smaller sizes. Fine tuning will help to a degree. But I'm going to watch and see how things progress.
I would gratefully buy some of the satellites but i would prefer to run the server stuff on my local NUC ( if possible).
As long as your NUC has a GPU then all of this is possible. Take a look at the new Ollama integration home assistant has rolled out.
eagerly awaiting an update.
Just launched a waitlist for our upcoming voice satellite hardware! Checkout the website and the product! Video coming soon.
@@FutureProofHomes Signed up! Thanks!
I was doing sketches for a similar thing :) Basically an all-in-one satellite plug and play device for one room. (Presense, BLE device tracking, temp, humidity, mic, speaker). Tho I was thinking for the AI to be a complete desktop with 2 GPUS maybe. That way it would be more powerfull and I could potentially use it for other tasks besides home automation like a real life assistant.
I really do want the product to be a self-contained unit rather than a multi-purpose server or exposed hardware. I’ll keep trying but I am already running things faster on my server.
@@FutureProofHomes Yep, as for the product I understand and agree. You are doing amazing stuff here. Really appreciate all of it.
Very interesting project. Could you imagine running the server on a Raspberry pi 5 with an coral USB accelerator ? Could drop the price by nearly 200 $.
Coral dont have the hardware capacity To run llama. Only have 8mb memory, and llama likes memory xD
Coral is not only too weak, but isn't purposed to run LLMs
Unfortunately Coral doesn’t do LLM inference well. Gotta keep looking for best solution out there. The hunt continues!
If your GPU supports Cuda, try running it through your cuda cores rather than the onboard processor. I’m betting you could cut your llm time by 80%.
(This was a suggestion from my IT son, way above my computer knowledge as I’m more hardware and he’s more software.)
Tell your son this is already running it through CUDA on GPU, but to keep the recommendations coming! :D
It's like having Einstein in your basement and asking him to turn your lights on and off.
Not really.
If you want it could tell you anything you want to know about the theory of relativity.
I want something like this and would pay a few hundred dollars for it. One caveat is that I very much prefer to be able to train it to integrate directly with RTI, my home automation system, rather than having to add HA to my ecosystem.
Inquiring minds want to know why you feel this way.
Hope this goes somewhere. this is exactly what I want. though the hardware to run it seems odd. I would rather something that could work on most thin client PCs I can only assume there is cheaper ways to get a more powerful CPU and run consumer desktop GPU.
Let's go 🔥
Yeaaa boyeee!
@FutureProofHomes any idea if I can use GROK ai on my Wyoming satellite?
Love home assistant chat GPT but ain't fun! Lol have been looking at local AI but learning alot but trying not to rush anything.
Any update on this? It looks great!
Just launched a waitlist for our upcoming voice satellite hardware! Checkout the website and the product! Video coming soon.
Why do you use a slow pi?
Plug-and-play out-of-the-box? Shut Up And Take My Money!
In one of your questions on your website, you asked how much I would pay for one, and it's NOT $500! Sorry, but this isn't Apple tax! I would pay up, too, maybe $150.
That pricing is for the optional server with a gpu in it. You'd be hard pressed to even find a GPU for $150 that can produce a less than 3 sec response time. No Apple Tax here my friend. I promise. I do hope hardware prices go down in the future or software evolves to be highly optimized.
Last thought too... the server is optional. If you want to use your own hardware/server/gpu you can do that.
Would the raspberry pi ai kit with the halo-8L work in place of the jets on nano?
Added to list of things to research. Don’t know off top of my head. Thanks for pointing out. Anyone else know?
Why this project over Willow? Willow already has response times less than an Echo
Need to dig into Willow more. Does it integrate with LLMs? Does it control & integrate with Home Assistant devices easily?
Ready to buy 2..😎 !
The Satellite1 waitlist just went live on our FutureProofHomes.net website. :)
I would love to replace a house full of Google Nest Minis with this.
It's gonna happen. Just a matter of time.
Super cool!
TY!
amazing work
Thank you! Cheers!
I'm really interested in putting this in a course and selling your hardware to our 100k+ students on our platform
Fascinating. :)
Just when I thought I could finally use my jetson tx2 and jetson nano I realise they're on maxwell architecture :(
Interested in buying or building some of the satellites. My machine that runs home assistant already has STT on it
Can you please post your prompt and functions? I want to play with functionary and a few other models to compare. Hopefully I can get a quantized version of firefunction running.
Prompt
===========
Your name is Jarvis. I want you to act as smart home manager of Home Assistant.
I will provide information of smart home along with a question, you will truthfully make correction or answer using information provided in one sentence in everyday language.
Do not ask for permissions or confirmation.
Current Time: {{now()}}
Please announce all timestamps in a human readable 12hr format EST.
Current Area: {{area_id(current_device_id)}}
Available Devices:
```csv
entity_id,name,state,area_id,aliases
{% for entity in exposed_entities -%}
{{ entity.entity_id }},{{ entity.name }},{{ entity.state }},{{area_id(entity.entity_id)}},{{entity.aliases | join('/')}}
{% endfor -%}
```
Areas:
```csv
area_id,name
{% for area_id in areas() -%}
{{area_id}},{{area_name(area_id)}}
{% endfor -%}
```
The current state of devices is provided in available devices.
Use execute_services function is only for requested action, not for current states.
Make decisions based on current area first.
Functions
==========
- spec:
name: execute_services
description: Use this function to execute service of devices in Home Assistant.
parameters:
type: object
properties:
list:
type: array
items:
type: object
properties:
domain:
type: string
description: The domain of the service
service:
type: string
description: The service to be called
service_data:
type: object
description: The service data object to indicate what to control.
properties:
entity_id:
type: string
description: >-
The entity_id retrieved from available devices. It must
start with domain, followed by dot character.
required:
- entity_id
required:
- domain
- service
- service_data
function:
type: native
name: execute_service
- spec:
name: get_attributes
description: Get attributes of any home assistant entity
parameters:
type: object
properties:
entity_id:
type: string
description: entity_id
required:
- entity_id
function:
type: template
value_template: "{{states[entity_id]}}"
- spec:
name: play_music
description: Use this function to play music on a certain media_player
parameters:
type: object
properties:
music_query:
type: string
description: The artist, album, or type of music to play
mass_media_player:
type: string
description: The correct entity value starts with "media_player" and ends with "satellite".
required:
- music_query
function:
type: script
sequence:
- service: script.play_music
data:
music_query: '{{music_query}}'
mass_media_player: '{{mass_media_player}}'
See my previous videos so that the prompt and functions make more sense.
Woah, nice!
I’m not a “guru” but I’m a self taught web and app developer I taught myself how to program in flutter for apps and Qwik for websites I do also know React native but I know Qwik better
Have you considered using a Coral TPU? instead of Nvidia GPU? I’ve seen some people who have set up local LLM‘s using Coral. And it might provide you a lower cost alternative to the Orion board.
Good stuff. The Orin Nano isn't quite fast enough despite being quite pricey. I wonder how it's GPU would compare performance wise with an old graphics card like a 1080ti or something else people might have gathering dust. It would be more power hungry and take up space (you'd have to hide a case somewhere) but might be an option for people not wanting to buy the jetson.
if your 1080TI is gathering dust, you're bad, bad person. :)
Agree. I wish the Orin Nano packed more of a punch. I really do want the product to be a self-contained and feel like a single unit rather than a multi-purpose server or exposed hardware.
After watching your previous videos about local Smart Home assistant, I also started to play around with Local AI, llama.cpp, vllm and so on.. Thanks a lot for all these amazing videos!
I have an AMD GPU doing the heavy lifting, got ROCm / Hardware Acceleration to work and so on. But I somehow have issues with every prompt through Home Assistant returning me HTTP 500 and several validation errors.
Could you maybe share your prompt and functions / tools? I'm especially interested in how you got the model to respond to your "chit chat" questions (first president of the US).
What model are you using. What is the server side error behind the 500 error?
@@FutureProofHomes I tried to respond multiple times yesterday, but it seems no comment actually made it. I'm using functionary v2.2 small. It behaves similar in q4 or f16.
The errors it generates are something like "missing tools / functions", etc. So it seems that the model might not be receiving from home assistant what it needs? Or the other way round maybe?
@@beatwalti4433 Think this GH issue will solve your issue: github.com/MeetKai/functionary/issues/136
@@FutureProofHomes just tried the new python bindings and chit-chat now works more or less as expected. Have to try function calling and also it's multilanguage abilities next :) thanks for your great work! I've seen, that you were digging deep and opened some issues - highly appreciated!
I've been looking at the ASUS NUC 14 Pro which has an Arc GPU and NPU which could be interesting for AI, but unclear how it compares to the Jetson in terms of performance.
I think you’re too focused on making everything an IoT item that you’re not seeing the bigger picture. At 550 dollars you can buy a refurbished small form factor computer with an 8600T in it and add a brand new Geforce GTX 1650 to it. Use the excess horsepower to run other docker containers to assist home automation, or self host some cloud type apps. Use the Intel iGPU for transcoding. Load Linux and toss it in the closet next to your router and let it run.
Now the remotes, those could be very compelling purchase.
I guess one goal is to keep it at a low energy consumption.
It is a question I suppose of power usage over time. The power saving by down clocking when idling is more effective on more recent computers.
How do get something like this shipped and running in homes all over the world though? Homes where IT nerds don’t live in them to set up and maintain the thing? That’s the challenge.
Somebody will figure this out. And it will be a big win for all of us.
Extremely good point! I humbly withdraw my criticism.