It strikes me as clear that multimodal models will dominate this sort of robotic humanoid LLM interaction at some point -- tonality, pacing, body language etc. make up a drastic amount of human (sub) communication, and this needs to be considered. Neat that the one you had a chat with was a localized model, a fine tuned Llama variant I assume? I'd like to see your conversation with an improved/tweaked version and on a SoTA non-local LLM, just for comparison sake to get a closer gauge of how far we are out from genuine comfortable interaction.
It strikes me as clear that multimodal models will dominate this sort of robotic humanoid LLM interaction at some point -- tonality, pacing, body language etc. make up a drastic amount of human (sub) communication, and this needs to be considered.
Neat that the one you had a chat with was a localized model, a fine tuned Llama variant I assume? I'd like to see your conversation with an improved/tweaked version and on a SoTA non-local LLM, just for comparison sake to get a closer gauge of how far we are out from genuine comfortable interaction.
Cute objects on the desk sir😙👀