What should we build next? Next up on the channel is likely going to be the Python version of this tutorial followed by some updates regarding interruptions and having the AI talk first.
This makes things so much easier. I was trying to do this manually, converting voice to text, sending prompt to openai, and then converting the response back to voice..
Now you just have to provide customer data from Segment to the model. Then when a customer calls the model can give a personalized answer. For example, a customer calls a car repair shop. Then the model using RAG accesses a customer’s data to check on the status of a car repair. Lastly, the model responds with the status of the car repair. All the customer has to do is call the car repair shop and ask a simple question with voice. A great customer experience if you ask me 😊
Yes, this is a great scenario! That's exactly the type of exciting things that can be enabled by combining all of the pieces. Thanks for watching and for the comment!
@@EDashMan Let me know how it goes! I know OpenAI is rolling this out in stages so if it doesn't work at first, check to make sure you have access to the OpenAI Realtime API. I was blown away the first time I got this working though. Feel free to mix up the SYSTEM_MESSAGE prompt and the temperature a bit too. It's pretty amazing. I feel like I should have it coach me through making a meal :D
@@TwilioDevs Yeah I'm getting: Error in the OpenAI WebSocket: Error: Unexpected server response: 403 I don't even have gpt-4o-realtime-preview-2024-10-01 in my playground. I guess I can't use it yet :(
Great. How difficult it is to modify this to use the realtime api with an openAI assistant trained on a specific knowledge base instead of a generic openAI?
Hey! I have use the function calling in this real time api for calendar bookking but I am struggling with how to send the response of the function back to API for TTS. Can you please help me with that?
got this working using my azure endpoint with some help from chatgpt! I did notice this example doesn't handle interruptions, will you be updating the repo with more features in the future?
that's awesome! thanks for giving it a try. We decided to leave interruptions out for this blog post/video because the code was already pretty long. We talked about doing follow-ups for things like interruptions and function calling. I'll check with the team and see what the plan is.
@@TwilioDevs to be honest, I'd really appreciate this - this is a huge part of what makes this tech so amazing. Any high level support on how to accomplish this, if it's even possible? thanks!
hey this is amazing , revolutionary even! , how do i connect my model to a vector_store / a knowledge base that it can refer to? or is that not supported yet ? I am trying to figure out if i should implement that in the function calling ; tools {} parameter or not? Thanks !!!!
Right after the code sends the sessionUpdate object you can send something like this (feel free to modify the prompt): const event = { type: 'conversation.item.create', item: { type: 'message', role: 'user', content: [ { type: 'input_text', text: 'Please greet the caller and say "hi there, how can i help you?"' } ] } }; openAiWs.send(JSON.stringify(event)); openAiWs.send(JSON.stringify({type: 'response.create'}));
Hello sir, i did something similar in python flaks. But i am getting huge delay ( 5 second ) to download the audio file. From twilio. Any alternative please reply
Thanks again for the tutorial. What is needed to make it possible to interrupt the AI? I think Twilio may be buffering received audio from OpenAI that it finishes playing even when interrupted. I tried several changes to try to fix things. I wonder if the audio from OpenAI is sent to twilio that is buffering it. Then when it is interrupted, that is why it still keeps playing what it's already received. Is there a way to tell twilio to stop playing what had already been sent when an interruption is detected. The Web only implementations with webrtc handle interruptions immediately just like the ChatGPT official app. I know phone networks have a delay but this is more than that is seems to keep talking for many seconds. Thank you in advance.
Hey hey! Check out this timestamp from our recent livestream where I helped Alex and Bianca add this (i'm the robot 😂). The timestamp starts at their first interaction with it where they see how the lack of interruptions impacts things and then we walk through how to add a version of interrupt to it: ua-cam.com/video/_itrbiszfiE/v-deo.htmlfeature=shared&t=2843
@@TwilioDevs Perfect and thank you! I watched the livestream recording and rebased my stuff on the newer version. It is working well now. What are you using to be a robot in the livestream?
@@mspicela Total custom build inside of OBS (obsproject.com). It's a pile of PNG files, a waveform generator for the mouth, and some subtle motion effects.
Can you share how you implemented this? I tried sending the following commands when the response type is input_audio_buffer.speech_started: await openai_ws.send(json.dumps({"type": "response.cancel"})) await openai_ws.send(json.dumps({"type": "output_audio_buffer.clear"})) No dice though :( Your help here would be greatly appreciated!
I usually leave out deployment since it can be a fairly personal choice and outside of the scope of the tutorial. That said, this code should work anywhere you can deploy a full Node.js app. Some popular options include Render (render.com), Railway (railway.app), DigitalOcean or building your own setup within a VPS. Lots of options out there! Thanks for watching and let us know if you need any further help.
This was a great video. I am looking for a way to output the conversation both what was received and how it responded. Is that possible through the realtimeapi? Currently I can capture the response in text but I have not figured out how to capture what is said to it in text, via realtimeapi. Thanks again.
@@TwilioDevs yes, and thank you so much. I can get the text for the realtime api response, but the text for the caller is where I am struggling. I don’t know if realtime has a way, and I recently saw something in Twilio that could possibly help. But thank you again, I truly appreciate your response and consideration.
"Thank you, Brent! Do I need a Twilio subscription for communication between two valid numbers? (The trial only provides one valid number.) When I try to make a call using the Twilio dev phone with the same number, I don't receive anything." it seems i need two numbers?
You can add a verified number to test your app with your own phone during trial: help.twilio.com/articles/223180048-Adding-a-Verified-Phone-Number-or-Caller-ID-with-Twilio
I struggled to make audio input detect for a specific language, even with whisper’s language parameter. Tell me if u were able to choose any other language.
This is going to really help you guys. I worked on this immediately when this was dropped but this setup has a weakness. Interruptions don’t work when you interrupt the agent in the middle of a larger audio playback (ask it to read an example paragraph) and then try to interrupt it in the middle - it won’t work. I tried messing with it but nothing worked.
Thanks for the fantastic video and do I need to upgrad my twilio account to a full version to perform this function? I have set up everthing right based on the tutorial but no response from the AI even I spoke the first sentence. Alas..
Is this still working? I got it to work some weeks ago, but strangely, it is not working anymore - When I call my Twilio Phone Number, in the nodejs output I get the event "input_audio_buffer.speech_started", and after I finished speaking, nothing happens, and the bot does not answer me.
@@AbhishekMishra-db2tj Hey, somehow it does not properly detect when I finished speaking with my phone. When trying from a different phone, it worked. Not sure why that is the case.
To avoid any confusion, it’s important to clearly state that even using the development phone incurs charges for both making and receiving calls(x2 charges), as some users might assume it’s free otherwise. Why not be clear?
The Twilio Dev Phone documentation page states that it is using one of your own Twilio numbers to make the call. There's no intended deception here. I used the Dev Phone in the video as an option to not use my personal phone for the demo since it's easier to see the interaction and logs. It's just an option.
Thank you for the tutorial. I built an AI phone agent/bot with this combined with function calling from OpenAI and it worked very well. Unfortunately, now I can no longer edit my phone numbers configuration -- "Voice configuration is unavailable for this phone number" -- but this isn't true because it lists my URL still and worked for days. To make things worse, the support spins and spins so I can't submit a trouble ticket.
Hi! Thanks for watching and I'm happy you built this. Sorry you're having trouble though (both with the app, and support). If you go here: help.twilio.com/ and ask a question, see if anything there helps resolve this. If not, there's a section at the bottom asking "Is this helpful?" and you can hit the thumbs down which will prompt you to either log in to submit a ticket or click the link next to it to submit a ticket without logging in. Once you have a ticket number, I can try to help escalate (no promises but worth a try!).
Hello Michael, Thank you for getting in touch with our Social Support Team. We sincerely apologize for the inconvenience caused. Could you please dm us the email address on file?
How to make OpenAI speak the function_call results? Like if the appointment is created successfully, then how to let the user know that the appointment is created successfully.
How can we get access to Realtime API on Openai account (I have paid account already). I integrated code and added openai key but problem is that during call, it's started communicating and not listening to me (No two-way communication). Can someone help me out?
So can we host this on Twilio serverless? If so, which file would we point the incoming call to? Also, it can be modified to greet the caller first, correct? I'm thinking for a business AI assistant to take calls, give information etc. I have created these AI apps with Vapi, but it gets pretty expensive. Twilio would be so much cheaper.
I think with the need for a persistent web socket connection you're probably going to be best served doing this outside of our serverless Twilio Functions. I can double check with the team though! As for greeting, you can definitely change the tags to customize the greeting from Twilio or I believe you could pre-prompt OpenAI with a text prompt using the Realtime API if you want the greeting to come from the assistant.
@@TwilioDevs Thanks, I'll mess around with it some. Is that voice coming from AWS? I've never heard that voice, but its really good and would be terrific for most professional business applications. The latency is next to nothing, which has been the biggest hurdle it seems with these voice AI assistants. Good to see Twilio is now in the game!
@@TwilioDevs I’ve created a twilio program before but using the gather method i was able to choose the language, but with openai realtime api i tried their language parameter for whisper-1 and it doesn’t work. And sadly the current state of auto detection is 75% flawed in my tests.
@@WaiZe0 At 03:02 we set up a system prompt. You can tell it what language you'd like for it to use in that prompt (and also tell it how to greet the caller, etc.). From my testing it has obeyed that quite well. I told it to converse only in Spanish and I wasn't able to get it to break out of that even by insisting I only knew English.
@@TwilioDevs I noticed it works well in English and Spanish, but im working with Arabic and it gets it only 1/10 times even with the clearest system prompt. Is there a way to set language like Twilio’s gather method?
I'm so frustrated I'm literally at the last step. I got the twilio and openai API to work together and when I call the phone number it says please wait speak your AI agent brought to you by openai and twilio and then says okay you can speak and then hangs up. Can anyone help I have been using chat GPT and Claude and they're both making me run around in circles
The symptoms sound like an OpenAI Realtime API key issue. Seems like the call is hanging up at the point the OpenAI Realtime API should be getting connected. Are you getting any errors in the terminal? Please refer to the blog post or GitHub repo in the video description to make sure your code is 100% correct. You can also check on your API key's access at platform.openai.com
Confused. Instructions say "Step 2: Get your Account Sid and Auth Token from the Twilio Console to get started.", but nowhere does it say what do with them. Also call connects ago, but it can't seem to hear me, then disconnected after 5 seconds. Related? Connected to the OpenAI Realtime API Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}} Disconnected from the OpenAI Realtime API
If the call is working at all, the Twilio side of this is fine which means you're okay on the Twilio credentials front. This looks like it's not getting audio over to the OpenAI API. There are some more logging types you can enable with the code in the blog post. Can you try turning those on and see what you get in the terminal?
Hi Bharath, Thank you for getting in touch with our Social Support Team. Unfortunately, Twilio does not offer the ability to purchase Indian phone numbers directly. However, there are some workarounds and considerations you can explore. Kindly dm us for more information.
I believe I read that OpenAI will detect the regional accent and speak the responses in that accent. I think you can add that to the instructions (SYSTEM_MESSAGE) in the app to help reinforce the goal.
@@0xb1sh0p8 how do you know if you have access to the api, other than a server 403 error I’m not getting an exact messaging regarding the api.. do you have it available in the playground ?
@@EDashMan I don't have anything public right now. When you signup with twilio, you'll create an account. When you go to that account's dashboard and scroll down, it will show you your SID and Auth Token to access the API
@@EDashMan hmm, did my last comment get deleted? You'll have access to the api when you sign up and create and account. At the bottom of the account page you'll see your SID and AuthToken to use.
This is the future.. the problem is that the OpenAI's voices in spanish doesn't sounds very well.. they sounds with like an american accent, is there a way to integrate this voice, not using GPT's voice but using elevenlabs without losing the realtime benefit of twilio-openai?
If you use advanced mode, switch your system language to Spanish, open a new conversation, and tell the assistant: "can you speak to me using a Castillian Spanish accent?"
@@mandrews817 But the advance mode is available in the API?, or you are talking about the voice assistant that OpenAI is currently launching?, if its the first thing, could you please tell me where i can read more about.. i have never heard about advance mode in the API speech to text
@@boytenesee3494 Will try this, maybe it will delay the responses a little bit but i think it wouldnt be very noticeable, i will give it a try, thank you for the idea.
From the video description: "OpenAI is rolling out Realtime API access incrementally. Please watch their site for updates." This is likely due to this.
Probably lots of use cases. This example is very basic but imagine an assistant that replaces the typical phone tree at a company with something that speaks naturally to them, can answer some questions they may have, and ultimately can redirect the call to an actual human if it detects it needs to.
Thre is an issue on the quality of the answer, especially when dealing with local dialects. While he can somewhat handle English (not Good), it struggles significantly with dialects like Darija or other regional languages. The difference in transcription accuracy between the current implementation and the OpenAI Playground is very noticeable.
OAI dashboard billing limits says I do have access "Realtime gpt-4o-realtime-preview 20,000 TPM 5,000 RPM gpt-4o-realtime-preview-2024-10-01 20,000 TPM 5,000 RPM" But I can only hear the clunky Twilio TTS at the beginning of the call and do not get connected. Also DTMF button press seems to end the session: "Server is listening on port 5050, Client connected Received non-media event: connected Incoming stream has started MZcbf17dca62564c8a46602ce815cd43bd Connected to the OpenAI Realtime API Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}} Received non-media event: dtmf Disconnected from the OpenAI Realtime API Received non-media event: stop Client disconnected."
What should we build next?
Next up on the channel is likely going to be the Python version of this tutorial followed by some updates regarding interruptions and having the AI talk first.
AI talk first pleaseeee. Couldnt find any tutorial on that in the web
Want to see how the AI talks first
This makes things so much easier. I was trying to do this manually, converting voice to text, sending prompt to openai, and then converting the response back to voice..
Yeah. This is great feature. Imagine the lag by passing data between the different apis. 😊
Yeah and the old way you only got back a 'reading' of the text, no emotions at all
that Twilio robotic voice need and update, thank for the content!!!
There are other voice options available that sound better but definitely agree that one is from different era 😅
Now you just have to provide customer data from Segment to the model. Then when a customer calls the model can give a personalized answer.
For example, a customer calls a car repair shop. Then the model using RAG accesses a customer’s data to check on the status of a car repair. Lastly, the model responds with the status of the car repair.
All the customer has to do is call the car repair shop and ask a simple question with voice. A great customer experience if you ask me 😊
Yes, this is a great scenario! That's exactly the type of exciting things that can be enabled by combining all of the pieces. Thanks for watching and for the comment!
I am literally building this right now...
@@CodyDietzofficial Awesome! Can’t wait to see it :)
Yes car repair shops is where the big bucks are to be made
thats for us to do, not twilio :) they just give you the utencils, you gotta make the meal.
This will start a new age of AI...
It's really impressive how interactive it is!
@@TwilioDevs Yoo that’s crazy. I’m going to test the repo myself first, seeing is believing haha!
@@EDashMan Let me know how it goes! I know OpenAI is rolling this out in stages so if it doesn't work at first, check to make sure you have access to the OpenAI Realtime API. I was blown away the first time I got this working though. Feel free to mix up the SYSTEM_MESSAGE prompt and the temperature a bit too. It's pretty amazing. I feel like I should have it coach me through making a meal :D
@@TwilioDevs Yeah I'm getting: Error in the OpenAI WebSocket: Error: Unexpected server response: 403
I don't even have gpt-4o-realtime-preview-2024-10-01 in my playground. I guess I can't use it yet :(
@@EDashMan Bummer! Yeah hopefully it'll roll out pretty quickly.
There's a small bug in the blog post guide. The websocket connection URL is mistyped (should contain a single model=, atm has two)
Thanks, I'll let Paul know!
It doesn't handle interruptions while the AI is speaking. Am i missing something?
Check the repo that is linked in the description. We figured out how to add that after the video shipped. Thanks for watching!
Great. How difficult it is to modify this to use the realtime api with an openAI assistant trained on a specific knowledge base instead of a generic openAI?
Hey! I have use the function calling in this real time api for calendar bookking but I am struggling with how to send the response of the function back to API for TTS. Can you please help me with that?
got this working using my azure endpoint with some help from chatgpt!
I did notice this example doesn't handle interruptions, will you be updating the repo with more features in the future?
that's awesome! thanks for giving it a try.
We decided to leave interruptions out for this blog post/video because the code was already pretty long. We talked about doing follow-ups for things like interruptions and function calling. I'll check with the team and see what the plan is.
@@TwilioDevs to be honest, I'd really appreciate this - this is a huge part of what makes this tech so amazing. Any high level support on how to accomplish this, if it's even possible? thanks!
Great! Would you mind to share your code?
@@TwilioDevs looking for this also...
@@TwilioDevs Also looking for this - even just the samples of the code would be great don't need a full video.
Is there a way to trigger the first response without needing to say something first?
Check the GitHub repo. It has a "assistant speaks first" option in it that got added after this video was made.
hey this is amazing , revolutionary even! , how do i connect my model to a vector_store / a knowledge base that it can refer to? or is that not supported yet ? I am trying to figure out if i should implement that in the function calling ; tools {} parameter or not? Thanks !!!!
I'm wondering if this is possible / how to do this as well
@@natevance3661 I have found out about some crazy shit , trying to piece it all together but you gotta use make
Did you figure out how to do that? Let me know if you do
Please let me know as well!
One thing I dont undrstand - how to make OpenAI speak first when it answers the call?
Right after the code sends the sessionUpdate object you can send something like this (feel free to modify the prompt):
const event = {
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text',
text: 'Please greet the caller and say "hi there, how can i help you?"'
}
]
}
};
openAiWs.send(JSON.stringify(event));
openAiWs.send(JSON.stringify({type: 'response.create'}));
@@TwilioDevs Thank you very MUCH! I got the code, but still no access to realtime API. Hopefully soon! Thanks again. Twilio is good.
@@randotkatsenko5157 try livekit
Hello sir, i can able to call but AI is not connect I am not able to see the incomming-call when I receive the call
Also, do you guys have any thoughts you would care to share on outbound calling?
What specifically are you looking for thoughts on?
Hello sir, i did something similar in python flaks. But i am getting huge delay ( 5 second ) to download the audio file. From twilio. Any alternative please reply
Thanks again for the tutorial. What is needed to make it possible to interrupt the AI? I think Twilio may be buffering received audio from OpenAI that it finishes playing even when interrupted.
I tried several changes to try to fix things. I wonder if the audio from OpenAI is sent to twilio that is buffering it. Then when it is interrupted, that is why it still keeps playing what it's already received. Is there a way to tell twilio to stop playing what had already been sent when an interruption is detected.
The Web only implementations with webrtc handle interruptions immediately just like the ChatGPT official app. I know phone networks have a delay but this is more than that is seems to keep talking for many seconds.
Thank you in advance.
Hey hey! Check out this timestamp from our recent livestream where I helped Alex and Bianca add this (i'm the robot 😂). The timestamp starts at their first interaction with it where they see how the lack of interruptions impacts things and then we walk through how to add a version of interrupt to it: ua-cam.com/video/_itrbiszfiE/v-deo.htmlfeature=shared&t=2843
@@TwilioDevs Perfect and thank you! I watched the livestream recording and rebased my stuff on the newer version. It is working well now.
What are you using to be a robot in the livestream?
@@mspicela Total custom build inside of OBS (obsproject.com). It's a pile of PNG files, a waveform generator for the mouth, and some subtle motion effects.
@@mspicela Also super glad you got it working! Let us know if there's anything else we can help with!
has anyone here figured out how to modify this code for interrupts?
Working on this at the moment. Hopefully have an update yearly this coming week.
I was able to figure it out! thanks
@@gurumack Happy to hear it!
@gurumack can you plz share it. How to handle intruptions.
Love the video Brent! 💪🚀
Thank you Craig!
for the interruption issue :
you need to clear the twilio buffer and then send response.cancel
Can you share how you implemented this? I tried sending the following commands when the response type is input_audio_buffer.speech_started:
await openai_ws.send(json.dumps({"type": "response.cancel"}))
await openai_ws.send(json.dumps({"type": "output_audio_buffer.clear"}))
No dice though :( Your help here would be greatly appreciated!
@@johns332 Use this : case 'input_audio_buffer.speech_started':
console.log('Speech Start:', response.type);
twilioWs.send(
JSON.stringify({
streamSid: streamSid,
event: 'clear',
})
);
console.log('Cancelling AI speech from the server');
const interruptMessage = {
type: 'response.cancel'
};
openaiWs.send(JSON.stringify(interruptMessage));
}
Would like to see a tutorial about using OpenAI to get on-screen transcriptions of phone calls
That's a cool idea. I'll see what we can do!
@@TwilioDevsyes please!
Perfect timing!
Can I use this as it is and deploy it on Twilio itself as a function/build?
I needed this 18 months ago lol
We all did 😅
This is great. But I have been struggling with the ability to interrupt the AI when on a call with Twilio.
Working on something for this! Stay tuned.
Deployment? Great work, great explanation - what's the best place to deploy this? TW Services? Or that wouldn't work?
I usually leave out deployment since it can be a fairly personal choice and outside of the scope of the tutorial. That said, this code should work anywhere you can deploy a full Node.js app. Some popular options include Render (render.com), Railway (railway.app), DigitalOcean or building your own setup within a VPS.
Lots of options out there! Thanks for watching and let us know if you need any further help.
This was a great video. I am looking for a way to output the conversation both what was received and how it responded. Is that possible through the realtimeapi? Currently I can capture the response in text but I have not figured out how to capture what is said to it in text, via realtimeapi.
Thanks again.
I'll see if I can put something together for that. First up is the Python version of this tutorial which got delayed a little bit.
So for clarity, you want the text of what the caller says to the AI?
@@TwilioDevs yes, and thank you so much. I can get the text for the realtime api response, but the text for the caller is where I am struggling. I don’t know if realtime has a way, and I recently saw something in Twilio that could possibly help. But thank you again, I truly appreciate your response and consideration.
No promises but I'll see what I can do. If not a video perhaps we can at least get you a code snippet.
@@TwilioDevs you are amazing thank you 🙏
Can you show how we can integrate Function Calling as well?
That's a good idea for a follow-up video, thanks!
"Thank you, Brent! Do I need a Twilio subscription for communication between two valid numbers? (The trial only provides one valid number.) When I try to make a call using the Twilio dev phone with the same number, I don't receive anything." it seems i need two numbers?
You can add a verified number to test your app with your own phone during trial: help.twilio.com/articles/223180048-Adding-a-Verified-Phone-Number-or-Caller-ID-with-Twilio
can i use it in danish, turkish or german?
I struggled to make audio input detect for a specific language, even with whisper’s language parameter. Tell me if u were able to choose any other language.
This is going to really help you guys. I worked on this immediately when this was dropped but this setup has a weakness. Interruptions don’t work when you interrupt the agent in the middle of a larger audio playback (ask it to read an example paragraph) and then try to interrupt it in the middle - it won’t work. I tried messing with it but nothing worked.
We're working on it! I should have something to share this week.
@@TwilioDevs Fantastic video! Just curious if you've uploaded anything regarding how to deal with interruptions
Thanks for the fantastic video and do I need to upgrad my twilio account to a full version to perform this function? I have set up everthing right based on the tutorial but no response from the AI even I spoke the first sentence. Alas..
running `twilio dev-phone` launches the dev phone but also updates the webhooks. anyone get this to work?
You need to use a different phone number than the one you are testing.
Is this still working? I got it to work some weeks ago, but strangely, it is not working anymore - When I call my Twilio Phone Number, in the nodejs output I get the event "input_audio_buffer.speech_started", and after I finished speaking, nothing happens, and the bot does not answer me.
Hey, I am also facing the same problem, did you find anything to solve this?
Should still be working, yes. We just were building again it on our livestream today and it was working.
@@AbhishekMishra-db2tj Hey, somehow it does not properly detect when I finished speaking with my phone. When trying from a different phone, it worked. Not sure why that is the case.
How can I load my own trained models in this?
To avoid any confusion, it’s important to clearly state that even using the development phone incurs charges for both making and receiving calls(x2 charges), as some users might assume it’s free otherwise. Why not be clear?
The Twilio Dev Phone documentation page states that it is using one of your own Twilio numbers to make the call. There's no intended deception here. I used the Dev Phone in the video as an option to not use my personal phone for the demo since it's easier to see the interaction and logs. It's just an option.
Thank you for the tutorial. I built an AI phone agent/bot with this combined with function calling from OpenAI and it worked very well. Unfortunately, now I can no longer edit my phone numbers configuration -- "Voice configuration is unavailable for this phone number" -- but this isn't true because it lists my URL still and worked for days. To make things worse, the support spins and spins so I can't submit a trouble ticket.
Hi! Thanks for watching and I'm happy you built this. Sorry you're having trouble though (both with the app, and support).
If you go here: help.twilio.com/ and ask a question, see if anything there helps resolve this.
If not, there's a section at the bottom asking "Is this helpful?" and you can hit the thumbs down which will prompt you to either log in to submit a ticket or click the link next to it to submit a ticket without logging in.
Once you have a ticket number, I can try to help escalate (no promises but worth a try!).
Hello Michael,
Thank you for getting in touch with our Social Support Team. We sincerely apologize for the inconvenience caused.
Could you please dm us the email address on file?
@@TwilioDevs thank you for the reply. It's working now! I didn't do anything to change it but it resolved itself.
Awesome news! That's much easier to triage 😀 Glad it's working again!
Twilio Folks,
Is there any tutorila to use realtime api for outbound calls ? i.e - triggering a call & taking it forward
any replacement instead of ngrok? having issues with my terminal
There's a full list of alternatives here: github.com/anderspitman/awesome-tunneling
How to make OpenAI speak the function_call results? Like if the appointment is created successfully, then how to let the user know that the appointment is created successfully.
Already built this on my channel will be crazy
Such a cool API
Is there a way to connect this to a GPT assistant?
How can we get access to Realtime API on Openai account (I have paid account already). I integrated code and added openai key but problem is that during call, it's started communicating and not listening to me (No two-way communication). Can someone help me out?
This is great! Thanks for sharing!
Glad you enjoyed it! Thanks for watching 🎉
what theme of you vscode you have?
Night Owl!
@@TwilioDevs many thanks!
So can we host this on Twilio serverless? If so, which file would we point the incoming call to? Also, it can be modified to greet the caller first, correct? I'm thinking for a business AI assistant to take calls, give information etc. I have created these AI apps with Vapi, but it gets pretty expensive. Twilio would be so much cheaper.
I think with the need for a persistent web socket connection you're probably going to be best served doing this outside of our serverless Twilio Functions. I can double check with the team though!
As for greeting, you can definitely change the tags to customize the greeting from Twilio or I believe you could pre-prompt OpenAI with a text prompt using the Realtime API if you want the greeting to come from the assistant.
@@TwilioDevs Thanks, I'll mess around with it some. Is that voice coming from AWS? I've never heard that voice, but its really good and would be terrific for most professional business applications. The latency is next to nothing, which has been the biggest hurdle it seems with these voice AI assistants. Good to see Twilio is now in the game!
@@wordpressobsessed9067 It's one of OpenAI's voices. I agree it's very natural sounding!
@@TwilioDevs Correct you'll need a persistent ws listening for a unique stream for each number/assistant you're hosting.
You can probably run it through your crm before answering to get all the phone info if any.
How can i set input language to something other that English?
You can change the system prompt to indicate the language you want to use. It will also usually match whatever language you speak to it.
@@TwilioDevs I’ve created a twilio program before but using the gather method i was able to choose the language, but with openai realtime api i tried their language parameter for whisper-1 and it doesn’t work.
And sadly the current state of auto detection is 75% flawed in my tests.
@@WaiZe0 At 03:02 we set up a system prompt. You can tell it what language you'd like for it to use in that prompt (and also tell it how to greet the caller, etc.). From my testing it has obeyed that quite well. I told it to converse only in Spanish and I wasn't able to get it to break out of that even by insisting I only knew English.
@@TwilioDevs I noticed it works well in English and Spanish, but im working with Arabic and it gets it only 1/10 times even with the clearest system prompt. Is there a way to set language like Twilio’s gather method?
I'm so frustrated I'm literally at the last step. I got the twilio and openai API to work together and when I call the phone number it says please wait speak your AI agent brought to you by openai and twilio and then says okay you can speak and then hangs up. Can anyone help I have been using chat GPT and Claude and they're both making me run around in circles
The symptoms sound like an OpenAI Realtime API key issue. Seems like the call is hanging up at the point the OpenAI Realtime API should be getting connected. Are you getting any errors in the terminal?
Please refer to the blog post or GitHub repo in the video description to make sure your code is 100% correct. You can also check on your API key's access at platform.openai.com
Any guide on how to add function calling ? Also can't we buy an Indian number rn ?
What is the reason for using fastify over express?
The websocket module for fastify is nice to work with and fastify is more performant than Express for this use case.
Confused. Instructions say "Step 2: Get your Account Sid and Auth Token from the Twilio Console to get started.", but nowhere does it say what do with them. Also call connects ago, but it can't seem to hear me, then disconnected after 5 seconds. Related? Connected to the OpenAI Realtime API
Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
Disconnected from the OpenAI Realtime API
If the call is working at all, the Twilio side of this is fine which means you're okay on the Twilio credentials front. This looks like it's not getting audio over to the OpenAI API. There are some more logging types you can enable with the code in the blog post. Can you try turning those on and see what you get in the terminal?
If I want to use this example without twillio call, but directly from my mic and web page
You'll need to stream audio from your local microphone to the OpenAI websocket.
Is there a way to buy Indian numners on Twilio if not what is the workaround rn ?
Hi Bharath,
Thank you for getting in touch with our Social Support Team. Unfortunately, Twilio does not offer the ability to purchase Indian phone numbers directly. However, there are some workarounds and considerations you can explore.
Kindly dm us for more information.
Will this work with changing the default voices accents to accents like Australian, English/UK and others?
I believe I read that OpenAI will detect the regional accent and speak the responses in that accent. I think you can add that to the instructions (SYSTEM_MESSAGE) in the app to help reinforce the goal.
this is for Incoming Call right what about outgoing call
Can you make a tutorial for this on azure as well?
This is great!
Thanks for watching!
Does this work with gemini 1.5 flash??
This tutorial is specifically for the OpenAI Realtime API.
Is the speed really this fast?
Yes! The phone calls shown are not sped up or edited 😃
I can vouch for the speed. I'm just wrapping up development on a project that uses this flow along with some other options for generating assistants.
@@0xb1sh0p8 how do you know if you have access to the api, other than a server 403 error I’m not getting an exact messaging regarding the api.. do you have it available in the playground ?
@@EDashMan I don't have anything public right now. When you signup with twilio, you'll create an account. When you go to that account's dashboard and scroll down, it will show you your SID and Auth Token to access the API
@@EDashMan hmm, did my last comment get deleted? You'll have access to the api when you sign up and create and account. At the bottom of the account page you'll see your SID and AuthToken to use.
can you do it using python?
Yes! Should we make a Python video tutorial?
For now, here's a blog post: www.twilio.com/en-us/blog/voice-ai-assistant-openai-realtime-api-python
Sorry for the delay!
ua-cam.com/video/OVguB1h-eTs/v-deo.html
This is the future.. the problem is that the OpenAI's voices in spanish doesn't sounds very well.. they sounds with like an american accent, is there a way to integrate this voice, not using GPT's voice but using elevenlabs without losing the realtime benefit of twilio-openai?
If you use advanced mode, switch your system language to Spanish, open a new conversation, and tell the assistant: "can you speak to me using a Castillian Spanish accent?"
The realtime API allows either speech or text response - you can send the respond to 11labs and then push back into twilio after
Have you tried the options provided by the other commenters yet? Would love to help you find success.
@@mandrews817 But the advance mode is available in the API?, or you are talking about the voice assistant that OpenAI is currently launching?, if its the first thing, could you please tell me where i can read more about.. i have never heard about advance mode in the API speech to text
@@boytenesee3494 Will try this, maybe it will delay the responses a little bit but i think it wouldnt be very noticeable, i will give it a try, thank you for the idea.
Thanks 🎉
Anyone else getting 403 errors?
From the video description:
"OpenAI is rolling out Realtime API access incrementally. Please watch their site for updates."
This is likely due to this.
Darn, thanks for the video and response though!@TwilioDevs
@@johns332 Thank you for watching 😃 Let us know when you get access. Happy building!
My next project ❤
Let us know how it goes!
Would you prefer to see this tutorial in Python? Check it out here: ua-cam.com/video/OVguB1h-eTs/v-deo.html
Using tools and azure realtime endpoint
Somewhat helpful, but why would you want this?
Probably lots of use cases. This example is very basic but imagine an assistant that replaces the typical phone tree at a company with something that speaks naturally to them, can answer some questions they may have, and ultimately can redirect the call to an actual human if it detects it needs to.
Thre is an issue on the quality of the answer, especially when dealing with local dialects. While he can somewhat handle English (not Good), it struggles significantly with dialects like Darija or other regional languages. The difference in transcription accuracy between the current implementation and the OpenAI Playground is very noticeable.
Node.js 18+
Correct, version 18 or higher. Not sure why I said 18+ like it was an age or something 🤣
OAI dashboard billing limits says I do have access "Realtime
gpt-4o-realtime-preview 20,000 TPM 5,000 RPM
gpt-4o-realtime-preview-2024-10-01 20,000 TPM 5,000 RPM"
But I can only hear the clunky Twilio TTS at the beginning of the call and do not get connected. Also DTMF button press seems to end the session: "Server is listening on port 5050, Client connected
Received non-media event: connected
Incoming stream has started MZcbf17dca62564c8a46602ce815cd43bd
Connected to the OpenAI Realtime API
Sending session update: {"type":"session.update","session":{"turn_detection":{"type":"server_vad"},"input_audio_format":"g711_ulaw","output_audio_format":"g711_ulaw","voice":"alloy","instructions":"You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested about and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling - subtly. Always stay positive, but work in a joke when appropriate.","modalities":["text","audio"],"temperature":0.8}}
Received non-media event: dtmf
Disconnected from the OpenAI Realtime API
Received non-media event: stop
Client disconnected."