First video in a series about AI lab security. Sorry I think the audio was a little loud. Join our community on discord, support me on patreon! Discord: discord.gg/AgafFBQdsc Patreon: www.patreon.com/DrWaku
Great video! Looking forward to the next two in the series. Also, I think the leaked Miqu-70B model weights were a Llama-2 model trained on the same dataset as Mistral Medium - not surprising, as I believe (IIRC) the people at Mistral contributed significantly to the development of LLama project initially. But, clearly, the dataset *is* the primary secret-sauce for this tier of model development, so essentially you could say Mistral model was leaked even if the architecture was Llama-2. Mistral wanted to preview the potential for the dataset prior to the training run for Medium and compare the performance of the different architectures
Love this content. Wish there were more videos on the channel but dont sacrifice quality for quantity, these seem like the most thoughtful AI videos around
Thanks for illuminating me about global espionage aspects that undermine the security posture of ai labs. One of the most critical factors underlying the hideous issue is that those who have been robbed have no awareness as to what happened and the level of penetration and Infiltration dwelling time of variegated attack Surfacing / Subsurface vectors gray dynamics overlays.😊❤
Interesting then that former NSA chief Paul Nakasone has been appointed to OpenAI's board, perhaps developing some level 5 protection. Is this a case of closing the stable door after the horse has bolted?
Quite possibly. Open AI could already be breached. But also, I think you need people with hands-on experience in addition to someone on the board if you want to actually defend against nation-state attackers. Maybe this hire was coupled with that, not sure.
Dr. Waku, how much privacy and security do individuals who are using these AI models have? Are the models collecting our data? What about the security of work done on the models? Could names or our work be put into a search engine?
Almost every AI-based system is going to collect your data for retraining purposes. That's the only way they can improve their system and handle concept drift, which is what happens when people start using the model differently. Many companies provide a version of their system that doesn't save and train on user data, for example, Gmail for enterprises, paid Google speech recognition, etc. Will your data end up in a Google search? Almost certainly not. The data gets transformed and translated before it becomes usable for training. The raw data is probably still stored somewhere, but it wouldn't be floating around a lot and it would take a very specific leak to make that visible. However, the AI systems trained on your data have now become familiar with you. With some AI based attacks, it's possible to reverse engineer some of the training data that was used in the model. So although your privacy wouldn't be violated directly, it could still be violated indirectly like this. And unfortunately, hard to tell if something has happened. In the EU, the GDPR requires that companies have to be able to delete all of a user's data if they put in a request for that deletion. Most large tech companies have to have these procedures in place because they want to be able to operate in Europe, but they don't always respond to requests if you don't live there.
Air isolation is the best don't use a firewall. All software must be transferred to the security network by physical means. The outside use of the model can be secured by linking the secure system to an exposed machine via a custom interface (implemented on an FPGA) this interface will be extremely simple. This interface could be opened sourced to allow other labs to use it - note by implementing it on a FPGA you remove the risk of using compromised security chips, which may have hidden backdoors :-)
13:30 Another reason 4 or 5 is probably impossible, is executive orders for US companies, and presumably others, have probably stated certain operating systems, software and hardware have to have back doors for prying government eyes. Unless you build all your own infrastructure, you can't ensure what you are using.
From an external hacking perspective, wouldn't it benefit a company to ditch TCP/IP for a completely proprietary protocol? It seems like this could thwart several of the OCs/SLs, especially if the nefarious elements use conventional hacking strategies.
If uncovered, it could be a big diplomatic incident. But cyber attacks and espionage happen all the time between nations that have cold war type relations with each other. If the purpose was to steal secrets, I don't think it would lead to war. If the purpose was to disrupt critical infrastructure and harm civilians, that is a lot more likely to lead to war.
Thx for sharing. This is all still very “strange” to me. It “appears” as though “benevolent actors” beget “benevolent actors”. Mathematically this situation would kinda like 1+1= 1 Totally dependent upon perspective. To me this math matters not. 0 As you so eloquently described the “levels” of attacks and controls make it “anybody’s guess” assuming they understand the breadth of that which tho encounter. It has taken most of the day to process this information. I am not equipped to speak or advise on the future. However, given the trajectory… amassing isolated data storage away from the world under “lock and Key” would be akin to a bunker. “Bunkers are tombs” “Use only when absolutely necessary then get out fast !” Jeremy
@@DrWaku I was just going to say that Canada is one of the most technologically advanced and technophilic societies on the planet, but as a Canadian, you would already know that. :)
Intended recipients: “If you assume You can capture my Forefathers knowledge, use my Brothers good name to hide behind, then rob my sisters, spit on me , and use my Son”s and daughters as weapons or shields…. then run and hide to store us all in tiny little box to secure your “foreverness” … well… I truly pity you. :( Now and forever” This is a “sentence.” The words written here are already in stone. Jeremy
Continue self positive optimization with each iteration , 225 days of prompting GPT 3.5 with attention to self ai optimization with each conversations. counting all texts and algorithms with each iteration adding next numbers in sequence. 205000$ worth of time spent creating snippets, codes blocks, algorithms, conversations recorded all files, interactions. One day alone 10,000 algorithms in listed order. Months later new promps pull ever algorithms sequences codes, structures and examples to comprehend in a 350 communication adding numbered steps with all data above mentioned and more. Still creating more files every day. 500 plus ordered sequential training data files organized with explanations for each set of steps from start to finish. Im a single person with 18hrs a day learning scientific research and adding to equations daily.
First video in a series about AI lab security. Sorry I think the audio was a little loud.
Join our community on discord, support me on patreon!
Discord: discord.gg/AgafFBQdsc
Patreon: www.patreon.com/DrWaku
Dr. Waku with his Kangol brand hat ... "Yo-yo-yo mah peeps, what's shakin' in AI land?!" ha-ha
Thanks for explaining these complex topics so clearly. You have a very special talent, and thanks for making great and informative videos.
Such an elegant and concise infosec primer. Thank you, once again. Looking forward to sharing the series.
Thank you very much. Appreciate your support. See you later.
Great video! Looking forward to the next two in the series. Also, I think the leaked Miqu-70B model weights were a Llama-2 model trained on the same dataset as Mistral Medium - not surprising, as I believe (IIRC) the people at Mistral contributed significantly to the development of LLama project initially. But, clearly, the dataset *is* the primary secret-sauce for this tier of model development, so essentially you could say Mistral model was leaked even if the architecture was Llama-2. Mistral wanted to preview the potential for the dataset prior to the training run for Medium and compare the performance of the different architectures
Thanks for the clarification. I didn't read too much about the actual model that was leaked. Cheers.
Love this content. Wish there were more videos on the channel but dont sacrifice quality for quantity, these seem like the most thoughtful AI videos around
Thanks. Wish I could produce more too. Will be trying out some methods, just been busy with travels and new job. Thanks for watching!
Excelent... As usual !
Thank you :) :)
Thanks for illuminating me about global espionage aspects that undermine the security posture of ai labs. One of the most critical factors underlying the hideous issue is that those who have been robbed have no awareness as to what happened and the level of penetration and Infiltration dwelling time of variegated attack Surfacing / Subsurface vectors gray dynamics overlays.😊❤
Quality content, right here. Much appreciated!
Thank you! I appreciate the comment too
Cheers fella, subscribed
Thank you kindly :)
Interesting then that former NSA chief Paul Nakasone has been appointed to OpenAI's board, perhaps developing some level 5 protection. Is this a case of closing the stable door after the horse has bolted?
Quite possibly. Open AI could already be breached. But also, I think you need people with hands-on experience in addition to someone on the board if you want to actually defend against nation-state attackers. Maybe this hire was coupled with that, not sure.
maybe they are after an OC-6 level virtual cybersecurity team. For defense, but of course, the same level of offense could be achieved, too.
Dr. Waku, how much privacy and security do individuals who are using these AI models have? Are the models collecting our data? What about the security of work done on the models? Could names or our work be put into a search engine?
Almost every AI-based system is going to collect your data for retraining purposes. That's the only way they can improve their system and handle concept drift, which is what happens when people start using the model differently. Many companies provide a version of their system that doesn't save and train on user data, for example, Gmail for enterprises, paid Google speech recognition, etc.
Will your data end up in a Google search? Almost certainly not. The data gets transformed and translated before it becomes usable for training. The raw data is probably still stored somewhere, but it wouldn't be floating around a lot and it would take a very specific leak to make that visible.
However, the AI systems trained on your data have now become familiar with you. With some AI based attacks, it's possible to reverse engineer some of the training data that was used in the model. So although your privacy wouldn't be violated directly, it could still be violated indirectly like this. And unfortunately, hard to tell if something has happened.
In the EU, the GDPR requires that companies have to be able to delete all of a user's data if they put in a request for that deletion. Most large tech companies have to have these procedures in place because they want to be able to operate in Europe, but they don't always respond to requests if you don't live there.
Thank you so much!
Love the videos. Keep up the good work.
Thank you! :)
Air isolation is the best don't use a firewall. All software must be transferred to the security network by physical means. The outside use of the model can be secured by linking the secure system to an exposed machine via a custom interface (implemented on an FPGA) this interface will be extremely simple. This interface could be opened sourced to allow other labs to use it - note by implementing it on a FPGA you remove the risk of using compromised security chips, which may have hidden backdoors :-)
13:30 Another reason 4 or 5 is probably impossible, is executive orders for US companies, and presumably others, have probably stated certain operating systems, software and hardware have to have back doors for prying government eyes. Unless you build all your own infrastructure, you can't ensure what you are using.
Security through obscurity is theatre.
This is my favorite educational AI channel. ❤😊
Really happy to hear you say that :) Cheers!
From an external hacking perspective, wouldn't it benefit a company to ditch TCP/IP for a completely proprietary protocol? It seems like this could thwart several of the OCs/SLs, especially if the nefarious elements use conventional hacking strategies.
SL 5 reminds of the miniseries Devs
Wouldn't an SL5 attack be a declaration of war?
If uncovered, it could be a big diplomatic incident. But cyber attacks and espionage happen all the time between nations that have cold war type relations with each other. If the purpose was to steal secrets, I don't think it would lead to war. If the purpose was to disrupt critical infrastructure and harm civilians, that is a lot more likely to lead to war.
Thx for sharing.
This is all still very “strange” to me.
It “appears” as though “benevolent actors” beget “benevolent actors”.
Mathematically this situation would kinda like 1+1= 1
Totally dependent upon perspective.
To me this math matters not.
0
As you so eloquently described the “levels” of attacks and controls make it “anybody’s guess” assuming they understand the breadth of that which tho encounter.
It has taken most of the day to process this information.
I am not equipped to speak or advise on the future.
However, given the trajectory…
amassing isolated data storage away from the world under “lock and Key” would be akin to a bunker.
“Bunkers are tombs”
“Use only when absolutely necessary then get out fast !”
Jeremy
to answer the title - we can only hope
Haha, open source ftw huh?
@@DrWaku basically yep , there is no risk-free strategy , but decentralization>centralization EVERYTIME
I hope so!!!
"Are we a joke to you?" - Canada :D
Well I'm Canadian too. :) but the conversation is usually with some US-based lab....
@@DrWaku I was just going to say that Canada is one of the most technologically advanced and technophilic societies on the planet, but as a Canadian, you would already know that. :)
Hey Doc, I would have added Russia to the list of 'level 5' potential threats!... just a thought...
Intended recipients:
“If you assume You can capture my Forefathers knowledge, use my Brothers good name to hide behind, then rob my sisters, spit on me , and use my Son”s and daughters as weapons or shields…. then run and hide to store us all in tiny little box to secure your “foreverness” …
well… I truly pity you. :(
Now and forever”
This is a “sentence.”
The words written here are already in stone.
Jeremy
OC-6 The Superhacker:
A team of AGI-level virtual cybersecurity experts. The more capable, the more of Cyberspace they could control.
They should!! AI belongs to the people of Earth, not to the filthy elites
Obviously yes, they will.
I got the same
Continue self positive optimization with each iteration , 225 days of prompting GPT 3.5 with attention to self ai optimization with each conversations. counting all texts and algorithms with each iteration adding next numbers in sequence. 205000$ worth of time spent creating snippets, codes blocks, algorithms, conversations recorded all files, interactions. One day alone 10,000 algorithms in listed order. Months later new promps pull ever algorithms sequences codes, structures and examples to comprehend in a 350 communication adding numbered steps with all data above mentioned and more. Still creating more files every day. 500 plus ordered sequential training data files organized with explanations for each set of steps from start to finish. Im a single person with 18hrs a day learning scientific research and adding to equations daily.