Hi community, it is only natural, that when we encounter a new method, we search for the nearest, "known method" or "explanation" to it. Because why should we learn something new, when we already have learned something simpler? Same happens here: "Is this not a simple model distillation?". Well, while both methods aim to replicate the behavior of an existing (Large Language) model, our old "model distillation" focuses on compressing a known model into a smaller one using supervised learning techniques, leveraging full access to the teacher model's outputs and maybe even additional info. In contrast, the method I've presented in this video is about reverse-engineering an unknown model's distribution through interactive queries, employing advanced mathematical techniques to construct a compact approximation without direct access to the model's structure, parameters, training data or internally learned sequences. I know it is more complex, and if you think, "it is about the same" .... this is absolute okay with me. But please don't post it like you are a professor of mathematics and have complete understanding and therefore declare "it is absolutely the same"... you might misinform others. However, if you ask, for a comparison, for further clarification, why not read the arxiv pre-print from MIT for a more detailed understanding. You might discover new ideas ...
I'm just going to to put this out there. This is exactly how we develop a Theory of Mind. Replace "steal" with "understand" and "LLM" with "person" and you get something intimately familiar to any human being.
When you go really hard on predicting the next token you are building something that creates a near perfect model of the representations of that token.
Oh no, our collective "private" data that models trained on without permission might be revealed... and that would be evidence of a obvious corporate crime. 👩⚖
Why not try asking o1 or Claude 3.5 about that last theorem? Given you can ground the discussion with the paper you may have that second brain within your reach. It’s interesting that they cast it in terms of model stealing when it seems this could work as distillation in general? (Perhaps that was the original goal, and this black box case fell out as an interesting idea?)
great job. unfortunately, there is no experimental proof. i believe that you can mimic the llm in a specific task but stealing it all ability will be very difficult.
If there is a mathematical proof, that is valis (and I can understand), I need no experimental proof. It is not about a single specific task, it is about the complete model, since we test the complete mathematical space of all representative sequences.
@@code4AI Just because it is theoretically possible doesn't mean modern LLMs are sufficiently low-rank enough for this "stealing" to be practical. Theoretically the large integers used in public key crypto can be factored, but it's not a practical means of attack. Experimental results would provide for a calibration of expectations. It's possible that while theoretically possible, it could cost more to "steal" a model than to train a new one from scratch.
@egoincarnate Do you have an facts that would support your statement " it would cost more to "steal" a model than to train a new one from scratch"? Since this idea was just published yesterday, you can't have empirical data.
@@code4AI I see thank you. I have not read the paper I just scrolled through to see the end. the story was quite exciting and ended up like an open question.
Hi community, it is only natural, that when we encounter a new method, we search for the nearest, "known method" or "explanation" to it. Because why should we learn something new, when we already have learned something simpler? Same happens here: "Is this not a simple model distillation?".
Well, while both methods aim to replicate the behavior of an existing (Large Language) model, our old "model distillation" focuses on compressing a known model into a smaller one using supervised learning techniques, leveraging full access to the teacher model's outputs and maybe even additional info.
In contrast, the method I've presented in this video is about reverse-engineering an unknown model's distribution through interactive queries, employing advanced mathematical techniques to construct a compact approximation without direct access to the model's structure, parameters, training data or internally learned sequences.
I know it is more complex, and if you think, "it is about the same" .... this is absolute okay with me. But please don't post it like you are a professor of mathematics and have complete understanding and therefore declare "it is absolutely the same"... you might misinform others.
However, if you ask, for a comparison, for further clarification, why not read the arxiv pre-print from MIT for a more detailed understanding. You might discover new ideas ...
I'm just going to to put this out there. This is exactly how we develop a Theory of Mind. Replace "steal" with "understand" and "LLM" with "person" and you get something intimately familiar to any human being.
When you go really hard on predicting the next token you are building something that creates a near perfect model of the representations of that token.
I'm very curious about witch prompts you use to simplify the papers and maintain the formulas
Oh no, our collective "private" data that models trained on without permission might be revealed... and that would be evidence of a obvious corporate crime. 👩⚖
Is this different from teacher-student "distillation"?
Great question. I pinned a reply to your question to the top of the comments, since multiple subscriber asked the same question.
Why not try asking o1 or Claude 3.5 about that last theorem? Given you can ground the discussion with the paper you may have that second brain within your reach. It’s interesting that they cast it in terms of model stealing when it seems this could work as distillation in general? (Perhaps that was the original goal, and this black box case fell out as an interesting idea?)
Smile. No chance with current AI. Maybe in 2 to 5 years?
isn't it same as knowledge distillation and model pruning?
Great question. I pinned a reply to your question to the top of the comments, since multiple subscriber asked the same question.
This sounds alike distilation technique... It could be used as striping out island or sleeping areas of an model.. but meh
Great question. I pinned a reply to your question to the top of the comments, since multiple subscriber asked the same question.
great job. unfortunately, there is no experimental proof. i believe that you can mimic the llm in a specific task but stealing it all ability will be very difficult.
If there is a mathematical proof, that is valis (and I can understand), I need no experimental proof. It is not about a single specific task, it is about the complete model, since we test the complete mathematical space of all representative sequences.
@@code4AI Just because it is theoretically possible doesn't mean modern LLMs are sufficiently low-rank enough for this "stealing" to be practical. Theoretically the large integers used in public key crypto can be factored, but it's not a practical means of attack. Experimental results would provide for a calibration of expectations. It's possible that while theoretically possible, it could cost more to "steal" a model than to train a new one from scratch.
@egoincarnate Do you have an facts that would support your statement " it would cost more to "steal" a model than to train a new one from scratch"? Since this idea was just published yesterday, you can't have empirical data.
@@code4AI Sorry, that should have been "could", not "would". corrected
@@code4AI I see thank you. I have not read the paper I just scrolled through to see the end. the story was quite exciting and ended up like an open question.