Trading Inference-Time Compute for Adversarial Robustness

Поділитися
Вставка
  • Опубліковано 10 лют 2025
  • Robustness to adversarial attacks⁠(opens in a new window) has been one of the thorns in AI’s side for more than a decade. In 2014, researchers showed⁠(opens in a new window) that imperceptible perturbations-subtle alterations undetectable to the human eye-can cause models to misclassify images, illustrating one example of a model’s vulnerability to adversarial attacks. Addressing this weakness has become more urgent as models are being used for high stakes applications and acting as agents that can browse the web and take actions on behalf of their users.
    Despite years of intense research, the problem of defending against adversarial attacks is far from solved. Nicholas Carlini, an expert in the field, recently said⁠(opens in a new window) that “in adversarial machine learning, we wrote over 9,000 papers in ten years and got nowhere.” One reason is that, unlike other progress in AI, increasing the size of models on its own has not been sufficient to make them robust to adversarial attacks.
    In a new paper, we present preliminary evidence that increasing inference-time compute-giving reasoning models more time and resources to ‘think’-can improve robustness to multiple types of attacks. This approach uses reasoning models like o1-preview and o1-mini, which can adapt their computation during inference. We also explore new attacks designed specifically for reasoning models like o1, as well as settings where inference-time compute does not improve robustness, and speculate on the reasons for these as well as ways to address them.
    openai.com/ind...

КОМЕНТАРІ •