SatisfIA - Updates From Our Project On Aspiration Based Agent Designs - Jobst Heitzig

Поділитися
Вставка
  • Опубліковано 27 вер 2024
  • Session presented during the Virtual AI Safety Unconference 2024
    Speaker: Jobst Heitzig
    Session Description: SatisfIA is an ongoing project (pik-gane.githu...) with AI Safety Camp, SPAR, and interns at my lab (forum.effectiv....
    We develop non-maximizing, aspiration-based designs for AI agents to avoid risks related to maximizing misspecified reward functions. This can be seen as being related to decision theory, inner and outer alignment, agent foundations, and impact regularization.
    We mostly operate in a theoretical framework that assumes the agent will be given temporary goals specified via constraints on world states (rather than via reward functions), will use a probabilistic world model for assessing consequences of possible plans, will consider various generic criteria to assess the safety of possible plans for achieving the goal (e.g., information-theoretic impact metrics), and will use a hard-coded, non-optimising decision algorithm to choose from these plans.
    Our project focuses on the design of such algorithms, the curation of safety criteria, and the testing in simple environments (e.g., AI safety gridworlds).
    This session reports on the project's goals, methods, outputs, and plans and invites questions, criticisms, and suggestions. We also still seek collaborators!

КОМЕНТАРІ •