In-Context Learning: A Case Study of Simple Function Classes
Вставка
- Опубліковано 17 сер 2023
- Gregory Valiant (Stanford University)
simons.berkeley.edu/talks/gre...
Large Language Models and Transformers
In-context learning refers to the ability of a model to learn new tasks from a sequence of input-output pairs given in a prompt. Crucially, this learning happens at inference time without any parameter updates to the model. I will discuss our empirical efforts that shed light on some basic aspects of in-context learning: To what extent can Transformers, or other models such as LSTMs be efficiently trained to in-context learn fundamental function classes, such as linear models, sparse linear models, and small decision trees? How can one evaluate in-context learning algorithms? And what are the qualitative differences between these architectures with respect to their ability to be trained to perform in-context learning? I will also discuss subsequent work of other researchers which illuminates connections between language modeling and learning: must a good language model be able to perform in-context learning? Do large language models know how to perform regression? And are such primitives useful for language-centric tasks? This talk will be mostly based on joint work with Shivam Garg, Dimitris Tsipras, and Percy Liang.
49:11 I think this is the most striking part of this talk: LSTM doesn't show numerical unstablity, meaning it never learns to "find the inverse matrix" as OLS does. But Transformer does learn it... Attention is all you need!
My tests on ChatGPT, Bing Chat, and Bard cannot get "4 - 1= 5" due to different reasons. Does it mean that they cannot perform in-context learning or contexts cannot overwrite weights?
What's the Abraham Lincoln joke?
Abraham Lincoln said “if you call a ‘tail’ a ‘leg,’ a dog still has four legs”… in context, even if you call slavery by a different name it’s still slavery.