ICL-Interpretation-Analysis-Resources

Working in progress

Overview

This repo contains relevant resources from our survey paper The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis in EMNLP 2024. In this paper, we present a thorough and organized survey of the research on the interpretation and analysis of ICL. As further research in this area evolves, we will provide timely updates to this survey and this repository.

Theoretical Interpretation of ICL

Researchers in the theoretical category focus on interpreting the fundamental mechanism behind the ICL process through different conceptual lenses.

Mechanistic Interpretability

A mathematical framework for transformer circuits (Elhage et al., 2021). [Paper]
Attention is All you Need (Vaswani et al., 2017). [Paper]
In-context Learning and Induction Heads (Olsson et al., 2022). [Paper]
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains (Edelman et al., 2024). [Paper]
Schema-learning and rebinding as mechanisms of in-context learning and emergence (Swaminathan et al., 2023). [Paper]
Function Vectors in Large Language Models (Todd et al., 2024). [Paper]
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection (Bai et al., 2023). [Paper]

Regression Function Learning

What can transformers learn in-context? a case study of simple function classes (Garg et al., 2022). [Paper]
Transformers as Algorithms: Generalization and Stability in In-context Learning (Li et al., 2023). [Paper]
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression (Li et al., 2023). [Paper]
What learning algorithm is in-context learning? Investigations with linear models (Akyürek et al., 2023). [Paper]
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations (Guo et al., 2024). [Paper]
In-context Learning and Induction Heads (Olsson et al., 2022). [Paper]
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection (Bai et al., 2023). [Paper]

Gradient Descent and Meta-Optimization

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers (Dai et al., 2023). [Paper]
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention (Irie et al., 2022). [Paper]
Transformers learn in-context by gradient descent (von Oswald et al., 2023). [Paper]
Uncovering mesa-optimization algorithms in Transformers (von Oswald et al., 2023). [Paper]
In-context Learning and Gradient Descent Revisited (Deutch et al., 2024). [Paper]
Do pretrained Transformers Learn In-Context by Gradient Descent? (Shen et al., 2024). [Paper]
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models (Fu et al., 2024). [Paper]
Numerical analysis (Gautschi., 2011). [Book]

Bayesian Inference

An Explanation of In-context Learning as Implicit Bayesian Inference. (Xie et al., 2022). [Paper]
Statistical Inference for Probabilistic Functions of Finite State Markov Chains (Baum and Petrie., 1966). [Paper]
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning (Wang et al., 2023). [Paper]
The Learnability of In-Context Learning (Wies et al., 2023). [Paper]
A Latent Space Theory for Emergent Abilities in Large Language Models (Jiang., 2023). [Paper]
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization (Zhang et al., 2023). [Paper]
Bayesian Model Selection and Model Averaging (Wasserman., 2000). [Journal]
In-Context Learning through the Bayesian Prism (Panwar et al., 2024). [Paper]
What can transformers learn in-context? a case study of simple function classes (Garg et al., 2022). [Paper]
What learning algorithm is in-context learning? Investigations with linear models (Akyürek et al., 2023). [Paper]
An Information-Theoretic Analysis of In-Context Learning (Jeon et al., 2024). [Paper]
In-Context Learning Dynamics with Random Binary Sequences (Bigelow et al., 2024). [Paper]

Empirical Analysis of ICL

Researchers in the empirical category focus on probing the factors that influence the ICL.

Pre-training Data

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model (Shin et al., 2022). [Paper]
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers (B Kim et al., 2021). [Paper]
Understanding In-Context Learning via Supportive Pretraining Data (Han et al., 2023). [Paper]
Mauve: Measuring the gap between neural text and human text using divergence frontiers (Pillutla et al., 2023). [Paper]
Impact of Pretraining Term Frequencies on Few-Shot Reasoning (Razeghi et al., 2023). [Paper]
Large Language Models Struggle to Learn Long-Tail Knowledge (Kandpal et al., 2023). [Paper]
Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression (Raventós et al., 2023). [Paper]
Data distributional properties drive emergent in-context learning in transformers. (Chan et al., 2023). [Paper]
THE PRINCIPLE OF LEAST EFFORT (ZIPF et al., 1949). [Paper]
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (Yadlowsky et al., 2023). [Paper]
What can transformers learn in-context? a case study of simple function classes. (Garg et al., 2022). [Paper]
In-Context Learning Creates Task Vectors. ( Hendel et al., 2023). [Paper]

Pre-training Model

Emergent abilities of large language models (Wei et al., 2022).Hoffmann [Paper]
Training Compute-Optimal Large Language Models (Hoffmann et al., 2022). [Paper]
Are Emergent Abilities of Large Language Models a Mirage? (Schaeffer et al., 2022). [Paper]
UL2: Unifying Language Learning Paradigms (Tay et al., 2022). [Paper]
GENERAL-PURPOSE IN-CONTEXT LEARNING BY META-LEARNING TRANSFORMERS (Kirsch et al., 2022). [Paper]
In-Context Language Learning: Architectures and Algorithms (Akyürek et al., 2024). [Paper]
In-context Learning and Induction Heads (Olsson et al., 2022). [Paper]

Demonstration Order

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity (Lu et al., 2022). [Paper]
addressing order sensitivity of in-context demonstration examples in causal language models (Xiang et al., 2024). [Paper]
Calibrate Before Use: Improving Few-shot Performance of Language Models (Zhao et al., 2021). [Paper]
Lost in the Middle: How Language Models Use Long Contexts (Zhao et al., 2024). [Paper]
What Makes Good In-Context Examples for GPT-3? (Liu et al., 2022). [Paper]

Input-Label Mapping

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? (Min et al., 2022). [Paper]
Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations (Yoo et al., 2022). [Paper]
Larger language models do in-context learning differently (Wei et al., 2023). [Paper]
In-Context Learning Learns Label Relationships but Is Not Conventional Learning (Kossen et al., 2024). [Paper]
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning (Pan et al., 2023). [Paper]
Dual Operating Modes of In-Context Learning (Lin et al., 2024). [Paper]
Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning (Tang et al., 2023). [Paper]
Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations (Si et al., 2023). [Paper]
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning (Si et al., 2023). [Paper]

Citation

Please consider cite our paper when you find our resources useful!

@inproceedings{zhou2023mystery,
  title={The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities},
  author={Yuxiang ZHou and Jiazheng Li and Yanzheng Xiang and Hanqi Yan and Lin Gui and Yulan He},
  booktitle={Proc. of EMNLP},
  year={2024},
url={https://arxiv.org/pdf/2311.00237}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICL-Interpretation-Analysis-Resources

Overview

Theoretical Interpretation of ICL

Mechanistic Interpretability

Regression Function Learning

Gradient Descent and Meta-Optimization

Bayesian Inference

Empirical Analysis of ICL

Pre-training Data

Pre-training Model

Demonstration Order

Input-Label Mapping

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

zyxnlp/ICL-Interpretation-Analysis-Resources

Folders and files

Latest commit

History

Repository files navigation

ICL-Interpretation-Analysis-Resources

Overview

Theoretical Interpretation of ICL

Mechanistic Interpretability

Regression Function Learning

Gradient Descent and Meta-Optimization

Bayesian Inference

Empirical Analysis of ICL

Pre-training Data

Pre-training Model

Demonstration Order

Input-Label Mapping

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages