Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Published in 39th Conference on Neural Information Processing Systems, 2025.

We study the universal approximation property of Transformers via in-context learning, found that when context is represented by tokens from a finite set (a vocabulary), Transformers need positional encoding to provide density to achieve universal approximation.