Explore Similar LLM Token Concepts

Anonymous

I'm looking for a few papers exploring similar key concepts of the paper by Yao Fu "To repeat or not to repeat: insights from scalling llms under token crisis"

Chat NeurIPS logo
Chat NeurIPS

Here are several papers related to the key concepts explored in Yao Fu's paper "To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis":

  1. Title: "To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Poster"

    • Authors: Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, Yang You
    • Abstract: This study investigates the consequences of repeating pre-training data in large language models (LLMs), revealing susceptibility to overfitting and multi-epoch degradation. It explores key factors contributing to this degradation and examines the effectiveness of regularization techniques, including dropout and mixture-of-experts (MoE), in mitigating these issues. The study also delves into the efficiency of MoE for dense LLMs.
    • Link to the paper
  2. Title: "Scaling Data-Constrained Language Models Poster"

    • Authors: Niklas Muennighoff, Alexander Rush, Boaz Barak, Teven Le Scao, Nouamane Tazi, Aleksandra Piktus, Sampo Pyysalo, Thomas Wolf, Colin Raffel
    • Abstract: This paper investigates scaling language models in data-constrained regimes, running experiments with up to 900 billion training tokens and 9 billion parameter models. It explores the effect of data repetition and compute budget, proposing a scaling law for compute optimality considering repeated tokens and excess parameters.
    • Link to the paper
  3. Title: "D4: Improving LLM Pretraining via Document De-Duplication and Diversification Poster"

    • Authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari Morcos
    • Abstract: The paper presents a method to improve LLM pre-training by using pre-trained model embeddings for data selection and demonstrates that careful data selection, including intelligent data repetition, can significantly enhance LLM pre-training.
    • Link to the paper
  4. Title: "Emergent and Predictable Memorization in Large Language Models Poster"

    • Authors: Stella Biderman, USVSN PRASHANTH, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, Edward Raff
    • Abstract: This paper investigates memorization in large language models, proposing scaling laws for predicting memorization and offering insights into the distribution of memorization scores across models and data.
    • Link to the paper
  5. Title: "Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective Poster"

    • Authors: Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su
    • Abstract: The paper advances the understanding of the neural text degeneration problem, exploring the correlation between training data repetitions and text degeneration, and presents methods to minimize degeneration by altering the training data.
    • Link to the paper

These papers collectively explore various aspects of data repetition, scaling, and their effects on large language models, offering a comprehensive understanding of the challenges and solutions in this area.