Latest in LLM Research

The latest research on emergent capabilities in large language models (LLMs) includes a variety of perspectives and findings. One notable paper by Schaeffer, Miranda, and Koyejo (2023) titled "Are Emergent Abilities of Large Language Models a Mirage?" presents an alternative explanation for emergent abilities. They propose that these abilities may appear due to the researcher’s choice of metric rather than fundamental changes in model behavior with scale. They argue that nonlinear or discontinuous metrics can produce apparent emergent abilities, whereas linear or continuous metrics result in smooth, predictable changes in model performance. Their work includes tests using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities and a meta-analysis of emergent abilities on BIG-Bench, suggesting that some previously claimed emergent abilities might be due to the researchers' analyses rather than inherent properties of scaling AI models【Schaeffer, R., Miranda, B., & Koyejo, S. (2023). Are Emergent Abilities of Large Language Models a Mirage?. NeurIPS 2023. Link】.

Another research area focuses on the use of web data in training LLMs. The "RefinedWeb Dataset for Falcon LLM" by Penedo et al. (2023) demonstrates that properly filtered and deduplicated web data can lead to powerful models, even outperforming those trained on curated corpora. They extracted five trillion tokens from CommonCrawl, showing that high-quality data from the web is still plentiful【Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Alobeidli, H., Cappelli, A., Pannier, B., Almazrouei, E., & Launay, J. (2023). The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only. NeurIPS 2023. Link】.

Additionally, there are studies on the inference efficiency of LLMs. Narayanan et al. (2023) proposed a new metric called "idealized runtime" to compare inference efficiency across different LLMs. Their analysis provides insights into the efficiency-capability tradeoffs in LLMs developed in 2022【Narayanan, D., Santhanam, K., Henderson, P., Bommasani, R., Lee, T., & Liang, P. (2023). Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models. NeurIPS 2023. Link】.

For more detailed insights and to explore these topics further, you can access the full papers using the provided links.