How Large Language Models Learn: Self-Supervised Training at Internet Scale
Large language models learn by predicting the next word in trillions of internet text samples using self-supervised training. This method powers GPT-4, Claude 3, and Llama 3, but comes with trade-offs in accuracy, bias, and cost.