Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing
Training a language model is memory-intensive, not only because the model itself is large but also because the long sequences in the training data batches. Training a model with
Read More
