We do not plan to continue supporting the version before sync ... language models with both model and data parallelism. To demonstrate how the code scales with multiple GPUs and model sizes, we ...