Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning
In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained…