self training with noisy student improves imagenet classification

Little Bill Vhs Archive, Articles S

A tag already exists with the provided branch name. Models are available at this https URL. This material is presented to ensure timely dissemination of scholarly and technical work. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. The Wilds 2.0 update is presented, which extends 8 of the 10 datasets in the Wilds benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment, and systematically benchmark state-of-the-art methods that leverage unlabeling data, including domain-invariant, self-training, and self-supervised methods. Amongst other components, Noisy Student implements Self-Training in the context of Semi-Supervised Learning. Probably due to the same reason, at =16, EfficientNet-L2 achieves an accuracy of 1.1% under a stronger attack PGD with 10 iterations[43], which is far from the SOTA results. Noisy student-teacher training for robust keyword spotting, Unsupervised Self-training Algorithm Based on Deep Learning for Optical Use, Smithsonian Due to duplications, there are only 81M unique images among these 130M images. The results also confirm that vision models can benefit from Noisy Student even without iterative training. We iterate this process by putting back the student as the teacher. The top-1 accuracy of prior methods are computed from their reported corruption error on each corruption. Callback to apply noisy student self-training (a semi-supervised learning approach) based on: Xie, Q., Luong, M. T., Hovy, E., & Le, Q. V. (2020). Noisy Student Training is based on the self-training framework and trained with 4 simple steps: For ImageNet checkpoints trained by Noisy Student Training, please refer to the EfficientNet github. Zoph et al. We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. By clicking accept or continuing to use the site, you agree to the terms outlined in our. In particular, we set the survival probability in stochastic depth to 0.8 for the final layer and follow the linear decay rule for other layers. We obtain unlabeled images from the JFT dataset [26, 11], which has around 300M images. Image Classification sign in . Soft pseudo labels lead to better performance for low confidence data. A common workaround is to use entropy minimization or ramp up the consistency loss. The hyperparameters for these noise functions are the same for EfficientNet-B7, L0, L1 and L2. This article demonstrates the first tool based on a convolutional Unet++ encoderdecoder architecture for the semantic segmentation of in vitro angiogenesis simulation images followed by the resulting mask postprocessing for data analysis by experts. We evaluate the best model, that achieves 87.4% top-1 accuracy, on three robustness test sets: ImageNet-A, ImageNet-C and ImageNet-P. ImageNet-C and P test sets[24] include images with common corruptions and perturbations such as blurring, fogging, rotation and scaling. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. Noisy Student Training is based on the self-training framework and trained with 4-simple steps: This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Hence, EfficientNet-L0 has around the same training speed with EfficientNet-B7 but more parameters that give it a larger capacity. labels, the teacher is not noised so that the pseudo labels are as good as Self-Training With Noisy Student Improves ImageNet Classification @article{Xie2019SelfTrainingWN, title={Self-Training With Noisy Student Improves ImageNet Classification}, author={Qizhe Xie and Eduard H. Hovy and Minh-Thang Luong and Quoc V. Le}, journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2019 . Their main goal is to find a small and fast model for deployment. [^reference-9] [^reference-10] A critical insight was to . We find that Noisy Student is better with an additional trick: data balancing. For instance, on the right column, as the image of the car undergone a small rotation, the standard model changes its prediction from racing car to car wheel to fire engine. Works based on pseudo label[37, 31, 60, 1] are similar to self-training, but also suffers the same problem with consistency training, since it relies on a model being trained instead of a converged model with high accuracy to generate pseudo labels. During the generation of the pseudo There was a problem preparing your codespace, please try again. After using the masks generated by teacher-SN, the classification performance improved by 0.2 of AC, 1.2 of SP, and 0.7 of AUC. This is a recurring payment that will happen monthly, If you exceed more than 500 images, they will be charged at a rate of $5 per 500 images. Self-training with noisy student improves imagenet classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10687-10698, (2020 . The top-1 and top-5 accuracy are measured on the 200 classes that ImageNet-A includes. corruption error from 45.7 to 31.2, and reduces ImageNet-P mean flip rate from When data augmentation noise is used, the student must ensure that a translated image, for example, should have the same category with a non-translated image. A. Alemi, Thirty-First AAAI Conference on Artificial Intelligence, C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, EfficientNet: rethinking model scaling for convolutional neural networks, Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results, H. Touvron, A. Vedaldi, M. Douze, and H. Jgou, Fixing the train-test resolution discrepancy, V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), J. Weston, F. Ratle, H. Mobahi, and R. Collobert, Deep learning via semi-supervised embedding, Q. Xie, Z. Dai, E. Hovy, M. Luong, and Q. V. Le, Unsupervised data augmentation for consistency training, S. Xie, R. Girshick, P. Dollr, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, I. However an important requirement for Noisy Student to work well is that the student model needs to be sufficiently large to fit more data (labeled and pseudo labeled). This paper standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications, and proposes a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations. E. Arazo, D. Ortego, P. Albert, N. E. OConnor, and K. McGuinness, Pseudo-labeling and confirmation bias in deep semi-supervised learning, B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson, There are many consistent explanations of unlabeled data: why you should average, International Conference on Learning Representations, Advances in Neural Information Processing Systems, D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel, MixMatch: a holistic approach to semi-supervised learning, Combining labeled and unlabeled data with co-training, C. Bucilu, R. Caruana, and A. Niculescu-Mizil, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Y. Carmon, A. Raghunathan, L. Schmidt, P. Liang, and J. C. Duchi, Unlabeled data improves adversarial robustness, Semi-supervised learning (chapelle, o. et al., eds. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Different types of. This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. Code is available at https://github.com/google-research/noisystudent. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Papers With Code is a free resource with all data licensed under. Infer labels on a much larger unlabeled dataset. We iterate this process by Self-training with Noisy Student improves ImageNet classication Qizhe Xie 1, Minh-Thang Luong , Eduard Hovy2, Quoc V. Le1 1Google Research, Brain Team, 2Carnegie Mellon University fqizhex, thangluong, qvlg@google.com, hovy@cmu.edu Abstract We present Noisy Student Training, a semi-supervised learning approach that works well even when . These works constrain model predictions to be invariant to noise injected to the input, hidden states or model parameters. Our finding is consistent with similar arguments that using unlabeled data can improve adversarial robustness[8, 64, 46, 80]. Please refer to [24] for details about mCE and AlexNets error rate. The performance consistently drops with noise function removed. This paper proposes a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images to improve the performance for a given target architecture, like ResNet-50 or ResNext. A self-training method that better adapt to the popular two stage training pattern for multi-label text classification under a semi-supervised scenario by continuously finetuning the semantic space toward increasing high-confidence predictions, intending to further promote the performance on target tasks. Scaling width and resolution by c leads to c2 times training time and scaling depth by c leads to c times training time. Whether the model benefits from more unlabeled data depends on the capacity of the model since a small model can easily saturate, while a larger model can benefit from more data. Then by using the improved B7 model as the teacher, we trained an EfficientNet-L0 student model. Finally, the training time of EfficientNet-L2 is around 2.72 times the training time of EfficientNet-L1. Using Noisy Student (EfficientNet-L2) as the teacher leads to another 0.8% improvement on top of the improved results. Lastly, we follow the idea of compound scaling[69] and scale all dimensions to obtain EfficientNet-L2. We hypothesize that the improvement can be attributed to SGD, which introduces stochasticity into the training process. A tag already exists with the provided branch name. We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. Self-training with noisy student improves imagenet classification, in: Proceedings of the IEEE/CVF Conference on Computer . For smaller models, we set the batch size of unlabeled images to be the same as the batch size of labeled images. The architecture specifications of EfficientNet-L0, L1 and L2 are listed in Table 7. CLIP (Contrastive Language-Image Pre-training) builds on a large body of work on zero-shot transfer, natural language supervision, and multimodal learning.The idea of zero-data learning dates back over a decade [^reference-8] but until recently was mostly studied in computer vision as a way of generalizing to unseen object categories. Our experiments showed that self-training with Noisy Student and EfficientNet can achieve an accuracy of 87.4% which is 1.9% higher than without Noisy Student. Our experiments show that an important element for this simple method to work well at scale is that the student model should be noised during its training while the teacher should not be noised during the generation of pseudo labels. Hence the total number of images that we use for training a student model is 130M (with some duplicated images). Notably, EfficientNet-B7 achieves an accuracy of 86.8%, which is 1.8% better than the supervised model. Lastly, we will show the results of benchmarking our model on robustness datasets such as ImageNet-A, C and P and adversarial robustness. The abundance of data on the internet is vast. Scripts used for our ImageNet experiments: Similar scripts to run predictions on unlabeled data, filter and balance data and train using the filtered data. (2) With out-of-domain unlabeled images, hard pseudo labels can hurt the performance while soft pseudo labels leads to robust performance. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. possible. We used the version from [47], which filtered the validation set of ImageNet. We improved it by adding noise to the student to learn beyond the teachers knowledge. Specifically, we train the student model for 350 epochs for models larger than EfficientNet-B4, including EfficientNet-L0, L1 and L2 and train the student model for 700 epochs for smaller models. 3.5B weakly labeled Instagram images. This invariance constraint reduces the degrees of freedom in the model. In our experiments, we observe that soft pseudo labels are usually more stable and lead to faster convergence, especially when the teacher model has low accuracy. We use stochastic depth[29], dropout[63] and RandAugment[14]. Self-training with Noisy Student improves ImageNet classificationCVPR2020, Codehttps://github.com/google-research/noisystudent, Self-training, 1, 2Self-training, Self-trainingGoogleNoisy Student, Noisy Studentstudent modeldropout, stochastic depth andaugmentationteacher modelNoisy Noisy Student, Noisy Student, 1, JFT3ImageNetEfficientNet-B00.3130K130K, EfficientNetbaseline modelsEfficientNetresnet, EfficientNet-B7EfficientNet-L0L1L2, batchsize = 2048 51210242048EfficientNet-B4EfficientNet-L0l1L2350epoch700epoch, 2EfficientNet-B7EfficientNet-L0, 3EfficientNet-L0EfficientNet-L1L0, 4EfficientNet-L1EfficientNet-L2, student modelNoisy, noisystudent modelteacher modelNoisy, Noisy, Self-trainingaugmentationdropoutstochastic depth, Our largest model, EfficientNet-L2, needs to be trained for 3.5 days on a Cloud TPU v3 Pod, which has 2048 cores., 12/self-training-with-noisy-student-f33640edbab2, EfficientNet-L0EfficientNet-B7B7, EfficientNet-L1EfficientNet-L0, EfficientNetsEfficientNet-L1EfficientNet-L2EfficientNet-L2EfficientNet-B75. Code for Noisy Student Training. The main difference between our method and knowledge distillation is that knowledge distillation does not consider unlabeled data and does not aim to improve the student model. Train a classifier on labeled data (teacher). This paper reviews the state-of-the-art in both the field of CNNs for image classification and object detection and Autonomous Driving Systems (ADSs) in a synergetic way including a comprehensive trade-off analysis from a human-machine perspective. Algorithm1 gives an overview of self-training with Noisy Student (or Noisy Student in short). The abundance of data on the internet is vast. A number of studies, e.g. Also related to our work is Data Distillation[52], which ensembled predictions for an image with different transformations to teach a student network. This model investigates a new method for incorporating unlabeled data into a supervised learning pipeline. For classes where we have too many images, we take the images with the highest confidence. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. The proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference and achieving better accuracy than standard distillation. to use Codespaces. Summarization_self-training_with_noisy_student_improves_imagenet_classification. We then select images that have confidence of the label higher than 0.3. As shown in Table2, Noisy Student with EfficientNet-L2 achieves 87.4% top-1 accuracy which is significantly better than the best previously reported accuracy on EfficientNet of 85.0%. As can be seen from the figure, our model with Noisy Student makes correct predictions for images under severe corruptions and perturbations such as snow, motion blur and fog, while the model without Noisy Student suffers greatly under these conditions. Then, EfficientNet-L1 is scaled up from EfficientNet-L0 by increasing width. We investigate the importance of noising in two scenarios with different amounts of unlabeled data and different teacher model accuracies. Proceedings of the eleventh annual conference on Computational learning theory, Proceedings of the IEEE conference on computer vision and pattern recognition, Empirical Methods in Natural Language Processing (EMNLP), Imagenet classification with deep convolutional neural networks, Domain adaptive transfer learning with specialist models, Thirty-Second AAAI Conference on Artificial Intelligence, Regularized evolution for image classifier architecture search, Inception-v4, inception-resnet and the impact of residual connections on learning. For simplicity, we experiment with using 1128,164,132,116,14 of the whole data by uniformly sampling images from the the unlabeled set though taking the images with highest confidence leads to better results. We use EfficientNets[69] as our baseline models because they provide better capacity for more data. Here we use unlabeled images to improve the state-of-the-art ImageNet accuracy and show that the accuracy gain has an outsized impact on robustness. We iterate this process by putting back the student as the teacher. Then we finetune the model with a larger resolution for 1.5 epochs on unaugmented labeled images. On ImageNet-C, it reduces mean corruption error (mCE) from 45.7 to 31.2. We first improved the accuracy of EfficientNet-B7 using EfficientNet-B7 as both the teacher and the student. on ImageNet ReaL. , have shown that computer vision models lack robustness. Do imagenet classifiers generalize to imagenet? Most existing distance metric learning approaches use fully labeled data Self-training achieves enormous success in various semi-supervised and (or is it just me), Smithsonian Privacy Compared to consistency training[45, 5, 74], the self-training / teacher-student framework is better suited for ImageNet because we can train a good teacher on ImageNet using label data. EfficientNet-L0 is wider and deeper than EfficientNet-B7 but uses a lower resolution, which gives it more parameters to fit a large number of unlabeled images with similar training speed. These significant gains in robustness in ImageNet-C and ImageNet-P are surprising because our models were not deliberately optimizing for robustness (e.g., via data augmentation). One might argue that the improvements from using noise can be resulted from preventing overfitting the pseudo labels on the unlabeled images. Figure 1(c) shows images from ImageNet-P and the corresponding predictions. In Noisy Student, we combine these two steps into one because it simplifies the algorithm and leads to better performance in our preliminary experiments. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to . Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, Y. Huang, Y. Cheng, D. Chen, H. Lee, J. Ngiam, Q. V. Le, and Z. Chen, GPipe: efficient training of giant neural networks using pipeline parallelism, A. Iscen, G. Tolias, Y. Avrithis, and O. This attack performs one gradient descent step on the input image[20] with the update on each pixel set to . The architectures for the student and teacher models can be the same or different. Infer labels on a much larger unlabeled dataset. [68, 24, 55, 22]. 10687-10698). Here we show an implementation of Noisy Student Training on SVHN, which boosts the performance of a This model investigates a new method. You signed in with another tab or window. As we use soft targets, our work is also related to methods in Knowledge Distillation[7, 3, 26, 16]. Chum, Label propagation for deep semi-supervised learning, D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, Semi-supervised learning with deep generative models, Semi-supervised classification with graph convolutional networks. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. Secondly, to enable the student to learn a more powerful model, we also make the student model larger than the teacher model. We use our best model Noisy Student with EfficientNet-L2 to teach student models with sizes ranging from EfficientNet-B0 to EfficientNet-B7. Noisy Student self-training is an effective way to leverage unlabelled datasets and improving accuracy by adding noise to the student model while training so it learns beyond the teacher's knowledge. At the top-left image, the model without Noisy Student ignores the sea lions and mistakenly recognizes a buoy as a lighthouse, while the model with Noisy Student can recognize the sea lions. Lastly, we apply the recently proposed technique to fix train-test resolution discrepancy[71] for EfficientNet-L0, L1 and L2. Learn more. Next, a larger student model is trained on the combination of all data and achieves better performance than the teacher by itself.OUTLINE:0:00 - Intro \u0026 Overview1:05 - Semi-Supervised \u0026 Transfer Learning5:45 - Self-Training \u0026 Knowledge Distillation10:00 - Noisy Student Algorithm Overview20:20 - Noise Methods22:30 - Dataset Balancing25:20 - Results30:15 - Perturbation Robustness34:35 - Ablation Studies39:30 - Conclusion \u0026 CommentsPaper: https://arxiv.org/abs/1911.04252Code: https://github.com/google-research/noisystudentModels: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnetAbstract:We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Our model is also approximately twice as small in the number of parameters compared to FixRes ResNeXt-101 WSL. We iterate this process by putting back the student as the teacher. For this purpose, we use a much larger corpus of unlabeled images, where some images may not belong to any category in ImageNet. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. We duplicate images in classes where there are not enough images. Agreement NNX16AC86A, Is ADS down? Self-Training With Noisy Student Improves ImageNet Classification Abstract: We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. Use Git or checkout with SVN using the web URL. Noisy Student Training is a semi-supervised training method which achieves 88.4% top-1 accuracy on ImageNet and surprising gains on robustness and adversarial benchmarks. Self-training was previously used to improve ResNet-50 from 76.4% to 81.2% top-1 accuracy[76] which is still far from the state-of-the-art accuracy. The accuracy is improved by about 10% in most settings. Finally, in the above, we say that the pseudo labels can be soft or hard. to noise the student. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. In addition to improving state-of-the-art results, we conduct additional experiments to verify if Noisy Student can benefit other EfficienetNet models. We sample 1.3M images in confidence intervals. Our work is based on self-training (e.g.,[59, 79, 56]). In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. on ImageNet ReaL Models are available at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet. It has three main steps: train a teacher model on labeled images use the teacher to generate pseudo labels on unlabeled images The swing in the picture is barely recognizable by human while the Noisy Student model still makes the correct prediction. To achieve this result, we first train an EfficientNet model on labeled Hence, whether soft pseudo labels or hard pseudo labels work better might need to be determined on a case-by-case basis. When dropout and stochastic depth are used, the teacher model behaves like an ensemble of models (when it generates the pseudo labels, dropout is not used), whereas the student behaves like a single model. Self-Training Noisy Student " " Self-Training . The pseudo labels can be soft (a continuous distribution) or hard (a one-hot distribution). Self-training with Noisy Student improves ImageNet classification Original paper: https://arxiv.org/pdf/1911.04252.pdf Authors: Qizhe Xie, Eduard Hovy, Minh-Thang Luong, Quoc V. Le HOYA012 Introduction EfficientNet ImageNet SOTA EfficientNet We verify that this is not the case when we use 130M unlabeled images since the model does not overfit the unlabeled set from the training loss. ImageNet . We will then show our results on ImageNet and compare them with state-of-the-art models. on ImageNet, which is 1.0 We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. In particular, we first perform normal training with a smaller resolution for 350 epochs. Code is available at this https URL.Authors: Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. LeLinks:YouTube: https://www.youtube.com/c/yannickilcherTwitter: https://twitter.com/ykilcherDiscord: https://discord.gg/4H8xxDFBitChute: https://www.bitchute.com/channel/yannic-kilcherMinds: https://www.minds.com/ykilcherParler: https://parler.com/profile/YannicKilcherLinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/If you want to support me, the best thing to do is to share out the content :)If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcherPatreon: https://www.patreon.com/yannickilcherBitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cqEthereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9mMonero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n On robustness test sets, it improves Yalniz et al. As shown in Table3,4 and5, when compared with the previous state-of-the-art model ResNeXt-101 WSL[44, 48] trained on 3.5B weakly labeled images, Noisy Student yields substantial gains on robustness datasets. In other words, small changes in the input image can cause large changes to the predictions. In both cases, we gradually remove augmentation, stochastic depth and dropout for unlabeled images, while keeping them for labeled images. Code is available at https://github.com/google-research/noisystudent. The width. Lastly, we trained another EfficientNet-L2 student by using the EfficientNet-L2 model as the teacher. This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Qizhe Xie, Eduard Hovy, Minh-Thang Luong, Quoc V. Le. In terms of methodology, Unlike previous studies in semi-supervised learning that use in-domain unlabeled data (e.g, ., CIFAR-10 images as unlabeled data for a small CIFAR-10 training set), to improve ImageNet, we must use out-of-domain unlabeled data.