It is shown that the simple pre-training undertaking of predicting which caption goes with which picture is surely an economical and scalable way to find out SOTA impression representations from scratch on a dataset of https://k2spiceshop.com/product/liquid-k2-on-paper-online/