Does coremltools support conversion of PyTorch Text-> Image models like CLIP? VQGAN?
Both these models (CLIP and VQGAN) are based on CNNs and transformer architectures, both of which should be supported.
In fact here is a resolved issue of a CLIP model conversion.
Note that, depending on the details, you may have to perform the pre-processing of the text input transformation to a tensor representation outside the PyTorch model given to the Core ML Tools convert API. The conversion operates on PyTorch models with tensor in tensor out interface.
I’d say just give the converter a try, and please take a look at some of the examples on the doc page and if you run into issues, post on the Github repo.