Hi, I'm excited to see more information about optimizing recent models for Core ML including the `ane_transformers` repo. If I wanted to optimize eg CLIP for ANE, should I use code from that repo, or just try to take recommendations from the case study?
Yes I think using the code from the Apple Neural Engine (ANE) Transformers repo at the end of the Deploying Transformers on the Apple Neural Engine article is the best way to get started, and definitely follow along the recommendations of the article as well. Sounds like you are already on the right track!
The default conversion should be quite efficient as well for the neural engine (NE). With the new performance tab in Xcode 14, you will see whether the model is already neural engine resident or not.
There are details in the article on some of the changes specific to distilbert, which may or may not be required for the transformer architecture in CLIP.
There are more details to be found in the A Multi-Task Neural Architecture for On-Device Scene Analysis article as well, hope this helps.
In any case if you find any inefficiencies after conversion, feel free to share with us via a feedback request. We are constantly adding new converter and NE compiler optimizations to automatically detect patterns and map them efficiently to NE, so such feedback is very valuable!