This may be a more general Machine Learning question. I am interested in the ability to extract text from images and video. The Vision framework does a great job of extracting the text. One thing I would like to do is determine whether each piece of text is in a particular category of typeface, largely looking to tell a source code / monospace font from a sans serif font. Which machine learning technologies available on Apple platforms would be best suited for that? And a high level of how you might approach that?

So the Vision framework, where you extract the text, tells you the region in the image where the text is; the first thing to do would be to crop the text out of the image.

If you have a binary image classifier (sans serif vs serif, or “looks like source code” vs “doesn’t look like source code”, it’s worth experimenting with what definition works best – and you’d need to collect samples of each for your training set!), you can then throw that crop to this classifier to work out whether it’s source code or not.

So at a high level, what I’d do is:

  • train a binary classifier to distinguish source-code from not-source-code
  • using Vision, crop out the region of the image with detected text in
  • use your classifier to determine whether it’s source code or not

and go from there!

Also you can try out the Live Text API this year, it's able to extract text of out images and videos. However it does not provide font-related information of the text yet. You can file a bug tracking this issue to us if needed.


From a non-Apple developer:

I created a serif/sans-serif model with CreateML. You can find it here: https://github.com/jmousseau/Mimeo

Tagged with: