Does Core ML have everything necessary to perform keyword extraction? How would you go about extracting keywords from articles of text?

Natural Language has a number of tools that can be useful in keyword extraction: tokenization, part-of-speech tagging, named entity recognition, gazetteers that could be used to identify stop words, and so on.

We don't provide an implementation of a specific keyword or keyphrase extraction algorithm, but there are algorithms that are sometimes used that take into account features such as frequency, co-occurrence statistics, TF-IDF, etc. that can be calculated from text that has been tokenized and processed using some of these tools.

Doing this fully unsupervised is a difficult task, though. You might be able to do better if you have some advance knowledge of the vocabulary that is relevant to the sort of text you will be working with.

Tagged with: