How would you recommend to approach the classification of various fine-grained subclasses of the same class? Specifically talking about different types of something made of paper. For example: "a postcard with something written on it", vs "an empty postcard" vs "just some piece of paper" vs "another object"? With a classifier model we were able to obtain very accurate results to distinguish "paper vs some other object". However we couldn't get accurate enough results (I think ~60% accuracy) regarding the more fine-grained decisions: "postcard vs some piece of paper" and "postcard with text vs empty postcard". The mistakes were usually into false-positive side (identifying some piece of paper as a postcard in my example). So how would you setup the training samples for this sort of goal? Or are we looking in the wrong option, and should be considering some other method, or a combination of methods instead?
Something you could try is doing a hierarchical approach where you first detect the overall class and then crop and do a more precise classification.
In addition to tuning the training, you can also tune the data you're training on like you mention. Adding more examples (width breadth to encompass you expect to see in practice) where you're getting the false positives might help iron out those edge cases. There's some data augmentation options in both the CreateML App and CreateML Components to that could help grow your sample size and add some diversity to it without collecting monumentally more data.
Hard to say if it's feasible. But getting text definitely sounds better suited if your sub-classes are always text based.