Vision question: does a VNRectangleObservation contain information about the shape's full outline? For example: a document scanner that needs to fully extract a page from its background. VNDetectRectanglesRequest will provide the position of each corner of the page, which allows us to clip the shape assuming the page is flat. But if the paper is curled, we can end up with bits of background in our cropped image. Is there a way to trace an accurate outline for imperfect rectangles?

For imperfect documents, you might want to look at VNDetectDocumentSegmentationRequest and they combine it with a contour detection on the globalSegmentationMask that you get as a result.

https://developer.apple.com/documentation/vision/vndetectedobjectobservation/3798796-globalsegmentationmask

It is a low res pixel buffer that represents the shape of the detected document. Where each pixel represents a confidence of being or not being part of the document

Document segmentation is ML based and trained on all kinds of documents, labels, papers, etc. Rectangle detection is a traditional algorithm that works on edges that intersect to form a quad.

Tagged with: