Starting from iOS 11, Apple introduces a new framework called Vision.
The Vision framework performs face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. Vision also allows the use of custom Core ML models for tasks like classification or object detection.
https://developer.apple.com/documentation/vision
Today we implement with few lines of code one of the simplest features of this beautiful framework, the OCR reader.
OCR means: Optical Character Recognition. If you want to learn more, wikipedia helps you: https://en.wikipedia.org/wiki/Optical_character_recognition.
Let’s start!
Create a new empty XCode project and a simple interface like this beautiful one:
Composed of an UIImageView, an UITextView and UIButton.
Connect outlets and actions and prepare the code!
Import Vision framework
First stuff, simplest is to add the new framework. So on top of your view controller, add:
import UIKit
import Vision
import VisionKit
Show the DocumentCameraViewController
Second thing, just to make a test, show the new VNDocumentCameraViewController that helps you to catch the document from any angle!
Attach the code to a button action:
@IBAction func didScanPressed(_ sender: Any) {
let scanVC = VNDocumentCameraViewController()
scanVC.delegate = self
present(scanVC, animated: true)
}
Remember to add the delegates needed:
extension ViewController: VNDocumentCameraViewControllerDelegate {
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan)
{
guard scan.pageCount >= 1 else {
controller.dismiss(animated: true)
return
}
imgDocument.image = scan.imageOfPage(at: 0)
processImage(scan.imageOfPage(at: 0))
controller.dismiss(animated: true)
}
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
{
controller.dismiss(animated: true)
}
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
controller.dismiss(animated: true)
}
}
The VNDocumentCameraViewController is your new photo view controller.
You can automatically recognize the documents, can adjust colors, crop, resize, align and much more… and the picture is taken automatically!
Cool… but now we want to read the text of our documents!
Implement the Text Request
Start creating the class:
private var ocrRequest = VNRecognizeTextRequest(completionHandler: nil)
Now we need to configure our request.
The VNRecognizeTextRequest have different options that can be used:
- recognitionLanguages
- customWords
- recognitionLevel
- usesLanguageCorrection
- minimumTextHeight
- etc..
Setup the OCR
private func setupOCR() {
// the OCR request to inizialize
ocrRequest = VNRecognizeTextRequest { (request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
return
}
var ocrText = ""
for observation in observations {
// the top words candidates founded
guard let topCandidate = observation.topCandidates(1).first else {
return
}
ocrText += topCandidate.string + "\n"
}
DispatchQueue.main.async {
// the response
self.txtRecognizedText.text = ocrText
}
}
// we want an accurate recognition
ocrRequest.recognitionLevel = .accurate
// correcting eventual misspelled words
ocrRequest.usesLanguageCorrection = true
// and our languages in priority order are:
ocrRequest.recognitionLanguages = ["it-IT", "es-ES", "en-US", "en-GB", "fr-FR", "de-DE"]
}
Process the image
Last but first important stuff is to analyze the taken image.
In the delegates before we called the processImage() function and this is the code.
private func processImage(_ image: UIImage) {
guard let cgImage = image.cgImage else {
return
}
DispatchQueue.main.async {
self.txtRecognizedText.text = ""
}
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
// we use our created OCR request
try requestHandler.perform([self.ocrRequest])
} catch {}
}
We reset the text field and perform an ocrRequest after the image is taken.
For this tutorial, we skip the multiple image management and consider only the first one taken.
Now everything is complete, remember to call setupOCR() in your init (viewDidLoad ).
Final result
A full text recognized, automatically that you can read, edit, share and whatever you want:
Now: add some UI, storage, a pay-per-use and you have created one of the 1000000 apps presents on the App Store that reads and scan documents 😀.
Enjoy scanning!