Core ML applies a machine learning algorithm to a set of training data to create a model. You use a model to make predictions based on new input data. Models can accomplish a wide variety of tasks that would be difficult or impractical to write in code. For example, you can train a model to categorize photos, or detect specific objects within a photo directly from its pixels. After you create the model, integrate it in your app and deploy it on the user’s device. Your app uses Core ML APIs and user data to make predictions and to train or fine-tune the model.
You can build and train a model with the Create ML app bundled with Xcode. Models trained using are in the Core ML model format and are ready to use in your app. Alternatively, you can use a wide variety of other machine learning libraries and then use Core ML Tools to convert the model into the Core ML format. Once a model is on a user’s device, you can use Core ML to retrain or fine-tune it on-device, with that user’s data.
Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.
Core ML is the foundation for domain-specific frameworks and functionality. Core ML supports Vision for analyzing images, Natural Language for processing text, Speech for converting audio to text, and Sound Analysis for identifying sounds in audio. Core ML itself builds on top of low-level primitives like Accelerate and BNNS, as well as Metal Performance Shaders.
Vision framework performs face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. Vision also allows the use of custom Core ML models for tasks like classification or object detection.
Real time camera object detection with Machine Learning. Basic introduction to Core ML, Vision and ARKit.
In our project we have two important functions, which we need to understand:
- loadCameraAndPreview
- captureOutput
loadCameraAndPreview
// SETTING UP THE CAMERA FOR RECOGNITION USING AVCaptureSession
private func loadCameraAndPreview() {
let captureSession = AVCaptureSession() // Creating Capture Session
captureSession.sessionPreset = .photo // Capture Present Style
guard let captureDevice = AVCaptureDevice.default(for: .video) else { return } // Capture Device location is given to back camera
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else { return } // Setting up the Capture device input from the device
captureSession.addInput(input) // Adding input to Capture Session
captureSession.startRunning() // Starting Capture Session
let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession) // Addedd the Capture Session to preview layer
view.layer.addSublayer(previewLayer) // Added previewLayer to View for displaying on the screen + Frame
previewLayer.frame = view.frame
// Capturing the data from the video frame and adding delegate.
let dataOutput = AVCaptureVideoDataOutput()
dataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))
captureSession.addOutput(dataOutput)
}
captureOutput
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// Image
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// Model
guard let model = try? VNCoreMLModel(for: Resnet50().model) else { return }
// Request
let request = VNCoreMLRequest(model: model) { finishRequest, error in
guard let results = finishRequest.results as? [VNClassificationObservation] else { return }
guard let observation = results.first else { return }
DispatchQueue.main.async {
self.identifierLabel.text = "\(observation.identifier) \(observation.confidence * 100)"
}
}
// Handler
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
}
captureOutput is a delegate method which is called everytime when camera is capturing a frame and in this function we will be setting up our model and request handler for object detection.
So first we will talk about pixelBuffer, pixelBuffer will be storing the image which it will be getting from CMSampleBufferGetImageBuffer. CMSampleBufferGetImageBuffer returns an image buffer that contains the media data and here we are getting media data from sampleBuffer which is a type of CMSampleBuffer (An object that contains zero or more media samples of a uniform media type.). sampleBuffer is a capture output from setSampleBufferDelegate(_:queue:) (Sets the delegate that will accept captured buffers and the dispatch queue on which the delegate will be called and this function is called in loadCameraAndPreview function).
Then we have our model of type VNCoreMLModel which is a container for the model to use with the Vison Request. We have used Resnet50 which is a Image Classification model and i have also added SqueezeNet which is also a Image Classification model but lighter one. We will talk about later in this project why i have added two models.😁 Download Models
Now most important part comes that is our request which is a VNCoreMLRequest. VNCoreMLRequest in simple word is a Vision Request which is a image analysis request that uses a CoreML model to process images. We have a completion handler that is providing us with an finishRequest and error. We take results from the finishRequest as a VNClassificationObservation which is an object that represents classification information that an image analysis request produces and at last we have our observation, which is an identifier which is known as classification label identifing the type of observation, which we are getting from the results.
This request is handle by VNImageRequestHandler, an object that processes one or more image analysis requests pertaining to a single image. Here we are requesting a pixelBuffer which is a type of cvPixelBuffer and asking to perform request which is a type of Vison Request.
Warning Solutions
If you getting warning "'init()' is deprecated: Use init(configuration:) instead and handle errors appropriately.". You can use this code:
guard let mlModel = try? Resnet50 (configuration: .init()).model,
let model = try? VNCoreMLModel(for: mlModel) else {return}
When starting AVCaptureSession getting background thread warning then you can use:
let backgroundQueue = DispatchQueue(label: "background_queue", qos: .background)
backgroundQueue.async {
captureSession.startRunning()
}
App Crash Solutions
Yes! sometime app may crash with warning "Message from debugger: Terminated due to memory issue". So we have two way of solving this First, you can use light model SqueezeNet. Second, you can reduce the Frame rate of capture device. For that you should check the supportive frame rate of your device by:
captureDevice.activeFormat.videoSupportedFrameRateRanges
and then you can set your frame rate by :
captureDevice.activeVideoMaxFrameDuration
captureDevice.activeVideoMinFrameDuration
Thank you. Happy Learning !!!