Alternatively, you could interact with a list of people that work in our office, select the person, type in your name and the iPad would send that person the message (again through Slack) that you have indeed arrived. This was a more data-rich interaction, but on the other hand, required more touch points on the display.
Touch points become a cause for concern
When COVID-19 hit, there were a lot of initial fears about the virus lingering on surfaces, which meant that touchable user input methods, like touch screens, would need to be cleaned a lot more frequently and preferably avoided.
Most of our commonly used input methods rely heavily on touch. Whether it’s a touchscreen, keyboard, mouse, keypad, joystick, game controller, remote controller, separate button, you name it. All touch-based.
But there are other options out there as well when you put your imagination to work. The iPad has a front-facing camera, which sees the person that is interacting with it. So, we thought, let’s use an expression.
And what expression would you prefer for your visitors to express? Smiles.
How did we do it?
- Camera grabs frames continuously
- Images are processed to answer the following: Is there a person in the picture and do they appear to be smiling?
- If a person is smiling for approximately 2 consecutive seconds we consider the doorbell pressed.
Grabbing the frame
Initializing the camera and grabbing the screen frames is a bit clunky, to be honest. I couldn’t find a nice Swifty API for it, so it’s really legacy, but it works. There’s quite a bit of code involved, but I’ll show the CaptureManager here that is responsible for most of it.
Capture manager runs its camera sample buffer handling in a dedicated DispatchQueue. We’ve set the DispatchQueue to user-initiated QoS level to ensure it gets enough processing priority. Camera FPS is set to 30 here.
// Despite singletons being evil, this class is used as a singleton
class CaptureManager: NSObject {
internal static let shared = CaptureManager()
weak var delegate: CaptureManagerDelegate?
var session: AVCaptureSession?
var device: AVCaptureDevice?
override init() {
super.init()
session = AVCaptureSession()
// Setup camera, choosing the correct front camera here
if let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) {
self.device = device
print(device.activeFormat.videoSupportedFrameRateRanges)
let input = try! AVCaptureDeviceInput(device: device)
session?.addInput(input)
// Initialize output, setting the appropriate buffer format
let output = AVCaptureVideoDataOutput()
output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as AnyHashable as! String: kCVPixelFormatType_32BGRA]
output.setSampleBufferDelegate(self, queue: avQueue)
session?.addOutput(output)
}
}
func startSession() {
avQueue.async {
self.session?.startRunning()
self.device?.set(frameRate: CAMERA_FPS)
}
}
func stopSession() {
avQueue.async {
self.session?.stopRunning()
}
}
func getImageFromSampleBuffer(sampleBuffer: CMSampleBuffer) -> UIImage? {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return nil
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue | CGBitmapInfo.byteOrder32Little.rawValue)
guard let context = CGContext(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue), let cgImage = context.makeImage() else {
return nil
}
let image = UIImage(cgImage: cgImage, scale: 1, orientation: .right)
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
return image
}
}
extension CaptureManager: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// Get image from the sample buffer and pass to delegate for processing
guard let image = getImageFromSampleBuffer(sampleBuffer: sampleBuffer) else { return }
self.delegate!.processCapturedImage(image: image)
}
}
Image processing
In a real-time solution, it’s essential that the image processing is done on-device. On our previous-generation iPad, we noticed that the smile detection was too much to handle for processing at the camera FPS rate of 30. The processing power of this iPad could handle 8 FPS, so we’re discarding the other images that are generated by the camera.
You might ask, why not drop the camera FPS rate then?
Well, in our UI experimentation, we’ve been on-and-off about the idea of displaying the camera stream in the UI. And for that 8 FPS would be really clunky. So, with the current solution, we have the flexibility of showing a smooth 30 FPS stream on the screen while only processing at 8 hz.
The image processing and smile detection is done in a separate DispatchQueue to keep the user interface running smoothly during processing. The UI keeps the user updated about what’s going on with a nice animation.
The API that we use to check for faces and smiles is Apple’s own Core Image, which offers a very simple “hasSmile” boolean when the CIDetector is run with the correct parameters.
DispatchQueue.global(qos: .userInitiated).async {
// Process the frame
// faceDetector is initialised as CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh])!)
let smileDetected = self.faceDetector
.features(in: CIImage(cgImage: image.cgImage!), options: [CIDetectorSmile: true])
.reduce(false, { result, face in
if let face = face as? CIFaceFeature {
return face.hasSmile || result
}
return result
})
// Update smiley states
if smileDetected {
incrementSmile(image : image)
} else {
decrementSmile(image : image)
}
}
}
Slack notification
Finally, when we determine there’s been an intentional smile at the camera, we trigger the Slack notification.
If you’ve ever built Slack integrations that post to Slack it probably looks very similar. There’s a custom Slack app that we’ve built that you can add to the channels you wish to use. Then we use Slack API files.upload and chat.postMessage endpoints to upload an image of the visitor and post a message that they have arrived.
The dramatic plot twist!
Quite ironically, while one of the catalysts for this project was COVID, it also represents one of the challenges for this technology.
Until image processing gets better at detecting a smile around the eyes, masks effectively prevent detecting the visitor’s smile using the above code. So, the visitor will need to remove their mask for a moment or fallback to the touch interface.