Wednesday, 11 May 2016

CaptionBot

CaptionBot

Want captions for your photos? Microsoft’s new tool can help

What is it?
Microsoft’s new photo-recognition website that ‘reads’ images then provides suitable captions describing the scene. You upload a photo at www.captionbot.ai, wait for Microsoft to analyse it, then mark the resulting caption out of five for accuracy. It’s the company’s latest attempt of many to demonstrate how its artificial-intelligence (AI) software can understand the content of a photo without human help.


How does it work?
By combining three of Microsoft’s application programming interfaces (APIs), which are rules that tell one piece of software how to interact with another. With CaptionBot, Microsoft’s Computer Vision API identifies the “components” of a photo (trees, people, animals and so on), then combines this result with data from the Bing Image Search API. Finally, the Emotion API determines a person’s facial expression so CaptionBot can guess how they are feeling (sad, happy etc). They learn from their mistakes to improve the accuracy of the results.

How good are they at the moment?
Not bad, but far from perfect. CaptionBot’s boast that it can “understand the content of any image” has been put to the test by thousands of people online, with mixed results. There are many examples of CaptionBot getting it wrong. It thought Michelle Obama was a mobile phone, mistook a human eye for a doughnut, and gave up completely on a Dalek (“I really can’t describe the picture”). Microsoft responded by saying that it’s “early days” for CaptionBot, but we actually think it shows a lot of promise. It certainly impressed us with its captions for the photos we uploaded.

Which were..?
The three on this page. CaptionBot got off to a good start, saying that a rockhopper penguin was “a black bird standing on a rocky surface” (image 1). Chris Packham would have wanted a more specific answer, but it satisfied us (four stars). It then identified the “large body of water”, but was silent on the stunning sunset (three stars). And it finished on a high by recognising two “happy” people in a shop (five stars). We’d have given it 50 stars had it identified it as a gingerbread shop in the Lake District.

What would I actually use it for?
Not much, at the moment. CaptionBot itself is little more than a party trick, albeit a fascinating one. But the technology behind it is already appearing in programs and apps that try to understand what’s in a photo. Microsoft has done a great job publicising its potential, having already released five similar tools, starting last year with ‘How old do I look?’ (www.how-old.net). Simply upload a photo of yourself and see whether Microsoft can guess your age.

In February, Microsoft broadened its scope from humans to canines with the ‘What Dog?’ tool (www.what-dog.net), which can identify over 100 breeds from photos. It’s also available as a free iOS app called Fetch: www.snipca.com/20312.

Does Microsoft keep the photos I upload?
Yes, to help CaptionBot learn from its mistakes. There’s probably nothing to worry about, but to be on the safe side perhaps you shouldn’t upload any embarrassing holiday photos. You don’t need to submit any personal details, so it’s private and fast to use.

Why is Microsoft doing all this?
Why does it ever do anything? To make money. It hopes to sell the technology to developers of programs and apps. It’s not alone - Google recently revealed PlaNet, a system that has been ‘taught’ to recognise the location of a photo anywhere in the world. These photo-analysing methods should mean that websites and software work faster in the future, as long as they don’t have to identify enemies of Doctor Who and First Ladies of the US.