Ah, remember the good old days of searching the Internet…the days when you had to type your search request into a search engine text box?  So quaint…so 2009.  Now, thanks to Google Goggles, it is possible for your mobile phone’s camera to take a picture of an object and use that image itself to search Google.

Have you seen van Gogh’s The Starry Night while visiting New York’s Museum of Modern Art? Just snap a picture with your Android mobile phone and using Google Goggles the picture will be submitted to Google who will recognize the real painting and return an array of search data on it.  Standing in front of the Eiffel Tower and want to know more?  No problem, just snap a shot and Google will identify the Paris icon and tell you all about it.

Of course Google Goggles does not just work on famous objects.  Discovered a new local pizzeria and want to know more before eating there?  Hold your mobile phone’s camera up to the restaurant and Google will use the image and your GPS location to provide restaurant reviews, menu items and their phone number.  Snap a shot of a book cover or a bottle of wine and Google will provide comparison pricing information and reviews on the items.

Welcome to the world of “seeing computers,” long ago predicted and popularized in Stanley Kubrick’s 1968 film classic 2001: A Space Odyssey.  The film tells the story of a villainous computer known as the HAL 9000 which attempts to kill the protagonist astronaut.  In the epic battle between man and machine, the Hall 9000 “watches” the astronaut via its omnipresent overs-sized red eye and uses the visual data obtained to prevent the astronaut from disabling the murderous super-computer.

Computers that can “see” are distinct from previous image capturing devices.  While it is nice to take a picture, it is a whole other story for a digital camera to “understand” what its lens is seeing.  Pairing a digital lens with a back-end intelligent network, such as the Internet, allows for a whole new world of possibilities, whether for good or ill.   It does not matter whether the lens in question is on a camera-enabled mobile phone, an internet connected SLR camera or as part of a closed-circuit television (CCTV) system, the pairing of digital/video images with back-end databases will undoubtedly change all facets of our society, to include crime and policing.

Widespread government use of CCTV systems have been around for several decades, most particularly in the United Kingdom, where it was estimated there were at least 60,000 owned or controlled by local governments alone as of 2009.  Previously CCTV systems were “dumb” cameras merely took pictures which had to be subsequently reviewed by human beings in order to discover items of interest, such as a street robbery, terrorist act or a wanted criminal.  The days of dumb cameras, however, are long behind us, and now even the cheapest of these devices when connected via the appropriate software and back-end databases is a powerful tool in either crime fighting or Orwellian surveillance, depending on one’s point of view.

CCTV systems are capable of doing facial recognition in an effort to target known criminals.  At Super Bowl XXXV in January 2001, police in Tampa Bay, Florida, used Identix’ facial recognition software, FaceIt, to search for potential criminals and terrorists in attendance at the event.  The system purportedly found 19 people with pending arrest warrants.

License plates can also be tracked as vehicles move throughout cities at the rate of a car a second, even when moving at speeds up to 100 miles per hour.  As stolen vehicles pass before smart cameras using OCR technology, the vehicle license plates are compared against police and motor vehicle records in an effort to find “wanted” autos or their drivers.

As these technologies drop in price, there will be opportunities for the general public to adopt and use them as well.  Wondering if the guy walking up your driveway with the pizza box really does work at Dominos?  Use your home security system to photo and identify him before he even gets a chance to ring your door bell.  Why not? After all, you own the camera, the information is publicly available on the net and the person photographed was on your property.

When cameras see, even your own trusted cameras, the ones in your home and on your person, can be used to report information on their rightful owners, usually without their permission and knowledge.  Most modern laptops include video cameras and it has long been difficult to find a cell phone that did not include one as well.   A long list of malware has been capable of remotely activating a desktop or laptop computer’s video camera for years.  Such malware-compromised machines provide an open door through which any transnational criminal can walk into an end-user’s home or office and observe their victim’s full range of activities.  The criminal opportunities are noteworthy, ranging from the creation of a new 21st century “peeping Tom,” to watching people type in their banking passwords to recording the latest confidential sales meeting for the purpose of the theft of trade secrets.

Of course such software need not only be deployed by criminal hackers.  In early 2010, much media coverage was devoted to a small school district in Pennsylvania (USA) that had included a “remote administration tool” that allowed the cameras on laptops provided to students to be activated surreptitiously, without the knowledge of the students of their parents.  The cameras were reportedly activated on at least 42 occasions and photographed a number of students without consent.  The fact that the school district officials claimed they only installed the programs in an effort to locate lost or stolen laptops did not deter a federal law suit against school officials and an investigation of the matter by the FBI.

Perhaps the primary “game-changer” in this realm has been the marriage between cameras and mobile phones.   Now that programs such as Google Goggles have brought object recognition to the mobile handset, how long will it be before another Google product, the Picasa digital image organizer, integrates its free facial recognition software into Android mobile handsets?  Facial recognition has also been incorporated into a host of other consummer products including Apple’s iPhoto, and notably the social networking site Facebook.  With over 400,000,000 million members, Facebook may be the largest commercial database of human beings on the planet.  To fully appreciate the convergence and power of these technologies, merely consider the possibilities of a mobile phone’s camera, using freely available facial recognition software to conduct a query against a massive database of photos, such as those on Facebook.

So how might this technology be used?  Well with nearly 5 billion mobile phones on the planet, it means that cameras have been absolutely ubiquitous.  Snap a shot of somebody with your mobile and soon you will be able to identify them.  Like the woman sitting across the isle from you on subway?  Just take her picture with your Android phone and find out who she is.  For good or ill, thus an end to anonymity on the high street or elsewhere.

Google Googles is not the Hall 9000.  In fact, Google Goggles is a useful system and cool technology that will likely change yet again the nature of Internet search, along with other technologies such as the semantic web.  Additional companies are of course making object recognition and “visual search” commonplace on other platforms, such as the iPhone, which supports a program known as oMoby which allows the iPhone to see just as Google Goggles does on the Android operating system.

Naturally as all existing cameras transition from “dumb” picture-taking devices to “smart” seeing objects, the changes in society will be notable.  It is not only the ability to see that will contribute to fundamental social change, but also the vast proliferation of cameras in neighborhoods, offices, on cellphones, music players, portable gaming consoles and automobiles, to name a few.  As noted above, these changes are already having an impact on crime and law enforcement.  As the proliferation trend of smart cameras accelerates exponentially, so too will their use for both criminals and police.

See also:

Machines That Can See

Computing: Advances in computer-vision software are begetting a host of new ways for machines to view the world

By The Economist

March 7, 2009

ENSURING that employees wear warm smiles when helping customers is good business—but no easy task, even for attentive managers. Omron Corporation, a Japanese developer of robotics software, is concocting a solution. Its software can analyse digital images, including video, to recognise and classify facial expressions. Soon the company will start selling a “smile measurement” system that will alert managers—in real time, if desired—when a cashier fails to muster an adequate grin. The software is configurable, so employers will be able to decide just how happy their employees should appear.

Using computers to measure smiles will strike many as absurd. Yet machines are learning to see in increasingly reliable and useful ways, opening up a wide range of new applications. Indeed, computer vision, also known as object recognition, has developed so rapidly over the past few years that rather than struggling to make sense of what they see, computers can now outperform humans in some cases. Curiously enough, one such category is interpreting human facial expressions.

Venu Govindaraju, a computer scientist at the University of Buffalo in New York, is designing software that helps determine the authenticity of expressions. He found that expressions that take as much time to form as to fade away are more likely to be genuine than those with unequal “onset” and “offset” durations. Detecting phoniness this way is far from fail-safe, but it is a good guide. So good, in fact, that Unilever, an Anglo-Dutch consumer-goods giant, is using expression-analysis software to pinpoint how testers react to foods. Procter & Gamble, an American competitor, is using similar technology to decipher the expressions of focus groups viewing its advertisements.

Using computer vision to analyse how people react to advertising, combined with the ability to identify what sort of people they are, also provides new opportunities. Digital billboards—the large TV screens that display advertisements in public places—already take into account the weather (touting cold drinks when it is hot) and the time of day (promoting wine in the evening). NICTA, a media laboratory funded by the Australian government, has gone a stage further. It has developed a digital sign called TABANAR, which sports an integrated camera. When a passer-by approaches, software determines his sex, approximate age and hair growth. Shoppers can then be enticed with highly targeted advertisements: action figures for little boys, for example, or razors for beardless men. If the person begins to turn away, TABANAR launches a different ad, perhaps with dramatic music. If he comes back later, TABANAR can show yet another advertisement. “You tend to go: ‘Wow, thanks, how did you know I needed that?’,” says Rob Fitzpatrick of NICTA.

Computer vision can prevent sales, too. In Japan it recently became illegal to sell tobacco from vending machines without verifying that customers are at least 20 years old. Fujitaka, a maker of vending machines in Kyoto, promptly devised a solution: it built dispensers with artificial vision. Fujitaka’s new machines refuse to sell cigarettes if their software detects plumpness in the skin (a tell-tale sign of adolescence) around a potential customer’s eyes. Tests show that the system is slightly better at estimating people’s ages than nightclub bouncers are. Ray Chiang of Fujitaka says sales surged after the government certified the technique last year.

The elderly are also coming under scrutiny. Computer scientists at the Toronto Rehabilitation Institute in Canada have been testing a computer-vision system for monitoring people living in nursing homes or alone. A cheap camera, stuck to the ceiling, wirelessly relays images to a small computer that monitors how people move. When someone neglects to brush their teeth, flush the toilet or wash their hands, a speaker can prompt them to do so. And if a person falls over or stops moving, and fails to declare that all is well when prompted by the computer, the system calls a relative or dials an emergency number.

Watching while you work

Similar software can identify slackers in fast-food kitchens. This year HyperActive Technologies of Pittsburgh, Pennsylvania, is launching “HyperActive Bob”, a system that processes data collected by an array of cameras and alerts restaurant managers (either on site, or back at headquarters) when employees indulge in lengthy toilet breaks, or are slow to toss burgers onto the grill. The monitoring will be offered as a subscription, costing less than $200 a month for each restaurant.

Nello Zeuch, an independent consultant based in Yardley, Pennsylvania, says computer-vision systems are also being used to monitor products on assembly lines, as well as the workers assembling them. In car factories, for example, workers can be notified by vision systems if components are missing or improperly seated. In some cases, workers are warned if they reach for the wrong tool or part. In electronics factories vision technology has become a vital part of the testing process. A machine can examine a circuit board for faults almost instantly. A human would take far longer to do the same thing, and would be less accurate.

Computer vision has even advanced to the point that it can perform internet searches with an image, rather than key words, as a search term. Later this year Accenture, a consulting firm, will launch a free service, called Accenture Mobile Object-Recognition Platform (AMORP), that will enable people to use images sent from mobile phones to look things up on the web. After sending an image of, say, a Chinese delicacy, a curious foodie might receive information gleaned from AsianFoodGrocer.com, for example. Fredrik Linaker, head of the AMORP project at Accenture’s research centre in Sofia Antipolis, France, likens the project to “physical-world hyperlinking”.

Microsoft is developing a competing service, known as Lincoln, which can already recognise more than a million objects in videos or photographs. Larry Zitnick, a Microsoft researcher in Redmond, Washington, notes that searching with images is often more precise than using words. Transmitting a picture of the Eiffel Tower taken from a magazine, for example, will fetch web pages that include information about travelling to Paris. Sending video footage of the monument itself, by contrast, will return web pages that contain useful information about the tower’s opening hours, or good places to eat nearby.

Sending pictures to the internet could help robots as well as people. Jim Little of the University of British Columbia in Canada wants to make robots less clumsy. He has connected robots wirelessly to the internet, enabling them to search for pictures online so that they can quickly learn to recognise nearby objects. Curious George, one of Dr Little’s robotic creations, can identify a book, for example, by finding a picture of it on Amazon, a leading online retailer.

One of the most promising uses of computer-vision software is in combating crime. In January a company called Evolution Robotics, based in Pasadena, California, began selling shopkeepers a system called LaneHawk InCart. When a customer arrives at a supermarket checkout, an overhead camera identifies the items on the conveyor belt and anything left behind in the shopping trolley. It then rings up the correct cost of the items. The system prevents “sweethearting”—the practice by which cashiers collude in a theft, either by failing to scan an item or by entering the wrong price. It also overcomes bar-code switching, in which would-be thieves remove the original bar-code and replace it with that of a cheaper item.

Eyes of the law

Nabbing drivers who switch car number-plates is another area where computer vision promises to help. Autonomy, a British firm, sells software that can recognise the make, model and colour of moving vehicles. By analysing data from roadside cameras, the system can notify police the moment a car drives past with a number-plate registered to another vehicle.

Similar technology is being used by repossession companies and other firms eager to get their hands on rogue vehicles. Last September Dijital Video ve Imge Teknolojileri, a firm based in Istanbul, launched a computer-vision system that uses a small camera mounted behind a car’s windscreen. A law firm installed it in 20 cars to look out for wanted vehicles and alert the police. Within two months it had led to the arrest of 15 drivers. They were “quite surprised”, says Muhittin Gökmen, the company’s founder. “They didn’t know they could be captured like this.”

Car-mounted vision systems can be used to prevent accidents as well as crime. The system sold by Mobileye Vision Technologies in Jerusalem, for example, notifies drivers of vehicles hidden in blind spots and advises them against changing lanes if speeding or erratically moving vehicles are nearby. The company has sold more than 100,000 systems to carmakers including BMW, General Motors and Volvo. This year Mobileye will launch a new system that applies the brakes if a collision is imminent.

The Technion-Israel Institute of Technology in Haifa, meanwhile, is developing roadside vision systems for dangerous junctions. If approaching cars appear to be heading towards a collision, drivers are warned by flashing street signs. Such safety systems need not be limited to roads. DFS Deutsche Flugsicherung, a government agency responsible for air-traffic control in Germany, is about to launch vision software for airports. Using images collected by surveillance cameras, its Advanced Surface Movement Guidance and Control System will warn traffic controllers of potential collisions between taxiing aircraft and vehicles ferrying luggage and food.

Jake Aggarwal, an expert in the security implications of traffic patterns at the University of Texas at Austin, is using funds from America’s defence department to analyse footage of suspicious driving filmed from above. Understanding vehicle movements, Mr Aggarwal says, is especially helpful to intelligence and security experts in Afghanistan and Iraq. Suspect vehicles include those that drive in circles and those that go to government buildings and military facilities, especially if they stop near them.

Advances in computer vision, in short, have applications in fields from advertising and manufacturing to road safety and counter-terrorism. It is a technology worth watching closely.

Technology Quarterly