CORVALLIS, Ore. — With U.S. analysts buried under surveillance imagery and facing the mind-numbing tedium of watching hour after hour of footage, some U.S. technologists are working on software tools that would assign the dreary but vital task to computers.
Two tools in development by the Pentagon’s Defense Advanced Research Projects Agency (DARPA) and Kitware, a 65-person company based in Clifton Park, N.Y., highlight the obstacles and potential benefits of automatically scanning imagery for suspicious behavior.
The Video and Image Retrieval and Analysis Tool (VIRAT) would focus on narrow targets, such as a doorway under video surveillance. The more ambitious Persistent Stare Exploitation and Analysis System (PERSEAS) would detect indicators of suspicious behavior by searching for links between events that occur over a wide area and at different times.
U.S. officials disagree over how accurate these tools must be to be useful. Designers of the tools said they would ease the burden on analysts even if they could detect objects automatically 50 percent of the time. Would this partial solution be better than nothing at all?
“Probably not,” said U.S. Air Force Col. Paul Nelson, commander of the service’s 480th Intelligence, Surveillance and Reconnaissance Wing at Langley Air Force Base, Va.
Nelson said his unit doesn’t have any software now like VIRAT. “It could help us focus our efforts, but even then we would likely still be required to have our crews analyze the video, or image, to determine if it was a road repair crew or something suspicious,” he said.
VIRAT and PERSEAS are young projects. Kitware formed its computer vision group in 2007, and since 2009 has received $35 million from DARPA. Kitware won the initial $10 million VIRAT contract in 2009 and a follow-on contract worth $11 million in October. The $14 million PERSEAS contract was awarded in July.
Though it is a tiny company, Kitware has hired a bevy of big-name subcontractors, including BAE Systems, General Dynamics, Honeywell, Lockheed Martin, Northrop Grumman and Raytheon.
VIRAT relies on video matching. Show the system a video clip of whatever the user is looking for, and it will look for similar video in the ISR imagery.
“It also can learn on the fly,” said Anthony Hoogs, Kitware’s principal investigator for VIRAT and PERSEAS. “So you could indicate a behavior of interest, give the system a video example of that, and go find more like it.”
PERSEAS is intended for wide-area motion imagery, known as WAMI, which has fewer frames per second than full-motion video. PERSEAS would combine imagery and signals intelligence generated by events that might be kilometers and hours apart.
According to the DARPA announcement for PERSEAS, the software focuses on such objects as roads and buildings. Imagery of these entities yields “tracks” of activity by people and vehicles. But in an urban environment, these tracks could be strung out over time and space as activities that occur throughout the day, or as surveillance contact is lost and regained. PERSEAS would use algorithms to link these disparate tracks in the hopes of identifying indicators of suspicious behavior.
One challenge is defining just what behavior should be deemed “suspicious,” especially in a hectic, crowded city.
“There are a lot of normal life issues,” Hoogs said. “Most people are just going about their daily business. But normal things have huge variability. There are a lot of detection issues just to track normal people so we don’t think they’re abnormal.”
The DARPA projects were spurred by too much imagery. Analysts might not have time to carefully examine real-time persistent surveillance imagery of a doorway, where there could be hours and hours of nothing interrupted by brief activity.
“Currently, video analysis for Predator and other aerial video surveillance platforms is very labor intensive, and limited to metadata queries, manual annotations and ‘fast forward’ examination of clips,” according to the DARPA broad agency announcement for VIRAT.
VIRAT and PERSEAS are part of the burgeoning field of computer vision, or the study of video and imagery analysis by computer. Wartime surveillance spurred the interest, but there also has been an explosion in imagery collected by private security systems.
The DARPA projects would not have been feasible a few years ago, Hoogs said. But there have been advances in automated tracking technology, as well as activity recognition, which involves models that compensate for inherent errors in video tracking and object recognition.
“The computer vision and machine learning communities have been developing fundamental technologies that account for uncertainty and error in underlying detection and measurements,” Hoogs said.
Officials continue to discuss the acceptable level of accuracy for detecting behavior such as planting an improvised explosive device. The VIRAT announcement calls for a detection rate of 95 percent by phase three of the project.
“What it comes down to is: Are we going to save the video analyst a lot of time?” Hoogs said. “Are we going to help him get through it efficiently, so that an analyst can go through 10 or 100 times the amount of video they can handle now? To do that, we don’t need to be perfect. I’m not sure, for certain problems, that you even need to be 50 percent accurate, because you’re giving them the ability to do something when they have essentially nothing.
“Even with a 50 percent detection rate, we can remove 90 or 95 percent of the video that an analyst has to look at otherwise,” he added.
DARPA declined to authorize officials to discuss these projects, but one official echoed Hoogs on the need to field some capability now.
“Where an automated capability is currently nonexistent, technology with potential for 50 percent success would certainly not be considered of limited utility, don’t you think?” the official said.