Massive numbers of video clips are generated daily and uploaded to the Internet, but making their contents searchable is a significant challenge. Researchers at ICSI have been developing state-of-the-art techniques for using audio analysis to detect specified events in videos, based on the lower-level audio concepts that make up those events. AURORA is a multi-institution collaboration that combines a variety of approaches to automated video analysis, with the goal of building a complete video-search system.
Researchers on the Global Inference & Online Privacy project are investigating the aggregation of seemingly innocuous public information from multiple websites to attack privacy. Users often do not realize that personal information from one site can be correlated to information on another; we have demonstrated how cross-site inference can tell much more about people than they are aware they are revealing. The related Teaching Privacy project aims to educate K-12 and college students about how online privacy really works.
Just as human location analysts use multiple sources of information to more accurately determine the geo-coordinates of the content recorded in digital media, we think approaches to automatic location estimation should be inherently multimodal. The Berkeley Multimodal Location Estimation project is developing an automatic location estimator that can use visual, acoustic, and textual cues to identify the probable recording location of an image, video, or audio track that lacks geolocation metadata.
Speaker diarization consists of segmenting and clustering a speech recording into speaker-homogenous regions, i.e., answering the question “Who spoke when?” ICSI researchers have contributed to the state of the art in speech recognition by using diarization to attack complex tasks like analyzing nonscripted multi-party interaction, locating speakers relative to each other, and automating the navigation of video recordings. Current work extends diarization methods to the analysis of non-speech sounds.
Crowdsourcing is generally used for tasks that are easy for humans but difficult or impossible for computers. However, it is also feasible to train and qualify crowd workers to do more specialized work — i.e., to use crowdsourcing for tasks that are difficult for both humans and computers. We used crowdsourcing as part of the Berkeley Multimodal Location Estimation Project, providing a human baseline against which to assess our automatic system, and are exploring further applications.
Big data experiments are computationally very intensive, so code written in high-level languages often must be recoded into a low-level language, creating a gap in productivity. Researchers at ICSI and UC Berkeley are developing SMASH, a tool that automatically generates optimized parallel implementations of large-scale multimedia content analysis algorithms from Python code. SMASH is based on PyCASP, a specialization framework for automatically mapping computations typical in content analysis.