We
use visual word/visterm as compact feature for image representation.
Two steps are involved in retrieving visterms. 1) SIFT features are
extracted from raw QVGA images; 2) Hierarchical clustering is used
to convert SIFT features to visterms. In vocabulary tree, the leaf clusters are used to represent the SIFT features of database images. Therefore, a 128 dimensional SIFT feature is now turned into a cluster id (usually 4 Bytes). We can this cluster id visual words, or visterm. In distributed image search, we use visterm as our distinguishing feature to represent images. |
The vocabulary tree is the core data structure to map SIFT features to visterms. Since vocabulary tree is a largedata structure (usually several MB), it is necessary to be maintained on flash. We can only load a subset of entire vocabulary tree into memory when converting SIFT to visterm. Thus, we developped the Buffered Vocabulary tree to optimize this converting process. The key idea of Buffered Vocabulary tree is to partition large Vocabulary tree into sub-trees, and buffer SIFT features for batch processing to reduce times of loading the same subtrees. |
Inverted file is used to map a visterm to the set of images in the local database that contain the visterm. As the number of captured images increases, the size of inverted file grows to exceed the memory limit. Therefore, we introduce an additive log-like flash storage to store inverted file. The longest document list in the inverted file are stored in this log-like storage to keep the flash writing time as less as possible. |
Global matching is to further refine the local search results for global optimal. The proxy gathers the top-k locally ranked results and merge them into the global top-k results. The sensors only need to send back the correspondent images that are in the global ranking list. |