Paper by Erik D. Demaine

Erik D. Demaine and Alejandro López-Ortiz, “A Linear Lower Bound on Index Size for Text Retrieval”, in Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001), Washington, DC, January 7–9, 2001, pages 289–294.

Most information-retrieval systems preprocess the data to produce an auxiliary index structure. Empirically, it has been observed that there is a tradeoff between query response time and the size of the index. When indexing a large corpus, such as the web, the size of the index is an important consideration. In this case it would be ideal to produce an index that is substantially smaller than the text.

In this work we prove a linear lower bound on the size of any index that reports the location (if any) of a substring in the text in time proportional to the length of the pattern. In other words, an index supporting linear-time substring searches requires about as much space as the original text. Here “time” is measured in the number of bit probes to the text; an arbitrary amount of computation may be done on an arbitrary amount of the index. Our lower bound applies to inverted word indices as well.

The paper is 6 pages.

The paper is available in PostScript (154k) and gzipped PostScript (62k).
See information on file formats.
[Google Scholar search]

Related papers:
TextSearchLB_JAlgorithms (A Linear Lower Bound on Index Size for Text Retrieval)

See also other papers by Erik Demaine.
These pages are generated automagically from a BibTeX file.
Last updated February 26, 2024 by Erik Demaine.