Compact and hash based variants of the suffix array
Bulletin of the Polish Academy of Sciences: Technical Sciences
2017
No 4
S. Grabowski
; M. Raniszewski
General Engineering
;
Computer Networks and Communications
;
Atomic and Molecular Physics, and Optics
;
Artificial Intelligence
;
Information Systems
Nauki Techniczne
<jats:title>Abstract</jats:title><jats:p>Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2</jats:p>
Polish Academy of Sciences
2017
ISSN 0239-7528, eISSN 2300-1917
10.1515/bpasts-2017-0046