performance - Benefits and trade offs for improving text search on small data in PostgreSQL -
- i have 4 text columns of interest.
- each column 100 characters.
- the text in 3 of columns latin words. (the data biological catalog, , these names of things.)
- the data 500 rows. don't expect grow beyond 1000.
- a small number of users (under 10) have editing privileges add, update, , delete data. not expect these users put heavy load on database.
so suggests pretty small data set consider.
i need perform search on 4 columns rows @ least 1 column contains search text (case insensitive). query issued (and results served) via web application. i'm bit lost how approach it.
postgresql offers few options improving text searching speed. possible options built postgresql i've been considering are
- don't try index @ all. use
ilike
,like
onlower
, or similar. (without index?) - index pg_trgm improve search speed. assume need index concatenation somehow.
- full text searching. assume involve concatenating index also.
unfortunately, i'm not familiar expected performance of of these or benefits , trades off, it's hard know things should try first , things shouldn't consider. things have read suggest doing indexing 2 , 3 pretty slow, conflicts fact i'll having occasional modifications going on. , mixed language makes full text search seem unattractive since appears language based, unless can handle multiple languages simultaneously. expect data small, simple ilike
or maybe like
on lower
fast enough? or maybe indexing fast enough low load of modifications on data small? better off looking outside database?
granted, have benchmark these know sure what's fastest, unfortunately, don't have time project. benefits , trade offs of these methods? of these options not appropriate solving type of problem? other types of solutions (including potentially outside database) worth considering?
(i suppose might find kind of beginner's tutorial on text searching in pg useful, searches turn full text search part, don't know if it's useful me.)
i'm on pg 9.2.4, goodies pre-9.3 option.
update: i've expanded answer a detailed blog post.
rather focusing purely on speed, please consider search semantics first. define requirements.
for example, users need able differentiate based on order of terms? should
radiata pinus
find:
pinus radiata
? same rule apply words within column between columns?
are spaces word separators, or spaces within column part of search term?
do need wildcards? if so, need left-anchored wildcards (think staph%
) or need right-anchored or infix wildcards (%ccus
, p%s
)? pg_tgrm
infix wildcards. suffix wildcards can handled index on reverse()
of word, gets clumsy in practice pg_tgrm
best option there.
if you're searching discrete words , word-order isn't important, pg's full-text search to_tsvector
, to_tsquery
desirable. supports left-anchored wildcard searches, weighting, categories, etc.
if you're doing prefix searches of discrete columns simple like
queries on regular b-tree index per column way go.
so. figure out need, then how it. current uncertainty stems partly not knowing quite want.
Comments
Post a Comment