performance - Benefits and trade offs for improving text search on small data in PostgreSQL -

- January 15, 2014

i have 4 text columns of interest.
each column 100 characters.
the text in 3 of columns latin words. (the data biological catalog, , these names of things.)
the data 500 rows. don't expect grow beyond 1000.
a small number of users (under 10) have editing privileges add, update, , delete data. not expect these users put heavy load on database.

so suggests pretty small data set consider.

i need perform search on 4 columns rows @ least 1 column contains search text (case insensitive). query issued (and results served) via web application. i'm bit lost how approach it.

postgresql offers few options improving text searching speed. possible options built postgresql i've been considering are

don't try index @ all. use ilike, like on lower, or similar. (without index?)
index pg_trgm improve search speed. assume need index concatenation somehow.
full text searching. assume involve concatenating index also.

unfortunately, i'm not familiar expected performance of of these or benefits , trades off, it's hard know things should try first , things shouldn't consider. things have read suggest doing indexing 2 , 3 pretty slow, conflicts fact i'll having occasional modifications going on. , mixed language makes full text search seem unattractive since appears language based, unless can handle multiple languages simultaneously. expect data small, simple ilike or maybe like on lower fast enough? or maybe indexing fast enough low load of modifications on data small? better off looking outside database?

granted, have benchmark these know sure what's fastest, unfortunately, don't have time project. benefits , trade offs of these methods? of these options not appropriate solving type of problem? other types of solutions (including potentially outside database) worth considering?

(i suppose might find kind of beginner's tutorial on text searching in pg useful, searches turn full text search part, don't know if it's useful me.)

i'm on pg 9.2.4, goodies pre-9.3 option.

update: i've expanded answer a detailed blog post.

rather focusing purely on speed, please consider search semantics first. define requirements.

for example, users need able differentiate based on order of terms? should

radiata pinus

find:

pinus radiata

? same rule apply words within column between columns?

are spaces word separators, or spaces within column part of search term?

do need wildcards? if so, need left-anchored wildcards (think staph%) or need right-anchored or infix wildcards (%ccus, p%s)? pg_tgrm infix wildcards. suffix wildcards can handled index on reverse() of word, gets clumsy in practice pg_tgrm best option there.

if you're searching discrete words , word-order isn't important, pg's full-text search to_tsvector , to_tsquery desirable. supports left-anchored wildcard searches, weighting, categories, etc.

if you're doing prefix searches of discrete columns simple like queries on regular b-tree index per column way go.

so. figure out need, then how it. current uncertainty stems partly not knowing quite want.

Search This Blog

LAVA

performance - Benefits and trade offs for improving text search on small data in PostgreSQL -

Comments

Post a Comment

Popular posts from this blog

c++ - Linked List error when inserting for the last time -

java - activate/deactivate sonar maven plugin by profile? -

tsql - Pivot with Temp Table (definition for column must include data type) -- SQL Server 2008 -