PGroongan&nZulip¶ ↑
: author
Kouhei Sutou
: institution
ClearCode Inc.
: content-source
Zulip & PGroonga Night
: date
2017-09-06
: allotted-time
30m
: theme
.
PGroonga¶ ↑
* Pronunciation: píːzí:lúnɡά\n (('note:読み方:ぴーじーるんが')) * PostgreSQL extension\n (('note:PostgreSQLの拡張機能')) * Fast full text search\n (('note:高速全文検索機能')) * All languages are supported!\n (('note:全言語対応!'))
Fast?(('note:(高速?)'))¶ ↑
* Need to measure to confirm\n (('note:確認するには測定しないと')) * Targets(('note:(測定対象)')) * textsearch (built-in)(('note:(組み込み)')) * pg_bigm (third party)(('note:(外部プロダクト)'))
PGroona and textsearch¶ ↑
# image # src = images/search-pgroonga-textsearch.pdf # relative_height = 100
As fast as textsearchn(('note:textsearchと同じくらいの速さ'))¶ ↑
* textsearch uses word based full text search\n (('note:textsearchは単語ベースの全文検索実装')) * PostgreSQL has enough performance for the approach\n (('note:PostgreSQLはこの方法では十分な性能を出せる'))
textsearch and Japanesen(('note:textsearchと日本語'))¶ ↑
* Asian languages including Japanese aren't supported\n (('note:日本語を含むアジア圏の言語は非サポート')) * Need plugin(('note:(プラグインが必要)')) * Plugin exists but isn't maintained\n (('note:プラグインはあるがメンテナンスされていない'))
Japanese supportn(('note:日本語対応'))¶ ↑
* Need one of them(('note:(どちらかが必要)')) * N-gram approach support\n (('note:N-gramというやり方のサポート')) * Japanese specific word based approach support\n (('note:日本語を考慮した単語ベースのやり方のサポート'))
(('tag:center')) PGroonga supports both of themn 🕺
PostgreSQL and N-gramn(('note:PostgreSQLとN-gram'))¶ ↑
* PostgreSQL is slow with N-gram approach\n (('note:PostgreSQLでN-gramというやり方を使うと遅い')) * N-gram approach: * pg_trgm (contrib) * Japanese isn't supported by default\n (('note:デフォルトでは日本語に対応していない')) * pg_bigm (third-party)
PGroona and pg_bigm¶ ↑
# image # src = images/search-pgroonga-pg-bigm.pdf # relative_height = 100
PGroonga is fast stablyn(('note:PGroongaは安定して速い'))¶ ↑
* PostgreSQL needs "recheck" for N-gram approach\n (('note:PostgreSQLはN-gramのときは「recheck」が必要')) * Seq search after index search\n (('note:インデックスサーチのあとにシーケンシャルサーチ')) * PGroonga doesn't need\n(('note:PGroongaでは必要ない')) * Only index search\n (('note:インデックスサーチだけでOK'))
Wrap upn(('note:まとめ'))¶ ↑
* textsearch is fast but Asian langs aren't supported\n (('note:textsearchは速いけどアジア圏の言語を未サポート')) * pg_bigm supports Japanese but is slow for large hits\n (('note:pg_bigmは日本語対応だがヒット数が多くなると遅い')) * PGroonga is fast and supports all languages\n (('note:PGroongaは速くて全言語対応'))
FYI: textsearch, PGroonga and Groonga¶ ↑
# image # src = images/search-pgroonga-groonga-textsearch.pdf # relative_height = 100
Zulip and PGroonga¶ ↑
* Zulip uses textsearch by default\n (('note:Zulipはデフォルトでtextsearchを使用')) * Japanese isn't supported\n (('note:日本語非対応')) * Zulip supports PGroonga as option\n (('note:ZulipでPGroongaも使うこともできる')) * Implemented by me\n (('note:私が実装'))
Zulip: full text searchn(('note:Zulipと全文検索'))¶ ↑
* Zulip is chat tool\n (('note:Zulipはチャットツール')) * Latency is important for UX\n (('note:UX的にレイテンシーは大事')) * Index update is heavy\n (('note:インデックスの更新は重い')) * Delay index update\n (('note:インデックスの更新を後回しにしている'))
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql CREATE TABLE zerver_message ( rendered_content text, -- ... ↓Column for full text search search_tsvector tsvector ); -- ↓Index for full text search CREATE INDEX zerver_message_search_tsvector ON zerver_message USING gin (search_tsvector);
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql -- Execute append_to_fts_update_log() on change CREATE TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW EXECUTE PROCEDURE append_to_fts_update_log();
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql -- Insert ID to fts_update_log table CREATE FUNCTION append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id); RETURN NEW; END $$;
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql -- Keep ID to be updated CREATE TABLE fts_update_log ( id SERIAL PRIMARY KEY, message_id INTEGER NOT NULL );
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql -- Execute do_notify_fts_update_log() -- on INSERT CREATE TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log();
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay sql -- NOTIFY to fts_update_log channel! CREATE FUNCTION do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY fts_update_log; RETURN NEW; END $$;
Delay index updaten(('note:インデックス更新を後回し'))¶ ↑
# coderay python cursor.execute("LISTEN ftp_update_log") # Wait cursor.execute("SELECT id, message_id FROM fts_update_log") ids = [] for (id, message_id) in cursor.fetchall(): cursor.execute("UPDATE zerver_message SET search_tsvector = " "to_tsvector('zulip.english_us_search', " "rendered_content) " "WHERE id = %s", (message_id,)) ids.append(id) cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)", (ids,))
PGroonga: index updaten(('note:PGroongaとインデックス更新'))¶ ↑
* PGroonga's index update is fast too\n (('note:PGroongaはインデックス更新も速い')) * PGroonga's search while index update is still fast\n (('note:PGroongaはインデックス更新中の検索も速い'))
Perf characteristicsn(('note:性能の傾向'))¶ ↑
# image # src = images/performance-charcteristic-for-constant-read-and-write.svg # relative_height = 100
Update and lockn(('note:更新とロック'))¶ ↑
* Update without ((*read*)) locks\n (('note:((*参照*))ロックなしで更新')) * ((*Write*)) locks are required\n (('note:((*書き込み*))ロックは必要'))
GIN: Read/Writen(('note:GIN:読み書き'))¶ ↑
# image # src = images/read-while-write-gin.svg # relative_height = 100
PGroonga: Read/Writen(('note:PGroonga:読み書き'))¶ ↑
# image # src = images/read-while-write-pgroonga.svg # relative_height = 100
Wrap upn(('note:まとめ'))¶ ↑
* Zulip: Low latency for UX\n (('note:ZulipはUXのために低レイテンシーをがんばっている')) * Delay index update\n (('note:インデックスの更新は後回し')) * PGroonga: Keeps fast search with update\n (('note:PGroongaは更新しながらでも高速検索を維持')) * Chat friendly characteristics\n (('note:チャット向きの特性'))
More PGroonga featuresn(('note:PGroongaの機能いろいろ'))¶ ↑
* Query expansion(('note:(クエリー展開)')) * Support synonyms(('note:(同義語検索をサポート)')) * Similar search(('note:(類似文書検索)')) * Find similar messages\n (('note:類似メッセージ検索')) * Fuzzy search(('note:(あいまい検索)')) * Stemming(('note:(ステミング)'))