PGroongan&nZulip

: author

Kouhei Sutou

: institution

ClearCode Inc.

: content-source

Zulip & PGroonga Night

: date

2017-09-06

: allotted-time

30m

: theme

.

PGroonga

* Pronunciation: píːzí:lúnɡά\n
  (('note:読み方:ぴーじーるんが'))

* PostgreSQL extension\n
  (('note:PostgreSQLの拡張機能'))
  * Fast full text search\n
    (('note:高速全文検索機能'))
  * All languages are supported!\n
    (('note:全言語対応!'))

Fast?(('note:(高速?)'))

* Need to measure to confirm\n
  (('note:確認するには測定しないと'))
* Targets(('note:(測定対象)'))
  * textsearch (built-in)(('note:(組み込み)'))
  * pg_bigm (third party)(('note:(外部プロダクト)'))

PGroona and textsearch

# image
# src = images/search-pgroonga-textsearch.pdf
# relative_height = 100

As fast as textsearchn(('note:textsearchと同じくらいの速さ'))

* textsearch uses word based full text search\n
  (('note:textsearchは単語ベースの全文検索実装'))
* PostgreSQL has enough performance for the approach\n
  (('note:PostgreSQLはこの方法では十分な性能を出せる'))

textsearch and Japanesen(('note:textsearchと日本語'))

* Asian languages including Japanese aren't supported\n
  (('note:日本語を含むアジア圏の言語は非サポート'))
  * Need plugin(('note:(プラグインが必要)'))
  * Plugin exists but isn't maintained\n
    (('note:プラグインはあるがメンテナンスされていない'))

Japanese supportn(('note:日本語対応'))

* Need one of them(('note:(どちらかが必要)'))
  * N-gram approach support\n
    (('note:N-gramというやり方のサポート'))
  * Japanese specific word based approach support\n
    (('note:日本語を考慮した単語ベースのやり方のサポート'))

(('tag:center')) PGroonga supports both of themn 🕺

PostgreSQL and N-gramn(('note:PostgreSQLとN-gram'))

* PostgreSQL is slow with N-gram approach\n
  (('note:PostgreSQLでN-gramというやり方を使うと遅い'))
* N-gram approach:
  * pg_trgm (contrib)
    * Japanese isn't supported by default\n
      (('note:デフォルトでは日本語に対応していない'))
  * pg_bigm (third-party)

PGroona and pg_bigm

# image
# src = images/search-pgroonga-pg-bigm.pdf
# relative_height = 100

PGroonga is fast stablyn(('note:PGroongaは安定して速い'))

* PostgreSQL needs "recheck" for N-gram approach\n
  (('note:PostgreSQLはN-gramのときは「recheck」が必要'))
  * Seq search after index search\n
    (('note:インデックスサーチのあとにシーケンシャルサーチ'))
* PGroonga doesn't need\n(('note:PGroongaでは必要ない'))
  * Only index search\n
    (('note:インデックスサーチだけでOK'))

Wrap upn(('note:まとめ'))

* textsearch is fast but Asian langs aren't supported\n
  (('note:textsearchは速いけどアジア圏の言語を未サポート'))
* pg_bigm supports Japanese but is slow for large hits\n
  (('note:pg_bigmは日本語対応だがヒット数が多くなると遅い'))
* PGroonga is fast and supports all languages\n
  (('note:PGroongaは速くて全言語対応'))

FYI: textsearch, PGroonga and Groonga

# image
# src = images/search-pgroonga-groonga-textsearch.pdf
# relative_height = 100

Zulip and PGroonga

* Zulip uses textsearch by default\n
  (('note:Zulipはデフォルトでtextsearchを使用'))
  * Japanese isn't supported\n
    (('note:日本語非対応'))
* Zulip supports PGroonga as option\n
  (('note:ZulipでPGroongaも使うこともできる'))
  * Implemented by me\n
    (('note:私が実装'))

Zulip: full text searchn(('note:Zulipと全文検索'))

* Zulip is chat tool\n
  (('note:Zulipはチャットツール'))
  * Latency is important for UX\n
    (('note:UX的にレイテンシーは大事'))
* Index update is heavy\n
  (('note:インデックスの更新は重い'))
  * Delay index update\n
    (('note:インデックスの更新を後回しにしている'))

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql
CREATE TABLE zerver_message (
  rendered_content text,
  -- ... ↓Column for full text search
  search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
  ON zerver_message
  USING gin (search_tsvector);

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql

-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
  zerver_message_update_search_tsvector_async
  BEFORE INSERT OR UPDATE OF rendered_content
    ON zerver_message
  FOR EACH ROW
    EXECUTE PROCEDURE append_to_fts_update_log();

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql

-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
  RETURNS trigger
  LANGUAGE plpgsql AS $$
    BEGIN
      INSERT INTO fts_update_log (message_id)
        VALUES (NEW.id);
      RETURN NEW;
    END
$$;

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql

-- Keep ID to be updated
CREATE TABLE fts_update_log (
  id SERIAL PRIMARY KEY,
  message_id INTEGER NOT NULL
);

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql

-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
  AFTER INSERT ON fts_update_log
  FOR EACH STATEMENT
    EXECUTE PROCEDURE
      do_notify_fts_update_log();

Delay index updaten(('note:インデックス更新を後回し'))

# coderay sql

-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
  RETURNS trigger
  LANGUAGE plpgsql AS $$
    BEGIN
      NOTIFY fts_update_log;
      RETURN NEW;
    END
  $$;

Delay index updaten(('note:インデックス更新を後回し'))

# coderay python

cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
  cursor.execute("UPDATE zerver_message SET search_tsvector = "
                   "to_tsvector('zulip.english_us_search', "
                               "rendered_content) "
                 "WHERE id = %s", (message_id,))
  ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
               (ids,))

PGroonga: index updaten(('note:PGroongaとインデックス更新'))

* PGroonga's index update is fast too\n
  (('note:PGroongaはインデックス更新も速い'))
* PGroonga's search while index update is still fast\n
  (('note:PGroongaはインデックス更新中の検索も速い'))

Perf characteristicsn(('note:性能の傾向'))

# image
# src = images/performance-charcteristic-for-constant-read-and-write.svg
# relative_height = 100

Update and lockn(('note:更新とロック'))

* Update without ((*read*)) locks\n
  (('note:((*参照*))ロックなしで更新'))
  * ((*Write*)) locks are required\n
    (('note:((*書き込み*))ロックは必要'))

GIN: Read/Writen(('note:GIN:読み書き'))

# image
# src = images/read-while-write-gin.svg
# relative_height = 100

PGroonga: Read/Writen(('note:PGroonga:読み書き'))

# image
# src = images/read-while-write-pgroonga.svg
# relative_height = 100

Wrap upn(('note:まとめ'))

* Zulip: Low latency for UX\n
  (('note:ZulipはUXのために低レイテンシーをがんばっている'))
  * Delay index update\n
    (('note:インデックスの更新は後回し'))
* PGroonga: Keeps fast search with update\n
  (('note:PGroongaは更新しながらでも高速検索を維持'))
  * Chat friendly characteristics\n
    (('note:チャット向きの特性'))

More PGroonga featuresn(('note:PGroongaの機能いろいろ'))

* Query expansion(('note:(クエリー展開)'))
  * Support synonyms(('note:(同義語検索をサポート)'))
* Similar search(('note:(類似文書検索)'))
  * Find similar messages\n
    (('note:類似メッセージ検索'))
* Fuzzy search(('note:(あいまい検索)'))
* Stemming(('note:(ステミング)'))