A collection of utilities to split Thai Unicode UTF-8 text by word
boundaries, also known as word tokenization or word breaking.  The
utilities use emacs, swath, perl, and a c++ icu-project program.  All
use dictionary-based word splitting.

Also included is a merged dictionary file of Thai words, a perl script
to grep Thai UTF-8 words, and an emacs library that can split,
unsplit, spellcheck, and play audio for Thai words.

Homepage:
https://ftp.NetBSD.org/pub/pkgsrc/distfiles/LOCAL_PORTS/
