morphemepiece - Morpheme Tokenization
Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.
Last updated 3 years ago
5.04 score 11 stars 8 scripts 223 downloadswordpiece - R Implementation of Wordpiece Tokenization
Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.
Last updated 3 years ago
4.60 score 8 stars 7 scripts 210 downloads