#

sentencepiece

Here are 40 public repositories matching this topic...

carban / PLNWorkshops

Workshops of natural language processing

twitter tokenizer python3 sentence tokenization freeling nltk-python sentencepiece

Updated Jan 6, 2021
Jupyter Notebook

RRisto / sentencepiece_experiments

sklearn classification estonian-language sentencepiece

Updated Mar 8, 2020
Jupyter Notebook

ZJaume / escape-unk

Escape unknown symbols in SentecePiece vocabularies

natural-language-processing neural-machine-translation escaping sentencepiece

Updated Jun 25, 2024
Python

burcgokden / Sentencepiece-Tokenizer-Wrapper-for-PLDR-LLM

A framework for building Sentencepiece tokenizer from a dataset

machine-learning natural-language-processing deep-learning tensorflow tokenizer keras transformer unigram bpe sentencepiece large-language-models llm pldr-llm

Updated Oct 23, 2024
Python

sagorbrur / bengali_sentencepiece

Bengali SentencePiece Model created with wiki dump data.

tokenization sentencepiece bengali-tokenization bengali-sentencepiece

Updated Dec 28, 2019

Systemcluster / sentencepiece-model

SentencePiece model parser generated from the SentencePiece protobuf definition.

nlp tokenizer sentencepiece

Updated Oct 8, 2024
Rust

anthonywu / sentencepiece

Temp fork to provide Python 3.13 macOS wheels ahead of official project releases

sentencepiece python313

Updated Nov 8, 2024
C++

leliuga / datrin

dataset, train, inference

inference dataset flax train jax sentencepiece safetensors

Updated May 19, 2024
Python

jkrukowski / swift-sentencepiece

Use SentencePiece in Swift for tokenization and detokenization.

tokenization sentencepiece

Updated Dec 1, 2024
Swift

Sid911 / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

natural-language-processing cmake sentencepiece

Updated Oct 26, 2021
C++

ReshiAdavan / Thoth

An Industry Standard Tokenizer, purposed for large-scale language models like OpenAI's GPT Series.

python rust natural-language-processing tokenizer gpt-2 sentencepiece bytepairencoding gpt-4 tiktoken llama2

Updated Jun 29, 2024
Python

FloweryK / Sentencepiece-Pretrained-Models

pretrained models and a training code for sentencepiece

pretrained sentencepiece

Updated Jul 27, 2023
Python

lingvanex-mt / models

Free and open source pre-trained translation models, including Kurdish, Samoan, Xhosa, Lao, Corsican, Cebuano, Galician, Yiddish, Swahili, and Yoruba.

Updated Nov 22, 2024

kgarg8 / NMT-RNN

NMT with RNN Models: (1) in Vanilla style, (2) with Sentencepiece, (3) using Pre-trained models from FairSeq

machine-translation pytorch rnn fairseq sentencepiece

Updated Sep 19, 2021
Python

kitoken

Systemcluster / kitoken

Fast and versatile tokenizer for language models with BPE, Unigram and WordPiece tokenization. Compatible with SentencePiece, Tokenizers, Tiktoken and more.

nlp tokenizer word-segmentation unigram bpe sentencepiece

Updated Nov 24, 2024
Rust

rafael-vasconcellos / sugoi-v4-space

A huggingface space for Sugoi V4

nlp api flask machine-learning natural-language-processing translation ai deep-learning backend server-side huggingface sentencepiece sentence-piece-tokenizer ctranslate2

Updated Dec 2, 2024
Python

jayden5744 / NMT_Korean_To_English

한글을 영어로 번역하는 자연어처리 모델 스터디입니다.

translation gpu python3 pytorch seq2seq ten seq2seq-pytorch seq2seq-attention sentencepiece

Updated May 29, 2020
Jupyter Notebook

wang1ang / SentencePieceWrapper

sentencepiece C# wrapper

wrapper csharp sentencepiece

Updated Aug 6, 2019
C++

sftblw / spm_jamo_tsv

korean sentencepiece

Updated May 16, 2020
JavaScript

smafjal / bengali_tokenizer

Bengali language Tokenizer (SentencePiece)

tokenizer bengali unsupervised-learning sentencepiece bengali-natural-language-processing bengali-tokenizer

Updated Oct 20, 2019
Python

Improve this page

Add a description, image, and links to the sentencepiece topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sentencepiece topic, visit your repo's landing page and select "manage topics."