[go: up one dir, main page]

Skip to content

carsonmulligan/SequoyahGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sequoyah GPT

  1. First, this is an app to make me fluent in Cherokee
  2. Second, it is an app to make others fluent in Cherokee
  3. Third, it is an app to vastly increase the amount of Cherokee speech and text on the internet

Some Ideation

Large Language Models (LLMs) learn languages the same way as human children. Sequoyah, the great giver of Cherokee text, established the foundation for LLMs to begin to read and understand Cherokee.

Imagine a language program with one human-student, three student-LLM agents, and four subject-oriented teacher-LLM agents which rotate through a group class and one-on-one schedule. The program only permits the use of the target-language, and, outside of the classroom doors is an open-world Farmville meets Red Dead Redemption world. You can ride a horse to Apalachian rivers, where you can hunt, fish, farm, trade, lead, fight, learn, write, and speak.

  1. CherokeeLLM Speaking Companion: trained on Sequoya's books and poems that can recite stories, and use OpenAI whisper or OpenAI whisper adjacent tools to serve as a speaking companion -- can converse etc with the user, like an Elder. Reciting Cherokee stories, also giving the user a modern use case for the language. Encouraging them to great music, poems, films in Cherokee. Give people the ability to learn, preserve, grow this language. This is the speaking companion.
  2. Multi-Agent Cherokee Text debate sandbox. CherokeeLLM AutoGPT, MemGPT, Autogen, multi-agent group-chat sandbox. This is a longer term moon shot project to increase the amount of valid Cherokee on the internet by over 100 times or more. Multi-agent means multiple Cherokee Text LLMs with different interests, backgrounds, and points of view engaging in conversation, debate, poetry, authorship, etc. It will take experts to understand if the outputs are valid (valid as in correct Cherokee, also culturally in the correct ballpark, trained with a lower temperature on Cherokee stories, perhaps reciting them verbatim, then talking through the concepts together.)
  3. Cherokee_Elder_GPT -- compile stories from Cherokee elders and train the model to be the ultimate Elder. The keeper of stories, the grower of language.
  4. An interactive farming game that interacts with you in Cherokee language, such that you learn words for the land, things Cherokee's based their language on, in an immersive (perhaps even AR/VR environment). I'm imagining an open-world game -- even like Red Dead Redemption, where you walk around, farm, ride, hunt, fish, speak, listen, and write. Imagine if there were a peaceful place you could go in a virtual world to be immersed in the culture
  5. An actual Middlebury-style (or Middlebury-actual) language school for Cherokee. A school program with a language pledge meant for adults to go for 6-8 weeks and become fluent in a summer. Use scholarship money and do it during a time when its nice to be in Talequah.
  6. Straight up what i want here -- documentaries only in Cherokee
  7. Ah, technology to auto-recognize cherokee and produce subtitles in Syllabary -- check if this exists, but would be great contribution
  8. A little ambitious, but perhaps a new type of writing system, or aid to the current Phonetics-Syllabary-English trio
  9. Cherokee freestyle rap, slam poetry, music, lofi, documentary films
  10. Straight up the Cherokee version of Harvard's Chinese Text Project www.ctext.org/

Gathering Data

Data sources include YouTube, Cherokee Github repos -- will make a resources.md

Resources

Github Repos:

Parent Directory https://github.com/CherokeeLanguage/

  1. Pronouns https://github.com/CherokeeLanguage/BoundPronouns
  2. Audio https://github.com/CherokeeLanguage/cherokee-audio-data
  3. Dictionary https://github.com/CherokeeLanguage/RavenDictionary
  4. Text-to-speech https://github.com/CherokeeLanguage/Cherokee-TTS
  5. Dictionary II https://github.com/CherokeeLanguage/cherokeedictionary
  6. Grammar https://github.com/CherokeeLanguage/Cherokee-Grammer-Notes

YouTube Videos

Overview - Cherokee Language Technology https://www.youtube.com/watch?v=xyWlmUWwvwA&list=PPSV

Native speaker interviews:

DigitalNativeMaker: finalnumberipoded https://www.youtube.com/watch?v=11-Klcgii6w frogmoon https://www.youtube.com/watch?v=7gnKWA7efbQ

Sample Poem

"ᏙᎢᏳᏟᏙᎠ" (Doyutlidua):

ᏙᎢᏳᏟᏙᎠ (Spring) ᏙᏓᏯ ᎤᏗᏞᎢᏗᏍᎪᎢ, ᏂᎦᏓ ᎤᏟᏴᏓᏆ, ᏥᎦᏢᎦ ᏕᎦᏟᏯᏍᏗᏏ, ᏥᏍᏆᏍᏚᏟ ᏦᎳᏍᏗᎾᏗᏏ.

Phonetic Pronunciation Doyutlidua udileidisgoi, Nigada utliyvdakwa, Tsigatlvg degatliyasdasi, Tsiswasduhli tsolasdinadisi.

English Translation Spring has come, All things are blooming, I will go out and see, Butterflies flying together.

Sample Sentences

  1. ᎣᏏᏲ (Osiyo) - Hello
  2. ᏙᎾᏓᏆᎶᏍᏗ (Donadagohvi) - Goodbye
  3. ᎠᎵᎮᎵᏍᏗ (Alihelisdi) - It is good
  4. ᏙᎯᏧᏣᎸᎢ (Dohitsutsalv'i) - Where are you going?
  5. ᎠᏂᏣᎳᎩ (Anijalagi) - They are Cherokee
  6. ᏚᎾᏙᏢᎢ (Dunadotlv'i) - He/She/It is speaking

Documentaries

Title: Dadiwonisi (We Will Speak) (2023) Link: https://www.imdb.com/title/tt22475220/, (Trailer) https://youtu.be/Z8SjUddg6lw?si=eXU_LXabkfYt7k0w Description: A feature-length documentary chronicling the efforts of Cherokee activists, artists, and educators fighting to save the Cherokee language. Directors: Schon Duncan, Micharl McDermit

Cherokee Syllabary Guide

The Cherokee syllabary is a set of symbols used to write the Cherokee language. It consists of 85 characters, each representing a syllable. Here’s a brief guide to get you started:

  1. Vowels: The Cherokee syllabary contains six simple vowels:
  • Ꭰ (a) as in "father"
  • Ꭱ (e) as in "they"
  • Ꭲ (i) as in "machine"
  • Ꭳ (o) as in "go"
  • Ꭴ (u) as in "flute"
  • Ꭵ (v) similar to the 'u' in "urn"
  1. Consonants: The syllabary includes consonants paired with each of these vowels. The consonants are:
  • g, k, h, l, m, n, qu, s, d, t, dl, tl, ts, w, and y.
  1. Consonant + Vowel Combinations: Each consonant can be paired with all six vowels to form syllables. For example:
  • Ꭶ (ga), Ꭷ (ka), Ꭸ (ge), Ꭹ (gi), Ꭺ (go), Ꭻ (gu), Ꭼ (gv)
Syllabary Phonetics
a
e
i
o
u
v
ga
ka
ge
gi
go
gu
gv
ha
he
hi
ho
hu
hv
la
le
li
lo
lu
lv
ma
me
mi
mo
mu
na
hna
nah
ne
ni
no
nu
nv
qua
que
qui
quo
quu
quv
sa
s
se
si
so
su
sv
da
ta
de
te
di
ti
do
du
dv
dla
tla
tle
tli
tlo
tlu
tlv
tsa
tse
tsi
tso
tsu
tsv
wa
we
wi
wo
wu
wv
ya
ye
yi
yo
yu
yv

Learning Activities

Learn to Recite the Cherokee Syllabary https://www.youtube.com/watch?v=QIMZHJ3vzjE

A printable syllabary chart https://language.cherokee.org/media/2qkmorwq/syllabary_handout.pdf

Trace the Letters of the Cherokee Syllabary https://language.cherokee.org/media/2uvhommm/united_states.pdf

Play Memory Games with Cherokee Syllabary Flashcards https://language.cherokee.org/media/de4jnp2h/flash_cards.pdf

Study to Pass a Cherokee Syllabary Test https://language.cherokee.org/media/1uzopedr/syllabary_test.pdf

Watch and Discuss the movie ‘First Language – The Race to Save Cherokee’

Other Video Resources

‘Creating a Written Alphabet for the Cherokee,’ American Experience: PBS, 1:17 https://www.pbs.org/video/american-experience-creating-written-alphabet-cherokee/

‘Investigating the Cherokee Syllabary,’ Cherokee Nation, 3:10 https://www.youtube.com/watch?v=H0OkSqsZnIw&t=6s

‘First Language – The Race to Save Cherokee,’ The Language & Life Project, 56:08 https://www.youtube.com/watch?v=e9y8fDOLsO4

‘The Cherokee Syllabary,’ The Language & Life Project, 3:14 https://www.youtube.com/watch?v=r3QKRzq5M5Y

‘We’re Still Here: The Cherokee Syllabary,’ Smithsonian NMAI (National Museum of the American Indian), 1:33 https://www.youtube.com/watch?v=rSFAq3Z7a2g

About

Documenting the Cherokee Language Learning Process

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages