[go: up one dir, main page]

Skip to content
Antoine Boquet edited this page Mar 28, 2024 · 65 revisions

Summary

The purpose of this library is to deal with multiple representations of a polytonic greek string, namely beta code, polytonic greek & transliterated — or romanized.

The library tries to be as simple and flexible as possible. It provides both conversion presets that follow some of the main institutional guidelines and an access to the underlying conversion parameters to provide a granular control over the conversion process.

Conversion presets

The library provides a number of presets that follow some of the main conversion standards. Find below the details for each defined preset, its potential limitations and conversion examples.

Beta code

Note

Only a subset of the large Thesaurus Linguae Graecae character set (1000+), including the Greek Alphabet and parts of Additional Punctuation and Characters & Additional Characters sections, is implemented (see the conversion chart).

Simple beta code

Use Description Reference

Preset.SIMPLE_BC

A simplified beta code style that aims to be easier to write than the canonical one.

See below

// Corresponding `IConversionOptions`

{ additionalChars: AdditionalChar.ALL }

// Examples

toBetaCode(
  'Ἐκεῖναι μὲν δὴ φυσικῆς μετὰ κινήσεως γάρ, ' +
  'αὕτη δὲ ἑτέρας, εἰ μηδεμία αὐτοῖς ἀρχὴ κοινή.',
  KeyType.GREEK, Preset.SIMPLE_BC
)

// Outputs: E)kei=nai me\n dh\ fusikh=s meta\ kinh/sews ga/r,
// au(/th de\ e(te/ras, ei) mhdemi/a au)toi=s a)rxh\ koinh/.

Reference

This beta code flavor follows essentially the guidelines defined by the Thesaurus Linguae Graecae, with these restrictions:

  1. only capital letters are written in capitals (adding an asterisk before a capital letter becomes unnecessary);
  2. diacritical marks are always placed after the letter that carries them.

TLG

Tip

To input Thesaurus Linguae Graecae beta code, you must use the KeyType value TLG_BETA_CODE.
e. g. toGreek('*QOUKUDI/DHS', KeyType.TLG_BETA_CODE) // Θουκυδίδης

Use Description Reference

Preset.TLG

Thesaurus Linguae Graecae

https://stephanus.tlg.uci.edu/encoding/quickbeta.pdf

// Corresponding `IConversionOptions`

{
  betaCodeStyle: {
    useTLGStyle: true
  },
  additionalChars: AdditionalChar.ALL
}

// Examples

toBetaCode(
  'Ἐκεῖναι μὲν δὴ φυσικῆς μετὰ κινήσεως γάρ, ' +
  'αὕτη δὲ ἑτέρας, εἰ μηδεμία αὐτοῖς ἀρχὴ κοινή.',
  KeyType.GREEK, Preset.TLG
)

// Outputs: *)EKEI=NAI ME\N DH\ FUSIKH=S META\ KINH/SEWS GA/R,
// AU(/TH DE\ E(TE/RAS, EI) MHDEMI/A AU)TOI=S A)RXH\ KOINH/.

Transliteration

ALA-LC

Tip

See ALA-LC (modern) for Modern Greek.

Note

The current implementation doesn't:

  • support rules that are not governed by a predictable law:
    1. add transliterated rough breathings ('h') if they're not explicitly indicated (such as in all caps strings);
    2. remove iota adscript occurrences (generally undifferentiated from the 'Greek Small Letter Iota');
  • transliterate greek numerals (planned for v0.15 - see #5).
Use Description (scope) Reference

Preset.ALA_LC

American Library Association – Library of Congress (Ancient and Medieval Greek, before 1454)

https://www.loc.gov/catdir/cpso/romanization/greek.pdf

// Corresponding `IConversionOptions`

{
  removeDiacritics: true,
  transliterationStyle: {
    gammaNasal_n: Preset.ALA_LC,
    rho_rh: true,
    upsilon_y: true,
    lunatesigma_s: true
  },
  additionalChars: [
    AdditionalChar.DIGAMMA,
    AdditionalChar.ARCHAIC_KOPPA,
    AdditionalChar.LUNATE_SIGMA
  ]
}

// Examples

toTransliteration(
  'Ὧν ἡ σοφία παρασκευάζεται εἰς τὴν τοῦ ὅλου βίου ' +
  'μακαριότητα πολὺ μέγιστόν ἐστιν ἡ τῆς φιλίας κτῆσις.',
  KeyType.GREEK, Preset.ALA_LC
)

// Outputs: Hōn hē sophia paraskeuazetai eis tēn tou holou biou
// makariotēta poly megiston estin hē tēs philias ktēsis.

toTransliteration(
  'ἄλαϲτα δὲ ϝέργα πάθον κακὰ μηϲαμένοι',
  KeyType.GREEK, Preset.ALA_LC
)

// Outputs: alasta de werga pathon kaka mēsamenoi

ALA-LC (modern)

Note

The same limitations as the ALA-LC preset for Ancient and Medieval Greek apply its modern variant.

Use Description (scope) Reference

Preset.ALA_LC_MODERN

American Library Association – Library of Congress (Modern Greek, after 1453)

https://www.loc.gov/catdir/cpso/romanization/greek.pdf

// Corresponding `IConversionOptions`

{
  removeDiacritics: true,
  transliterationStyle: {
    beta_v: true,
    gammaNasal_n: Preset.ALA_LC,
    muPi_b: true,
    nuTau_d: true,
    upsilon_y: true,
    lunatesigma_s: true
  },
  additionalChars: [
    AdditionalChar.DIGAMMA,
    AdditionalChar.ARCHAIC_KOPPA,
    AdditionalChar.LUNATE_SIGMA
  ]
}

// Examples

toTransliteration(
  'Ὧν ἡ σοφία παρασκευάζεται εἰς τὴν τοῦ ὅλου βίου ' +
  'μακαριότητα πολὺ μέγιστόν ἐστιν ἡ τῆς φιλίας κτῆσις.',
  KeyType.GREEK, Preset.ALA_LC_MODERN
)

// Outputs: Hōn hē sophia paraskeuazetai eis tēn tou holou viou
// makariotēta poly megiston estin hē tēs philias ktēsis.

toTransliteration(
  'Λασκαρίνα Μπουμπουλίνα',
  KeyType.GREEK, Preset.ALA_LC_MODERN
)

// Outputs: Laskarina Boumpoulina

BNF (adapted)

Tip

You should use the ISO 843 (1997) preset for Modern Greek.

Important

This implementation uses the alternative forms for Ancient Greek (see reference, rule 2. n. 1). While the reference defines an 'ISO form' and a 'reference form', this implementation returns a unique form.

Note

The current implementation doesn't support rules numbered 4.1.1., 4.1.2., 4.3. n. 4 & 7.

Use Description (scope) Reference

Preset.BNF_ADAPTED

Bibliothèque nationale de France — adapted from the ISO 843 (1997) standard with particular attention to special cases. (Ancient Greek)

https://kitcat.bnf.fr/consignes-catalogage/translitteration-du-grec

// Corresponding `IConversionOptions`

{
  greekStyle: {
    useGreekQuestionMark: true
  },
  transliterationStyle: {
    upsilon_y: Preset.ISO
  },
  additionalChars: [
    AdditionalChar.DIGAMMA,
    AdditionalChar.YOT,
    AdditionalChar.LUNATE_SIGMA,
    AdditionalChar.STIGMA,
    AdditionalChar.KOPPA,
    AdditionalChar.SAMPI
  ]
}

// Examples

toTransliteration(
  'Ὧν ἡ σοφία παρασκευάζεται εἰς τὴν τοῦ ὅλου βίου ' +
  'μακαριότητα πολὺ μέγιστόν ἐστιν ἡ τῆς φιλίας κτῆσις.',
  KeyType.GREEK, Preset.BNF
)

// Outputs: Hō̃n hē sophía paraskeuázetai eis tḕn toũ hólou bíou
// makariótēta polỳ mégistón estin hē tē̃s philías ktē̃sis.

toTransliteration(
  'ἄλαϲτα δὲ ϝέργα πάθον κακὰ μηϲαμένοι',
  KeyType.GREEK, Preset.BNF
)

// Outputs: álacta dè wérga páthon kakà mēcaménoi

ISO 843 (1997)

Use Description (scope) Reference

Preset.ISO

ISO 843 (1997) type 1 (transliteration) (Ancient and Modern Greek)

https://transliteration.eki.ee/pdf/Greek.pdf

// Corresponding `IConversionOptions`

{
  transliterationStyle: {
    setCoronisStyle: Coronis.APOSTROPHE,
    beta_v: true,
    eta_i: true,
    phi_f: true,
    upsilon_y: Preset.ISO,
    lunatesigma_s: true
  },
  additionalChars: [
    AdditionalChar.DIGAMMA,
    AdditionalChar.YOT,
    AdditionalChar.LUNATE_SIGMA
  ]
}

// Examples

toTransliteration(
  'Ὧν ἡ σοφία παρασκευάζεται εἰς τὴν τοῦ ὅλου βίου ' +
  'μακαριότητα πολὺ μέγιστόν ἐστιν ἡ τῆς φιλίας κτῆσις.',
  KeyType.GREEK, Preset.ISO
)

// Outputs: Hō̃n hī sofía paraskeuázetai eis tī̀n toũ hólou víou
// makariótīta polỳ mégistón estin hī tī̃s filías ktī̃sis.

toTransliteration(
  'ἄλαϲτα δὲ ϝέργα πάθον κακὰ μηϲαμένοι',
  KeyType.GREEK, Preset.ISO
)

// Outputs: álasta dè wérga páthon kakà mīsaménoi

SBL

Use Description Reference

Preset.SBL

Society of Biblical Literature (Ancient Greek)

https://archive.org/details/sblhandbookofsty0000unse_g7i4/

// Corresponding `IConversionOptions`

{
  removeDiacritics: true,
  transliterationStyle: {
    gammaNasal_n: true,
    rho_rh: true,
    upsilon_y: true
  }
}

// Examples

toTransliteration(
  'Ὧν ἡ σοφία παρασκευάζεται εἰς τὴν τοῦ ὅλου βίου ' +
  'μακαριότητα πολὺ μέγιστόν ἐστιν ἡ τῆς φιλίας κτῆσις.',
  KeyType.GREEK, Preset.SBL
)

// Outputs: Hōn hē sophia paraskeuazetai eis tēn tou holou biou
// makariotēta poly megiston estin hē tēs philias ktēsis.

toTransliteration(
  'ἄλαϲτα δὲ ϝέργα πάθον κακὰ μηϲαμένοι',
  KeyType.GREEK, Preset.SBL
)

// Outputs: alaϲta de ϝerga pathon kaka mēϲamenoi

Conversion options

Find below the expected behavior for each conversion option.

removeDiacritics

boolean Removes diacritical marks according to input type.

const style = { removeDiacritics: true }

toGreek('ánthrōpos', KeyType.TRANSLITERATION, style) // ανθρωπος
toTransliteration('εὐδαίμων', KeyType.GREEK, style) // eudaimōn

removeExtraWhitespace

boolean Removes multiple spaces, multiple line breaks et cætera.

const style = { removeExtraWhitespace: true }
toGreek('ICHTHUS     ZŌNTŌN', KeyType.TRANSLITERATION, style) // ἸΧΘΥΣ ΖΩΝΤΩΝ

betaCodeStyle

skipSanitization

boolean Prevents the deletion of non-beta code characters during the normalization process.

const style = { betaCodeStyle: { skipSanitization: true } }
toBetaCode('*TO\ ZW=|ON <τὸ ζῷον>', KeyType.TLG_BETA_CODE, style) // To\ zw=|on <τὸ ζῷον>

useTLGStyle

Tip

To input Thesaurus Linguae Graecae beta code, you must use the KeyType value TLG_BETA_CODE.
e. g. toGreek('*QOUKUDI/DHS', KeyType.TLG_BETA_CODE) // Θουκυδίδης

boolean Outputs Thesaurus Linguae Graecae beta code (Preset.TLG is a shortcut for this).

const style = { betaCodeStyle: { useTLGStyle: true } }

toBetaCode('Sōkrátēs', KeyType.TRANSLITERATION, style) // *SWKRA/THS
toBetaCode('O(pli/ths', KeyType.BETA_CODE, style) // *(OPLI/THS

greekStyle

useBetaVariant

boolean Use the typographic variant 'ϐ' [U+03D0] within a word. This is employed in some high-quality typesetting.

const style = { greekStyle: { useBetaVariant: true } }
toGreek('βιϐλίον', KeyType.GREEK, style) // βιβλίον

useGreekQuestionMark

boolean Outputs greek question marks ';' [U+037E] rather than regular semicolons.

const style = { greekStyle: { useGreekQuestionMark: true } }
toGreek('poũ?', KeyType.TRANSLITERATION, style) // ποῦ; (U+037E)

useLunateSigma

Tip

Enabling option useLunateSigma automatically adds the lunate sigma to the mapping.

boolean Outputs lunate sigmas 'ϲ, Ϲ' rather than regular sigmas (this option applies to regular sigmas).

const style = { greekStyle: { useLunateSigma: true } }

toGreek('hágios', KeyType.TRANSLITERATION, style) // ἅγιοϲ
toGreek('ἅγιος', KeyType.GREEK, style) // ἅγιοϲ

useMonotonicOrthography

boolean Outputs monotonic accents (tonos, diaeresis) only.

const style = { greekStyle: { useMonotonicOrthography: true } }

toGreek('kalòs ka̓gathós', KeyType.TRANSLITERATION, style) // καλος καγαθός
toGreek('Ἄϊδα', KeyType.GREEK, style) // Άϊδα

transliterationStyle

setCoronisStyle

Coronis (defaults to: Coronis.PSILI) Takes a Coronis enum whose values are PSILI | APOSTOPHE | NO.

const apostrophe = { transliterationStyle: { setCoronisStyle: Coronis.APOSTROPHE } }
const disableCoronis = { transliterationStyle: { setCoronisStyle: Coronis.NO } }

toTransliteration('κἀγώ', KeyType.GREEK) // ka̓gṓ
toTransliteration('κἀγώ', KeyType.GREEK, apostrophe) // ka’gṓ
toTransliteration('κἀγώ', KeyType.GREEK, disableCoronis) // kagṓ

useCxOverMacron

Warning

This option also affects the input. So, if you convert a transliterated string to another representation, you must either write using the rule described below, or perform a self-conversion first.

boolean Alters the mapping so that letters with a macron (like long vowels eta and omega) are written with a circumflex.

const style = { transliterationStyle: { useCxOverMacron: true } }

toTransliteration('Ὁπλίτης', KeyType.GREEK, style) // Hoplítês
toTransliteration('Hoplítēs', KeyType.TRANSLITERATION, style) // Hoplítês

// Illustration of the warning above

toGreek('Hoplítēs', KeyType.TRANSLITERATION, style) // ✗ Ὁπλίτε̄ς
toGreek('Hoplítês', KeyType.TRANSLITERATION, style) // ✓ Ὁπλίτης
toGreek(toTransliteration('Hoplítēs', KeyType.TRANSLITERATION, style), KeyType.TRANSLITERATION, style) // ✓ Ὁπλίτης

beta_v, eta_i, xi_ks, phi_f, chi_kh, upsilon_y, lunatesigma_s

Warning

These options also affect the input. So, if you convert a transliterated string to another representation, you must either write using the rule described below, or perform a self-conversion first.

Tip

Enabling option lunatesigma_s automatically adds the lunate sigma to the mapping.

boolean Alters the mapping so that letters named in the left side of the option (beta, eta, etc) match the value given in the right side ('v', 'i', etc).

const style = { transliterationStyle: { beta_v: true } }

toTransliteration('βάρϐαρος', KeyType.GREEK, style) // várvaros
toTransliteration('bárbaros', KeyType.TRANSLITERATION, style) // várvaros

// Illustration of the warning above

toGreek('bárbaros', KeyType.TRANSLITERATION, style) // ✗ bάρbαρος
toGreek('várvaros', KeyType.TRANSLITERATION, style) // ✓ βάρϐαρος
toGreek(toTransliteration('bárbaros', KeyType.TRANSLITERATION, style), KeyType.TRANSLITERATION, style) // ✓ βάρϐαρος

gammaNasal_n

boolean Outputs 'n' rather than 'g' when a gamma nasal occurs.

const style = { transliterationStyle: { gammaNasal_n: true } }
toTransliteration('ἄγγελος', KeyType.GREEK, style) // ángelos

muPi_b

Tip

Best used in conjunction with beta_v, to avoid the letter 'b' being ambiguous.

boolean Outputs 'b' rather than 'mp' at the beginning of a word.

const style = { transliterationStyle: { muPi_b: true } }
toTransliteration('Γεώργιος Μπαμπινιώτης', KeyType.GREEK, style) // Geṓrgios Bampiniṓtēs

nuTau_d

boolean Outputs 'd̲' [U+0064, U+0332] rather than 'nt' at the beginning of a word.

const style = { transliterationStyle: { nuTau_d: true } }
toTransliteration('Ντμίτρι', KeyType.GREEK, style) // D̲mitri

rho_rh

boolean Always outputs 'rh' for a rho at the beginning of a word or 'rrh' for a double rho.

const style = { transliterationStyle: { rho_rh: true } }

toTransliteration('*RO/DOS', KeyType.TLG_BETA_CODE, style) // Rhódos
toTransliteration('polúrrizos', KeyType.TRANSLITERATION, style) // polúrrhizos

additionalChars

Note

See the additional characters section below for the list of additional characters.

AdditionalChar[] | AdditionalChar Extends the default mapping with additional characters from the AdditionalChar enum. Use AdditionalChar.ALL to enable the whole set.

toGreek('A(/GIOS3', KeyType.BETA_CODE, {
  additionalChars: AdditionalChar.LUNATE_SIGMA
}) // ἍΓΙΟϹ

toBetaCode('βασιληϝος, διϳος', KeyType.GREEK, {
  additionalChars: [AdditionalChar.DIGAMMA, AdditionalChar.YOT]
}) // basilhvos, diϳos

toTransliteration('ϛ, ϟ, ϡ', KeyType.GREEK, {
  additionalChars: AdditionalChar.ALL
}) // c̄, q, s̄

Conversion chart

Find below the conversion chart for each available representation of a polytonic greek string:

Default characters

Label Greek Beta code Transliteration Modified translit. (enabled option)
Alpha Α a A a A a
Beta Β b B b B b V v (beta_v)
Gamma Γ γ G g G g
Delta Δ δ D d D d
Epsilon Ε ε E e E e
Zeta Ζ ζ Z z Z z
Eta Η η H h Ē ē Ī ī (eta_i)
Ê/Î ê/î (useCxOverMacron)
Theta Θ θ Q q Th th
Iota Ι ι I i I i
Kappa Κ κ K k K k
Lambda Λ λ L l L l
Mu Μ μ M m M m
Nu Ν ν N n N n
Xi Ξ ξ C c X x Ks ks (xi_ks)
Omicron Ο ο O o O o
Pi Π π P p P p
Rho Ρ ρ R r R(h) r(h)
Sigma Σ σ/ϛ S s S s
Tau Τ τ T t T t
Upsilon Υ υ U u U u Y y (upsilon_y)[^1]
Phi Φ φ F f Ph ph F f (phi_f)
Chi Χ χ X x Ch ch Kh kh (chi_kh)
Psi Ψ ψ Y y Ps ps
Omega Ω ω W w Ō ō Ô ô (useCxOverMacron)
Question mark U+037E ; ; ?
Ano teleia U+0387 · : ;
Smooth breathing U+0313 ◌̓ ) [^2]
Rough breathing U+0314 ◌̔ ( H h
Acute accent ('oxia'/'tonos') U+0301 ◌́ / U+0301 ◌́
Perispomenon U+0342 ◌͂ = U+0303 ◌̃
Grave accent ('varia') U+0300 ◌̀ \ U+0300 ◌̀
Diaeresis U+0308 ◌̈ + U+0308 ◌̈
Iota subscript U+0345 ◌ͅ | U+0327 ◌̧
Dot below U+0323 ◌̣ ? U+0323 ◌̣
Macron U+0304 ◌̄ %26 U+0304 ◌̄
Breve U+0306 ◌̆ %27 U+0306 ◌̆

[^1]: Diphthongs are transliterated U u unless they carry a diaeresis. If upsilon_y is set to Preset.ISO, only diphthongs 'au', 'eu' and 'ou' are preserved.

[^2]: Coronides are transliterated U+0313 ◌̓ by default (see the setCoronisStyle section).

Additional characters

Note

See the additionalChars section above for the use of additional characters.

Label (AdditionalChar) Greek Beta code Transliteration Modified translit. (enabled option)
Digamma (DIGAMMA) Ϝ ϝ V v W w
Yot (YOT) Ϳ ϳ J j J j
Lunate sigma (LUNATE_SIGMA) Ϲ ϲ S3 s3 C c S s (lunatesigma_s)
Stigma (STIGMA) Ϛ ϛ *#2 #2 Ĉ ĉ (useCxOverMacron)
Koppa (KOPPA) Ϟ ϟ *#1 #1 Q q
Archaic koppa (ARCHAIC_KOPPA) Ϙ ϙ *#3 #3
Sampi (SAMPI) Ϡ ϡ *#5 #5 Ŝ ŝ (useCxOverMacron)