[go: up one dir, main page]

How does dblp handle homonyms and synonyms?

Homonyms

A homonym is one of a group of words that share the same spelling but have different meanings. In the context of dblp, we call different authors with the same name homonyms. Hereby, "the same name" refers to exactly the same (latin-1) string, taking punctation, diacritics, and even case into account. That is, "J. Smith", "John Smith", and "Jöhn Smîth" are all considered different names, as are "O'Shea" and "O-Shea", or "Æleen" and "AEleen", or even "Gianluigi" and "GianLuigi".

In dblp, we try to identify and distinguish authors with the same name. Different authors are represented by their own individual author page. From a technical perspective, they are assigned a unique key and their names are distinguished in our data stock by a unique numerical suffix to their name.

Please understand that at the moment, the splitting of existing dblp author pages is either triggered by requests of authors who find their publications mixed with other persons' writings, or if we can prove our own strong suspicion that there are several persons behind an entry. In many cases homonyms remain undetected. If you know of such an instance, you might help us by letting us know.

Synonyms

Synonyms are different words with identical or similar meanings. In dblp, there are many reasons why several author names are considered to be synonymous for a particular author: name changes, nicknames, a sporadic use of middle names, missing or abbreviated name parts, or even pseudonyms. Occasional spelling errors in the publishers' metadata also only complicate the matter.

Whenever possible, we correct possible errors in the metadata. We also complete any missing or abbreviated name part, even if these parts are not present in the actual publication. When multiple versions of a name are frequently used on publications, we may include these names as aliases to our data set.

Due to technical limitations, dblp cannot handle names with Unicode characters beyond the latin-1 character set. However, we sometimes provide the full Unicode name as note on the author's page.