Top Qs
Timeline
Chat
Perspective

Author name disambiguation

Process of identifying different authors referred to in the same or closely similar ways From Wikipedia, the free encyclopedia

Author name disambiguation
Remove ads

Author name disambiguation is the process of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Thumb
The author name "Li Li" might refer to a number of people, including the seven listed here.

An editor may apply the process to scholarly documents where the goal is to find all mentions of the same author and cluster them together. Authors of scholarly documents often share names which makes it hard to distinguish each author's work. Hence, author name disambiguation aims to find all publications that belong to a given author and distinguish them from publications of other authors who share the same name.

Remove ads

Methods

Considerable research has been conducted into name disambiguation.[1][2][3][4][5] Typical approaches for author name disambiguation rely on information to distinguish between authors, including (but not limited to) information about the authors such as: their name representation, affiliations and email addresses, and information about the publication: such as year of publication, co-authors, and the topic of the paper. This information can be used to train a machine learning classifier to decide whether two author mentions refer to the same author or not.[6] Much research regards name disambiguation as a clustering problem, i.e., partitioning documents into clusters, where each represents an author.[2][7][8] Other research treats it as a classification problem.[9] Some works construct a document graph and utilize the graph topology to learn document similarity.[8][10] Recently, several pieces of research[10][11] aim to learn low-dimensional document representations by employing network embedding methods.[12][13]

Remove ads

Applications

Thumb
Some of the ways in which authorship has been indicated for the same person

There are multiple reasons that cause author names to be ambiguous, among which: individuals may publish under multiple names for a variety of reasons including different transliteration, misspelling, name change due to marriage, or the use of nicknames or middle names and initials.[14]

Motivations for disambiguating individuals include identifying inventors from patents, and researchers across differing publishers, research institutions and time periods.[15] Name disambiguation is also a cornerstone in author-centric academic search and mining systems, such as AMiner (formerly ArnetMiner).[16]

Remove ads

Similar issues

Author name disambiguation is only one record linkage problem in the scholarly data domain. Closely related, and potentially mutually beneficial problems include: organisation (affiliation) disambiguation,[17] as well as conference or publication venue disambiguation, since data publishers often use different names or aliases for these entities.

See also

Resources

Several well-known benchmarks to evaluate author name disambiguation are listed below, each of which provides publications with some ambiguous names and their ground truths.

Source Codes

Remove ads

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads