Top Qs
Timeline
Chat
Perspective
Languages of science
Languages associated with scientific research From Wikipedia, the free encyclopedia
Remove ads
Languages of science are vehicular languages used by one or several scientific communities for international communication. According to the science historian Michael Gordin, scientific languages are "either specific forms of a given language that are used in conducting science, or they are the set of distinct languages in which science is done."[1] These two meanings are different, since the first describes a distinct prose in a given language (i.e., scientific writing), while the second describes which languages are used in mainstream science.

Until the 19th century, classical languages—such as Latin, Classical Arabic, Sanskrit, and Classical Chinese—were commonly used across Afro-Eurasia for international scientific communication. A combination of structural factors, the emergence of nation-states in Europe, the Industrial Revolution, and the expansion of colonization entailed the global use of three European national languages: French, German, and English. Yet new languages of science, such as Russian and Italian, had started to emerge by the end of the 19th century—to the point that international scientific organizations began promoting the use of constructed languages such as Esperanto as a non-national global standard.
After the First World War, English gradually outpaced French and German; it became the leading language of science, but not the only international standard. Research in the Soviet Union (USSR) rapidly expanded in the years after the Second World War, and access to Russian journals became a major policy issue in the United States, prompting the early development of machine translation. In the last decades of the 20th century, an increasing number of scientific publications were written primarily in English, in part due to the preeminence of English-speaking scientific infrastructure, indexes, and metrics such as the Science Citation Index. Local languages remain largely relevant for science in major countries and world regions such as China, Latin America, and Indonesia. Disciplines and fields of study with a significant degree of public engagement—such as social sciences, environmental studies, and medicine—have also maintained the relevance of local languages.

The development of open science has revived the debate over linguistic diversity in science, as social and local impact has become an important objective of open science infrastructure and platforms. In 2019, 120 international research organizations cosigned the Helsinki Initiative on Multilingualism in Scholarly Communication; they also called for supporting multilingualism and the development of an "infrastructure of scholarly communication in national languages".[2] In 2021, UNESCO's Recommendation for Open Science included "linguistic diversity" as one of the core features of open science, since this diversity aims to "make multilingual scientific knowledge openly available, accessible and reusable for everyone."[3] In 2022, the Council of the European Union officially supported "initiatives to promote multilingualism" in science, such as the Helsinki Initiative.[4]
Remove ads
History
Summarize
Perspective
From classical languages to vernaculars
Until the 19th century, classical languages played an instrumental role in the diffusion of languages in Europe, Asia, and North Africa.
In Europe, starting in the 12th century, Latin was the primary language of religion, law, and administration until the Early Modern period. It became a language of science "through its encounter with Arabic"; during the Renaissance of the 12th century, a large corpus of Arabic scholarly texts was translated into Latin, so that it would be available in the emerging network of European universities and centers of knowledge.[5] In this process, the Latin language changed and acquired the specific features of scholastic Latin through numerous lexical and even syntactic borrowings from Greek and Arabic. The use of scientific Latin persisted long after the replacement of Latin by vernacular languages in most European administrations: "Latin's status as a language of science rested on the contrast it made with the use of the vernacular in other contexts" and created "a European community of learning" entirely distinct from the local communities where the scholars lived.[6] Latin was never the sole language of science and education. Beyond local publications, vernaculars quite early attained the status of international scientific languages, which could be expected to be understood and translated across Europe. In the mid-16th century, a significant amount of printed output in France was in Italian.
In India and South Asia, Sanskrit was a leading vehicular language for science. Sanskrit was remodeled even more radically than Latin for scientific communication, as it shifted "toward ever more complex noun forms to encompass the kinds of abstractions demanded by scientific and mathematical thinking."[7] Classical Chinese held a similarly prestigious position in East Asia, being largely adopted by scientific and Buddhist communities beyond the Chinese Empire, notably in Japan and Korea.[8]
Classical languages declined throughout Eurasia during the second millennium. Sanskrit was increasingly marginalized after the 13th century.[9] Until the end of the 17th century in Europe, Latin resisted displacement by vernacular languages: although medical books in the 16th century began to use French as well, this trend was reversed after 1597, and most medical literature in France remained accessible only in Latin until the 1680s.[10] In 1670, as many books were printed in Latin as in German in the German states; in 1787, such books accounted for no more than 10% of the total.[11] At this point, Latin's decline became irreversible: since ever fewer European scholars were conversant with the language, publications using it dwindled, and there was reduced incentive to maintain linguistic training in Latin.
The emergence of scientific journals was both a symptom and cause of the declining use of a classical language. The first two modern scientific journals were published simultaneously in 1665: the Journal des Sçavans in France and the Philosophical Transactions of the Royal Society in England. Both journals used the local vernacular, which "made perfect historical sense", as both the Kingdom of France and the Kingdom of England were engaged in an active policy of linguistic promotion of their language standard.[12]
European and auxiliary languages (1800–1920)
The gradual disuse of Latin opened an uneasy transition period, as more and more works were accessible only in local languages. Many national European languages held the potential to become a language of science within a specific research field: some scholars "took measures to learn Swedish so they could follow the work of [the Swedish chemist] Bergman and his compatriots."[13]
Language preferences and use across scientific communities were gradually consolidated into a triumvirate or triad of dominant languages of science: French, English, and German. While each language could be expected to be understood for international scientific communication, each also followed "different functional distributions evident in various scientific fields".[14] French had been almost acknowledged as the international standard for European science in the late 18th century, and it remained "essential" throughout the 19th century.[15] German became a major scientific language during the 19th century, since it "covered portions of the physical sciences, particularly physics and chemistry, in addition to mathematics and medicine."[15] English was used largely by researchers and engineers because of the seminal contribution of English technology to the Industrial Revolution.[15]
In the years preceding the First World War, the linguistic diversity of scientific publications increased significantly. The emergence of modern nationalities and early decolonization movements created new incentives to publish scientific knowledge in one's national language.[16] Russian was one of the most successful developments as a new language of science. During the 1860s and 1870s, Russian researchers in chemistry and other physical sciences ceased publishing in German in favor of local periodicals (in Russian), following major work in adapting and creating names for scientific concepts or elements (such as chemical compounds).[17] A controversy over the meaning of Dmitri Mendeleev's periodic table contributed to acknowledging original publications in Russian in global scientific debate: the original version was deemed more authoritative than its first "imperfect" translation in German.[18]
Linguistic diversity became framed as a structural problem that ultimately limited the spread of scientific knowledge. In 1924, the linguist Roland Grubb Kent underlined that scientific communication could soon be significantly disrupted by the use of as many as "twenty" languages of science:
Today with the recrudescence of certain minor linguistic units and the increased nationalistic spirit of certain larger ones, we face a time when scientific publications of value may appear in perhaps twenty languages [and] be facing an era in which important publications will appear in Finnish, Lithuanian, Hungarian, Serbian, Irish, Turkish, Hebrew, Arabic, Hindustani, Japanese, Chinese.[19]
The definition of an auxiliary language for science became a major issue discussed in emerging international scientific institutions. On January 17, 1901, the newly established International Association of Academies created the Delegation for the Adoption of an International Auxiliary Language "with support from 310 member organizations".[20] This delegation was tasked with finding an auxiliary language that could be used for "scientific and philosophical exchanges", and it could not be any "national language".[21] In the context of increased nationalistic tensions, any of the dominant languages of science would have appeared as a partisan choice.[22] The delegation consequently had a limited set of options: these included the unlikely revival of a classical language such as Latin,[23] or a new constructed language such as Volapük, Idiom Neutral, or Esperanto.
Throughout the first part of the 20th century, Esperanto was seriously considered as a potential international language of science. As late as 1954, UNESCO passed a recommendation to promote the use of Esperanto for scientific communication.[24] In contrast with Idiom Neutral—or the simplified version of Latin, Interlingua—Esperanto was not conceived primarily as a scientific language. Yet, by the early 1900s, Esperanto was by far the most successful constructed language, with a large international community and numerous dedicated publications. Starting in 1904, the Internacia Science Revuo aimed to adapt Esperanto to the specific needs of scientific communication.[25] The development of a specialized technical vocabulary was a challenging task, since Esperanto's extensive derivation system made it complicated to directly import words commonly used in German, French, or English scientific publications.[26] In 1907, the Delegation for the Adoption of an International Auxiliary Language seemed close to retaining Esperanto as its preferred language. Nevertheless, significant criticism was still addressed at a few remaining complexities of the language, as well as its lack of scientific purpose and technical vocabulary. Unexpectedly, the delegation supported a new variant of Esperanto, Ido, which was submitted late in the process by an unknown contributor. While this decision was framed as a compromise between the Esperantist and the anti-Esperantist factions, it ultimately disappointed all proponents of an international medium for scientific communication, and it durably harmed the adoption of constructed languages in academic circles.[27]
English, competitors, and machine translation (1920–1965)
The two world wars had a lasting impact on scientific languages. A combination of political, economic, and social factors durably weakened the triumvirate of the three main languages of science in the 19th century; this combination paved the way for the predominance of English in the latter part of the 20th century. There is ongoing debate about whether the world wars accelerated a structural tendency toward English predominance or merely created the conditions for it. Ulrich Ammon wrote that "even without the World Wars the English language community would have gained economic and, consequently, scientific superiority and, thus, preference of its language for international scientific communication."[28] By contrast, Michael Gordin emphasizes that the privileged status of English was far from settled until the 1960s.
The First World War had an immediate impact on the global use of German in academic settings.[15] For nearly a decade after this war, international scientific events boycotted German researchers. The German scientific communities had been compromised by nationalistic propaganda in favor of German science during the war, in addition to the exploitation of scientific research for war crimes. German was no longer acknowledged as a global scientific language. While the boycott did not last, its effects were long-term. In 1919, the International Research Council was created to replace the International Association of Academies, and it used only French and English as working languages.[29] In 1932, fully 98.5% of international scientific conferences admitted contributions in French, 83.5% in English, and only 60% in German.[30] At the same time, the focus of German periodicals and conferences had become increasingly local, and it included research from non-Germanic countries ever less frequently.[30] German never recovered its privileged status as a leading language of science in the United States; due to the lack of alternatives beyond French, American education became "increasingly monoglot" and isolationist.[31] Unaffected by the international boycott, the use of French reached "a plateau between the 1920s and 1940s"; while it did not decline, it did not profit from the marginalization of German, but instead it decreased relative to the expansion of English.[15]
The rise of totalitarianism in the 1930s reinforced the status of English as the leading scientific language. In absolute terms, German publications retained some relevance, but German scientific research was structurally weakened by anti-Semitic and political purges, rejection of international collaborations, and emigration.[32] The German language was not boycotted again in international scientific conferences after the Second World War, since its use had quickly become marginal, even in Germany itself; even after the period of the occupied zone, English (in the West) and Russian (in the East) became major vehicular languages for higher education.[33]
In the two decades after the Second World War, English had become the leading language of science. However, a large share of global research continued to be published in other languages, and language diversity even seemed to increase until the 1960s. Russian publications in numerous fields, especially chemistry and astronomy, had grown rapidly after the war: "in 1948, more than 33% of all technical data published in a foreign language now appeared in Russian."[34] As late as 1962, Christopher Wharton Hanson raised doubts about the future of English as the leading language in science, with Russian and Japanese rising as major languages of science, and the new decolonized states seemingly poised to favor local languages:
It seems wise to assume that in the long run the number of significant contributions to scientific knowledge by different countries will be roughly proportional to their populations, and that except where populations are very small contributions will normally be published in native languages.[35]
The expansion of Russian scientific publishing became a source of recurring tension in the United States during the decade of the Cold War. Very few American researchers were able to read Russian, which contrasted with a remaining widespread familiarity with the two oldest languages of science, French and German. "In a 1958 survey, 49% of American scientific and technical personnel claimed they could read at least one foreign language, yet only 1.2% could handle Russian."[24] Science administrators and funders had recurring concerns about their ability to efficiently track the progress of academic research in the USSR. This ongoing anxiety became an overt crisis after the successful launch of the Sputnik 1 satellite in 1958, as the decentralized American research system seemed for a time outpaced by the efficiency of Soviet planning.
Although the Sputnik crisis was relatively brief, it had far-reaching consequences for linguistic practices in science—in particular, the development of machine translation. Research in this area emerged precociously[clarification needed]: automated translation appeared as a natural extension of the purpose of the first computers, which was code-breaking.[36] Leading figures in computing, such as Norbert Wiener, were initially reluctant. Nevertheless, several well-connected science administrators in the US, such as Warren Weaver and Léon Dostert, established a series of major conferences and experiments in the nascent field, out of a concern that "translation was vital to national security".[36] On January 7, 1954, Dostert coordinated the Georgetown–IBM experiment, which aimed to demonstrate that the technique was sufficiently mature, despite the significant shortcomings of computing infrastructure at the time. Some sentences from Russian scientific articles were automatically translated, using a dictionary of 250 words and six basic syntax rules.[37] It was not disclosed at the time that these sentences had been purposely selected for their suitability for automated translation. At most, Dostert argued that "scientific Russian" was easier to translate, since it was more formulaic and less grammatically diverse than everyday Russian.
Machine translation became a major priority in US federal research funding in 1956 because of an emerging arms race with Soviet researchers. While the Georgetown–IBM experiment did not have a large impact in the United States initially, it was immediately noticed in the USSR. The first articles in the field appeared in 1955; only a year later, a major conference was held that attracted 340 representatives.[38] In 1956, Léon Dostert secured significant funding with the support of the CIA, and he had enough resources to overcome the technical limitations of existing computing infrastructure. In 1957, automated translation from Russian to English could run on a vastly expanded dictionary of 24,000 words, and it could rely on hundreds of predefined syntax rules.[39] At this scale, automated translation remained costly, since it relied on numerous computer operators using thousands of punch cards.[39] Nevertheless, the quality of the output did not improve significantly: in 1964, the automated translation of the few sentences submitted during the Georgetown–IBM experiment yielded a much less readable output, since it was no longer possible to tweak the rules for a predefined corpus.[40]
English as a global standard (1965 onward)

During the 1960s and 1970s, English was no longer a majority language of science, but a scientific lingua franca instead. The transformation had more wide-ranging consequences than the replacement of two or three main languages of science by a single language: it marked "the transition from a triumvirate that valued, at least in a limited way, the expression of identity within science, to an overwhelming emphasis on communication and thus a single vehicular language."[40] Ulrich Ammon characterizes English as an "asymmetrical lingua franca", since it is "the native tongue and the national language of the most influential segment of the global scientific community, but a foreign language for the rest of the world."[41] This paradigm is usually associated with the globalization of American and English-speaking culture in the latter part of the 20th century.[41]
No specific event accounts for the full shift, though numerous transformations highlight an accelerated conversion to English science in the later part of the 1960s. On June 11, 1965, US President Lyndon B. Johnson stated that the English language had become a lingua franca that opened "doors to scientific and technical knowledge" and whose promotion should be a "major policy" of the United States.[42] In 1969, the most prestigious collection of abstracts in chemistry in the early 20th century—the German Chemisches Zentralblatt—was discontinued. This polyglot compilation in 36 languages could no longer compete with the English-focused Chemical Abstract, since more than 65% of publications in the field were in English.[43] By 1982, a report of the French Academy of Sciences admitted that "English is by now the international standard language of science and it could very nearly become its unique language", and it is already the main "means of communication" in European countries with a long-standing tradition of publication in local languages such as Germany and Italy.[44] In the European Union, the Bologna Declaration of 1999 "obliged universities throughout Europe and beyond to align their systems with that of the United Kingdom", and it created strong incentives to publish academic results in English.[45] From 1999 to 2014, the number of English-speaking courses in European universities increased tenfold.[46]
Machine translation, which had been booming since 1954 thanks to Soviet-American competition, was immediately affected by the new paradigm. In 1964, the US National Science Foundation underlined that "there is no emergency in the field of translation" and that translators were easily up to the task of making foreign research accessible.[40] Funding stopped simultaneously in the United States and the Soviet Union, and machine translation did not recover from this research "winter" until the 1980s; by that time, translating scientific publications was no longer the main motivation. Research in this area was still pursued in a few countries where bilingualism was an important political and cultural issue; in Canada, for example, the METEO system was successfully established to "translate weather forecasts from English into French".[47]
English content gradually became prevalent in originally non-English journals—first as an additional language, and then as the default language. Before 1998, seven leading European journals had published in their local languages: Acta Physica Hungarica, Anales de Física, Il Nuovo Cimento, Journal de Physique, Portugaliae Physica, and Zeitschrift für Physik. In 1998, these journals merged and became the European Physical Journal, an international journal accepting only English submissions. The same process occurred repeatedly in less prestigious publications:
The pattern has become so routine as to be almost cliché: first, a periodical publishes only in a particular ethnic language (French, German, Italian); then, it permits publication in that language and also a foreign tongue, always including English but sometimes also others; finally, the journal excludes all other languages but English and becomes purely Anglophone.[48]
Early scientific infrastructure was a leading factor in the conversion to a single vernacular language. Critical developments in applied scientific computing and information retrieval systems occurred in the United States after the 1960s.[49] The Sputnik crisis was the main incentive, since it "turned the librarians' problem of bibliographic control into a national information crisis";[50] in addition, it favored ambitious research plans such as the following:
- SCITEL—an ultimately failed proposal to create a centrally planned system of electronic publication in the early 1960s
- MEDLINE—for medicine journals
- NASA/RECON—for astronomy and engineering
By contrast with the decline of machine translation, scientific infrastructure and databases emerged as a profitable business in the 1970s. Even before the emergence of a global network such as the World Wide Web, "it was estimated in 1986 that fully 85% of the information available in worldwide networks was already in English."[51]
The predominant use of English went beyond the architecture of networks and infrastructures, and it affected the content as well. The Science Citation Index—created by Eugene Garfield in the aftermath of SCITEL—had a significant and lasting influence on the structure of global scientific publication in the last decades of the 20th century, providing its most important metrics. The journal impact factor, "ultimately came to provide the metric tool needed to structure a competitive market among journals."[52] The Science Citation Index had better coverage of English-speaking journals, which gave them a stronger journal impact factor and created incentives to publish in English: "Publishing in English placed the lowest barriers toward making one's work 'detectable' to researchers."[53] Because it was convenient to deal with a monolingual corpus, Eugene Garfield called for acknowledging English as the only international language for science:
Since Current Contents has an international audience, one might say that the ideal publication would be multi-lingual, listing all titles in five languages -- one or more of which is read by most of our subscribers, including German, French, Russian and Japanese, as well as English. This is, of course, impractical since it would quadruple the size of Current Contents (…) the only reasonable solution is to publish as many contents pages in English as is economically and technically feasible. To do this we need the cooperation of publishers and authors.[54]
Remove ads
Current trends
Summarize
Perspective
English standardization
Nearly all the scientific publications indexed on the leading commercial academic search engines are in English. As of 2022, this observation covers 95.86% of the 28,142,849 references indexed on the Web of Science platform, in addition to 84.35% of the 20,600,733 references indexed in the Scopus system.[55]
The minimal coverage of non-English languages creates a feedback loop—non-English publications can be considered less valuable because they are not indexed in international rankings and fare poorly in evaluation metrics. As many as 75,000 articles, book titles, and book reviews from Germany were excluded from Biological Abstracts between 1970 and 1996.[56] In 2009, at least 6555 journals were published in Spanish and Portuguese on a global scale, and "only a small fraction are included in the Scopus and Web of Science indices."[57]
Criteria for inclusion in commercial databases not only favor English journals but also incentivize non-English journals to discontinue their local journals. They "demand that articles be in English, have abstracts in English, or at least have their references in English".[58] In 2012, the Web of Science was explicitly committed to the anglicization (and Romanization) of published knowledge:
English is the universal language of science. For this reason, Thomson Reuters focuses on journals that publish full text in English, or at very least, bibliographic information in English. There are many journals covered in Web of Science that publish articles with bibliographic information in English and full text in another language. However, going forward, it is clear that the journals most important to the international research community will publish full text in English. This is especially true in the natural sciences. There are notable exceptions to this rule in the Arts & Humanities and in Social Sciences topics.[59]
This commitment to English science has a significant performative effect. The influence that commercial databases "now wield on the international stage is considerable and works very much in favor of English", since they provide a wide range of indicators of research quality.[57] They contributed to "large-scale inequality, notably between Northern and Southern countries".[60] While leading scientific publishers had initially "failed to grasp the significance of electronic publishing,"[61] they successfully pivoted to a "data analytics business" by the 2010s. Actors such as Elsevier and Springer are increasingly able to control "all aspects of the research lifecycle, from submission to publication and beyond".[62] Due to this vertical integration, commercial metrics are no longer restricted to journal article metadata, but they can include a wide range of individual and social data extracted from scientific communities.
National databases of scientific publications show that the use of English continued to expand during the 2000s and 2010s at the expense of local languages. According to a comparison of seven national databases in Europe from 2011 to 2014, in "all countries, there was a growth in the proportion of English publications".[63] In France, data from the Open Science Barometer shows that the share of publications in French has shrunk from 23% in 2013 to 12-16% by 2019–2020.[64]
According to Ulrich Ammon, the predominance of English has created a hierarchy and a "central-peripheral dimension" within the global scientific publication landscape, which negatively affects the reception of research published in a non-English language.[65] The unique use of English has discriminatory effects on scholars who are not sufficiently conversant in the language; in a survey organized in Germany in 1991, 30% of researchers in all disciplines gave up on publishing when English was the only option.[66] In this context, the emergence of new scientific powers is no longer linked with the appearance of a new language of science, which was the case until the 1960s. China has quickly become a major player in international research, placing second after the United States in numerous rankings and disciplines.[67] Nevertheless, most of this research is English-speaking and abides by the linguistic norms established by commercial indexes.
The dominant position of English has also been strengthened by the "lexical deficit" accumulated during past decades by alternative languages of sciences; after the 1960s, "new terms were being coined in English at a much faster rate than they were being created in French."[68]
Persistence of linguistic diversity
Several languages have retained a secondary status as an international language of science, due to the extent of the local scientific production or to their continued use as a vehicular language in specific contexts. These languages generally include "Chinese, French, German, Italian, Japanese, Russian, and Spanish."[65] Local languages have remained prevalent in major scientific countries: "most scientific publications are still published in Chinese in China".[69]
Empirical studies of the languages used in scientific publications have long been constrained by structural bias in the most readily accessible sources—commercial databases such as the Web of Science.[70] Atypical access to a large corpus not covered by global indexes showed that multilingualism remains non-negligible, although little studied; as of 2022, there are "few examples of analyses at scale" for multilingualism in science.[71] In seven European countries with limited international reach for each local language, one third of researchers in social sciences and the humanities publish in two or more languages; "research is international, but multilingual publishing keeps locally relevant research alive with the added potential for creating impact."[72] Because of the discrepancy between actual practices and their visibility, multilingualism has been described as a "hidden norm of academic publication".[73]
Overall, the social sciences and the humanities (SSH) have preserved more diverse linguistic practices; "while natural scientists of any linguistic background have largely shifted to English as their language of publication, social scientists and scholars of the humanities have not done so to the same extent."[74] In these disciplines, the need for global communication is balanced by the significance for local culture; "the SSH are typically collaborating with, influencing and improving culture and society. To achieve this, their scholarly publishing is partly in the native languages."[75] Nevertheless, the distinctiveness of the social sciences and the humanities in this regard was progressively reduced after 2000; by the 2010s, a large proportion of German and French articles on art and the humanities (as indexed in the Web of Science) were in English.[76] While German has been outpaced by English even in Germanic-speaking countries since the Second World War, German continues to be marginally used as a vernacular scientific language in specific disciplines or research fields (the so-called Nischenfächer or "niche-disciplines").[77] Linguistic diversity is not specific to the social sciences, but this persistence may be obscured by the high prestige attached to international commercial databases; in the Earth sciences, "the proportion of English-language documents in the regional or national databases (KCI, RSCI, SciELO) was approximately 26%, whereas virtually all the documents (approximately 98%) in Scopus and WoS were in English."[78]
Beyond the general distinction between the social sciences and the natural sciences, there are finer-grained distributions of language practices. In 2018, a bibliometric analysis was performed on the publications in the social sciences and the humanities of eight European countries; this analysis highlighted that "patterns in the language and type of SSH publications are related not only to the norms, culture, and expectations of each SSH discipline but also to each country's specific cultural and historic heritage."[79] The use of English was more prevalent in Northern Europe than in Eastern Europe, and publication in the local languages remains especially significant in Poland due to a large "'local' market of academic output".[80] Local research policies may have a significant impact: preference for international commercial database (such as Scopus or the Web of Science) may account for a steeper decline in publications in the local language in the Czech Republic, relative to Poland.[81] Additional factors include the distribution of economic models within the journals; non-commercial publications have much stronger "language diversity" than do commercial publications.[82]
Since the 2000s, the expansion of digital collections has contributed to a relative increase in linguistic diversity in academic indexes and search engines.[70] The Web of Science enhanced its regional coverage during 2005-2010, which caused the index to "increase the number of non-English papers such as Spanish papers".[83] In Portuguese research communities, there has been a sharp increase in Portuguese-language papers in commercial indexes during 2007-2018, which is indicative of remaining "spaces of resilience and contestation of some hegemonic practices" and of a potential new paradigm in scientific publishing "steered towards plurilingual diversity".[84] Multilingualism as a practice and competency has also increased; as of 2022, 65% of early-career researchers in Poland had published in two or more languages, whereas only 54% of the older generations had done so.[85]
In 2022, Bianca Kramer and Cameron Neylon led a large-scale analysis of the metadata available for 122 million objects indexed with a digital object identifier (DOI) by the Crossref organization.[71] Overall, non-English publications made up "less than 20%", although this percentage could be underestimated for two reasons: a lower adoption rate for DOIs, or the use of local DOIs (for example, through the Chinese National Knowledge Infrastructure).[71] Nevertheless, multilingualism seems to have improved during the last 20 years, with a significant increase in publications in Portuguese, Spanish, and Indonesian.[71]
Machine translation
Scientific publication has been the first major use case for machine translation, with early experiments dating back to 1954. Development in this area slowed after 1965, because of the increasing predominance of English, limitations in computing infrastructure, and the shortcomings of the leading approach—rule-based machine translation. By design, rule-based methods favored translation between a few major languages (English, Russian, French, German, ...), since a "transfer module" needed to be developed for "each pair of languages"; this situation led to a combinatorial explosion when more languages were considered.[86] After the 1980s, the field of machine translation was revived as it underwent a "full-scale paradigm shift": explicit rules were replaced by statistical and machine learning methods applied to a large aligned corpus.[87][86] By that time, most of the demand no longer came from scientific publishing, but instead from commercial documents such as technical and engineering manuals.[88] A second paradigm shift occurred in the 2010s, with the development of deep learning methods, which can be partially trained on a non-aligned corpus (i.e., "zero-shot translation"). Requiring few supervision inputs, deep learning models make it possible to incorporate a wider diversity of languages, but also a wider diversity of linguistic contexts within one language.[89] The results are significantly more accurate than with rule-based machine translation: after 2018, automated translation of PubMed biomedical abstracts was deemed superior to human translation for a few languages (such as Portuguese).[90] Scientific publications are an appropriate use case for neural-network translation models, since they work best "in restricted fields for which it has a lot of training data."[91]
In 2021, there were "few in-depth studies on the efficiency of Machine Translation in social science and the humanities" since "most research in translation studies are focused on technical, commercial or law texts".[92] Uses of machine translation are especially difficult to estimate, since freely available tools such as Google Translate have become ubiquitous. "There is an emerging yet rapidly increasing need for machine translation literacy among members of the scientific research and scholarly communication communities. Yet in spite of this, there are very few resources to help these community members acquire and teach this type of literacy."[93]
In an academic setting, machine translation includes a variety of uses. Production of written translations remains constrained by a lack of accuracy and consequently efficiency, since the post-editing of an imperfect machine translation should ideally take less time than a human translation.[94] Automated translation of foreign-language text is more widespread in the context of a literature survey or "information assimilation", since the quality requirements are generally lower and a global understanding of a text is sufficient.[95] The impact of machine translation on linguistic diversity in science depends on these uses:
If machine translation for assimilation purposes makes it possible, in principle, for researchers to publish in their own language and still reach a wide audience, then machine translation for dissemination purposes could be seen to favor the opposite and to support the use of a common language for research publication.[96]
The increased use of machine translation has created concerns about "uniform multilingualism". Research in the field has largely focused on English and a few major European languages; "While we live in a multilingual world, this is paradoxically not taken into account by machine translation".[97] English has often been used as a pivot language, serving as a hidden intermediary state in translation between two non-English languages.[98] From a training corpus, probabilistic methods tend to favor the most expected possible translation and to rule out more unusual alternatives. "A common argument against the statistical methods in translation is that when the algorithm suggests the most probable translation, it eliminates alternative options and makes the language of the text so produced conform to well-documented modes of expression."[99] While deep learning models can deal with a wider diversity of language construct, they can still be limited by collection bias in the original corpus; "the translation of a word can be affected by the prevailing theories or paradigms in the corpus harvested to train the AI".[92]
In its 2022 research assessment of open science, the Council of the European Union welcomed the "promising developments that have recently emerged in the area of automatic translation"; the council supported a more widespread use of "semi-automatic translation of scholarly publications within Europe" because of its "major potential in terms of market creation".[4]
Remove ads
Open science and multilingualism
Summarize
Perspective
Open science infrastructure
The development of open science infrastructure or "community-controlled infrastructure" has become a major policy issue in the open science movement. In the 2010s, the expansion of commercial scientific infrastructure led to major acknowledgment of the fragility of open scholarly publishing and open archives.[100] The concept of open science infrastructure emerged in 2015 with the publication of the Principles for Open Scholarly Infrastructures. In November 2021, a UNESCO recommendation acknowledged open science infrastructure as one of the four pillars of open science, along with open science knowledge, open engagement of societal actors, and open dialogue with other knowledge systems. UNESCO called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their long-term sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible."[3] Examples of open science infrastructure include indexes, publishing platforms, shared databases, and computer grids.
Open infrastructures have supported linguistic diversity in science. The leading free software for scientific publishing, Open Journal Systems, is available in 50 languages;[101] it is widespread among non-commercial open-access journals.[102] A landscape study was conducted by the SPARC alliance in 2021; it shows that European open science infrastructures "provide access to a range of language content of local and international significance."[103] In 2019, leading open science infrastructures endorsed the Helsinki Initiative on Multilingualism in Scholarly Communication, and they thus committed to "protect national infrastructures for publishing locally relevant research."[2] Signatories include the Directory of Open Access Journals (DOAJ), Digital Research Infrastructure for the Arts and Humanities (DARIAH), Latindex, OpenEdition, Open Scholarly Communication in the European Research Area for Social Sciences and Humanities (OPERAS), and SPARC Europe.[104]
In contrast with commercial indexes, the DOAJ does not prescribe the use of English. As a consequence, only half of the journals indexed are published primarily in English; this is a sharp contrast with the overwhelming prevalence of English in commercial indexes such as the Web of Science (more than 95% in English). Six languages are represented by more than 500 journals: Spanish (2776 journals), Portuguese (1917 journals), Indonesian (1329 journals), French (993 journals), Russian (733 journals), and Italian (529 journals).[105] Most of this language diversity is due to non-commercial journals (or diamond open access); 25.7% of these publications accept contributions in Spanish, compared with only 2.4% of journals based on an article processing charge (APC).[105] In 2020-2022, "for English articles in DOAJ journals, 21% are in non-APC journals, but for articles in languages other than English, this percentage is a massive 86%."[71]
Non-English open infrastructures have experienced significant growth; as of 2022, "national repositories and databases are growing everywhere (see the databases such as Latindex in Latin America, or the new repositories in Asia, China, Russia, India)".[106] This development opens up new research opportunities for the study of multilingualism in a scientific context. It will become increasingly feasible to study the "differences between locally published research in non-English speaking contexts and English-speaking international authors".[106]
Multilingualism and social impact
Publication on open-access platforms has created new incentives for publishing in a local language. In commercial indexes, non-English publications were penalized by the lack of international reception, and they had a significantly lower impact factor.[107] Without a paywall, a local language publication can find its specific audience among a large non-academic public who may be less proficient in English.
During the 2010s, quantitative studies began to highlight the positive impact of local languages on the reuse of open-access resources in nations such as Finland, Québec, Croatia, and Mexico.[108][109][110] A study of the Finnish platform Journal.fi shows that the audience for Finnish-language articles is significantly more diverse: "in case of the national language publications students (42%) are clearly the largest group, and besides researchers (25%), also private citizens (12%) and other experts (11%)".[108] By contrast, English-language publications attract mostly professional researchers. Because of ease of access, open science platforms in a local language can also achieve more global reach. The French-Canadian journal consortium Érudit has a primarily international audience, with less than one third of readers coming from Canada.[111]
A strong network of open science infrastructures has been developed in South America (e.g., Scielo and Redalyc) and the Iberian region; this has contributed to the resurgence of Spanish and Portuguese in international scientific communication. Regional growth may also be associated with the boom in open-access publishing. Both Portuguese and Spanish play important roles in open-access publishing (as do Brazil and Spain themselves).[83]
Although multilingualism has been either neglected or even discriminated against in commercial databases, it has been valued as central to the social impact of open science platforms and infrastructure. In 2015, Juan Pablo Alperin introduced a systematic measure of social impact that highlighted the relevance of scientific content for local communities: "By looking at a broad range of indicators of impact and reach, far beyond the typical measures of one article citing another, I argue, it is possible to gain a sense of the people that are using Latin American research, thereby opening the door for others to see how it has touched those individuals and communities.[112] In this context, new indicators for linguistic diversity have been proposed. Proposals include the PLOTE index and the Linguistic Diversity Index.[113][114] As of 2022, however, they have had "limited traction in the scholarly anglophone literature".[71] Comprehensive indicators of the local impact of research remain largely non-existent; "many aspects of research cannot be measured quantitatively, especially its sociocultural impact."[115]
Policies in favor of multilingualism
A new scientific and policy debate over linguistic diversity emerged after 2015:[116] "in recent years, policies for Responsible Research and Innovation (RRI) and Open Science call for increasing access to research, interaction between science and society and public understanding of science".[117] The debate initially stemmed from wider discussions about the evaluation of open science and the limitations of commercial metrics. In 2015, the Leiden Manifesto included ten principles to "guide research evaluation", which included a call to "protect excellence in locally relevant research".[118] Building on empirical data showing the persistence of non-English research communities in Europe, Gunnar Sivertsen (in 2018) theorized the need for a balanced multilingualism "to consider all the communication purposes in all different areas of research, and all the languages needed to fulfil these purposes, in a holistic manner without exclusions or priorities."[75] In 2016, Sivertsen contributed to the "Norwegian model" of scientific evaluation: he proposed a flat hierarchy among a few large international journals, along with a wide selection of journals that would not discriminate against local publications, and he encouraged journals in the social sciences and the humanities to favor Norwegian publications.[75]
These local initiatives developed into a new international movement in favor of multilingualism. In 2019, 120 research organizations and several hundred individual researchers cosigned the Helsinki Initiative on Multilingualism in Scholarly Communication. The initiative includes three principles:
- "Support dissemination of research results for the full benefit of the society", which implies that they should be available "in a variety of languages".
- "Protect national infrastructures for publishing locally relevant research" through specific support of the non-commercial/diamond model to "make sure not-for-profit journals and book publishers have both sufficient resources". Non-commercial journals are more likely to be published in a local language.[82]
- "Promote language diversity in research assessment, evaluation, and funding systems", in line with the third recommendation of the Leiden Manifesto.
In the aftermath of the Helsinki Initiative, multilingualism has been increasingly associated with open science. This trend was accelerated during the COVID-19 pandemic, which "saw a widespread need for multilingual scholarly communication, not only between researchers, but to enable research to reach decision-makers, professionals and citizens".[119] Multilingualism has also re-emerged as a topic of debate beyond the social sciences. In 2022, the Journal of Science Policy and Governance published a "Call to Diversify the Lingua Franca of Academic STEM Communities", which stressed that "cross-cultural solutions are necessary to prevent critical information from being missed by English-speaking researchers."[120]
In November 2021, the UNESCO Recommendation for Open Science included multilingualism as the core of its definition of open science: "For the purpose of this Recommendation, open science is defined as an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone".[3]
During the early 2020s, the European Union begin to officially support language diversity in science, as a continuation of its general policies in favor of multilingualism. In December 2021, the European Commission issued an important report on the future of scientific assessment in European countries. However, this report overlooked the issue of linguistic diversity: "Multilingualism is the most notable omission".[119] In June 2022, the Council of the European Union included a detailed recommendation on the "Development of multilingualism for European scholarly publications" in its research assessment of open science. The declaration acknowledges the "important role of multilingualism in the context of science communication with society" and welcomes "initiatives to promote multilingualism, such as the Helsinki initiative on multilingualism in scholarly communication."[4] While the declaration is not binding, it invites experiments with multilingualism "on a voluntary basis" and to assess the need for further action by the end of 2023.[121]
Remove ads
References
Bibliography
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads