Top Qs
Timeline
Chat
Perspective

Anna's Archive

Shadow library search engine From Wikipedia, the free encyclopedia

Anna's Archive
Remove ads

Anna's Archive is an open source search engine for shadow libraries (repositories of digital texts which are otherwise not readily accessible) that was launched by the pseudonymous Anna shortly after law enforcement efforts to shut down Z-Library in 2022. The site aggregates records from Z-Library, Sci-Hub, and Library Genesis (LibGen), among other sources. It calls itself "the largest truly open library in human history",[† 1] and has said it aims to "catalog all the books in existence" and "track humanity's progress toward making all these books easily available in digital form". It claims not to be responsible for downloads of copyrighted works, since the site indexes metadata but does not directly host any files, instead linking to third-party downloads. However, it has faced government blocks and legal action from copyright holders and publishing trade associations for engaging in large-scale copyright infringement.

Quick Facts Type of site, Founder(s) ...
Remove ads

Origins

Anna's Archive emerged out of the Pirate Library Mirror (PiLiMi) project, an anonymous effort to mirror shadow libraries that completed a full copy of Z-Library in September 2022.[1][2] PiLiMi acknowledged that it "deliberately violated the copyright law in most countries",[1][2] and its initial focus was on preservation rather than on making its data searchable.[3] Days after US law enforcement seized several Z-Library domains and arrested its alleged operators in November 2022, PiLiMi member Anna (also known as Anna Archivist) launched Anna's Archive, which initially displayed results from Z-Library and LibGen.[1][2][4][5]

Remove ads

Website and operations

Summarize
Perspective

Anna's Archive has been variously described as a search engine,[4] a metasearch engine,[1] and a shadow library itself.[2] The site does not itself host any files (which it claims makes it nonliable for downloads of copyrighted works), but it links to third-party downloads provided by anonymous partners,[† 1][6] as well as offering downloads through the IPFS protocol.[a][1][7] Its source code is dedicated to the public domain under the CC0 license.[† 3] It operates three mirrors under different top-level domains, currently .li, .se, and .org.[† 1]

The site's "source libraries" include LibGen, Sci-Hub, Z-Library, the Internet Archive, DuXiu, MagzDB, Nexus/STC, and HathiTrust; Open Library, WorldCat, and Google Books are listed as metadata-only sources.[† 4] Some of these datasets are already publicly accessible, while others are scraped or otherwise privately acquired for distribution.[† 4][8] They are then released in bulk[b] with torrent files so as to make them resilient to website takedowns.[† 1] As of June 2025, Anna's Archive includes 51,064,327 books and 98,551,617 papers,[† 1] and its unified list of torrents totals roughly 1.1 petabytes in size.[† 6]

A 2025 study comparing the coverage of conventional library databases to various alternatives (including scholarly search engines, other web-based databases, academic social networks, and piracy sites) found that Anna's Archive had among the most comprehensive full-text coverage, but criticized it for having an unintuitive interface.[9] In March 2025, it averaged over 650,000 daily downloads, roughly 10 times the estimated distribution of the New York Public Library.[10]

Finances

High-speed downloads on Anna's Archive are only available to users with a paid membership, while nonmembers must use slower options with browser verification to prevent abuse by bots. It describes itself as a nonprofit, claiming that membership fees and donations are mostly spent on server infrastructure and that none are personally used by the site's operators.[† 1] It awards memberships and monetary "bounties" to some volunteer contributors.[† 7]

Anna's Archive offers high-speed access to its full collection via SFTP to groups training large language models (LLMs) in exchange for large contributions of money or data.[11] It said it provided such access to about 30 companies as of January 2025, primarily based in China, including both LLM companies and data brokers.[12] DeepSeek's VL model was trained on data from the site.[13]

Motivation

Anna's Archive is a non-profit project with two goals:

1. Preservation: Backing up all knowledge and culture of humanity.

2. Access: Making this knowledge and culture available to anyone in the world.

Anna's Archive, FAQ[† 1]

Anna's Archive has said its objectives are to "catalog all the books in existence" and "track humanity's progress toward making all these books easily available in digital form".[4] It has been described as both continuing and greatly extending the ambitions of earlier shadow libraries with its vision of a "universal library" that preserves as many books as possible. It has been interpreted as part of an ascendant "culture of mistrust towards corporations, institutions, governments, and laws... that perhaps began with the financial collapse of 2008 and the Occupy Wall Street movements" which saw the rise of decentralizing technologies.[10]

Anna has justified their opposition to copyright on ethical grounds, stating that they "believe that preserving and hosting these files is morally right"[10] and that they and other shadow librarians believe that "information wants to be free".[14] They have suggested that copyright law must be reformed as a matter of national security, proposing that Western countries make legal carveouts for text and data mining so as to remain ahead in the AI arms race.[12]

Anna cites programmer and information activist Aaron Swartz as inspiring the project's collection of metadata.[† 1] The site recommends Swartz's writings as well as Stephen Witt's How Music Got Free and Michele Boldrin and David K. Levine's Against Intellectual Monopoly, which criticize existing copyright law and have been associated with the copyleft movement.[10]

Remove ads
Summarize
Perspective
Thumb
Map of countries blocking Anna's Archive:
  Currently blocked

United States

Since 2023, Anna's Archive domains have appeared in the annual Notorious Markets List of the Office of the United States Trade Representative, which highlights digital and physical markets allegedly involved in large-scale intellectual property infringement. These reports describe the site as related to Sci-Hub and LibGen.[15][16][17] In response to a request for comment by the Office on its 2023 List, the Association of American Publishers identified Anna's Archive as an infringing site, and analyzed its cryptocurrency wallets to find that it had received over $29,000 in funds as of July 2023.[18][19]

In response to a March 2024 lawsuit accusing Nvidia of training LLMs on data from a shadow library,[20] the company disputed the characterization of Anna's Archive and other repositories as "shadow libraries", despite Anna's own use of the term.[21][22][relevant?]

OCLC lawsuit

In October 2023, Anna's Archive was reported to have scraped the entirety of WorldCat, the world's largest bibliographic database, and made its proprietary data freely available, which Anna described as "a major milestone in mapping out all the books in the world".[8] OCLC, WorldCat's maintainer, responded by suing the site in an Ohio federal court in January 2024, claiming the scrape was achieved through cyberattacks on its servers.[6] It sought over $5 million in total damages and an injunction to stop Anna's Archive from scraping or sharing its data.[23] OCLC clarified that although its internal systems were not breached, it believes the site's actions legally constitute hacking.[24] The only named defendant denied any involvement with the scrape or Anna's Archive.[25] Technology writer Glyn Moody criticized the suit as "costly and pointless", saying it went against OCLC's stated mission of making information accessible.[26]

In July 2024, in the wake of the suit, the .org mirror of Anna's Archive was replaced with a new .gs mirror to avoid falling under US jurisdiction; however, soon afterward, the .gs domain was suspended and the mirror reverted to the original .org domain.[23][27]

In March 2025, the court deferred judgement on aspects of the case to the Supreme Court of Ohio over concerns about its legal novelty, denying both a motion for default judgement from OCLC and a motion to dismiss from the named defendant.[28] In April, OCLC reached an agreement with the named defendant to drop her from the case, focusing instead on obtaining judgement against the site itself.[29]

Meta lawsuit

In February 2025, internal emails were unsealed in a lawsuit against Meta in a California court for allegedly training its AI models on copyrighted works which revealed that the company had downloaded over 81 terabytes of data through Anna's Archive torrents, in addition to data previously downloaded from LibGen. The plaintiffs in the case, a group of authors including Richard Kadrey, Sarah Silverman, and Christopher Golden, alleged that CEO Mark Zuckerberg personally authorized the use of shadow libraries. The company had argued that its use of copyrighted data in AI training constituted fair use.[30][31][32]

In June 2025, the court partially ruled in favor of Meta, finding that the AI training was "highly transformative" and therefore fair use. Vince Chhabria, the judge in the case, emphasized that the ruling did not mean that Meta's actions were in fact legitimate, but said that the plaintiffs failed to develop strong arguments. He identified "market dilution" as a convincing argument for financial harm not pursued by the plaintiffs: the idea that "by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works".[33][34][35]

Italy

In January 2024, Italy's national communications agency ordered major internet service providers (ISPs) in the country to block Anna's Archive due to a copyright complaint by the Italian Publishers Association.[36] An investigation by the Digital Services Directorate confirmed the presence of copyrighted works on the site and found that some of its servers were likely owned by a Ukrainian hosting provider, but failed to uncover the identity of its operators.[2]

Netherlands

In March 2024, the Rotterdam District Court ordered major ISPs in the Netherlands to block Anna's Archive and LibGen due to a request by advocacy group BREIN. The order was "dynamic", meaning that if the blocked sites changed domains or IP addresses in the future, ISPs would be obligated to update their blocks.[37][38][39][40]

United Kingdom

In December 2024, the UK Publishers Association won an order from the High Court of Justice requiring major ISPs to block Anna's Archive and other copyright-infringing sites, extending a list of sites blocked since 2015 under section 97A of the Copyright, Designs and Patents Act. The Association said it identified over one million records of copyrighted books and journal articles on Anna's Archive domains.[41][42]

Belgium

In July 2025, a group of organizations representing Belgian authors and copyright holders  including the Association of Belgian Publishers (ADEB), the Civil Society of Multimedia Authors (La Scam), the Cooperative for the Perception and Compensation of Belgian Publishers (Copiebel), Librius, the Educational and Scientific Publishers Group (GEWU), the General Publishers Ground (GAU), and the Flemish Authors' Association (VAV)  successfully petititoned the Commercial Court to issue judgement against five alleged piracy sites: Anna's Archive, LibGen, Sci-Hub, Z-Library, and OceanofPDF. The judge ordered FPS Economy's anti-piracy service to block the sites in the interim. In the event of noncompliance, the sites face fines of up to 500,000 euros.[43][44][45][46]

Other issues

Anna's Archive was among Google Search's ten most reported domains for DMCA takedown as of June 2024.[47] It has been one of the most targeted sites of Dutch anti-piracy service Link-Busters, which sends takedown notices to Google and other search engines on behalf of major publishers.[48][49][50]

In January 2025, the messaging app Telegram suspended the Anna's Archive channel for copyright infringement, despite the operators reportedly taking precautions to avoid infringing posts on the app. Z-Library's Telegram channel was suspended the same week, and neither was alerted of the action. The removals were speculated to be linked to legal action by an Indian court.[51]

Remove ads

Notes

  1. According to Anna's personal blog, they no longer host IPFS themselves because they believe it is not yet suitable for their purposes.[† 2]
  2. According to a post on Anna's blog, the project's data is standardized under the custom Anna's Archive Containers format to allow for incremental releases.[† 5]

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads