Academic Torrents

Academic Torrents^[1]^[2]^[3]^[4]^[5]^[6] is a platform that enables the sharing of research data using the BitTorrent protocol. Launched in November 2013, a U.S.-based 501(c)(3) non-profit organization.^[7]^[8] Similar to LOCKSS, Academic Torrents focuses on providing open access to research materials and supporting reproducibility in scientific studies. They do so by "offering researchers the opportunity to distribute the hosting of their papers and datasets to authors and readers, providing easy access to scholarly works and simultaneously backing them up on computers around the world."^[9]^[10]

Quick facts Type of site, Country of origin ...

Academic Torrents


Type of site	BitTorrent tracker Digital library
Country of origin	United States
Owner	Institute for Reproducible Research
Founder(s)	Joseph Paul Cohen Henry Z Lo
Industry	Non-profit
URL	academictorrents.com
Launched	2013
Current status	Active

Reddit Comments and Submissions Dataset

Academic Torrents hosts a large collection of Reddit comment and submission datasets spanning June 2005 to June 2025, compiled through the Pushshift project, totaling over 3.4TB.^[12]^[13] The dataset comprises 476 zstandard-compressed NDJSON files, including monthly submission archives such as RS_2025-06.zst (18.68 GB) and earlier files like RS_2021-06.zst (9.46 GB).^[13] It supports research in social media analysis, natural language processing, and computational social science, offering a historical archive of Reddit activity.^[12] Python scripts for parsing the data are available on GitHub, facilitating programmatic access.^[14] Distributed via BitTorrent, the dataset ensures efficient access and long-term preservation for researchers studying online communities.^[13]

Several studies have specifically cited the dataset's availability through Academic Torrents for accessing Reddit data. For instance, Andrei (2025) analyzed hate speech trends on Reddit during Donald Trump's 2024–2025 presidential campaign, using approximately 500 GB of data from the platform to apply NLP techniques like BERT for classification and HDBSCAN for clustering targets.^[15] Goyal et al. (2025) employed comments from the r/wallstreetbets subreddit, sourced via Academic Torrents, to develop sentiment-based predictive trading strategies, achieving higher returns than buy-and-hold approaches through metrics like Sentiment Volume Change.^[16] Baumgartner et al. (2023) utilized the dataset to compare health-related vocabulary usage between laypeople and medical professionals on the r/AskDocs subreddit, reproducing their corpus from the bulk data available on Academic Torrents.^[17] Boraske and Burns (2025) drew from the AITA subreddit data hosted on Academic Torrents to align large language models with human moral judgments, using nearly 50,000 submissions and comments to improve LLM accuracy in ethical evaluations.^[18] Popoola et al. (2024) explored over 143,000 Reddit posts on computing internships, using topic modeling and sentiment analysis to identify prevalent themes like academics and career, sourced from Academic Torrents.^[19]

Developing Human Connectome Project

The developing Human Connectome Project related to the Human Connectome Project uses the platform. "Researchers from three leading British institutions are using BitTorrent to share over 150 GB of unique high-resolution brain scans of unborn babies with colleagues worldwide... The researchers opted to go for the Academic Torrents tracker, which specializes in sharing research data"^[20]

CrossRef metadata

The site hosts public metadata releases from Crossref which contain over 120+ million metadata records for scholarly work, each with a DOI. This was done so to allow the community to work with the entire database programmatically instead of using their API. "The sheer number of records means that, though anyone can use these records anytime, downloading them all via our APIs can be quite time-consuming. We hope this saves the research community valuable time during this crisis."^[21]^[22]

Academic Torrents

Mission and Purpose

Notable datasets

Reddit Comments and Submissions Dataset

Developing Human Connectome Project

CrossRef metadata

See also

References

Wikiwand - on