Top Qs
Timeline
Chat
Perspective

Academic Torrents

File-sharing website From Wikipedia, the free encyclopedia

Academic Torrents
Remove ads

Academic Torrents[1][2][3][4][5][6] is a platform that enables the sharing of research data using the BitTorrent protocol. Launched in November 2013, a U.S.-based 501(c)(3) non-profit organization.[7][8] Similar to LOCKSS, Academic Torrents focuses on providing open access to research materials and supporting reproducibility in scientific studies. They do so by "offering researchers the opportunity to distribute the hosting of their papers and datasets to authors and readers, providing easy access to scholarly works and simultaneously backing them up on computers around the world."[9][10]

Quick facts Type of site, Country of origin ...
Remove ads

Mission and Purpose

Academic Torrents aims to enhance the accessibility and preservation of research data by leveraging BitTorrent’s decentralized file-sharing technology. The platform supports researchers by reducing hosting costs, improving download speeds, and ensuring data redundancy across global networks. Its mission aligns with promoting open science and reproducible research, allowing academics to share datasets, papers, and other scholarly resources freely.[11]

Remove ads

Notable datasets

Summarize
Perspective

Reddit Comments and Submissions Dataset

Academic Torrents hosts a large collection of Reddit comment and submission datasets spanning June 2005 to June 2025, compiled through the Pushshift project, totaling over 3.4TB.[12][13] The dataset comprises 476 zstandard-compressed NDJSON files, including monthly submission archives such as RS_2025-06.zst (18.68 GB) and earlier files like RS_2021-06.zst (9.46 GB).[13] It supports research in social media analysis, natural language processing, and computational social science, offering a historical archive of Reddit activity.[12] Python scripts for parsing the data are available on GitHub, facilitating programmatic access.[14] Distributed via BitTorrent, the dataset ensures efficient access and long-term preservation for researchers studying online communities.[13]

Several studies have specifically cited the dataset's availability through Academic Torrents for accessing Reddit data. For instance, Andrei (2025) analyzed hate speech trends on Reddit during Donald Trump's 2024–2025 presidential campaign, using approximately 500 GB of data from the platform to apply NLP techniques like BERT for classification and HDBSCAN for clustering targets.[15] Goyal et al. (2025) employed comments from the r/wallstreetbets subreddit, sourced via Academic Torrents, to develop sentiment-based predictive trading strategies, achieving higher returns than buy-and-hold approaches through metrics like Sentiment Volume Change.[16] Baumgartner et al. (2023) utilized the dataset to compare health-related vocabulary usage between laypeople and medical professionals on the r/AskDocs subreddit, reproducing their corpus from the bulk data available on Academic Torrents.[17] Boraske and Burns (2025) drew from the AITA subreddit data hosted on Academic Torrents to align large language models with human moral judgments, using nearly 50,000 submissions and comments to improve LLM accuracy in ethical evaluations.[18] Popoola et al. (2024) explored over 143,000 Reddit posts on computing internships, using topic modeling and sentiment analysis to identify prevalent themes like academics and career, sourced from Academic Torrents.[19]

Developing Human Connectome Project

The developing Human Connectome Project related to the Human Connectome Project uses the platform. "Researchers from three leading British institutions are using BitTorrent to share over 150 GB of unique high-resolution brain scans of unborn babies with colleagues worldwide... The researchers opted to go for the Academic Torrents tracker, which specializes in sharing research data"[20]

CrossRef metadata

The site hosts public metadata releases from Crossref which contain over 120+ million metadata records for scholarly work, each with a DOI. This was done so to allow the community to work with the entire database programmatically instead of using their API. "The sheer number of records means that, though anyone can use these records anytime, downloading them all via our APIs can be quite time-consuming. We hope this saves the research community valuable time during this crisis."[21][22]

Remove ads

See also

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads