M
MercyNews
Home
Back

Reddit History Preserved: New Tool Archives 2.38B Posts Offline

Hacker News13h ago
3 min read
📋

Key Facts

  • ✓ The tool processes the 3.28TB Pushshift torrent containing 2.38 billion Reddit posts.
  • ✓ It generates static HTML, requiring no JavaScript or external internet connection to browse.
  • ✓ Includes a full REST API with 30+ endpoints and an MCP server for AI integration.
  • ✓ Deployment options range from a simple USB drive to a Tor hidden service.
  • ✓ The project is built using Python, PostgreSQL, Jinja2, and Docker.
  • ✓ It is released into the Public Domain on GitHub.

In This Article

  1. The Digital Time Capsule
  2. How It Works
  3. Total Ownership
  4. Advanced Capabilities
  5. Deployment Options
  6. Looking Ahead

The Digital Time Capsule#

Reddit's ecosystem has undergone a seismic shift in recent years. With the effective death of the public API and the disappearance of third-party applications, access to the platform's vast repository of discussions has become increasingly restricted. The Pushshift dataset, a critical resource for researchers and archivists, has faced repeated threats of being cut off, leaving the future of Reddit's collective knowledge in jeopardy.

Now, a new open-source project offers a definitive solution. A developer has built a tool capable of transforming the entire 3.28TB torrent of Reddit history into a fully functional, offline-accessible archive. This innovation ensures that once the data is downloaded, it belongs to the user forever—immune to corporate decisions, API keys, or internet connectivity.

How It Works#

The core function of the tool is deceptively simple yet powerful. It ingests compressed data dumps from Reddit (in .zst format), as well as archives from Voat and Ruqqus, and generates static HTML files. This approach eliminates the need for complex server infrastructure or constant internet access. Users simply open the generated index.html file in any browser to navigate through posts and comments.

For those requiring advanced functionality, an optional Docker stack with PostgreSQL can be deployed. This remains entirely on the user's machine, providing full-text search capabilities without external requests. The system is designed for maximum flexibility and privacy:

  • No JavaScript or external tracking
  • Works on air-gapped machines
  • Serves content over a local LAN (e.g., Raspberry Pi)
  • Can be distributed via USB drive

"Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away."

— Project Developer

Total Ownership#

The primary value proposition is data sovereignty. Once the Pushshift torrent is downloaded and processed, the user owns the data. There are no API keys to manage, no rate limits to navigate, and no Terms of Service changes that can revoke access. This is a critical development for anyone relying on Reddit data for long-term projects or research.

Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

The tool scales efficiently. The PostgreSQL backend ensures that memory usage remains constant regardless of dataset size. While a single instance can handle tens of millions of posts, the full 2.38 billion post dataset can be managed by running multiple instances segmented by topic. This architecture makes preserving the entirety of Reddit's history a feasible task for individuals and small organizations.

Advanced Capabilities#

Beyond simple browsing, the archive is built for integration and automation. It ships with a full REST API featuring over 30 endpoints. Users can query posts, comments, users, subreddits, and perform aggregations directly against their local database.

Perhaps most notably, the project includes a Model Context Protocol (MCP) server with 29 tools. This allows AI applications to query the local Reddit archive directly, opening up new possibilities for AI-driven analysis and data mining without relying on cloud services. The developer built the tool using Python, PostgreSQL, Jinja2 templates, and Docker, utilizing Claude Code in an experiment of AI-assisted development.

Deployment Options#

The tool is designed to be accessible to users with varying levels of technical expertise. It supports a wide range of hosting scenarios, from the simplest to the most secure. The available self-hosting options include:

  • USB Drive / Local Folder: The most basic setup; just open the HTML files.
  • Home Server (LAN): Serve the archive to devices on a Raspberry Pi or similar hardware.
  • Tor Hidden Service: Two commands enable access via Tor without port forwarding.
  • VPS with HTTPS: Standard web hosting for public or private access.
  • GitHub Pages: Suitable for hosting smaller archives.

A live demo of the archiver is available online, showcasing the static browsing experience. The project code is released into the Public Domain via GitHub, encouraging widespread adoption and contribution.

Looking Ahead#

The release of this archiver tool represents a significant step in the preservation of digital culture. As platforms evolve and restrict access, the ability for individuals to maintain their own archives becomes increasingly valuable. This project provides a robust, scalable, and private method for ensuring that the 2.38 billion posts that constitute Reddit's history remain accessible for future generations.

By democratizing access to massive datasets, the tool empowers researchers, developers, and enthusiasts to continue their work without fear of platform instability. It stands as a testament to the open-source community's ability to respond to centralized control with decentralized solutions.

Continue scrolling for more

AI Transforms Mathematical Research and Proofs
Technology

AI Transforms Mathematical Research and Proofs

Artificial intelligence is shifting from a promise to a reality in mathematics. Machine learning models are now generating original theorems, forcing a reevaluation of research and teaching methods.

Just now
4 min
171
Read Article
Accidents

Crane Collapses on Thai Train, Killing 22

A passenger train traveling from Bangkok to Thailand's northeast was derailed Wednesday morning when a construction crane collapsed onto one of its carriages, resulting in significant casualties.

51m
5 min
7
Read Article
Accidents

Train Crane Collapse in Thailand Kills 22

A catastrophic crane collapse onto a moving train in northern Thailand has resulted in at least 22 fatalities and over 30 injuries, marking a dark day for the nation's transport safety.

56m
5 min
7
Read Article
Prediction Markets Shatter Records with $702M Volume
Economics

Prediction Markets Shatter Records with $702M Volume

Trading volume in prediction markets reached an unprecedented $701.7 million on Monday, with Kalshi emerging as the dominant platform. This record-breaking activity signals growing mainstream adoption despite ongoing regulatory challenges.

58m
5 min
6
Read Article
Entertainment

The Rise of the 'Superdad': When Fatherhood Becomes Everything

From Kieran Culkin to Timothée Chalamet, a new wave of celebrity fathers is redefining the 'superdad' archetype, placing parenthood at the absolute center of their existence.

1h
4 min
7
Read Article
Sports

Thunder Ends Losing Streak Against Wembanyama's Spurs

The Oklahoma City Thunder finally broke through with their first win of the season, delivering a commanding performance against the San Antonio Spurs and their star rookie.

1h
5 min
6
Read Article
2025: The Third Hottest Year on Record
Environment

2025: The Third Hottest Year on Record

Global temperatures soared in 2025, marking the third-hottest year on record. Experts warn the trend will continue into 2026.

1h
3 min
6
Read Article
Israeli Government Attacks on Supreme Court
Politics

Israeli Government Attacks on Supreme Court

The Israeli government is engaged in a fierce campaign against the Supreme Court, casting it as undemocratic to lay the groundwork for disobeying court orders. This analysis examines the escalating conflict and its implications for Israeli democracy.

1h
5 min
6
Read Article
Veteran Sound Mixer Thomas Causey Dies at 76
Entertainment

Veteran Sound Mixer Thomas Causey Dies at 76

Thomas Dewitt Causey, Jr., a veteran production sound mixer who worked on over 85 films including 'Dick Tracy' and 'Broadcast News,' has died at 76 in Cathedral City, California.

1h
3 min
6
Read Article
Politics

NZ Foreign Minister Rebukes Central Bank Governor

New Zealand Foreign Affairs Minister Winston Peters has publicly rebuked the country's new Reserve Bank governor, Anna Breman, for signing a statement backing US Federal Reserve Chair Jerome Powell.

1h
5 min
7
Read Article
🎉

You're all caught up!

Check back later for more stories

Back to Home