Search/Bookmarks/Start Pages

Hoarder (now Karakeep)

Mark

07 Sep 2025 • 3 min read

Hoarder (now Karakeep) is a self-hosted, privacy-first archive for everything you save online: links, full pages, PDFs, images, notes, and even videos. It preserves content with full-page snapshots, extracts metadata and text (including OCR), and layers on AI-assisted tagging and summaries to make large personal or team libraries easier to manage.

It’s designed for people who want control and longevity: researchers, developers, power users, and small teams that prefer keeping data on their own hardware. With browser extensions, mobile apps, RSS ingestion, and a REST API, Karakeep centralizes capture and search while remaining deployable on a home server, NAS, VPS, or Raspberry Pi via Docker.

Use Cases

Personal research library: Archive articles, docs, and PDFs with full-text search and AI summaries to quickly revisit findings.
Team knowledge base (on‑prem): Keep sensitive competitive research, security intel, or customer docs in a self-hosted system with SSO and API integrations.
Change tracking and compliance: Preserve full-page snapshots to guard against link rot and track evolving web content.
Automated collection from sources: Use RSS auto‑hoarding to continuously capture posts, then auto‑tag and route items with rules.
Field capture on mobile: Save photos of whiteboards or receipts and make them searchable with OCR.
Video archiving: Locally capture important talks or tutorials (e.g., via youtube‑dl); plan for higher storage and CPU usage.
Migrations and cleanup: Use bulk actions and exports to reorganize or move collections as needs change.

Strengths

Privacy and data ownership: Self-hosted and open-source; you control infrastructure and retention.
Full‑page archival: Monolith snapshots preserve content even if the live page changes or disappears.
Broad content support: Links, full pages, images, PDFs, notes, and optional video archiving.
AI tagging and summarization: Integrates with OpenAI or local models to reduce manual organization and surface key points.
Powerful search: Full-text search across archived pages, extracted metadata, OCR results.
OCR pipeline: Makes scanned documents and screenshots searchable.
Automation and rules: Auto‑tag, categorize, and process at scale.
Capture everywhere: Browser extensions, web clipper, mobile apps, and RSS ingestion.
Integrations and API: REST API plus SSO support for broader workflows.
Docker‑friendly deployment: Runs on personal servers, NAS, VPS, or Raspberry Pi.
Active community: Ongoing development, guides, and community discussions.

Limitations

Self‑hosting overhead: You are responsible for provisioning, updates, backups, and security.
Storage and resource demands: Full-page snapshots, OCR, and video capture increase CPU and disk usage.
Opaque on‑disk asset layout (some users): Assets may be stored with UUID filenames (e.g., f59b4fbf.../asset.bin), which complicates raw inspection or manual migrations (see GitHub issue).
UX/feature tradeoffs: Some users report gaps or preferences around views and search behavior; expect a learning curve.
AI dependencies (optional): Cloud AI requires API keys and budget; local models add setup complexity.
Legal and policy considerations: Ensure video archiving and web capture comply with site terms and applicable laws.

Final Thoughts

Karakeep is a strong choice if you need long‑term, privacy‑respecting archival across mixed media with robust search and automation. It can replace a patchwork of bookmarkers and read‑later apps, provided you are comfortable running and maintaining a service.

Start with Docker on a VPS or NAS; enable HTTPS and regular backups from day one.
Plan storage: Monolith snapshots and videos grow quickly; monitor disk usage and set retention policies.
Enable OCR selectively for large image/PDF batches to manage CPU load.
Decide on AI: Use OpenAI for convenience or configure local models if you need tighter data control.
Leverage rules for auto‑tagging and routing; periodically prune and re‑index for performance.
Track releases and community notes to close UX gaps or adopt new integrations as they mature.