Skip to content
← All Posts
PKM

The Digital Brain: A Self-Hosted Personal Knowledge Management System

I built a system where every note, document, bookmark, voice memo, and wiki page flows into a single AI-searchable knowledge base. Here's the architecture.

The Problem With Tools

I've used Notion, Evernote, Google Keep, Apple Notes, Bear, Roam Research, and half a dozen others. Every one of them solved part of the problem — and created a new silo. My knowledge was scattered across apps, devices, and cloud accounts.

I knew I had the answer to a question somewhere. I just couldn't find it. The information existed, but it wasn't accessible. And every new tool I adopted made the fragmentation worse.

So I stopped looking for the right tool and built the right system.

The Architecture: Four Layers

The Digital Brain isn't a single application. It's a system of connected services, each handling one job well, wired together through automation.

Layer 1: Capture

Everything that enters the system starts here. The goal: make capture so frictionless that there's no excuse not to save something.

  • Obsidian — Quick notes, daily logs, meeting notes, ideas. Synced across devices via Syncthing.
  • Paperless-ngx — Scanned documents, receipts, mail. Drop a file in the consumption folder; OCR handles the rest.
  • SolScribe — Voice memos and recordings. Record, transcribe, and it flows into the system.
  • Linkwarden — Web pages and articles. One-click save from the browser with full-page archiving.
  • FreshRSS — Curated RSS feeds. The internet comes to me, filtered and organized.

Layer 2: Process

Raw information isn't knowledge. This layer transforms captures into something searchable and structured.

  • OCR — Paperless-ngx extracts text from scanned images and PDFs.
  • Transcription — WhisperX converts audio to text with speaker identification.
  • Summarization — Local LLM (Qwen3-8B) generates summaries of long documents and transcripts.
  • Embedding — nomic-embed-text converts text into 768-dimensional vectors for semantic search.
  • Classification — LLM suggests tags, categories, and document types for new content.

All processing happens through n8n workflows, triggered automatically when new content arrives.

Layer 3: Store

Processed knowledge lives in purpose-specific stores, each optimized for a different access pattern.

  • Obsidian vault — Markdown files with backlinks, tags, and graph visualization. The human-readable layer.
  • Paperless-ngx — Document archive with full-text search, tags, and correspondents.
  • Qdrant — Vector embeddings for semantic search. The AI-readable layer.
  • BookStack — Structured wiki for reference material, runbooks, and procedures.
  • Linkwarden — Archived web pages with metadata and tags.

Layer 4: Retrieve

Knowledge you can't find is knowledge you don't have. Multiple retrieval methods ensure you can always find what you need.

  • Keyword search — Obsidian search, Paperless-ngx search, BookStack search. Fast, exact matches.
  • Semantic search — Qdrant finds results by meaning. "How do I fix Docker networking?" finds a note titled "Container bridge mode troubleshooting."
  • Conversational — AnythingLLM lets you ask questions about your documents in natural language, with cited sources.
  • MCP integration — Claude can search my Paperless documents, trigger n8n workflows, and query Qdrant directly through MCP servers.
  • Graph exploration — Obsidian's graph view reveals connections between notes you didn't know existed.

The Automation Layer

The magic isn't in any single service — it's in the connections between them. n8n workflows handle all data movement and processing.

New Document Pipeline

  1. File dropped into Paperless-ngx consumption folder
  2. Paperless OCRs and stores the document
  3. n8n webhook fires on new document
  4. LLM reads the content and suggests tags + document type
  5. Tags applied via Paperless API
  6. Content embedded in Qdrant (semantic search ready)
  7. ntfy sends a notification: "New document processed: [title]"

Total time: under 60 seconds. Zero manual steps.

Voice Memo Pipeline

  1. Record audio in SolScribe (browser or mobile)
  2. WhisperX transcribes with timestamps and speaker detection
  3. SolScribe webhook triggers n8n
  4. LLM summarizes the transcript into key points + action items
  5. Obsidian note created via REST API (title, summary, link to audio)
  6. Full transcript embedded in Qdrant
  7. Notification: "Voice memo processed: [generated title]"

Weekly Knowledge Digest

  1. Every Sunday at 9am, n8n collects all daily notes from the past week
  2. LLM reads them all and extracts: themes, decisions made, action items, open questions
  3. A "Weekly Review" note is created in Obsidian with the synthesis
  4. Unfinished action items are surfaced for follow-up

The Obsidian Vault Structure

Obsidian is the human-facing layer. My vault follows a modified PARA method:

  • Daily Notes/ — One note per day. Quick captures, thoughts, logs. Auto-generated from template.
  • Projects/ — Active project folders with dedicated notes, references, and task lists.
  • Areas/ — Ongoing responsibilities (homelab, finance, health, career).
  • Resources/ — Reference material organized by topic.
  • Archive/ — Completed projects and inactive areas.
  • Templates/ — 15 templates (meeting note, project kickoff, weekly review, decision log, etc.).

Every note uses consistent frontmatter (date, tags, project, status) that enables Dataview queries for dynamic dashboards.

Semantic Search: The Superpower

Traditional search finds documents containing specific words. Semantic search finds documents containing specific ideas.

When I embed a document in Qdrant, the embedding model converts it into a 768-dimensional vector that captures its meaning. Searching for "budget planning strategies" will find a note titled "Q2 Financial Outlook" — because the concepts are semantically related, even though the words don't match.

Three collections power the search:

  • documents — Paperless-ngx content (contracts, receipts, manuals)
  • pkm-knowledge — Obsidian notes and processed content
  • bookstack_embeddings — Wiki pages and technical documentation

A single API call to the Qdrant embedder sub-workflow searches all three collections and returns ranked results. The MCP server exposes this to Claude, so I can ask my AI assistant to search my knowledge base from any conversation.

What This Actually Feels Like

A week into using the system, something clicked. I stopped worrying about where to put information. The answer is always: put it in. The system handles routing, processing, and retrieval.

When I need to find something, I have options. Quick keyword search in Obsidian for exact matches. Semantic search in Qdrant for conceptual matches. Conversational search via AnythingLLM when I'm not sure exactly what I'm looking for.

The weekly review digest surfaces patterns I wouldn't have noticed. The automated embeddings mean every new piece of information immediately becomes searchable. The MCP integration means Claude can reference my personal knowledge base when helping with tasks.

It's not a second brain — it's a searchable memory.

Getting Started

You don't need all of this to start. Begin with two things:

  1. Obsidian — Free, local-first, Markdown-based. Start with daily notes and a few templates. Get the habit of capturing.
  2. Paperless-ngx — One Docker container. Drag documents in. They get OCR'd and searchable. That alone eliminates paper clutter.

Once those two are working, add Qdrant for semantic search. Then n8n to automate the connections. Then AnythingLLM for conversational access. Each layer builds on the last.

The key principle: no single tool does everything, but every tool does its one thing well. The system emerges from the connections between purpose-built components.