Skip to content
Automation AA-012

Keep Sync — Google Keep to Obsidian Converter

Automated converter that transforms 847 Google Keep exports into structured Obsidian markdown with YAML frontmatter, 5-category auto-classification, MD5 deduplication (18.4% duplicate rate), and bidirectional wikilink generation — packaged as a standalone .exe for zero-dependency distribution.

01 — Problem

Three Years of Thinking, Trapped in the Wrong System

I had accumulated 847 notes in Google Keep over 3 years — meeting fragments, project ideas, reading notes, half-formed architectural decisions, and the occasional grocery list. Keep excels at capture but fails at retrieval. There’s no linking between notes, no tagging taxonomy, no way to surface connections between an idea I had in March and a project I started in November. When I committed to Obsidian as my knowledge management system, the migration problem was immediate: 847 unstructured text files needed to become properly formatted markdown with YAML frontmatter, categories, cross-references, and deduplication. Doing this by hand would have taken 40+ hours.

I needed a converter that understood the implicit structure of messy notes — distinguishing a todo list from a journal entry from a technical reference — and imposed the Obsidian conventions automatically.

02 — Architecture

Parse, Classify, Deduplicate, Link

The converter operates in four stages on the Google Keep export:

Stage 1 — Export Parsing

Reads the Google Keep text export (one .txt file per note) and extracts content, creation date, and any embedded metadata. Handles encoding inconsistencies and strips Keep-specific formatting artifacts that don’t translate to markdown.

Stage 2 — Auto-Classification

Regex-based heuristics classify each note into one of 5 categories: Todo (checkbox patterns), Thoughts (reflective language, first-person statements), Sensitive (email addresses, phone numbers, API key patterns), Links (URL-heavy content), and Other. Classification drives both folder placement and frontmatter metadata in the output. The sensitive category triggers a privacy flag that excludes those notes from any cross-reference indexing.

Stage 3 — Content-Hash Deduplication

MD5 hashing on normalized content detects duplicate notes — a common artifact of Keep’s sync behavior, where the same note appears multiple times with slightly different timestamps. Deduplication reduced my 847 exports to 691 unique notes, eliminating 156 duplicates that would have polluted the Obsidian vault.

Stage 4 — Obsidian Output Generation

Each unique note becomes a markdown file with YAML frontmatter (title, date, category, tags), auto-generated content tags extracted from keyword frequency, and bidirectional wikilinks ([[related note]]) generated by detecting shared keywords across notes. An archive-after-processing flag ensures safe re-runs on incremental exports without reprocessing previously converted notes.

Key Design Decisions

Why regex-based classification instead of an LLM? The classification task is shallow — it’s distinguishing between checkboxes and paragraphs, not understanding semantic intent. Regex handles this at thousands of notes per second with deterministic results. An LLM would add latency, cost, and non-determinism to a problem that doesn’t need any of those. Sometimes the right tool is the old one.

Why package as a standalone .exe? I built this for myself, but I also wanted to share it with colleagues who use Keep but don’t have Python installed. PyInstaller produces a single executable that runs on any Windows machine without dependencies. The .exe adds 15 seconds to startup (bundled interpreter) but eliminates the “install Python, create a venv, pip install” barrier entirely.

03 — Outcomes

Measured Results

847
Notes Processed

from 3 years of Google Keep accumulation

156
Duplicates Eliminated

via MD5 content hashing — 18.4% of total exports

5
Auto-Categories

Todo, Thoughts, Sensitive, Links, and Other

<8s
Total Processing Time

for the complete 847-note export on a standard machine

04 — Reflection

Migration Is a Knowledge Archaeology Problem

The unexpected value of this project was the classification step. Forcing every note through a category heuristic revealed patterns in my own note-taking behavior: 34% were todos I’d never completed, 22% were links I’d saved and never revisited, and only 28% contained substantive thinking worth preserving. The migration wasn’t just a format conversion — it was a curation. The tool helped me distinguish signal from noise across 3 years of accumulated capture.

What I’d change: the bidirectional wikilink generation is based on keyword overlap, which produces some false connections (two notes mentioning “Python” aren’t necessarily related). A lightweight embedding-based similarity check would produce more meaningful links, though it would require adding a dependency on a sentence-transformer model — which conflicts with the .exe packaging goal. The right answer is probably offering both modes: fast keyword-based linking by default, and optional semantic linking when Python is available.

“The value of a note-taking system isn’t how much it captures. It’s how much it lets you throw away with confidence.”

Outcomes

847 notes processed in under 8 seconds; 156 duplicates eliminated (18.4%); 5 auto-classification categories; Standalone .exe with zero dependencies