K Knowledge Engine

Binary-analysis corpus

Every binary, addressed by content.
Searchable, linked, similar-matched.

A hash-first knowledge engine for reverse engineering: store binaries and IDA databases by their bytes, find structurally similar code across the whole corpus, and project files straight from Git.

self-hosted · one Bun service · one Postgres · no Celery / Meilisearch / Chroma / Redis

See it in ~30 seconds

From git push to a searchable, matchable corpus

Create a repo, push a binary and its IDB — KE projects them into a bucket, links them, and they're instantly ready to search and similarity-match.

A scripted illustration of the basic flow. Take the real UI tour →

Why ke-simple

What makes it different

identity

Content is identity

Files are addressed by SHA-256 and stored once — duplicates collapse automatically. Names, buckets, and locations are just mutable metadata around the hash.

operate

One service, one Postgres

No Celery, Meilisearch, ChromaDB, or Redis to run. Full-text and vector search live in Postgres; bytes live on disk. Stand it up with a single command.

analyze

Similarity across the corpus

Find functions and binaries that resemble a known sample — by microcode structure, flow-graph embeddings, shared strings, or FLIRT signatures.

ingest

Git-native projection

Push binaries and IDBs to a hosted repo; a small .ke/actions.yml maps files into buckets with tags and processing — one repo can feed many buckets.

provenance

Links that self-resolve

An IDB records the source binary it was built from; the link is keyed by content hash and resolves in any ingest order — no back-fill, with tag inheritance.

automate

Hash-addressable API

Pipelines fetch any object's exact bytes by hash over plain HTTP — no UI, no bucket/key needed. The web UI and API share one origin.

Who it's for

How it fits together

One front door, one datastore. Every ingest path funnels into a single storage service; a reactor turns changes into jobs; workers run extractors and import their results; operations answer queries.

CLI ingest / REST PUT / Git push ──► storage (sha256 + bucket/key + event) │ reactor ──► jobs ──► workers ──► extractors ──► plugin tables │ search · similarity · provenance

Read the Concepts for the model, or jump to Use cases for concrete workflows.

Start here