Hash-first identity
An object is a blob of bytes addressed by its SHA-256. That hash is the identity: lookups, downloads, de-duplication, and processing all key off it. A bucket/key is a mutable location pointing at an object — one object can have many locations, and renaming or moving a location never changes identity. Names, tags, aliases, relations, and notes are metadata attached to the hash.
The pipeline
Every ingest path funnels into one storage service, which records identity and emits an event. A reactor turns events into jobs; workers run extractors and import their output; operations answer queries. There is no hidden orchestration — each stage is observable.
Because extraction is deduped by plugin + sha256, the same bytes under many names or
repositories are analyzed once.
Plugins, extractors, operations
A plugin is a static package: a manifest plus SQL plus one or more extractors. An
extractor is an isolated process with a manifest-in / files-out contract — it runs as
headless IDA (ida-python), or plain python/node — and emits NDJSON
rows the platform imports into plugin-owned tables. An operation is plugin-declared SQL
exposed as a query (e.g. match_resource). The platform owns storage, jobs, transactions, and
isolation; the plugin owns its analysis and its data contract.
KE Actions: Git as a source, not a store
A KE-hosted repository can include a .ke/actions.yml that declares how pushed files become
corpus objects. The repository layout stays free-form; the config declares the projection.
version: 1
rules:
- id: idbs
match: "**/*.i64" # glob; a file may match several rules
bucket: idb-analysis # target bucket (decoupled from the repo name)
key: "{path}" # template: {repo} {path} {basename} {sha256} {branch} {commit}
tags: [ida, "{branch}"] # templated searchable labels
process: [kep-bbsh, kep-funcnames] # plugins to run after storing
One repo can feed many buckets, and many repos can feed one bucket — the rules decide. An invalid config is rejected at push time.
Gitea transport (optional)
For hosted repositories, KE drives a Gitea instance behind the scenes: Gitea owns repos, HTTP transport,
and storage; KE owns identity and projection. KE provisions a backing Gitea account and a per-user token,
so a user authenticates once and pushes over HTTPS as themselves — not a shared admin token. A push
webhook (or gitea:sync) runs the same Actions projection. Transport is HTTPS only.
One Postgres
Identity, metadata, jobs, and all plugin data live in a single Postgres (ParadeDB, which adds BM25 full-text and pgvector). Blob bytes are stored on local disk by hash. Gitea, when used, runs alongside with its own embedded SQLite. There is no separate search server, vector server, or task broker to run.
Glossary
| Term | Meaning |
|---|---|
| Object | Immutable bytes addressed by SHA-256. |
| Asset / version | The stored entity backed by an object, with version history. |
| Location | A mutable bucket/key reference to an asset. |
| Bucket | A queryable collection of locations. Not a Git repo. |
| Tag / alias / relation | Searchable label / alternate name / inter-object edge (e.g. idb_for). |
| Plugin / extractor / operation | Analysis package / its process / its query SQL. |
| KE Action | A rule mapping pushed paths → bucket/key/tags/processing. |