K Knowledge Engine

What it's for

Real use cases

Concrete workflows a reverse-engineering or malware-analysis team runs on a shared corpus. Each one maps to features that exist today; the commands are real.

1Build and de-duplicate a sample corpus

Point a folder of samples at ke-simple. Identical bytes are stored once (content-addressed by SHA-256), even when the same file arrives under different names or from different sources. Each upload adds a bucket/key location and keeps the original filename as an alias.

bun run ingest --bucket malware-corpus --key-prefix family-a ~/samples/*.bin
# re-uploading the same bytes is idempotent; the corpus never stores a duplicate

Then browse buckets, tag by family, and pivot by hash, filename, or tag from the web UI or the API.

2Hunt for similar functions and binaries

Given a known-bad sample, find structurally similar code across the corpus — even when names and bytes differ. Three independent angles, each a plugin:

# which binaries look like this one? (top 5, by shared microcode structure)
bun run operation kep-bbsh match_resource '{"bucket":"malware-corpus","key":"family-a/sample.i64","limit":5}'

# which functions elsewhere resemble the one at this address?
bun run operation kep-flake match_function '{"bucket":"malware-corpus","key":"family-a/sample.i64","address":4198400}'

In the web UI this is one click: open an object and use Find similar (pick the algorithm) — the same engine, with results linking straight to the matched objects.

Object page with metadata, a Find-similar match, and per-plugin processing status
An object's metadata, a BBSH match to a repacked variant, and per-plugin processing status.

3Identify library code with FLIRT

Store FLIRT signatures alongside binaries, match an unknown against the corpus of known libraries, and download the matching .sig to apply in IDA — so analysts stop re-reversing zlib, OpenSSL, or a vendored SDK.

bun run operation kep-flirt match_resource '{"bucket":"libs","key":"unknown.i64","limit":10}'
bun run operation kep-flirt get_signature '{"bucket":"libs","key":"libcrypto.sig"}'   # base64 .sig to apply

4Link IDBs to their source binaries — automatically

An IDB embeds the SHA-256 of the binary it was built from. ke-simple reads it (JS-only, no IDA needed) and records an idb_for relation between the two by content identity. The link is recorded the moment the IDB is processed and resolves whenever the binary arrives — either order, no back-fill — and the IDB inherits the binary's tags.

This is how a corpus stays coherent when binaries and their IDBs are ingested separately, by different people, at different times.

5Search function names across the whole corpus

Find every binary that defines a function of interest — a decryptor, a known-bad routine, an exported symbol — regardless of which bucket it lives in.

bun run operation kep-funcnames search_function_name '{"name":"decrypt_payload","limit":20}'

6Scope similarity by architecture or compiler

The metadata plugin records cpu, file type, compiler, and more. Similarity matchers join it, so you can restrict a hunt to, say, ARM PE binaries and skip cross-architecture noise.

bun run operation kep-bbsh match_resource '{"bucket":"corpus","key":"a.i64","cpu":"arm","file_type":"pe","limit":10}'

7Collaborate through Git

Host repositories and push binaries and IDBs as you would to any Git host (IDBs travel efficiently via the git-ida filter). A small .ke/actions.yml in the repo decides which files become corpus objects, in which buckets, with which tags and which plugins run.

version: 1
rules:
  - match: "**/*.i64"
    bucket: idb-analysis
    tags: [ida, "{branch}"]
    process: [kep-bbsh, kep-funcnames]

Or go the other way: select objects while browsing and build a ready-to-clone repository from a corpus selection — KE writes the files plus a matching .ke/actions.yml, so later pushes project straight back in. (See the walkthrough.)

8Automate by hash

Pipelines never need the UI or even a bucket/key. Once a sample is in the corpus, fetch its exact bytes by hash, list everywhere it lives, or attach a tag — all over plain HTTP.

curl http://127.0.0.1:3000/objects/<sha256>/content --output sample.bin
curl http://127.0.0.1:3000/objects/<sha256>/locations
curl -X POST -H 'content-type: application/json' -d '{"name":"triaged"}' \
  http://127.0.0.1:3000/objects/<sha256>/tags

Full endpoint and CLI listings are in the Reference.