Data Formats and Schema

This project moves review data through a small set of predictable formats:

Temporary JSON files under $TMPDIR/claude (or /tmp/claude if TMPDIR is not set)
Hook payloads passed over stdin/stdout
Plugin and marketplace metadata files
A local SQLite database for review history and analytics

If you are debugging a review flow, installing plugins, or querying past review data, these are the formats that matter.

At A Glance

Format	Example location	Produced by	Used by
Review snapshot JSON	`$TMPDIR/claude/pr-123-reviews.json`	`reviews fetch`	`reviews post`, `reviews store`
Pending review JSON	`$TMPDIR/claude/pr-owner-repo-123-pending-review.json`	`reviews pending-fetch`	`reviews pending-update`
Inline comment batch JSON	any file or stdin	user or slash command workflow	`pr post-comment`
Review database	`.claude/data/reviews.db`	`reviews store`	`db` commands and auto-skip logic
Hook payload JSON	stdin/stdout	Claude Code hooks	hook scripts in `scripts/`

Note: The command docs often say /tmp/claude/..., but the code actually uses $TMPDIR/claude when TMPDIR is set.

Temporary JSON Artifacts

Review snapshot: `pr-<pr_number>-reviews.json`

This is the main handoff file for the review-reply workflow. It is created by myk-claude-tools reviews fetch and groups fetched threads by reviewer source.

The file is built in myk_claude_tools/reviews/fetch.py like this:

final_output = {
    "metadata": {
        "owner": owner,
        "repo": repo,
        "pr_number": int(pr_number),
        "json_path": str(json_path),
    },
    "human": categorized["human"],
    "qodo": categorized["qodo"],
    "coderabbit": categorized["coderabbit"],
}

A real test fixture from tests/test_store_reviews_to_db.py shows the shape that reviews store accepts:

data = {
    "metadata": {
        "owner": "org",
        "repo": "repo",
        "pr_number": 1,
    },
    "human": [
        {
            "thread_id": "thread_abc",
            "node_id": "node_xyz",
            "comment_id": 12345,
            "author": "reviewer1",
            "path": "src/main.py",
            "line": 100,
            "body": "Please fix this bug",
            "priority": "HIGH",
            "status": "addressed",
            "reply": "Fixed in commit abc123",
            "skip_reason": None,
            "posted_at": "2024-01-15T10:00:00Z",
            "resolved_at": "2024-01-15T10:05:00Z",
            "type": "outside_diff_comment",
        }
    ],
    "qodo": [],
    "coderabbit": [],
}

What you can expect in each thread object:

GitHub identifiers: thread_id, node_id, comment_id
Location fields: path, line
Review text: body, reply, skip_reason
Workflow state: status, posted_at, resolved_at
Classification: source, priority, and sometimes type

When threads are enriched in process_and_categorize(), the code adds these defaults:

enriched = {
    **thread,
    "source": source,
    "priority": priority,
    "reply": thread.get("reply"),
    "status": thread.get("status", "pending"),
}

That means a freshly fetched thread usually starts with:

status: "pending"
reply: null
source: "human", "qodo", or "coderabbit"
priority: "HIGH", "MEDIUM", or "LOW"

Special synthesized comment types

Not every review note comes from a normal GitHub review thread. CodeRabbit body-parsed comments are converted into thread-like objects with extra fields.

From myk_claude_tools/reviews/fetch.py:

threads.append({
    "thread_id": None,
    "node_id": node_id,
    "comment_id": review_id,
    "author": author,
    "path": path,
    "line": line_int,
    "end_line": end_line_int,
    "body": body,
    "category": comment.get("category", ""),
    "severity": comment.get("severity", ""),
    "replies": [],
    "type": thread_type,
    "review_id": review_id,
    "suggestion_index": idx,
})

These special type values are currently:

outside_diff_comment
nitpick_comment
duplicate_comment

Warning: These synthesized comments do not behave like normal GitHub review threads. They are handled later as consolidated PR comments rather than replied to inline.

Status values and resolution rules

The reply/posting step recognizes these statuses in myk_claude_tools/reviews/post.py:

Status handling:
  - addressed: Post reply and resolve thread
  - not_addressed: Post reply and resolve thread (similar to addressed)
  - skipped: Post reply (with skip reason) and resolve thread
  - pending: Skip (not processed yet)
  - failed: Retry posting

Resolution behavior by source:
  - qodo/coderabbit: Always resolve threads after replying
  - human: Only resolve if status is "addressed"; skipped/not_addressed
          threads are not resolved to allow reviewer follow-up

That source-specific rule is important if you are reading posted_at and resolved_at later:

AI review threads are usually both replied to and resolved.
Human review threads may be replied to without being resolved.

Atomic writes and cleanup

The review snapshot is written atomically and the temp directory is created with restricted permissions:

tmp_base = Path(os.environ.get("TMPDIR") or tempfile.gettempdir())
out_dir = tmp_base / "claude"
out_dir.mkdir(parents=True, exist_ok=True, mode=0o700)

The file itself is written through a temp file and renamed into place:

fd, tmp_json_path = tempfile.mkstemp(
    prefix=f"pr-{pr_number}-reviews.json.",
    dir=str(out_dir),
)
...
os.replace(tmp_path, json_path)

The fetch module also tracks temp files and removes any orphaned .new files during cleanup.

Tip: reviews fetch prints the full JSON to stdout as well as saving it to disk. reviews pending-fetch behaves differently and prints only the saved file path.

Pending review snapshot: `pr-<owner>-<repo>-<pr_number>-pending-review.json`

This file is created by myk-claude-tools reviews pending-fetch. It is used for the “refine an existing draft review” workflow.

The exact output shape comes from myk_claude_tools/reviews/pending_fetch.py:

final_output: dict[str, Any] = {
    "metadata": {
        "owner": owner,
        "repo": repo,
        "pr_number": pr_number_int,
        "review_id": review_id,
        "username": username,
        "json_path": str(json_path),
    },
    "comments": comments,
    "diff": diff,
}

Each comment starts with this structure:

comment: dict[str, Any] = {
    "id": c.get("id"),
    "path": c.get("path"),
    "line": c.get("line"),
    "side": c.get("side", "RIGHT"),
    "body": c.get("body", ""),
    "diff_hunk": c.get("diff_hunk", ""),
    "refined_body": None,
    "status": "pending",
}

What each field is for:

id: the GitHub review comment ID to patch later
path, line, side: where the draft comment is attached
body: the original comment text
diff_hunk: nearby diff context
refined_body: where your edited version goes
status: workflow state, typically moved from pending to accepted

If you later run pending-update, the file may also include optional submission metadata. The module documents the expected structure like this:

Expected JSON structure:
  {
    "metadata": {
      "owner": "...",
      "repo": "...",
      "pr_number": 123,
      "review_id": 456,
      "submit_action": "COMMENT",        # optional
      "submit_summary": "Summary text"    # optional
    },
    "comments": [
      {
        "id": 789,
        "path": "src/main.py",
        "line": 42,
        "body": "original comment",
        "refined_body": "refined version",
        "status": "accepted"
      }
    ]
  }

Valid submit_action values come directly from code:

VALID_SUBMIT_ACTIONS = {"COMMENT", "APPROVE", "REQUEST_CHANGES"}

Note: reviews pending-update reads this JSON and updates GitHub comments, but it does not rewrite the local JSON file the way reviews post does.

Batched inline comment input

myk-claude-tools pr post-comment accepts a much simpler format: a JSON array of {path, line, body} objects.

The exact example in myk_claude_tools/pr/post_comment.py is:

[
  {
    "path": "src/main.py",
    "line": 42,
    "body": "### [CRITICAL] SQL Injection\n\nDescription..."
  },
  {
    "path": "src/utils.py",
    "line": 15,
    "body": "### [WARNING] Missing error handling\n\nDescription..."
  }
]

Severity markers are parsed from the first line of body:

Severity Markers:
    - ### [CRITICAL] Title - For critical security/functionality issues
    - ### [WARNING] Title  - For important but non-critical issues
    - ### [SUGGESTION] Title - For code improvements and suggestions

One practical detail from the loader: it can recover from prepended shell or hook output by scanning for the first line that starts with [ and attempting JSON parsing from there.

Other JSON you may see: `pr diff` output

myk-claude-tools pr diff prints a JSON object to stdout rather than saving a fixed temp file. This is often used as structured input for PR review workflows.

From myk_claude_tools/pr/diff.py:

output = {
    "metadata": {
        "owner": pr_info.owner,
        "repo": pr_info.repo,
        "pr_number": pr_info.pr_number,
        "head_sha": head_sha,
        "base_ref": base_ref,
        "title": pr_title,
        "state": pr_state,
    },
    "diff": pr_diff,
    "files": files,
}

Each files entry includes:

{
    "path": f["filename"],
    "status": f["status"],
    "additions": f["additions"],
    "deletions": f["deletions"],
    "patch": f.get("patch", ""),
}

Hook Payload Expectations

Hook registration lives in settings.json. The repo uses four hook event types:

"hooks": {
  "Notification": [...],
  "PreToolUse": [...],
  "UserPromptSubmit": [...],
  "SessionStart": [...]
}

`PreToolUse`: stdin JSON in, optional deny JSON out

Both scripts/rule-enforcer.py and scripts/git-protection.py read JSON from stdin and look for tool_name plus tool_input.

From rule-enforcer.py:

input_data = json.loads(sys.stdin.read())
tool_name = input_data.get("tool_name", "")
tool_input = input_data.get("tool_input", {})

The test suite shows the expected input shape clearly:

input_data = {
    "tool_name": "Bash",
    "tool_input": {"command": "python script.py"},
}

When a command is denied, the scripts return a JSON envelope under hookSpecificOutput. From rule-enforcer.py:

output = {
    "hookSpecificOutput": {
        "hookEventName": "PreToolUse",
        "permissionDecision": "deny",
        "permissionDecisionReason": "Direct python/pip commands are forbidden.",
        "additionalContext": (
            "You attempted to run python/pip directly. Instead:\n"
            "1. Delegate Python tasks to the python-expert agent\n"
            "2. Use 'uv run script.py' to run Python scripts\n"
            "3. Use 'uvx package-name' to run package CLIs\n"
            "See: https://docs.astral.sh/uv/"
        ),
    }
}

In practice:

tool_name is usually "Bash" for these hooks
tool_input.command is the important field for command hooks
allow decisions are normally represented by exiting successfully without printing a deny payload

Warning: The two command hooks have different failure behavior. rule-enforcer.py fails open on unexpected errors, while git-protection.py fails closed and returns a deny payload if it crashes.

The prompt-based destructive-command gate

There is also a prompt-style PreToolUse hook in settings.json. It asks an LLM to classify destructive shell commands and requires a very small JSON response.

The configured prompt ends with this exact contract:

Respond with JSON: {"decision": "approve" or "block" or "ask", "reason": "brief explanation"}

If you are building tooling around this repo, those are the only three supported decisions for that gate:

approve
block
ask

`UserPromptSubmit`: stdin ignored, context JSON returned

scripts/rule-injector.py reads stdin only because the hook protocol expects it, then returns structured JSON with additional prompt context.

From the script:

output = {"hookSpecificOutput": {"hookEventName": "UserPromptSubmit", "additionalContext": rule_reminder}}

That means the payload contract is simple:

input: whatever Claude Code provides on stdin
output: JSON with hookSpecificOutput.hookEventName and additionalContext

`Notification`: JSON with a top-level `message`

scripts/my-notifier.sh expects JSON on stdin and reads one field:

if ! notification_message=$(echo "$input_json" | jq -r '.message' 2>&1); then
    echo "Error: Failed to parse JSON - $notification_message" >&2
    exit 1
fi

Practical rules for this hook:

message must be present
message must not be empty or null
the script does not read nested fields

A minimal valid payload looks like:

{
  "message": "Review completed"
}

`SessionStart`: plain text, not JSON

scripts/session-start-check.sh is the outlier. It does not parse JSON input, and when it finds missing tools or plugins it prints a plain-text report.

The report starts like this:

MISSING_TOOLS_REPORT:

[AI INSTRUCTION - YOU MUST FOLLOW THIS]
Some tools required by this configuration are missing.

It then prints sections for critical and optional tools, install hints, and explicit instructions about asking the user for help installing them.

Warning: SessionStart output is plain text, not JSON. If you are consuming hook output programmatically, do not assume every hook in this repo uses the same encoding.

Plugin And Marketplace Metadata

Marketplace manifest: `.claude-plugin/marketplace.json`

The marketplace index describes which plugins are published from this repository.

A real entry looks like this:

{
  "name": "myk-org",
  "owner": {
    "name": "myk-org"
  },
  "plugins": [
    {
      "name": "myk-github",
      "source": "./plugins/myk-github",
      "description": "GitHub operations - PR reviews, releases, review handling, CodeRabbit rate limits",
      "version": "1.7.2"
    },
    {
      "name": "myk-review",
      "source": "./plugins/myk-review",
      "description": "Local code review and review database operations",
      "version": "1.7.2"
    },
    {
      "name": "myk-acpx",
      "source": "./plugins/myk-acpx",
      "description": "Multi-agent prompt execution via acpx (Agent Client Protocol)",
      "version": "1.7.2"
    }
  ]
}

What the fields mean:

name: marketplace namespace
owner.name: display owner for the marketplace
plugins[]: published plugin entries
source: repo-relative plugin directory
version: marketplace-published version for that plugin entry

Per-plugin manifest: `plugins/<plugin>/.claude-plugin/plugin.json`

Each plugin also ships its own manifest. For example, plugins/myk-github/.claude-plugin/plugin.json:

{
  "name": "myk-github",
  "version": "1.4.3",
  "description": "GitHub operations for Claude Code - PR reviews, releases, review handling, and CodeRabbit rate limits",
  "author": {
    "name": "myk-org"
  },
  "repository": "https://github.com/myk-org/claude-code-config",
  "license": "MIT",
  "keywords": ["github", "pr-review", "refine-review", "release", "code-review", "coderabbit", "rate-limit"]
}

The manifest format used across this repo is intentionally small:

name
version
description
author.name
repository
license
keywords

Command metadata in `plugins//commands/.md`

Each slash command is a Markdown file with YAML frontmatter. A real example from plugins/myk-github/commands/pr-review.md:

---
description: Review a GitHub PR and post inline comments on selected findings
argument-hint: [PR_NUMBER|PR_URL]
allowed-tools: Bash(myk-claude-tools:*), Bash(uv:*), Bash(git:*), Bash(gh:*), AskUserQuestion, Task
---

Those frontmatter keys are the command schema used in this repo:

description: what the command does
argument-hint: how the command should be invoked
allowed-tools: which Claude Code tools the command is allowed to use

You can see the same pattern repeated across command files such as:

plugins/myk-review/commands/local.md
plugins/myk-review/commands/query-db.md
plugins/myk-github/commands/release.md
plugins/myk-acpx/commands/prompt.md

Runtime plugin metadata in `settings.json`

The checked-in settings.json also records which plugins are enabled and which extra marketplaces are known.

From the file:

"enabledPlugins": {
  "myk-review@myk-org": true,
  "myk-github@myk-org": true,
  "myk-acpx@myk-org": true
},
"extraKnownMarketplaces": {
  "cli-anything": {
    "source": {
      "source": "github",
      "repo": "HKUDS/CLI-Anything"
    }
  },
  "worktrunk": {
    "source": {
      "source": "github",
      "repo": "max-sixty/worktrunk"
    }
  }
}

This is runtime configuration rather than plugin packaging metadata, but it is still part of the repo’s plugin schema story.

SQLite Review Database Schema

Location and lifecycle

The review database lives at:

<git-root>/.claude/data/reviews.db

The storage path is set in myk_claude_tools/reviews/store.py:

db_path = project_root / ".claude" / "data" / "reviews.db"

The storage workflow is:

Read a completed review JSON file
Create the database directory if needed
Insert one row into reviews
Insert one row per comment into comments
Commit the transaction
Delete the JSON file on success

The delete step is explicit:

json_path.unlink()

Warning: reviews store is intentionally destructive for the temp artifact. After a successful import, the JSON file is removed.

Table definitions

The schema is defined directly in Python as SQL:

CREATE TABLE IF NOT EXISTS reviews (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    pr_number INTEGER NOT NULL,
    owner TEXT NOT NULL,
    repo TEXT NOT NULL,
    commit_sha TEXT NOT NULL,
    created_at TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS comments (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    review_id INTEGER NOT NULL REFERENCES reviews(id),
    source TEXT NOT NULL,
    thread_id TEXT,
    node_id TEXT,
    comment_id INTEGER,
    author TEXT,
    path TEXT,
    line INTEGER,
    body TEXT,
    priority TEXT,
    status TEXT,
    reply TEXT,
    skip_reason TEXT,
    posted_at TEXT,
    resolved_at TEXT,
    type TEXT DEFAULT NULL
);

Indexes are also created for the most common lookups:

CREATE INDEX IF NOT EXISTS idx_comments_review_id ON comments(review_id);
CREATE INDEX IF NOT EXISTS idx_comments_source ON comments(source);
CREATE INDEX IF NOT EXISTS idx_comments_status ON comments(status);
CREATE INDEX IF NOT EXISTS idx_reviews_pr ON reviews(owner, repo, pr_number);
CREATE INDEX IF NOT EXISTS idx_reviews_commit ON reviews(commit_sha);

What the columns mean

For end users, the important columns are:

reviews.id: one stored review run
reviews.pr_number, owner, repo: which PR the review belongs to
reviews.commit_sha: the commit SHA captured at store time
reviews.created_at: when that database row was written

And in comments:

review_id: foreign key back to reviews.id
source: human, qodo, or coderabbit
thread_id, node_id, comment_id: GitHub-side identifiers
path, line: where the comment points
body: the original review text
priority: HIGH, MEDIUM, or LOW
status: pending, addressed, not_addressed, skipped, or failed
reply: the reply text posted back to GitHub
skip_reason: why something was skipped
posted_at, resolved_at: workflow timestamps
type: special synthesized comment type such as outside_diff_comment

A real test confirms that all of these fields are stored as expected:

assert row[0] == "thread_abc"
assert row[1] == "node_xyz"
assert row[2] == 12345
assert row[3] == "reviewer1"
assert row[4] == "src/main.py"
assert row[5] == 100
assert row[6] == "Please fix this bug"
assert row[7] == "HIGH"
assert row[8] == "addressed"
assert row[9] == "Fixed in commit abc123"
assert row[10] == "2024-01-15T10:00:00Z"
assert row[11] == "2024-01-15T10:05:00Z"
assert row[12] == "outside_diff_comment"

Append-only behavior

Stored reviews are append-only. Re-running storage for the same PR creates a new reviews row instead of overwriting the old one.

That behavior is tested explicitly:

review_id1 = store_reviews.insert_review(conn, "owner", "repo", 123, "abc1234567")
review_id2 = store_reviews.insert_review(conn, "owner", "repo", 123, "def7890123")

assert review_id1 != review_id2

This means the database preserves history across multiple review passes on the same PR.

Schema migration: the `type` column

Older databases may not have comments.type. The code upgrades them automatically on startup.

From create_tables():

cursor = conn.execute("PRAGMA table_info(comments)")
columns = {row[1] for row in cursor.fetchall()}
if "type" not in columns:
    conn.execute("ALTER TABLE comments ADD COLUMN type TEXT DEFAULT NULL")

From ReviewDB._migrate_schema():

cursor = conn.execute("PRAGMA table_info(comments)")
columns = {row[1] for row in cursor.fetchall()}
if "type" not in columns:
    conn.execute("ALTER TABLE comments ADD COLUMN type TEXT DEFAULT NULL")
    conn.commit()

Note: There is no separate migration framework in this repo for the review database. The migration is code-driven and safe to run repeatedly.

Read-only query rules

The analytics/query layer is intentionally read-only. ReviewDB.query() only accepts SELECT and WITH statements.

The key safety check is:

if not sql_upper.startswith(("SELECT", "WITH")):
    raise ValueError("Only SELECT/CTE queries are allowed for safety")

It also rejects multiple statements and blocks modifying keywords such as:

INSERT
UPDATE
DELETE
DROP
ALTER
CREATE
ATTACH
DETACH
PRAGMA

This is why myk-claude-tools db query is safe for analytics but not for schema changes.

Dismissed-comment lookups and auto-skip semantics

The database is not only for reporting. It also powers auto-skip behavior during reviews fetch.

get_dismissed_comments() deliberately includes:

all not_addressed comments
all skipped comments
only some addressed comments, when type is a special synthesized type

The SQL condition is:

AND (
    c.status IN ('not_addressed', 'skipped')
    OR (c.status = 'addressed'
        AND c.type IN ('outside_diff_comment', 'nitpick_comment', 'duplicate_comment'))
)

That rule exists because those special comment types do not map cleanly to resolvable GitHub review threads. The database becomes the only reliable place to remember that they were already handled.

`db find-similar` stdin format

myk-claude-tools db find-similar reads JSON from stdin and expects a single object with path and body.

The CLI implementation does this:

input_data = json.load(sys.stdin)
path = input_data.get("path", "")
body = input_data.get("body", "")

The test suite uses this exact input:

input_json = json.dumps({"path": "path/to/file.py", "body": "Add skip option"})

Tip: Pass a single JSON object to db find-similar, not an array.

Practical Rules Of Thumb

Use reviews fetch when you need a full review snapshot grouped into human, qodo, and coderabbit.
Use reviews pending-fetch when you already have a pending GitHub review and want to refine its draft comments.
Use pr post-comment when you only need to post a simple batch of inline comments.
Treat reviews.db as append-only history, not as a scratch database.
Expect JSON for PreToolUse, UserPromptSubmit, and Notification, but plain text for SessionStart.
If a comment has type: outside_diff_comment, nitpick_comment, or duplicate_comment, expect different posting and storage behavior than a normal inline thread.