Data Formats and Schema
This project moves review data through a small set of predictable formats:
- Temporary JSON files under
$TMPDIR/claude(or/tmp/claudeifTMPDIRis not set) - Hook payloads passed over stdin/stdout
- Plugin and marketplace metadata files
- A local SQLite database for review history and analytics
If you are debugging a review flow, installing plugins, or querying past review data, these are the formats that matter.
At A Glance
| Format | Example location | Produced by | Used by |
|---|---|---|---|
| Review snapshot JSON | $TMPDIR/claude/pr-123-reviews.json |
reviews fetch |
reviews post, reviews store |
| Pending review JSON | $TMPDIR/claude/pr-owner-repo-123-pending-review.json |
reviews pending-fetch |
reviews pending-update |
| Inline comment batch JSON | any file or stdin | user or slash command workflow | pr post-comment |
| Review database | .claude/data/reviews.db |
reviews store |
db commands and auto-skip logic |
| Hook payload JSON | stdin/stdout | Claude Code hooks | hook scripts in scripts/ |
Note: The command docs often say
/tmp/claude/..., but the code actually uses$TMPDIR/claudewhenTMPDIRis set.
Temporary JSON Artifacts
Review snapshot: pr-<pr_number>-reviews.json
This is the main handoff file for the review-reply workflow. It is created by myk-claude-tools reviews fetch and groups fetched threads by reviewer source.
The file is built in myk_claude_tools/reviews/fetch.py like this:
final_output = {
"metadata": {
"owner": owner,
"repo": repo,
"pr_number": int(pr_number),
"json_path": str(json_path),
},
"human": categorized["human"],
"qodo": categorized["qodo"],
"coderabbit": categorized["coderabbit"],
}
A real test fixture from tests/test_store_reviews_to_db.py shows the shape that reviews store accepts:
data = {
"metadata": {
"owner": "org",
"repo": "repo",
"pr_number": 1,
},
"human": [
{
"thread_id": "thread_abc",
"node_id": "node_xyz",
"comment_id": 12345,
"author": "reviewer1",
"path": "src/main.py",
"line": 100,
"body": "Please fix this bug",
"priority": "HIGH",
"status": "addressed",
"reply": "Fixed in commit abc123",
"skip_reason": None,
"posted_at": "2024-01-15T10:00:00Z",
"resolved_at": "2024-01-15T10:05:00Z",
"type": "outside_diff_comment",
}
],
"qodo": [],
"coderabbit": [],
}
What you can expect in each thread object:
- GitHub identifiers:
thread_id,node_id,comment_id - Location fields:
path,line - Review text:
body,reply,skip_reason - Workflow state:
status,posted_at,resolved_at - Classification:
source,priority, and sometimestype
When threads are enriched in process_and_categorize(), the code adds these defaults:
enriched = {
**thread,
"source": source,
"priority": priority,
"reply": thread.get("reply"),
"status": thread.get("status", "pending"),
}
That means a freshly fetched thread usually starts with:
status: "pending"reply: nullsource: "human","qodo", or"coderabbit"priority: "HIGH","MEDIUM", or"LOW"
Special synthesized comment types
Not every review note comes from a normal GitHub review thread. CodeRabbit body-parsed comments are converted into thread-like objects with extra fields.
From myk_claude_tools/reviews/fetch.py:
threads.append({
"thread_id": None,
"node_id": node_id,
"comment_id": review_id,
"author": author,
"path": path,
"line": line_int,
"end_line": end_line_int,
"body": body,
"category": comment.get("category", ""),
"severity": comment.get("severity", ""),
"replies": [],
"type": thread_type,
"review_id": review_id,
"suggestion_index": idx,
})
These special type values are currently:
outside_diff_commentnitpick_commentduplicate_comment
Warning: These synthesized comments do not behave like normal GitHub review threads. They are handled later as consolidated PR comments rather than replied to inline.
Status values and resolution rules
The reply/posting step recognizes these statuses in myk_claude_tools/reviews/post.py:
Status handling:
- addressed: Post reply and resolve thread
- not_addressed: Post reply and resolve thread (similar to addressed)
- skipped: Post reply (with skip reason) and resolve thread
- pending: Skip (not processed yet)
- failed: Retry posting
Resolution behavior by source:
- qodo/coderabbit: Always resolve threads after replying
- human: Only resolve if status is "addressed"; skipped/not_addressed
threads are not resolved to allow reviewer follow-up
That source-specific rule is important if you are reading posted_at and resolved_at later:
- AI review threads are usually both replied to and resolved.
- Human review threads may be replied to without being resolved.
Atomic writes and cleanup
The review snapshot is written atomically and the temp directory is created with restricted permissions:
tmp_base = Path(os.environ.get("TMPDIR") or tempfile.gettempdir())
out_dir = tmp_base / "claude"
out_dir.mkdir(parents=True, exist_ok=True, mode=0o700)
The file itself is written through a temp file and renamed into place:
fd, tmp_json_path = tempfile.mkstemp(
prefix=f"pr-{pr_number}-reviews.json.",
dir=str(out_dir),
)
...
os.replace(tmp_path, json_path)
The fetch module also tracks temp files and removes any orphaned .new files during cleanup.
Tip:
reviews fetchprints the full JSON to stdout as well as saving it to disk.reviews pending-fetchbehaves differently and prints only the saved file path.
Pending review snapshot: pr-<owner>-<repo>-<pr_number>-pending-review.json
This file is created by myk-claude-tools reviews pending-fetch. It is used for the “refine an existing draft review” workflow.
The exact output shape comes from myk_claude_tools/reviews/pending_fetch.py:
final_output: dict[str, Any] = {
"metadata": {
"owner": owner,
"repo": repo,
"pr_number": pr_number_int,
"review_id": review_id,
"username": username,
"json_path": str(json_path),
},
"comments": comments,
"diff": diff,
}
Each comment starts with this structure:
comment: dict[str, Any] = {
"id": c.get("id"),
"path": c.get("path"),
"line": c.get("line"),
"side": c.get("side", "RIGHT"),
"body": c.get("body", ""),
"diff_hunk": c.get("diff_hunk", ""),
"refined_body": None,
"status": "pending",
}
What each field is for:
id: the GitHub review comment ID to patch laterpath,line,side: where the draft comment is attachedbody: the original comment textdiff_hunk: nearby diff contextrefined_body: where your edited version goesstatus: workflow state, typically moved frompendingtoaccepted
If you later run pending-update, the file may also include optional submission metadata. The module documents the expected structure like this:
Expected JSON structure:
{
"metadata": {
"owner": "...",
"repo": "...",
"pr_number": 123,
"review_id": 456,
"submit_action": "COMMENT", # optional
"submit_summary": "Summary text" # optional
},
"comments": [
{
"id": 789,
"path": "src/main.py",
"line": 42,
"body": "original comment",
"refined_body": "refined version",
"status": "accepted"
}
]
}
Valid submit_action values come directly from code:
VALID_SUBMIT_ACTIONS = {"COMMENT", "APPROVE", "REQUEST_CHANGES"}
Note:
reviews pending-updatereads this JSON and updates GitHub comments, but it does not rewrite the local JSON file the wayreviews postdoes.
Batched inline comment input
myk-claude-tools pr post-comment accepts a much simpler format: a JSON array of {path, line, body} objects.
The exact example in myk_claude_tools/pr/post_comment.py is:
[
{
"path": "src/main.py",
"line": 42,
"body": "### [CRITICAL] SQL Injection\n\nDescription..."
},
{
"path": "src/utils.py",
"line": 15,
"body": "### [WARNING] Missing error handling\n\nDescription..."
}
]
Severity markers are parsed from the first line of body:
Severity Markers:
- ### [CRITICAL] Title - For critical security/functionality issues
- ### [WARNING] Title - For important but non-critical issues
- ### [SUGGESTION] Title - For code improvements and suggestions
One practical detail from the loader: it can recover from prepended shell or hook output by scanning for the first line that starts with [ and attempting JSON parsing from there.
Other JSON you may see: pr diff output
myk-claude-tools pr diff prints a JSON object to stdout rather than saving a fixed temp file. This is often used as structured input for PR review workflows.
From myk_claude_tools/pr/diff.py:
output = {
"metadata": {
"owner": pr_info.owner,
"repo": pr_info.repo,
"pr_number": pr_info.pr_number,
"head_sha": head_sha,
"base_ref": base_ref,
"title": pr_title,
"state": pr_state,
},
"diff": pr_diff,
"files": files,
}
Each files entry includes:
{
"path": f["filename"],
"status": f["status"],
"additions": f["additions"],
"deletions": f["deletions"],
"patch": f.get("patch", ""),
}
Hook Payload Expectations
Hook registration lives in settings.json. The repo uses four hook event types:
"hooks": {
"Notification": [...],
"PreToolUse": [...],
"UserPromptSubmit": [...],
"SessionStart": [...]
}
PreToolUse: stdin JSON in, optional deny JSON out
Both scripts/rule-enforcer.py and scripts/git-protection.py read JSON from stdin and look for tool_name plus tool_input.
From rule-enforcer.py:
input_data = json.loads(sys.stdin.read())
tool_name = input_data.get("tool_name", "")
tool_input = input_data.get("tool_input", {})
The test suite shows the expected input shape clearly:
input_data = {
"tool_name": "Bash",
"tool_input": {"command": "python script.py"},
}
When a command is denied, the scripts return a JSON envelope under hookSpecificOutput. From rule-enforcer.py:
output = {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "Direct python/pip commands are forbidden.",
"additionalContext": (
"You attempted to run python/pip directly. Instead:\n"
"1. Delegate Python tasks to the python-expert agent\n"
"2. Use 'uv run script.py' to run Python scripts\n"
"3. Use 'uvx package-name' to run package CLIs\n"
"See: https://docs.astral.sh/uv/"
),
}
}
In practice:
tool_nameis usually"Bash"for these hookstool_input.commandis the important field for command hooks- allow decisions are normally represented by exiting successfully without printing a deny payload
Warning: The two command hooks have different failure behavior.
rule-enforcer.pyfails open on unexpected errors, whilegit-protection.pyfails closed and returns a deny payload if it crashes.
The prompt-based destructive-command gate
There is also a prompt-style PreToolUse hook in settings.json. It asks an LLM to classify destructive shell commands and requires a very small JSON response.
The configured prompt ends with this exact contract:
Respond with JSON: {"decision": "approve" or "block" or "ask", "reason": "brief explanation"}
If you are building tooling around this repo, those are the only three supported decisions for that gate:
approveblockask
UserPromptSubmit: stdin ignored, context JSON returned
scripts/rule-injector.py reads stdin only because the hook protocol expects it, then returns structured JSON with additional prompt context.
From the script:
output = {"hookSpecificOutput": {"hookEventName": "UserPromptSubmit", "additionalContext": rule_reminder}}
That means the payload contract is simple:
- input: whatever Claude Code provides on stdin
- output: JSON with
hookSpecificOutput.hookEventNameandadditionalContext
Notification: JSON with a top-level message
scripts/my-notifier.sh expects JSON on stdin and reads one field:
if ! notification_message=$(echo "$input_json" | jq -r '.message' 2>&1); then
echo "Error: Failed to parse JSON - $notification_message" >&2
exit 1
fi
Practical rules for this hook:
messagemust be presentmessagemust not be empty ornull- the script does not read nested fields
A minimal valid payload looks like:
{
"message": "Review completed"
}
SessionStart: plain text, not JSON
scripts/session-start-check.sh is the outlier. It does not parse JSON input, and when it finds missing tools or plugins it prints a plain-text report.
The report starts like this:
MISSING_TOOLS_REPORT:
[AI INSTRUCTION - YOU MUST FOLLOW THIS]
Some tools required by this configuration are missing.
It then prints sections for critical and optional tools, install hints, and explicit instructions about asking the user for help installing them.
Warning:
SessionStartoutput is plain text, not JSON. If you are consuming hook output programmatically, do not assume every hook in this repo uses the same encoding.
Plugin And Marketplace Metadata
Marketplace manifest: .claude-plugin/marketplace.json
The marketplace index describes which plugins are published from this repository.
A real entry looks like this:
{
"name": "myk-org",
"owner": {
"name": "myk-org"
},
"plugins": [
{
"name": "myk-github",
"source": "./plugins/myk-github",
"description": "GitHub operations - PR reviews, releases, review handling, CodeRabbit rate limits",
"version": "1.7.2"
},
{
"name": "myk-review",
"source": "./plugins/myk-review",
"description": "Local code review and review database operations",
"version": "1.7.2"
},
{
"name": "myk-acpx",
"source": "./plugins/myk-acpx",
"description": "Multi-agent prompt execution via acpx (Agent Client Protocol)",
"version": "1.7.2"
}
]
}
What the fields mean:
name: marketplace namespaceowner.name: display owner for the marketplaceplugins[]: published plugin entriessource: repo-relative plugin directoryversion: marketplace-published version for that plugin entry
Per-plugin manifest: plugins/<plugin>/.claude-plugin/plugin.json
Each plugin also ships its own manifest. For example, plugins/myk-github/.claude-plugin/plugin.json:
{
"name": "myk-github",
"version": "1.4.3",
"description": "GitHub operations for Claude Code - PR reviews, releases, review handling, and CodeRabbit rate limits",
"author": {
"name": "myk-org"
},
"repository": "https://github.com/myk-org/claude-code-config",
"license": "MIT",
"keywords": ["github", "pr-review", "refine-review", "release", "code-review", "coderabbit", "rate-limit"]
}
The manifest format used across this repo is intentionally small:
nameversiondescriptionauthor.namerepositorylicensekeywords
Command metadata in plugins/*/commands/*.md
Each slash command is a Markdown file with YAML frontmatter. A real example from plugins/myk-github/commands/pr-review.md:
---
description: Review a GitHub PR and post inline comments on selected findings
argument-hint: [PR_NUMBER|PR_URL]
allowed-tools: Bash(myk-claude-tools:*), Bash(uv:*), Bash(git:*), Bash(gh:*), AskUserQuestion, Task
---
Those frontmatter keys are the command schema used in this repo:
description: what the command doesargument-hint: how the command should be invokedallowed-tools: which Claude Code tools the command is allowed to use
You can see the same pattern repeated across command files such as:
plugins/myk-review/commands/local.mdplugins/myk-review/commands/query-db.mdplugins/myk-github/commands/release.mdplugins/myk-acpx/commands/prompt.md
Runtime plugin metadata in settings.json
The checked-in settings.json also records which plugins are enabled and which extra marketplaces are known.
From the file:
"enabledPlugins": {
"myk-review@myk-org": true,
"myk-github@myk-org": true,
"myk-acpx@myk-org": true
},
"extraKnownMarketplaces": {
"cli-anything": {
"source": {
"source": "github",
"repo": "HKUDS/CLI-Anything"
}
},
"worktrunk": {
"source": {
"source": "github",
"repo": "max-sixty/worktrunk"
}
}
}
This is runtime configuration rather than plugin packaging metadata, but it is still part of the repo’s plugin schema story.
SQLite Review Database Schema
Location and lifecycle
The review database lives at:
<git-root>/.claude/data/reviews.db
The storage path is set in myk_claude_tools/reviews/store.py:
db_path = project_root / ".claude" / "data" / "reviews.db"
The storage workflow is:
- Read a completed review JSON file
- Create the database directory if needed
- Insert one row into
reviews - Insert one row per comment into
comments - Commit the transaction
- Delete the JSON file on success
The delete step is explicit:
json_path.unlink()
Warning:
reviews storeis intentionally destructive for the temp artifact. After a successful import, the JSON file is removed.
Table definitions
The schema is defined directly in Python as SQL:
CREATE TABLE IF NOT EXISTS reviews (
id INTEGER PRIMARY KEY AUTOINCREMENT,
pr_number INTEGER NOT NULL,
owner TEXT NOT NULL,
repo TEXT NOT NULL,
commit_sha TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS comments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
review_id INTEGER NOT NULL REFERENCES reviews(id),
source TEXT NOT NULL,
thread_id TEXT,
node_id TEXT,
comment_id INTEGER,
author TEXT,
path TEXT,
line INTEGER,
body TEXT,
priority TEXT,
status TEXT,
reply TEXT,
skip_reason TEXT,
posted_at TEXT,
resolved_at TEXT,
type TEXT DEFAULT NULL
);
Indexes are also created for the most common lookups:
CREATE INDEX IF NOT EXISTS idx_comments_review_id ON comments(review_id);
CREATE INDEX IF NOT EXISTS idx_comments_source ON comments(source);
CREATE INDEX IF NOT EXISTS idx_comments_status ON comments(status);
CREATE INDEX IF NOT EXISTS idx_reviews_pr ON reviews(owner, repo, pr_number);
CREATE INDEX IF NOT EXISTS idx_reviews_commit ON reviews(commit_sha);
What the columns mean
For end users, the important columns are:
reviews.id: one stored review runreviews.pr_number,owner,repo: which PR the review belongs toreviews.commit_sha: the commit SHA captured at store timereviews.created_at: when that database row was written
And in comments:
review_id: foreign key back toreviews.idsource:human,qodo, orcoderabbitthread_id,node_id,comment_id: GitHub-side identifierspath,line: where the comment pointsbody: the original review textpriority:HIGH,MEDIUM, orLOWstatus:pending,addressed,not_addressed,skipped, orfailedreply: the reply text posted back to GitHubskip_reason: why something was skippedposted_at,resolved_at: workflow timestampstype: special synthesized comment type such asoutside_diff_comment
A real test confirms that all of these fields are stored as expected:
assert row[0] == "thread_abc"
assert row[1] == "node_xyz"
assert row[2] == 12345
assert row[3] == "reviewer1"
assert row[4] == "src/main.py"
assert row[5] == 100
assert row[6] == "Please fix this bug"
assert row[7] == "HIGH"
assert row[8] == "addressed"
assert row[9] == "Fixed in commit abc123"
assert row[10] == "2024-01-15T10:00:00Z"
assert row[11] == "2024-01-15T10:05:00Z"
assert row[12] == "outside_diff_comment"
Append-only behavior
Stored reviews are append-only. Re-running storage for the same PR creates a new reviews row instead of overwriting the old one.
That behavior is tested explicitly:
review_id1 = store_reviews.insert_review(conn, "owner", "repo", 123, "abc1234567")
review_id2 = store_reviews.insert_review(conn, "owner", "repo", 123, "def7890123")
assert review_id1 != review_id2
This means the database preserves history across multiple review passes on the same PR.
Schema migration: the type column
Older databases may not have comments.type. The code upgrades them automatically on startup.
From create_tables():
cursor = conn.execute("PRAGMA table_info(comments)")
columns = {row[1] for row in cursor.fetchall()}
if "type" not in columns:
conn.execute("ALTER TABLE comments ADD COLUMN type TEXT DEFAULT NULL")
From ReviewDB._migrate_schema():
cursor = conn.execute("PRAGMA table_info(comments)")
columns = {row[1] for row in cursor.fetchall()}
if "type" not in columns:
conn.execute("ALTER TABLE comments ADD COLUMN type TEXT DEFAULT NULL")
conn.commit()
Note: There is no separate migration framework in this repo for the review database. The migration is code-driven and safe to run repeatedly.
Read-only query rules
The analytics/query layer is intentionally read-only. ReviewDB.query() only accepts SELECT and WITH statements.
The key safety check is:
if not sql_upper.startswith(("SELECT", "WITH")):
raise ValueError("Only SELECT/CTE queries are allowed for safety")
It also rejects multiple statements and blocks modifying keywords such as:
INSERTUPDATEDELETEDROPALTERCREATEATTACHDETACHPRAGMA
This is why myk-claude-tools db query is safe for analytics but not for schema changes.
Dismissed-comment lookups and auto-skip semantics
The database is not only for reporting. It also powers auto-skip behavior during reviews fetch.
get_dismissed_comments() deliberately includes:
- all
not_addressedcomments - all
skippedcomments - only some
addressedcomments, whentypeis a special synthesized type
The SQL condition is:
AND (
c.status IN ('not_addressed', 'skipped')
OR (c.status = 'addressed'
AND c.type IN ('outside_diff_comment', 'nitpick_comment', 'duplicate_comment'))
)
That rule exists because those special comment types do not map cleanly to resolvable GitHub review threads. The database becomes the only reliable place to remember that they were already handled.
db find-similar stdin format
myk-claude-tools db find-similar reads JSON from stdin and expects a single object with path and body.
The CLI implementation does this:
input_data = json.load(sys.stdin)
path = input_data.get("path", "")
body = input_data.get("body", "")
The test suite uses this exact input:
input_json = json.dumps({"path": "path/to/file.py", "body": "Add skip option"})
Tip: Pass a single JSON object to
db find-similar, not an array.
Practical Rules Of Thumb
- Use
reviews fetchwhen you need a full review snapshot grouped intohuman,qodo, andcoderabbit. - Use
reviews pending-fetchwhen you already have a pending GitHub review and want to refine its draft comments. - Use
pr post-commentwhen you only need to post a simple batch of inline comments. - Treat
reviews.dbas append-only history, not as a scratch database. - Expect JSON for
PreToolUse,UserPromptSubmit, andNotification, but plain text forSessionStart. - If a comment has
type: outside_diff_comment,nitpick_comment, orduplicate_comment, expect different posting and storage behavior than a normal inline thread.