Logging and Data Files
github-webhook-server keeps its persistent configuration, key material, and logs under a single data directory. By default that directory is /home/podman/data, and you can move it with WEBHOOK_SERVER_DATA_DIR.
self.data_dir: str = os.environ.get("WEBHOOK_SERVER_DATA_DIR", "/home/podman/data")
self.config_path: str = os.path.join(self.data_dir, "config.yaml")
The example container setup mounts a host directory directly to that path and calls out the files that should already exist there:
volumes:
- "./webhook_server_data_dir:/home/podman/data:Z" # Should include config.yaml and webhook-server.private-key.pem
# Mount temporary directories to prevent boot ID mismatch issues
- "/tmp/podman-storage-${USER:-1000}:/tmp/storage-run-1000"
environment:
- PUID=1000
- PGID=1000
- TZ=Asia/Jerusalem
- MAX_WORKERS=50 # Defaults to 10 if not set
- WEBHOOK_SERVER_IP_BIND=0.0.0.0 # IP to listen
- WEBHOOK_SERVER_PORT=5000 # Port to listen
- WEBHOOK_SECRET=<secret> # If set verify hook is a valid hook from Github
- VERIFY_GITHUB_IPS=1 # Verify hook request is from GitHub IPs
- VERIFY_CLOUDFLARE_IPS=1 # Verify hook request is from Cloudflare IPs
- ENABLE_LOG_SERVER=true # Enable log viewer endpoints (default: false)
- ENABLE_MCP_SERVER=false # Enable MCP server for AI agent integration (default: false)
Data Directory Layout
Common files and directories you will see:
config.yaml: the main server configuration file.webhook-server.private-key.pem: the GitHub App private key.logs/: the main logging directory.logs/<log-file>: the main human-readable application log.logs/<log-file>.1,logs/<log-file>.2, and so on: rotated text logs.logs/webhooks_YYYY-MM-DD.json: structured webhook log files, one file per UTC day.logs/logs_server.log: dedicated log viewer log when the log viewer is enabled.logs/mcp_server.log: dedicated MCP server log when MCP support is enabled.
The private key is read from the data directory root, not from logs/:
def get_repository_github_app_api(config_: Config, repository_name: str) -> Github | None:
LOGGER.debug("Getting repositories GitHub app API")
with open(os.path.join(config_.data_dir, "webhook-server.private-key.pem")) as fd:
private_key = fd.read()
github_app_id: int = config_.root_data["github-app-id"]
Note: The
logs/directory is created automatically when the server needs it.
Configure Log Files
The example config shows the main logging settings:
log-level: INFO # Set global log level, change take effect immediately without server restart
log-file: webhook-server.log # Set global log file, change take effect immediately without server restart
mcp-log-file: mcp_server.log # Set global MCP log file, change take effect immediately without server restart
logs-server-log-file: logs_server.log # Set global Logs Server log file, change take effect immediately without server restart
mask-sensitive-data: true # Mask sensitive data in logs (default: true). Set to false for debugging (NOT recommended in production)
You can also override the text log file and masking behavior for a specific repository:
repositories:
my-repository:
name: my-org/my-repository
log-level: DEBUG # Override global log-level for repository
log-file: my-repository.log # Override global log-file for repository
mask-sensitive-data: false # Override global setting - disable masking for debugging this specific repo (NOT recommended in production)
Relative filenames are resolved inside <data-dir>/logs/. If you give an absolute path, the server uses it as-is:
def get_log_file_path(config: Config, log_file_name: str | None) -> str | None:
"""
Resolve the full path for a log file using the configuration data directory.
Args:
config: Config object containing data_dir
log_file_name: Name of the log file (e.g., "server.log")
Returns:
Full path to the log file, or None if log_file_name is None
"""
if log_file_name and not log_file_name.startswith("/"):
log_file_path = os.path.join(config.data_dir, "logs")
if not os.path.isdir(log_file_path):
os.makedirs(log_file_path, exist_ok=True)
return os.path.join(log_file_path, log_file_name)
return log_file_name
Tip: Use repository-level overrides when you need extra visibility for one repository without changing logging for every repository.
Rotating Text Logs
Text logs are the easiest place to read day-to-day activity. They use size-based rotation and are designed not to crash the server if rotated files have already been removed.
The project swaps in a safe rotating handler and sets a 10 MiB max file size for logger-managed text files:
# Patch simple_logger to use SafeRotatingFileHandler to prevent crashes
# when backup log files are missing during rollover
simple_logger.logger.RotatingFileHandler = SafeRotatingFileHandler
logger = get_logger(
name=logger_cache_key,
filename=log_file_path_resolved,
level=log_level,
file_max_bytes=1024 * 1024 * 10,
mask_sensitive=mask_sensitive,
mask_sensitive_patterns=mask_sensitive_patterns,
console=True, # Enable console output for docker logs with FORCE_COLOR support
)
During rollover, the handler works with the standard rotated filenames such as .1, .2, and .3, but suppresses file-operation errors so logging can continue:
if self.backupCount > 0:
# Remove backup files that exceed backupCount, handle missing files
for i in range(self.backupCount - 1, 0, -1):
sfn = self.rotation_filename(f"{self.baseFilename}.{i}")
dfn = self.rotation_filename(f"{self.baseFilename}.{i + 1}")
if os.path.exists(sfn):
try:
if os.path.exists(dfn):
os.remove(dfn)
os.rename(sfn, dfn)
except FileNotFoundError:
# File was deleted between exists check and operation - ignore
pass
except OSError:
# Broad suppression intentional: logging must never crash.
# See module docstring for full rationale.
pass
dfn = self.rotation_filename(f"{self.baseFilename}.1")
try:
if os.path.exists(dfn):
os.remove(dfn)
except FileNotFoundError:
# File was deleted between exists check and remove - ignore
pass
except OSError:
# Broad suppression intentional: logging must never crash.
# See module docstring for full rationale.
pass
try:
self.rotate(self.baseFilename, dfn)
except FileNotFoundError:
# Base file was deleted - just create a new one
pass
except OSError:
# Broad suppression intentional: logging must never crash.
# See module docstring for full rationale.
pass
if not self.delay:
try:
self.stream = self._open()
except OSError:
# Cannot open new log file - leave stream as None.
# FileHandler.emit() will attempt to open on next log entry.
pass
In practice, if log-file is webhook-server.log, expect a current file like logs/webhook-server.log plus rotated siblings such as logs/webhook-server.log.1.
Structured JSONL Webhook Logs
The server also writes structured webhook data into daily files named webhooks_YYYY-MM-DD.json under logs/.
Despite the .json extension, these files use JSON Lines: one compact JSON object per line.
def _get_log_file_path(self, date: datetime | None = None) -> Path:
"""Get log file path for the specified date.
Args:
date: Date for the log file (defaults to current UTC date)
Returns:
Path to the log file (e.g., {log_dir}/webhooks_2026-01-05.json)
"""
if date is None:
date = datetime.now(UTC)
date_str = date.strftime("%Y-%m-%d")
return self.log_dir / f"webhooks_{date_str}.json"
def write_log(self, context: WebhookContext) -> None:
"""Write webhook context as JSONL entry to date-based log file.
Writes a compact JSON entry (single line, no indentation) containing complete webhook execution context.
Each entry is terminated by a newline character.
Uses atomic write pattern (temp file + rename) with file locking for safety.
Args:
context: WebhookContext to serialize and write
Note:
Uses context.completed_at as source of truth, falls back to datetime.now(UTC)
"""
# Prefer context.completed_at as source of truth, fall back to current time
completed_at = context.completed_at if context.completed_at else datetime.now(UTC)
# Get context dict and update timing locally (without mutating context)
context_dict = context.to_dict()
context_dict["type"] = "webhook_summary"
if "timing" in context_dict:
context_dict["timing"]["completed_at"] = completed_at.isoformat()
if context.started_at:
duration_ms = int((completed_at - context.started_at).total_seconds() * 1000)
context_dict["timing"]["duration_ms"] = duration_ms
# Get log file path
log_file = self._get_log_file_path(completed_at)
# Serialize context to JSON (compact JSONL format - single line, no indentation)
log_entry = json.dumps(context_dict, ensure_ascii=False)
These files contain two important entry types:
webhook_summary: one end-of-webhook summary with timing, workflow steps, success state, and errors.log_entry: individual log records enriched with webhook context when that context exists.
A log_entry record is built like this:
message = record.getMessage()
message = _ANSI_ESCAPE_RE.sub("", message)
exc_text: str | None = None
if record.exc_info and record.exc_info[0] is not None:
exc_text = "".join(traceback.format_exception(*record.exc_info))
entry: dict[str, object] = {
"type": "log_entry",
"timestamp": datetime.fromtimestamp(record.created, tz=UTC).isoformat(),
"level": record.levelname,
"logger_name": record.name,
"message": message,
}
if exc_text:
entry["exc_info"] = exc_text
# Enrich with webhook context when available
ctx = get_context()
if ctx is not None:
entry["hook_id"] = ctx.hook_id
entry["event_type"] = ctx.event_type
entry["repository"] = ctx.repository
entry["pr_number"] = ctx.pr_number
entry["api_user"] = ctx.api_user
A webhook_summary carries the higher-level fields you usually want when debugging a delivery:
return {
"hook_id": self.hook_id,
"level": self._derive_level(),
"status": self._derive_status(),
"event_type": self.event_type,
"action": self.action,
"sender": self.sender,
"repository": self.repository,
"repository_full_name": self.repository_full_name,
"pr": {
"number": self.pr_number,
"title": self.pr_title,
"author": self.pr_author,
}
if self.pr_number
else None,
"api_user": self.api_user,
"timing": {
"started_at": self.started_at.isoformat(),
"completed_at": (self.completed_at.isoformat() if self.completed_at else None),
"duration_ms": int((self.completed_at - self.started_at).total_seconds() * 1000)
if self.completed_at
else None,
},
"workflow_steps": self.workflow_steps,
"token_spend": self.token_spend,
"initial_rate_limit": self.initial_rate_limit,
"final_rate_limit": self.final_rate_limit,
"success": self.success,
"error": self.error,
"summary": self._build_summary(),
}
Note:
webhooks_*.jsonis date-split, not size-rotated. The server creates a new file each UTC day, but it does not roll these files over by size. If you keep logs for a long time, plan your own retention or archival policy.
A practical detail: the server always tries to write a structured summary at the end of webhook processing, even after failures. If you need the most reliable delivery-level record, start with webhooks_*.json.
How Masking Works
Masking is enabled by default with mask-sensitive-data: true. The logger treats common secret and credential patterns as sensitive:
mask_sensitive_patterns: list[str] = [
# Passwords and secrets
"container_repository_password",
"password",
"secret",
# Tokens and API keys
"token",
"apikey",
"api_key",
"github_token",
"GITHUB_TOKEN",
"pypi",
# Authentication credentials
"username",
"login",
"-u",
"-p",
"--username",
"--password",
"--creds",
# Private keys and sensitive IDs
"private_key",
"private-key",
"webhook_secret",
"webhook-secret",
"github-app-id",
# Slack webhooks (contain sensitive URLs)
"slack-webhook-url",
"slack_webhook_url",
"webhook-url",
"webhook_url",
]
In practice, this means:
- Tokens, passwords, webhook secrets, and similar values are masked in log output by default.
- Command helpers redact explicitly supplied secrets before writing command lines, stdout, or stderr to the logs.
- Repository-level
mask-sensitive-datacan override the global setting for one repository.
Warning: Setting
mask-sensitive-data: falsecan expose credentials in your logs. Use it only for short-lived debugging in a controlled environment.
Log Separation
This project intentionally separates logs by purpose.
- The main text log is for readable application activity.
webhooks_*.jsonis for structured webhook diagnostics and analysis.logs_server.logis for the log viewer itself.mcp_server.logis for the optional MCP server.
That separation is enforced in the logger setup. The structured JSON handler is attached only to the default webhook/application logger, not to infrastructure loggers that are created with an explicit filename:
# Attach JsonLogHandler for writing log records to the webhook JSONL file.
# Only attach when:
# - A log file path is configured (skip console-only loggers)
# - The logger is for the main webhook log (log_file_name not explicitly set)
# Infrastructure loggers (mcp_server.log, logs_server.log) must NOT write
# to webhooks_*.json because their entries lack webhook context (hook_id,
# event_type, etc.) and pollute the webhook log with noise entries.
# - Only once per logger instance to avoid duplicate handlers.
# Uses _config.data_dir/logs (same directory as StructuredLogWriter) instead
# of deriving from the text log file path, which may differ for absolute paths.
if log_file_path_resolved and not log_file_name:
log_dir = os.path.join(_config.data_dir, "logs")
with _JSON_HANDLER_LOCK:
if not any(isinstance(h, JsonLogHandler) and h.log_dir == Path(log_dir) for h in logger.handlers):
logger.addHandler(
JsonLogHandler(
log_dir=log_dir,
level=getattr(logging, log_level.upper(), logging.DEBUG),
)
)
That last comment matters: even if you point log-file at an absolute path somewhere else, the structured webhooks_*.json files still stay under <data-dir>/logs/.
The log viewer gets its own dedicated logger:
if _log_viewer_controller_singleton is None:
# Use global LOGGER for config operations
config = Config(logger=LOGGER)
logs_server_log_file = config.get_value("logs-server-log-file", return_on_none="logs_server.log")
# Create dedicated logger for log viewer
log_viewer_logger = get_logger_with_params(log_file_name=logs_server_log_file)
_log_viewer_controller_singleton = LogViewerController(logger=log_viewer_logger)
The same pattern is used for MCP logging during startup:
# Configure MCP logging separation
if MCP_SERVER_ENABLED:
mcp_log_file = root_config.get("mcp-log-file", "mcp_server.log")
# Use get_logger_with_params to reuse existing logging configuration logic
# (rotation, sensitive data masking, formatting)
# This returns a logger configured for the specific file
mcp_file_logger = get_logger_with_params(log_file_name=mcp_log_file)
# Add the configured handler to the actual MCP logger and stop propagation
# This ensures MCP logs go ONLY to mcp_server.log and not webhook_server.log
if mcp_file_logger.handlers:
for handler in mcp_file_logger.handlers:
mcp_logger.addHandler(handler)
mcp_logger.propagate = False
Log Viewer Files
When enabled, the log viewer reads the same files from <data-dir>/logs/; it does not build a separate database.
It scans current text logs, rotated text logs, and structured webhook files:
# Find all log files including rotated ones and JSON files
log_files: list[Path] = []
log_files.extend(log_dir.glob("*.log"))
log_files.extend(log_dir.glob("*.log.*"))
log_files.extend(log_dir.glob("webhooks_*.json"))
# Sort log files to prioritize JSON webhook files first (primary data source),
# then other files by modification time (newest first)
# This ensures webhook data is displayed before internal log files
def sort_key(f: Path) -> tuple[int, float]:
is_json_webhook = f.suffix == ".json" and f.name.startswith("webhooks_")
# JSON webhook files: (0, -mtime) - highest priority, newest first
# Other files: (1, -mtime) - lower priority, newest first
return (0 if is_json_webhook else 1, -f.stat().st_mtime)
...
async with aiofiles.open(log_file, encoding="utf-8") as f:
# Use appropriate parser based on file type
if log_file.suffix == ".json":
# JSONL files: one compact JSON object per line
# Process both "log_entry" and "webhook_summary" entries
# Skip infrastructure logger entries that lack webhook context
async for line in f:
entry = self.log_parser.parse_json_log_entry(line)
if entry and not LogViewerController._is_infrastructure_noise(entry):
buffer.append(entry)
else:
# Text log files: parse line by line
# Skip infrastructure logger entries that lack webhook context
async for line in f:
entry = self.log_parser.parse_log_entry(line)
if entry and not LogViewerController._is_infrastructure_noise(entry):
buffer.append(entry)
It also filters out known infrastructure noise when those entries have no webhook context:
@staticmethod
def _is_infrastructure_noise(entry: LogEntry) -> bool:
"""Check if a log entry is infrastructure noise that should be excluded.
Infrastructure loggers (MCP server, log viewer) produce high-frequency
entries without webhook context. These are filtered out to prevent them
from drowning actual webhook processing entries in unfiltered queries.
Only excludes entries that have NO webhook context (hook_id is None),
preserving any infrastructure log that happens to correlate with a webhook.
Args:
entry: LogEntry to check
Returns:
True if the entry is infrastructure noise and should be excluded
"""
return entry.logger_name in LogViewerController._INFRASTRUCTURE_LOGGERS and entry.hook_id is None
What this means in practice:
- The log viewer can show current and rotated text logs together with structured webhook logs.
- Structured JSON files are the primary source for webhook summaries, workflow steps, and export data.
- Text logs provide the detailed line-by-line context that summary records intentionally do not include.
- Exports are streamed to the client on demand; they are not written back into the data directory as extra files.
Warning: The project does not add application-level authentication to the
/logsendpoints. Treat the log viewer as an internal tool and protect it with trusted network placement or a reverse proxy that adds authentication.