Project Overview
docsfy is a self-hosted, AI-powered documentation generation service. It takes a Git repository, uses an AI provider to plan and write documentation pages, and publishes a fully static docs site that can be viewed in-browser or downloaded as an archive.
At runtime, it is a FastAPI web application with a built-in dashboard, status pages, authentication, role-based access, and per-project ownership/access control.
[project]
name = "docsfy"
description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos"
[project.scripts]
docsfy = "docsfy.main:run"
What Problem It Solves
Keeping documentation current is expensive and usually manual. docsfy addresses that by:
- Generating docs from code, config, and tests (not just top-level project docs)
- Tracking generated variants by AI provider/model
- Supporting incremental regeneration when repositories change
- Rendering polished static output ready for hosting or download
- Adding team-grade controls (auth, roles, ownership, access grants)
The prompt layer explicitly enforces source-first documentation generation:
def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str:
return f"""You are a technical documentation writer. Explore this repository to write
the "{page_title}" page for the {project_name} documentation.
Page description: {page_description}
Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines
to write comprehensive, accurate documentation. Do NOT rely on the README.
...
"""
Who It Is For
docsfy is best suited for:
- Platform/DevEx teams maintaining internal docs for many repositories
- Engineering teams that want docs regenerated as code changes
- Teams comparing documentation quality across AI providers/models
- Organizations needing controlled docs access (admin/user/viewer + grants)
How docsfy Works (High-Level)
1) Intake and validation
A generation request accepts either a remote repo URL or a local repo path (admin-only), plus provider/model options:
class GenerateRequest(BaseModel):
repo_url: str | None = Field(
default=None, description="Git repository URL (HTTPS or SSH)"
)
repo_path: str | None = Field(default=None, description="Local git repository path")
ai_provider: Literal["claude", "gemini", "cursor"] | None = None
ai_model: str | None = None
ai_cli_timeout: int | None = Field(default=None, gt=0)
force: bool = Field(
default=False, description="Force full regeneration, ignoring cache"
)
@model_validator(mode="after")
def validate_source(self) -> GenerateRequest:
if not self.repo_url and not self.repo_path:
msg = "Either 'repo_url' or 'repo_path' must be provided"
raise ValueError(msg)
if self.repo_url and self.repo_path:
msg = "Provide either 'repo_url' or 'repo_path', not both"
raise ValueError(msg)
return self
if gen_request.repo_path and not request.state.is_admin:
raise HTTPException(
status_code=403,
detail="Local repo path access requires admin privileges",
)
if ai_provider not in ("claude", "gemini", "cursor"):
raise HTTPException(
status_code=400,
detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
)
2) Planning, incremental updates, and page generation
The generation pipeline:
- checks AI CLI availability
- plans doc structure
- optionally computes changed files between commits
- regenerates pages (parallelized)
- renders the final static site
plan = await run_planner(
repo_path=repo_dir,
project_name=project_name,
ai_provider=ai_provider,
ai_model=ai_model,
ai_cli_timeout=ai_cli_timeout,
)
plan["repo_url"] = source_url
pages = await generate_all_pages(
repo_path=repo_dir,
plan=plan,
cache_dir=cache_dir,
ai_provider=ai_provider,
ai_model=ai_model,
ai_cli_timeout=ai_cli_timeout,
use_cache=use_cache if use_cache else not force,
project_name=project_name,
owner=owner,
)
site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)
result = subprocess.run(
["git", "diff", "--name-only", old_sha, new_sha],
cwd=repo_path,
capture_output=True,
text=True,
timeout=30,
)
Tip: Keep
forcedisabled for normal runs.docsfycan reuse cached pages and use Git diffs to regenerate only what changed.
3) Static docs output + AI-friendly artifacts
The renderer creates both human-facing and model-friendly assets:
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
json.dumps(search_index), encoding="utf-8"
)
llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
The generated docs UI also includes search, theme switching, code copy buttons, callout styling, and sidebar navigation.
Security and Access Model
docsfy is multi-user and role-aware, with both Bearer-token API auth and cookie-based browser sessions.
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
# 1. Check Authorization header (API clients)
...
# 2. Check session cookie (browser) -- opaque session token
...
if request.url.path.startswith("/api/"):
return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
def _require_write_access(request: Request) -> None:
"""Raise 403 if user is a viewer (read-only)."""
if request.state.role not in ("admin", "user"):
raise HTTPException(
status_code=403,
detail="Write access required.",
)
Project variants are scoped by name + provider + model + owner:
CREATE TABLE IF NOT EXISTS projects (
name TEXT NOT NULL,
ai_provider TEXT NOT NULL DEFAULT '',
ai_model TEXT NOT NULL DEFAULT '',
owner TEXT NOT NULL DEFAULT '',
...
PRIMARY KEY (name, ai_provider, ai_model, owner)
)
Access can be delegated by admins on a per-project-owner basis:
@app.post("/api/admin/projects/{name}/access")
async def grant_access(request: Request, name: str) -> dict[str, str]:
...
await grant_project_access(name, username, project_owner=project_owner)
Warning:
ADMIN_KEYis required at startup and must be at least 16 characters; otherwise the app exits.
if not settings.admin_key:
logger.error("ADMIN_KEY environment variable is required")
raise SystemExit(1)
if len(settings.admin_key) < 16:
logger.error("ADMIN_KEY must be at least 16 characters long")
raise SystemExit(1)
Configuration and Deployment
Core environment configuration comes from .env:
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
# AI Configuration
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
Containerized local deployment uses /data for persistent state:
services:
docsfy:
build: .
ports:
- "8000:8000"
env_file: .env
volumes:
- ./data:/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
Runtime entrypoint:
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
Quality and CI/CD Posture
Quality checks are configured via pre-commit and tox:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
- repo: https://github.com/gitleaks/gitleaks
rev: v8.30.0
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.2
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
Note: No repository-hosted workflow files were found under
.github/workflows; current automation is defined through local tooling and container health checks.