Project Overview

docsfy is a self-hosted, AI-powered documentation generation service. It takes a Git repository, uses an AI provider to plan and write documentation pages, and publishes a fully static docs site that can be viewed in-browser or downloaded as an archive.

At runtime, it is a FastAPI web application with a built-in dashboard, status pages, authentication, role-based access, and per-project ownership/access control.

[project]
name = "docsfy"
description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos"

[project.scripts]
docsfy = "docsfy.main:run"

What Problem It Solves

Keeping documentation current is expensive and usually manual. docsfy addresses that by:

Generating docs from code, config, and tests (not just top-level project docs)
Tracking generated variants by AI provider/model
Supporting incremental regeneration when repositories change
Rendering polished static output ready for hosting or download
Adding team-grade controls (auth, roles, ownership, access grants)

The prompt layer explicitly enforces source-first documentation generation:

def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str:
    return f"""You are a technical documentation writer. Explore this repository to write
the "{page_title}" page for the {project_name} documentation.

Page description: {page_description}

Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines
to write comprehensive, accurate documentation. Do NOT rely on the README.
...
"""

Who It Is For

docsfy is best suited for:

Platform/DevEx teams maintaining internal docs for many repositories
Engineering teams that want docs regenerated as code changes
Teams comparing documentation quality across AI providers/models
Organizations needing controlled docs access (admin/user/viewer + grants)

How docsfy Works (High-Level)

1) Intake and validation

A generation request accepts either a remote repo URL or a local repo path (admin-only), plus provider/model options:

class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self

if gen_request.repo_path and not request.state.is_admin:
    raise HTTPException(
        status_code=403,
        detail="Local repo path access requires admin privileges",
    )

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )

2) Planning, incremental updates, and page generation

The generation pipeline:

checks AI CLI availability
plans doc structure
optionally computes changed files between commits
regenerates pages (parallelized)
renders the final static site

plan = await run_planner(
    repo_path=repo_dir,
    project_name=project_name,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
)

plan["repo_url"] = source_url

pages = await generate_all_pages(
    repo_path=repo_dir,
    plan=plan,
    cache_dir=cache_dir,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    use_cache=use_cache if use_cache else not force,
    project_name=project_name,
    owner=owner,
)

site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)

result = subprocess.run(
    ["git", "diff", "--name-only", old_sha, new_sha],
    cwd=repo_path,
    capture_output=True,
    text=True,
    timeout=30,
)

Tip: Keep force disabled for normal runs. docsfy can reuse cached pages and use Git diffs to regenerate only what changed.

3) Static docs output + AI-friendly artifacts

The renderer creates both human-facing and model-friendly assets:

(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")

search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)

llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")

llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")

The generated docs UI also includes search, theme switching, code copy buttons, callout styling, and sidebar navigation.

Security and Access Model

docsfy is multi-user and role-aware, with both Bearer-token API auth and cookie-based browser sessions.

# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
# 1. Check Authorization header (API clients)
...
# 2. Check session cookie (browser) -- opaque session token
...
if request.url.path.startswith("/api/"):
    return JSONResponse(status_code=401, content={"detail": "Unauthorized"})

def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )

Project variants are scoped by name + provider + model + owner:

CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    ...
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)

Access can be delegated by admins on a per-project-owner basis:

@app.post("/api/admin/projects/{name}/access")
async def grant_access(request: Request, name: str) -> dict[str, str]:
    ...
    await grant_project_access(name, username, project_owner=project_owner)

Warning: ADMIN_KEY is required at startup and must be at least 16 characters; otherwise the app exits.

if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)

Configuration and Deployment

Core environment configuration comes from .env:

# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

Containerized local deployment uses /data for persistent state:

services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]

Runtime entrypoint:

ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]

Quality and CI/CD Posture

Quality checks are configured via pre-commit and tox:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]

Note: No repository-hosted workflow files were found under .github/workflows; current automation is defined through local tooling and container health checks.