# docsfy

> Self-hosted AI documentation generator that turns Git repositories into searchable static docs through a FastAPI web service.

---

Source: project-overview.md

# Project Overview

`docsfy` is a self-hosted, AI-powered documentation generation service. It takes a Git repository, uses an AI provider to plan and write documentation pages, and publishes a fully static docs site that can be viewed in-browser or downloaded as an archive.

At runtime, it is a FastAPI web application with a built-in dashboard, status pages, authentication, role-based access, and per-project ownership/access control.

```toml
[project]
name = "docsfy"
description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos"

[project.scripts]
docsfy = "docsfy.main:run"
```

## What Problem It Solves

Keeping documentation current is expensive and usually manual. `docsfy` addresses that by:

- Generating docs from code, config, and tests (not just top-level project docs)
- Tracking generated variants by AI provider/model
- Supporting incremental regeneration when repositories change
- Rendering polished static output ready for hosting or download
- Adding team-grade controls (auth, roles, ownership, access grants)

The prompt layer explicitly enforces source-first documentation generation:

```python
def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str:
    return f"""You are a technical documentation writer. Explore this repository to write
the "{page_title}" page for the {project_name} documentation.

Page description: {page_description}

Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines
to write comprehensive, accurate documentation. Do NOT rely on the README.
...
"""
```

## Who It Is For

`docsfy` is best suited for:

- Platform/DevEx teams maintaining internal docs for many repositories
- Engineering teams that want docs regenerated as code changes
- Teams comparing documentation quality across AI providers/models
- Organizations needing controlled docs access (admin/user/viewer + grants)

## How docsfy Works (High-Level)

### 1) Intake and validation

A generation request accepts either a remote repo URL or a local repo path (admin-only), plus provider/model options:

```python
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self
```

```python
if gen_request.repo_path and not request.state.is_admin:
    raise HTTPException(
        status_code=403,
        detail="Local repo path access requires admin privileges",
    )

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
```

### 2) Planning, incremental updates, and page generation

The generation pipeline:

- checks AI CLI availability
- plans doc structure
- optionally computes changed files between commits
- regenerates pages (parallelized)
- renders the final static site

```python
plan = await run_planner(
    repo_path=repo_dir,
    project_name=project_name,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
)

plan["repo_url"] = source_url
```

```python
pages = await generate_all_pages(
    repo_path=repo_dir,
    plan=plan,
    cache_dir=cache_dir,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    use_cache=use_cache if use_cache else not force,
    project_name=project_name,
    owner=owner,
)

site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)
```

```python
result = subprocess.run(
    ["git", "diff", "--name-only", old_sha, new_sha],
    cwd=repo_path,
    capture_output=True,
    text=True,
    timeout=30,
)
```

> **Tip:** Keep `force` disabled for normal runs. `docsfy` can reuse cached pages and use Git diffs to regenerate only what changed.

### 3) Static docs output + AI-friendly artifacts

The renderer creates both human-facing and model-friendly assets:

```python
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")

search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)

llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")

llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

The generated docs UI also includes search, theme switching, code copy buttons, callout styling, and sidebar navigation.

## Security and Access Model

`docsfy` is multi-user and role-aware, with both Bearer-token API auth and cookie-based browser sessions.

```python
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
# 1. Check Authorization header (API clients)
...
# 2. Check session cookie (browser) -- opaque session token
...
if request.url.path.startswith("/api/"):
    return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
```

```python
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

Project variants are scoped by name + provider + model + owner:

```python
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    ...
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

Access can be delegated by admins on a per-project-owner basis:

```python
@app.post("/api/admin/projects/{name}/access")
async def grant_access(request: Request, name: str) -> dict[str, str]:
    ...
    await grant_project_access(name, username, project_owner=project_owner)
```

> **Warning:** `ADMIN_KEY` is required at startup and must be at least 16 characters; otherwise the app exits.

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

## Configuration and Deployment

Core environment configuration comes from `.env`:

```env
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

Containerized local deployment uses `/data` for persistent state:

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
```

Runtime entrypoint:

```dockerfile
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

## Quality and CI/CD Posture

Quality checks are configured via `pre-commit` and `tox`:

```yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2
```

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** No repository-hosted workflow files were found under `.github/workflows`; current automation is defined through local tooling and container health checks.


---

Source: architecture.md

# Architecture

`docsfy` is a single FastAPI service that combines four major subsystems:

- an authenticated web/API control plane,
- a SQLite-backed metadata layer,
- an asynchronous AI documentation generation pipeline,
- a static site renderer that emits HTML, Markdown, search data, and LLM index files.

## High-Level Component Model

- **Application layer**: `src/docsfy/main.py`
- **Storage layer**: `src/docsfy/storage.py`
- **Generation pipeline**: `src/docsfy/generator.py`, `src/docsfy/repository.py`, `src/docsfy/prompts.py`, `src/docsfy/ai_client.py`, `src/docsfy/json_parser.py`
- **Static renderer**: `src/docsfy/renderer.py`, `src/docsfy/templates/*`, `src/docsfy/static/*`

End-to-end flow:

1. `POST /api/generate` receives a `GenerateRequest`.
2. Request is authorized (Bearer token or session cookie).
3. Variant metadata is stored in SQLite (`status=generating`).
4. A background `asyncio` task runs cloning/planning/page generation/rendering.
5. Output site is written to filesystem under `/data/projects/.../site`.
6. Variant status flips to `ready`.
7. Docs are served from `/docs/{project}/{provider}/{model}/...`.

## FastAPI App Architecture

The application enforces startup requirements (`ADMIN_KEY`), initializes DB state, and adds auth middleware globally:

```python
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)

    _generating.clear()
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

Authentication is centralized in `AuthMiddleware`:

```python
class AuthMiddleware(BaseHTTPMiddleware):
    """Authenticate every request via Bearer token or session cookie."""

    # Paths that do not require authentication
    _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

    async def dispatch(
        self, request: Request, call_next: RequestResponseEndpoint
    ) -> Response:
        if request.url.path in self._PUBLIC_PATHS:
            return await call_next(request)

        settings = get_settings()
        user = None
        is_admin = False
        username = ""

        # 1. Check Authorization header (API clients)
        auth_header = request.headers.get("authorization", "")
        if auth_header.startswith("Bearer "):
            token = auth_header[7:]
            if token == settings.admin_key:
                is_admin = True
                username = "admin"
            else:
                user = await get_user_by_key(token)
```

The generation endpoint uses a lock + in-memory task registry to prevent duplicate variant runs:

```python
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
async with _gen_lock:
    if gen_key in _generating:
        raise HTTPException(
            status_code=409,
            detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
        )

    await save_project(
        name=project_name,
        repo_url=gen_request.repo_url or gen_request.repo_path or "",
        status="generating",
        ai_provider=ai_provider,
        ai_model=ai_model,
        owner=owner,
    )

    try:
        task = asyncio.create_task(
            _run_generation(
                repo_url=gen_request.repo_url,
                repo_path=gen_request.repo_path,
                project_name=project_name,
                ai_provider=ai_provider,
                ai_model=ai_model,
                ai_cli_timeout=gen_request.ai_cli_timeout
                or settings.ai_cli_timeout,
                force=gen_request.force,
                owner=owner,
            )
        )
        _generating[gen_key] = task
```

> **Note:** Generated docs under `/docs/...` are still protected by middleware; only `/login` and `/health` are public.

Static file serving is path-safe (prevents traversal beyond the variant site directory):

```python
file_path = site_dir / path
try:
    file_path.resolve().relative_to(site_dir.resolve())
except ValueError as exc:
    raise HTTPException(status_code=403, detail="Access denied") from exc
if not file_path.exists() or not file_path.is_file():
    raise HTTPException(status_code=404, detail="File not found")
return FileResponse(file_path)
```

## SQLite Storage Layer

The `projects` table is variant-scoped by `(name, ai_provider, ai_model, owner)`:

```sql
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    repo_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'generating',
    current_stage TEXT,
    last_commit_sha TEXT,
    last_generated TEXT,
    page_count INTEGER DEFAULT 0,
    error_message TEXT,
    plan_json TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

Additional tables:
- `users` (role-based accounts, hashed API keys),
- `project_access` (per-owner access grants),
- `sessions` (hashed session tokens + expiry).

User key hashing uses HMAC with `ADMIN_KEY` as secret:

```python
def hash_api_key(key: str, hmac_secret: str = "") -> str:
    """Hash an API key with HMAC-SHA256 for storage.

    Uses ADMIN_KEY as the HMAC secret so that even if the source is read,
    keys cannot be cracked without the environment secret.
    """
    # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
    # invalidate all existing api_key_hash values, requiring all users to
    # regenerate their API keys.
    secret = hmac_secret or os.getenv("ADMIN_KEY", "")
    if not secret:
        msg = "ADMIN_KEY environment variable is required for key hashing"
        raise RuntimeError(msg)
    return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()
```

Project artifact paths are computed and sanitized:

```python
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    if not ai_provider or not ai_model:
        msg = "ai_provider and ai_model are required for project directory paths"
        raise ValueError(msg)
    # Sanitize path segments to prevent traversal
    for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
        if (
            "/" in segment
            or "\\" in segment
            or ".." in segment
            or segment.startswith(".")
        ):
            msg = f"Invalid {segment_name}: '{segment}'"
            raise ValueError(msg)
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

> **Warning:** Rotating `ADMIN_KEY` invalidates existing `api_key_hash` records by design.

## AI Generation Pipeline

Provider integration is intentionally delegated to `ai-cli-runner`:

```python
from ai_cli_runner import (
    PROVIDERS,
    VALID_AI_PROVIDERS,
    ProviderConfig,
    call_ai_cli,
    check_ai_cli_available,
    get_ai_cli_timeout,
    run_parallel_with_limit,
)
```

Main staged flow (`_generate_from_path`) updates `current_stage` in DB while progressing through planning, generation, and rendering:

```python
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="planning",
)

plan = await run_planner(
    repo_path=repo_dir,
    project_name=project_name,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
)

plan["repo_url"] = source_url
```

```python
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="generating_pages",
    plan_json=json.dumps(plan),
)

pages = await generate_all_pages(
    repo_path=repo_dir,
    plan=plan,
    cache_dir=cache_dir,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    use_cache=use_cache if use_cache else not force,
    project_name=project_name,
    owner=owner,
)
```

```python
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="rendering",
    page_count=len(pages),
)

site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)
```

```python
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="ready",
    owner=owner,
    current_stage=None,
    last_commit_sha=commit_sha,
    page_count=page_count,
    plan_json=json.dumps(plan),
)
```

Parallel page generation is bounded (`MAX_CONCURRENT_PAGES = 5`):

```python
MAX_CONCURRENT_PAGES = 5
...
results = await run_parallel_with_limit(
    coroutines, max_concurrency=MAX_CONCURRENT_PAGES
)
```

Incremental regeneration uses git diff + AI page targeting:

```python
changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
...
pages_to_regen = await run_incremental_planner(
    repo_dir,
    project_name,
    ai_provider,
    ai_model,
    changed_files,
    existing_plan,
    ai_cli_timeout,
)
if pages_to_regen != ["all"]:
    # Delete only the cached pages that need regeneration
    for slug in pages_to_regen:
        ...
        cache_file = cache_dir / f"{slug}.md"
        ...
        if cache_file.exists():
            cache_file.unlink()
    use_cache = True
```

Prompt construction explicitly requires source/config/test exploration and README avoidance:

```python
def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str:
    return f"""You are a technical documentation writer. Explore this repository to write
the "{page_title}" page for the {project_name} documentation.

Page description: {page_description}

Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines
to write comprehensive, accurate documentation. Do NOT rely on the README.
...
"""
```

> **Tip:** Use `force=true` in `POST /api/generate` to clear cached pages and force a full rebuild.

## Static Site Renderer

Renderer converts Markdown to HTML with syntax highlighting and TOC, then sanitizes generated HTML:

```python
md = markdown.Markdown(
    extensions=["fenced_code", "codehilite", "tables", "toc"],
    extension_configs={
        "codehilite": {"css_class": "highlight", "guess_lang": False},
        "toc": {"toc_depth": "2-3"},
    },
)
content_html = _sanitize_html(md.convert(md_text))
toc_html = getattr(md, "toc", "")
```

URL attributes are allowlisted in sanitization (`http`, `https`, `#`, `/`, `mailto`):

```python
def _sanitize_url_attr(match: re.Match) -> str:  # type: ignore[type-arg]
    attr = match.group(1)  # href or src
    quote = match.group(2)  # " or '
    url = match.group(3)  # the URL value
    ...
    if clean_url.startswith(("http://", "https://", "#", "/", "mailto:")):
        return match.group(0)  # Keep as-is
    # Block everything else (javascript:, data:, vbscript:, etc.)
    return f"{attr}={quote}#{quote}"
```

Site output includes static pages and machine-readable indexes:

```python
# Prevent GitHub Pages from running Jekyll
(output_dir / ".nojekyll").touch()
...
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
...
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
...
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)
...
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

The generated UI is enhanced client-side with static assets:
- `search.js` (Cmd/Ctrl+K modal search over `search-index.json`),
- `copy.js` (copy buttons on code blocks),
- `callouts.js` (blockquote callout classes),
- `theme.js`, `scrollspy.js`, `codelabels.js`, `github.js`.

## Configuration and Runtime

App settings (Pydantic settings model):

```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

Environment example:

```dotenv
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

Container compose:

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

Container entrypoint:

```dockerfile
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

> **Note:** `ADMIN_KEY` must be set and at least 16 characters, or startup exits.

## Testing and CI/CD Posture

The repository has broad unit/integration coverage (`tests/test_main.py`, `tests/test_storage.py`, `tests/test_generator.py`, `tests/test_renderer.py`, `tests/test_auth.py`, `tests/test_integration.py`, etc.).

Local test pipeline (`tox.toml`):

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

Local quality/security checks (`.pre-commit-config.yaml`) include:
- `ruff` + `ruff-format`,
- `mypy`,
- `detect-secrets`,
- `gitleaks`,
- `flake8` (with project-specific plugin usage).

> **Warning:** No in-repo hosted workflow definitions were found (for example, no `.github/workflows`), so remote CI/CD orchestration is external to this repository.


---

Source: core-concepts.md

# Core Concepts

`docsfy` organizes generated documentation around six core entities:

- **Project**: a repository identity (derived name + metadata).
- **Variant**: one generated output for a specific AI provider/model.
- **Owner**: the authenticated user who owns that project/variant namespace.
- **Role**: authorization level (`admin`, `user`, `viewer`).
- **Session**: login state via secure cookie and DB-backed expiry.
- **Generated artifacts**: cached markdown and rendered static site files.

> **Note:** In `docsfy`, project names are repository-centric, but storage and access are owner-scoped to avoid cross-user collisions.

## 1) Projects

A generation request must include exactly one source (`repo_url` or `repo_path`), and `project_name` is derived from that source.

```10:30:src/docsfy/models.py
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self
```

```55:64:src/docsfy/models.py
@property
def project_name(self) -> str:
    if self.repo_url:
        name = self.repo_url.rstrip("/").split("/")[-1]
        if name.endswith(".git"):
            name = name[:-4]
        return name
    if self.repo_path:
        return Path(self.repo_path).resolve().name
    return "unknown"
```

Projects are tracked in SQLite with generation metadata (`status`, commit SHA, page count, plan JSON, timestamps).

```56:73:src/docsfy/storage.py
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    repo_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'generating',
    current_stage TEXT,
    last_commit_sha TEXT,
    last_generated TEXT,
    page_count INTEGER DEFAULT 0,
    error_message TEXT,
    plan_json TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

## 2) Variants

A **variant** is one `(project, provider, model, owner)` tuple.  
This is the real unit of generation, status, deletion, serving, and download.

```282:290:src/docsfy/storage.py
"""INSERT INTO projects (name, ai_provider, ai_model, owner, repo_url, status, updated_at)
   VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
   ON CONFLICT(name, ai_provider, ai_model, owner) DO UPDATE SET
   repo_url = excluded.repo_url,
   status = excluded.status,
   error_message = NULL,
   current_stage = NULL,
   updated_at = CURRENT_TIMESTAMP""",
(name, ai_provider, ai_model, owner, repo_url, status),
```

Variant-specific API/docs routes are explicit:

```1019:1041:src/docsfy/main.py
@app.get("/api/projects/{name}/{provider}/{model}")
async def get_variant_details(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> dict[str, str | int | None]:
    name = _validate_project_name(name)
    project = await _resolve_project(
        request, name, ai_provider=provider, ai_model=model
    )

    return project


@app.delete("/api/projects/{name}/{provider}/{model}")
async def delete_variant(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> dict[str, str]:
```

```1379:1386:src/docsfy/main.py
@app.get("/docs/{project}/{provider}/{model}/{path:path}")
async def serve_variant_docs(
    request: Request,
    project: str,
    provider: str,
    model: str,
    path: str = "index.html",
) -> FileResponse:
```

## 3) Owners

Owner is set from the authenticated username at generation time:

```457:484:src/docsfy/main.py
project_name = gen_request.project_name
owner = request.state.username

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
if not ai_model:
    raise HTTPException(status_code=400, detail="AI model must be specified.")

# Fix 6: Use lock to prevent race condition between check and add
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
async with _gen_lock:
    if gen_key in _generating:
        raise HTTPException(
            status_code=409,
            detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
        )

    await save_project(
        name=project_name,
        repo_url=gen_request.repo_url or gen_request.repo_path or "",
        status="generating",
        ai_provider=ai_provider,
        ai_model=ai_model,
        owner=owner,
    )
```

Owner is also part of filesystem layout:

```501:519:src/docsfy/storage.py
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    if not ai_provider or not ai_model:
        msg = "ai_provider and ai_model are required for project directory paths"
        raise ValueError(msg)
    # Sanitize path segments to prevent traversal
    for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
        if (
            "/" in segment
            or "\\" in segment
            or ".." in segment
            or segment.startswith(".")
        ):
            msg = f"Invalid {segment_name}: '{segment}'"
            raise ValueError(msg)
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

Cross-owner sharing is controlled through `project_access` and scoped by `(project_name, project_owner, username)`.

```237:243:src/docsfy/storage.py
CREATE TABLE IF NOT EXISTS project_access (
    project_name TEXT NOT NULL,
    project_owner TEXT NOT NULL DEFAULT '',
    username TEXT NOT NULL,
    PRIMARY KEY (project_name, project_owner, username)
)
```

> **Warning:** For admin users, if multiple owners have the same variant `(name/provider/model)`, owner is ambiguous and some variant routes return `409` until disambiguated.

```241:246:src/docsfy/main.py
if len(distinct_owners) > 1:
    raise HTTPException(
        status_code=409,
        detail="Multiple owners found for this variant, please specify owner",
    )
```

## 4) Roles

`docsfy` defines three roles:

- **admin**: full access, including user and access management endpoints.
- **user**: read/write project operations (generate, abort, delete) within accessible scope.
- **viewer**: read-only access (dashboard/docs/download/status), no write operations.

```609:623:src/docsfy/storage.py
VALID_ROLES = frozenset({"admin", "user", "viewer"})


async def create_user(username: str, role: str = "user") -> tuple[str, str]:
    """Create a user and return (username, raw_api_key)."""
    if username.lower() == "admin":
        msg = "Username 'admin' is reserved"
        raise ValueError(msg)
    if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$", username):
        msg = f"Invalid username: '{username}'. Must be 2-50 alphanumeric characters, dots, hyphens, underscores."
        raise ValueError(msg)
    if role not in VALID_ROLES:
        msg = f"Invalid role: '{role}'. Must be admin, user, or viewer."
        raise ValueError(msg)
```

```185:191:src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

## 5) Sessions

Authentication supports both:

- `Authorization: Bearer ...` (admin key or user API key)
- `docsfy_session` cookie (browser login flow)

```122:137:src/docsfy/main.py
# 1. Check Authorization header (API clients)
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer "):
    token = auth_header[7:]
    if token == settings.admin_key:
        is_admin = True
        username = "admin"
    else:
        user = await get_user_by_key(token)

# 2. Check session cookie (browser) -- opaque session token
if not user and not is_admin:
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        session = await get_session(session_token)
```

Sessions are opaque tokens, hashed at rest, and expire after 8 hours.

```21:23:src/docsfy/storage.py
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```

```686:713:src/docsfy/storage.py
async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )
        await db.commit()
    return token
```

```297:304:src/docsfy/main.py
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

> **Tip:** Keep `SECURE_COOKIES` enabled in production. Only set it to `false` for local HTTP development.

```27:28:.env.example
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

## 6) Generated Artifacts

Each completed variant writes structured outputs under owner/project/provider/model:

- `plan.json` (navigation plan used for rendering and status UI)
- `cache/pages/*.md` (cached AI markdown for incremental regeneration)
- `site/` (served static docs)

Site generation includes HTML, markdown copies, search index, and LLM-friendly files:

```223:290:src/docsfy/renderer.py
# Prevent GitHub Pages from running Jekyll
(output_dir / ".nojekyll").touch()

project_name: str = plan.get("project_name", "Documentation")
tagline: str = plan.get("tagline", "")
navigation: list[dict[str, Any]] = plan.get("navigation", [])
repo_url: str = plan.get("repo_url", "")

# ...
(output_dir / "index.html").write_text(index_html, encoding="utf-8")

# ...
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")

search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)

# Generate llms.txt files
llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")

llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

The orchestration layer persists the plan and final status:

```998:1015:src/docsfy/main.py
site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)

project_dir = get_project_dir(project_name, ai_provider, ai_model, owner)
(project_dir / "plan.json").write_text(json.dumps(plan, indent=2), encoding="utf-8")

page_count = len(pages)
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="ready",
    owner=owner,
    current_stage=None,
    last_commit_sha=commit_sha,
    page_count=page_count,
    plan_json=json.dumps(plan),
)
```

Persistent storage is typically mounted to `/data`:

```1:10:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
```

## 7) CI/CD and Quality Gate Context

This repository currently has no checked-in `.github` workflow directory, but quality checks are still codified via local/CI-capable tooling:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```43:61:.pre-commit-config.yaml
- repo: https://github.com/astral-sh/ruff-pre-commit
  rev: v0.15.2
  hooks:
    - id: ruff
    - id: ruff-format

- repo: https://github.com/gitleaks/gitleaks
  rev: v8.30.0
  hooks:
    - id: gitleaks

- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v1.19.1
  hooks:
    - id: mypy
```

In practice, these concepts fit together as:

1. Authenticated user (owner + role) submits generation request.
2. Request creates/updates a project variant.
3. Background pipeline plans, generates, renders artifacts.
4. Session-scoped or bearer-scoped access controls who can view/manage each variant.
5. Static artifacts are served directly or downloaded as `.tar.gz`.


---

Source: generation-lifecycle.md

# Generation Lifecycle

docsfy runs generation as a background task per **variant** (`owner/project/provider/model`).  
A variant starts in `generating`, moves through internal stages, and finishes as `ready`, `error`, or `aborted`.

## 1) Request Intake and Variant Locking

Generation starts at `POST /api/generate`. The request model enforces source rules (`repo_url` XOR `repo_path`) and derives `project_name`.

```10:64:src/docsfy/models.py
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self
```

The API path enforces permissions, prevents duplicate in-flight generation for the same variant key, persists `status="generating"`, then starts `_run_generation()` as an async task.

```422:505:src/docsfy/main.py
@app.post("/api/generate", status_code=202)
async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]:
    _require_write_access(request)
    # Fix 9: Local repo path access requires admin privileges
    if gen_request.repo_path and not request.state.is_admin:
        raise HTTPException(
            status_code=403,
            detail="Local repo path access requires admin privileges",
        )

    # ... snip ...

    # Fix 6: Use lock to prevent race condition between check and add
    gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
    async with _gen_lock:
        if gen_key in _generating:
            raise HTTPException(
                status_code=409,
                detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
            )

        await save_project(
            name=project_name,
            repo_url=gen_request.repo_url or gen_request.repo_path or "",
            status="generating",
            ai_provider=ai_provider,
            ai_model=ai_model,
            owner=owner,
        )

        try:
            task = asyncio.create_task(
                _run_generation(
                    repo_url=gen_request.repo_url,
                    repo_path=gen_request.repo_path,
                    project_name=project_name,
                    ai_provider=ai_provider,
                    ai_model=ai_model,
                    ai_cli_timeout=gen_request.ai_cli_timeout
                    or settings.ai_cli_timeout,
                    force=gen_request.force,
                    owner=owner,
                )
            )
            _generating[gen_key] = task
        except Exception:
            _generating.pop(gen_key, None)
            raise

    return {"project": project_name, "status": "generating"}
```

> **Note:** `repo_path` is admin-only and must point to an absolute path containing `.git`.

## 2) Clone (or Local SHA Resolution)

`_run_generation()` always enters `current_stage="cloning"` first.  
For remote sources, docsfy performs a shallow clone (`--depth 1`) and resolves HEAD SHA.  
For local sources, it skips clone and reads local HEAD SHA directly.

```720:789:src/docsfy/main.py
async def _run_generation(
    repo_url: str | None,
    repo_path: str | None,
    project_name: str,
    ai_provider: str,
    ai_model: str,
    ai_cli_timeout: int,
    force: bool = False,
    owner: str = "",
) -> None:
    gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
    try:
        # ... snip ...
        await update_project_status(
            project_name,
            ai_provider,
            ai_model,
            status="generating",
            owner=owner,
            current_stage="cloning",
        )

        if repo_path:
            # Local repository - use directly, no cloning needed
            local_path, commit_sha = get_local_repo_info(Path(repo_path))
            await _generate_from_path(
                local_path,
                commit_sha,
                repo_url or repo_path,
                project_name,
                ai_provider,
                ai_model,
                ai_cli_timeout,
                force,
                owner,
            )
        else:
            # Remote repository - clone to temp dir
            if repo_url is None:
                msg = "repo_url must be provided for remote repositories"
                raise ValueError(msg)
            with tempfile.TemporaryDirectory() as tmp_dir:
                repo_dir, commit_sha = await asyncio.to_thread(
                    clone_repo, repo_url, Path(tmp_dir)
                )
                await _generate_from_path(
                    repo_dir,
                    commit_sha,
                    repo_url or "",
                    project_name,
                    ai_provider,
                    ai_model,
                    ai_cli_timeout,
                    force,
                    owner,
                )
```

```21:45:src/docsfy/repository.py
def clone_repo(repo_url: str, base_dir: Path) -> tuple[Path, str]:
    repo_name = extract_repo_name(repo_url)
    repo_path = base_dir / repo_name
    logger.info(f"Cloning {repo_name} to {repo_path}")
    result = subprocess.run(
        ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)],
        capture_output=True,
        text=True,
        timeout=300,
    )
    if result.returncode != 0:
        msg = f"Clone failed: {result.stderr or result.stdout}"
        raise RuntimeError(msg)
    sha_result = subprocess.run(
        ["git", "rev-parse", "HEAD"],
        cwd=repo_path,
        capture_output=True,
        text=True,
    )
    if sha_result.returncode != 0:
        msg = f"Failed to get commit SHA: {sha_result.stderr or sha_result.stdout}"
        raise RuntimeError(msg)
    commit_sha = sha_result.stdout.strip()
    logger.info(f"Cloned {repo_name} at commit {commit_sha[:8]}")
    return repo_path, commit_sha
```

## 3) Planning

After source resolution, docsfy sets `current_stage="planning"` and calls the planner prompt.  
The prompt explicitly tells the model to inspect source/config/tests/CI and output strict JSON.

```24:42:src/docsfy/prompts.py
def build_planner_prompt(project_name: str) -> str:
    return f"""You are a technical documentation planner. Explore this repository thoroughly.
Explore the source code, configuration files, tests, CI/CD pipelines, and project structure.
Do NOT rely on the README — understand the project from its code and configuration.

Then create a documentation plan as a JSON object. The plan should cover:
- Introduction and overview
- Installation / getting started
- Configuration (if applicable)
- Usage guides for key features
- API reference (if the project has an API)
- Any other sections that would help users understand and use this project

Project name: {project_name}

CRITICAL: Your response must be ONLY a valid JSON object. No text before or after. No markdown code blocks.

Output format:
{PLAN_SCHEMA}"""
```

The parsed plan is stored into DB (`plan_json`) before page generation so UI clients can show structure/progress.

## 4) Incremental Planning and Cache Decisions

When `force=true`, docsfy clears cached pages and resets `page_count` to `0`.  
Without force, it can short-circuit to `ready/up_to_date` if commit SHA did not change.

```832:867:src/docsfy/main.py
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
    # Reset page count so API shows 0 during regeneration
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="generating",
        owner=owner,
        page_count=0,
    )
else:
    existing = await get_project(
        project_name, ai_provider=ai_provider, ai_model=ai_model, owner=owner
    )
    if existing and existing.get("last_generated"):
        old_sha = (
            str(existing["last_commit_sha"])
            if existing.get("last_commit_sha")
            else None
        )
        if old_sha == commit_sha:
            logger.info(
                f"[{project_name}] Project is up to date at {commit_sha[:8]}"
            )
            await update_project_status(
                project_name,
                ai_provider,
                ai_model,
                status="ready",
                owner=owner,
                current_stage="up_to_date",
            )
            return
```

If SHA changed and prior plan exists, docsfy runs incremental planning (`current_stage="incremental_planning"`) and removes only cached markdown files for affected slugs.

```913:955:src/docsfy/main.py
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="incremental_planning",
)
pages_to_regen = await run_incremental_planner(
    repo_dir,
    project_name,
    ai_provider,
    ai_model,
    changed_files,
    existing_plan,
    ai_cli_timeout,
)
if pages_to_regen != ["all"]:
    # Delete only the cached pages that need regeneration
    for slug in pages_to_regen:
        # Validate slug to prevent path traversal
        if (
            "/" in slug
            or "\\" in slug
            or ".." in slug
            or slug.startswith(".")
        ):
            logger.warning(
                f"[{project_name}] Skipping invalid slug from incremental planner: {slug}"
            )
            continue
        cache_file = cache_dir / f"{slug}.md"
        # Extra safety: ensure the resolved path is inside cache_dir
        try:
            cache_file.resolve().relative_to(cache_dir.resolve())
        except ValueError:
            logger.warning(
                f"[{project_name}] Path traversal attempt in slug: {slug}"
            )
            continue
        if cache_file.exists():
            cache_file.unlink()
    use_cache = True
```

> **Tip:** Use `force: true` for a guaranteed clean rebuild when changing model/provider behavior.

## 5) Page Generation

docsfy sets `current_stage="generating_pages"` and calls `generate_all_pages()` with concurrency cap `MAX_CONCURRENT_PAGES = 5`.

Each page:
- Validates slug safety
- Uses cache if enabled
- Calls AI for markdown
- Writes cache file
- Updates `page_count` during generation

```66:131:src/docsfy/generator.py
async def generate_page(
    repo_path: Path,
    slug: str,
    title: str,
    description: str,
    cache_dir: Path,
    ai_provider: str,
    ai_model: str,
    ai_cli_timeout: int | None = None,
    use_cache: bool = False,
    project_name: str = "",
    owner: str = "",
) -> str:
    # Validate slug to prevent path traversal
    if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug:
        msg = f"Invalid page slug: '{slug}'"
        raise ValueError(msg)

    cache_file = cache_dir / f"{slug}.md"
    if use_cache and cache_file.exists():
        logger.debug(f"[{_label}] Using cached page: {slug}")
        return cache_file.read_text(encoding="utf-8")

    # ... AI call snip ...

    output = _strip_ai_preamble(output)
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file.write_text(output, encoding="utf-8")

    # Update page count in DB if project_name provided
    if project_name:
        existing_pages = len(list(cache_dir.glob("*.md")))
        await update_project_status(
            project_name,
            ai_provider,
            ai_model,
            owner=owner,
            status="generating",
            page_count=existing_pages,
        )
```

```168:201:src/docsfy/generator.py
coroutines = [
    generate_page(
        repo_path=repo_path,
        slug=p["slug"],
        title=p["title"],
        description=p["description"],
        cache_dir=cache_dir,
        ai_provider=ai_provider,
        ai_model=ai_model,
        ai_cli_timeout=ai_cli_timeout,
        use_cache=use_cache,
        project_name=project_name,
        owner=owner,
    )
    for p in all_pages
]

results = await run_parallel_with_limit(
    coroutines, max_concurrency=MAX_CONCURRENT_PAGES
)
pages: dict[str, str] = {}
for page_info, result in zip(all_pages, results):
    if isinstance(result, Exception):
        logger.warning(
            f"[{_label}] Page generation failed for '{page_info['slug']}': {result}"
        )
        pages[page_info["slug"]] = (
            f"# {page_info['title']}\n\n*Documentation generation failed.*"
        )
    else:
        pages[page_info["slug"]] = result
```

## 6) Rendering and Publish

After markdown generation, docsfy sets `current_stage="rendering"` and renders final static output.  
`render_site()` recreates output, copies assets, writes both HTML and markdown pages, search index, and `llms` files.

```215:292:src/docsfy/renderer.py
def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None:
    if output_dir.exists():
        shutil.rmtree(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    assets_dir = output_dir / "assets"
    assets_dir.mkdir(exist_ok=True)

    # Prevent GitHub Pages from running Jekyll
    (output_dir / ".nojekyll").touch()

    # ... snip ...

    for idx, slug_info in enumerate(valid_slug_order):
        # ... snip ...
        (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
        (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")

    search_index = _build_search_index(valid_pages, plan)
    (output_dir / "search-index.json").write_text(
        json.dumps(search_index), encoding="utf-8"
    )

    # Generate llms.txt files
    llms_txt = _build_llms_txt(plan)
    (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")

    llms_full_txt = _build_llms_full_txt(plan, valid_pages)
    (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

Final publish state:

```988:1015:src/docsfy/main.py
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="rendering",
    page_count=len(pages),
)

site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)

# ... snip ...

await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="ready",
    owner=owner,
    current_stage=None,
    last_commit_sha=commit_sha,
    page_count=page_count,
    plan_json=json.dumps(plan),
)
```

## Statuses and Stages

### Statuses

`storage.py` defines canonical lifecycle statuses:

```17:17:src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

| Status | Meaning | Terminal |
|---|---|---|
| `generating` | Task is active | No |
| `ready` | Docs published (or no-op `up_to_date`) | Yes |
| `error` | Generation failed | Yes |
| `aborted` | Generation canceled by user/task | Yes |

### `current_stage` values used in lifecycle

- `cloning`
- `planning`
- `incremental_planning`
- `generating_pages`
- `rendering`
- `up_to_date` (ready/no-op)
- `null` (done/aborted)

> **Note:** The status page timeline UI is hardcoded to `cloning`, `planning`, `generating_pages`, and `rendering`; `incremental_planning` is a backend stage but not in the stage-order array.

## 7) Monitoring in UI and API

`/status/{name}/{provider}/{model}` computes total planned pages from `plan_json`, then the page JS polls variant details every 3 seconds.

```369:401:src/docsfy/main.py
@app.get("/status/{name}/{provider}/{model}", response_class=HTMLResponse)
async def project_status_page(
    request: Request, name: str, provider: str, model: str
) -> HTMLResponse:
    # ... snip ...
    if project.get("plan_json"):
        try:
            plan_json = json.loads(str(project["plan_json"]))
            for group in plan_json.get("navigation", []):
                total_pages += len(group.get("pages", []))
        except (json.JSONDecodeError, TypeError):
            plan_json = None
```

```948:1063:src/docsfy/templates/status.html
var PROJECT_NAME = {{ project.name | tojson }};
var PROJECT_PROVIDER = {{ project.ai_provider | tojson }};
var PROJECT_MODEL = {{ project.ai_model | tojson }};
var POLL_INTERVAL_MS = 3000;

var previousPageCount = {{ (project.page_count or 0) | tojson }};
var currentStatus = {{ project.status | tojson }};
var currentStage = {{ (project.current_stage or '') | tojson }} || null;

var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering'];
```

## 8) Ready, Error, and Aborted End States

### Ready
- Final state after successful render
- Also used for no-op updates with `current_stage="up_to_date"`
- Download endpoint requires `ready`

```1086:1091:src/docsfy/main.py
if project["status"] != "ready":
    raise HTTPException(status_code=400, detail="Variant not ready")
project_owner = str(project.get("owner", ""))
site_dir = get_project_site_dir(name, provider, model, project_owner)
if not site_dir.exists():
    raise HTTPException(status_code=404, detail="Site not found")
```

### Error
- Set when CLI availability fails or any unhandled exception occurs
- Carries `error_message`
- UI shows retry controls for `error` and `aborted`

### Aborted
- Variant abort endpoint cancels task, waits up to 5s, then marks `aborted`

```642:717:src/docsfy/main.py
@app.post("/api/projects/{name}/{provider}/{model}/abort")
async def abort_variant(
    request: Request, name: str, provider: str, model: str
) -> dict[str, str]:
    # ... snip ...
    task.cancel()
    try:
        await asyncio.wait_for(task, timeout=5.0)
    except asyncio.CancelledError:
        pass
    except asyncio.TimeoutError as exc:
        raise HTTPException(
            status_code=409,
            detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.",
        ) from exc

    await update_project_status(
        name,
        provider,
        model,
        status="aborted",
        owner=key_owner,
        error_message="Generation aborted by user",
        current_stage=None,
    )
```

> **Warning:** On server startup, any orphaned `generating` rows are automatically converted to `error` with `"Server restarted during generation"`.

```182:185:src/docsfy/storage.py
# Reset orphaned "generating" projects from previous server run
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
```

## 9) Storage Layout and Runtime Configuration

Variant artifacts are stored under owner/project/provider/model paths:

```501:530:src/docsfy/storage.py
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    # ... snip ...
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model

def get_project_site_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "site"

def get_project_cache_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
```

Relevant runtime config:

```1:8:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

```1:13:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

## 10) How Lifecycle Behavior Is Validated

Integration tests verify the full mocked flow (`generate -> ready -> serve -> download`), and storage tests verify restart recovery behavior.

```52:109:tests/test_integration.py
async def test_full_flow_mock(client: AsyncClient, tmp_path: Path) -> None:
    """Test the full generate -> status -> download flow with mocked AI."""
    # ... snip ...
    await _run_generation(
        repo_url="https://github.com/org/test-repo.git",
        repo_path=None,
        project_name="test-repo",
        ai_provider="claude",
        ai_model="opus",
        ai_cli_timeout=60,
        owner="admin",
    )

    # Check status
    response = await client.get("/api/status")
    assert response.status_code == 200
    projects = response.json()["projects"]
    assert len(projects) == 1
    assert projects[0]["name"] == "test-repo"
    assert projects[0]["status"] == "ready"
```

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** This repository does not include a checked-in `.github/workflows` directory; automation in-repo is defined via `tox` and `.pre-commit-config.yaml`.


---

Source: prerequisites.md

# Prerequisites

Before running `docsfy`, make sure your environment has Python, `uv`, `git`, one supported AI CLI with credentials, and a valid `ADMIN_KEY`.

## Python and `uv`

`docsfy` requires Python 3.12+.

```toml
[project]
name = "docsfy"
version = "0.1.0"
description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos"
requires-python = ">=3.12"
dependencies = [
    "ai-cli-runner",
    "fastapi",
    "uvicorn",
    "pydantic-settings",
    "python-simple-logger",
    "aiosqlite",
    "jinja2",
    "markdown",
    "pygments",
    "python-multipart>=0.0.22",
]
```

The lock file enforces the same minimum Python version:

```toml
version = 1
revision = 3
requires-python = ">=3.12"
```

The project workflow uses `uv` for install, running, and tests:

```dockerfile
RUN uv sync --frozen --no-dev
```

```dockerfile
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** `Settings` loads environment variables from `.env`, so your local config must be present there.

```python
model_config = SettingsConfigDict(
    env_file=".env",
    env_file_encoding="utf-8",
    extra="ignore",
)
```

## `git` is required

`docsfy` uses `git` to clone repositories and resolve commit SHAs:

```python
result = subprocess.run(
    ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)],
    capture_output=True,
    text=True,
    timeout=300,
)
if result.returncode != 0:
    msg = f"Clone failed: {result.stderr or result.stdout}"
    raise RuntimeError(msg)
sha_result = subprocess.run(
    ["git", "rev-parse", "HEAD"],
    cwd=repo_path,
    capture_output=True,
    text=True,
)
```

Local-path generation also requires a real git repo (`.git` must exist):

```python
if not (repo_p / ".git").exists():
    raise HTTPException(
        status_code=400,
        detail=f"Not a git repository (no .git directory): '{gen_request.repo_path}'",
    )
```

## Supported AI providers, CLIs, and credentials

Supported providers are fixed to `claude`, `gemini`, and `cursor`:

```python
ai_provider: Literal["claude", "gemini", "cursor"] | None = None
```

```python
assert VALID_AI_PROVIDERS == frozenset({"claude", "gemini", "cursor"})
```

`AI_CLI_TIMEOUT` must be greater than zero:

```python
ai_cli_timeout: int = Field(default=60, gt=0)
```

The container image installs all three AI CLIs:

```dockerfile
# Install Claude Code CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash"

# Install Cursor Agent CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash"

# Configure npm for non-root global installs and install Gemini CLI
RUN mkdir -p /home/appuser/.npm-global \
    && npm config set prefix '/home/appuser/.npm-global' \
    && npm install -g @google/gemini-cli
```

Credential/config variables expected in `.env`:

```dotenv
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

# Claude - Option 1: API Key
# ANTHROPIC_API_KEY=

# Claude - Option 2: Vertex AI
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=
# ANTHROPIC_VERTEX_PROJECT_ID=

# Gemini
# GEMINI_API_KEY=

# Cursor
# CURSOR_API_KEY=
```

The app checks provider CLI readiness before generation:

```python
cli_flags = ["--trust"] if ai_provider == "cursor" else None
available, msg = await check_ai_cli_available(
    ai_provider, ai_model, cli_flags=cli_flags
)
if not available:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="error",
        owner=owner,
        error_message=msg,
    )
    return
```

> **Tip:** You only need credentials for the provider selected in `AI_PROVIDER`, but that provider’s CLI must be installed and authenticated.

## Mandatory `ADMIN_KEY` setup

`ADMIN_KEY` is required and must be at least 16 characters.

```dotenv
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
```

Startup fails fast if `ADMIN_KEY` is missing or too short:

```python
settings = get_settings()
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

`ADMIN_KEY` is also used as the admin login secret:

```python
if username == "admin" and api_key == settings.admin_key:
    is_admin = True
    authenticated = True
```

And as the HMAC secret for API key hashing:

```python
secret = hmac_secret or os.getenv("ADMIN_KEY", "")
if not secret:
    msg = "ADMIN_KEY environment variable is required for key hashing"
    raise RuntimeError(msg)
```

> **Warning:** Rotating `ADMIN_KEY` invalidates existing API key hashes, and `ADMIN_KEY` users cannot rotate this through the API (`"ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead."`).

## Minimal `.env` baseline

```dotenv
ADMIN_KEY=<your-min-16-char-secret>
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
LOG_LEVEL=INFO
```

For local HTTP (non-HTTPS) development, this optional setting is available:

```dotenv
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

`docker-compose` also expects `.env`:

```yaml
services:
  docsfy:
    env_file: .env
```


---

Source: local-installation.md

# Local Installation

docsfy is a Python FastAPI service packaged with a `pyproject.toml` + `uv.lock` workflow.

```toml
[project]
name = "docsfy"
version = "0.1.0"
requires-python = ">=3.12"

[project.scripts]
docsfy = "docsfy.main:run"

[project.optional-dependencies]
dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"]
```

## Prerequisites

- Python `3.12+`
- `uv` (used for dependency and runtime commands in this repo)
- `git` (required for repository cloning and diffing during generation)

```python
def clone_repo(repo_url: str, base_dir: Path) -> tuple[Path, str]:
    result = subprocess.run(
        ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)],
        capture_output=True,
        text=True,
        timeout=300,
    )
```

> **Tip:** Generation supports `claude`, `gemini`, and `cursor` providers.

```python
ai_provider: Literal["claude", "gemini", "cursor"] | None = None
```

## 1) Install dependencies

From the repository root:

```bash
uv sync --frozen --no-dev
```

This is the same locked install pattern used by the project container build:

```dockerfile
RUN uv sync --frozen --no-dev
```

If you want to run tests later, the repo uses this dev command in `tox.toml`:

```toml
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

## 2) Configure local environment

Copy the env template and create a local data directory:

```bash
cp .env.example .env
mkdir -p data
```

Base `.env` values come from `.env.example`:

```dotenv
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

# ANTHROPIC_API_KEY=
# GEMINI_API_KEY=
# CURSOR_API_KEY=

LOG_LEVEL=INFO
# SECURE_COOKIES=false
```

Runtime defaults are defined in `src/docsfy/config.py`:

```python
admin_key: str = ""
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True
```

Storage paths are derived from `DATA_DIR`:

```python
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
```

Recommended local overrides in `.env`:

```dotenv
DATA_DIR=./data
SECURE_COOKIES=false
```

> **Warning:** `ADMIN_KEY` is mandatory and must be at least 16 characters, or startup exits.

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Warning:** For plain local HTTP (`http://127.0.0.1:8000`), keep `SECURE_COOKIES=false` so login sessions work in the browser.

## 3) Run the service

Start docsfy:

```bash
uv run docsfy
```

The entrypoint behavior is:

```python
reload = os.getenv("DEBUG", "").lower() == "true"
host = os.getenv("HOST", "127.0.0.1")
port = int(os.getenv("PORT", "8000"))
uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload)
```

Common local override (bind all interfaces, custom port, reload on code changes):

```bash
HOST=0.0.0.0 PORT=8800 DEBUG=true uv run docsfy
```

## 4) Verify startup

Health endpoint:

```bash
curl http://127.0.0.1:8000/health
```

Expected response:

```json
{"status":"ok"}
```

Open the login page: `http://127.0.0.1:8000/login`

- Username: `admin`
- Password: value of `ADMIN_KEY`

```python
if username == "admin" and api_key == settings.admin_key:
    is_admin = True
    authenticated = True
```

API auth smoke test (Bearer token):

```bash
export ADMIN_KEY="your-admin-key"
curl -sS http://127.0.0.1:8000/api/status \
  -H "Authorization: Bearer ${ADMIN_KEY}"
```

> **Note:** Only `/login` and `/health` are public routes by default.

```python
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
```

## 5) Optional: generation smoke test

```bash
curl -X POST http://127.0.0.1:8000/api/generate \
  -H "Authorization: Bearer ${ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"repo_url":"https://github.com/org/repo.git"}'
```

Generation checks AI CLI availability at runtime:

```python
available, msg = await check_ai_cli_available(
    ai_provider, ai_model, cli_flags=cli_flags
)
if not available:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="error",
        owner=owner,
        error_message=msg,
    )
    return
```

> **Note:** Install and authenticate the CLI for the provider you use (`claude`, `gemini`, or `cursor`) before running generation jobs.

## 6) Optional: run tests

```bash
uv run --extra dev pytest -n auto tests
```

This matches the project’s `tox.toml` command exactly.


---

Source: run-with-docker.md

# Run with Docker

This repository provides both a `Dockerfile` and a `docker-compose.yaml` to run `docsfy` as a containerized service on port `8000`.

## Prerequisites and Environment

Create a local `.env` file from `.env.example`:

```bash
cp .env.example .env
```

The shipped example includes required and optional runtime variables:

```env
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

# Claude - Option 1: API Key
# ANTHROPIC_API_KEY=

# Claude - Option 2: Vertex AI
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=
# ANTHROPIC_VERTEX_PROJECT_ID=

# Gemini
# GEMINI_API_KEY=

# Cursor
# CURSOR_API_KEY=

# Logging
LOG_LEVEL=INFO

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Startup enforces `ADMIN_KEY` presence and minimum length:

```python
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)

    _generating.clear()
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

> **Warning:** If `ADMIN_KEY` is missing or shorter than 16 characters, the container exits during startup.

> **Warning:** `SECURE_COOKIES` defaults to `true`. For plain HTTP local development, set `SECURE_COOKIES=false` in `.env` or browser login cookies may not persist.

---

## Run with `docker compose` (recommended)

Repository compose file:

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

Run it:

```bash
mkdir -p data
docker compose up --build
```

Detached mode:

```bash
docker compose up -d --build
```

Stop and remove container/network:

```bash
docker compose down
```

---

## Run directly from `Dockerfile`

The image is multi-stage (`builder` + runtime), installs dependencies with `uv`, and runs as non-root `appuser`:

```dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
COPY --from=ghcr.io/astral-sh/uv:0.5.14 /uv /usr/local/bin/uv
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*
COPY pyproject.toml uv.lock ./
COPY src/ src/
RUN uv sync --frozen --no-dev

FROM python:3.12-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
    bash \
    git \
    curl \
    nodejs \
    npm \
    && rm -rf /var/lib/apt/lists/*
```

Runtime data, health check, and entrypoint:

```dockerfile
RUN useradd --create-home --shell /bin/bash -g 0 appuser \
    && mkdir -p /data \
    && chown appuser:0 /data \
    && chmod -R g+w /data

USER appuser
ENV PATH="/home/appuser/.local/bin:/home/appuser/.npm-global/bin:${PATH}"
ENV HOME="/home/appuser"

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

Build and run:

```bash
docker build -t docsfy:local .
mkdir -p data
docker run --rm -p 8000:8000 --env-file .env -v "$(pwd)/data:/data" docsfy:local
```

> **Note:** The container listens on internal port `8000` (`ENTRYPOINT` is fixed to `--port 8000`). Change host-side port with mappings like `-p 8080:8000`.

---

## Mounted Data Volume (`/data`)

Compose mounts host `./data` into container `/data`:

```yaml
volumes:
  - ./data:/data
```

Application defaults also target `/data`:

```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    data_dir: str = "/data"
```

Storage paths are derived from `DATA_DIR` and initialized on startup:

```python
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"

async def init_db(data_dir: str = "") -> None:
    ...
    DB_PATH.parent.mkdir(parents=True, exist_ok=True)
    PROJECTS_DIR.mkdir(parents=True, exist_ok=True)
```

Project artifacts are organized under provider/model-specific subdirectories:

```python
return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

The repository intentionally ignores local data folders:

```gitignore
# Data
data/
.dev/data/
```

> **Tip:** Back up `./data` (especially `docsfy.db` and `projects/`) to preserve generated docs and metadata across container rebuilds.

---

## Health Checks

Container-level health checks call the app endpoint:

```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1
```

Compose defines the same check:

```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
```

App endpoint implementation:

```python
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}
```

Behavior is covered in tests:

```python
async def test_health_is_public(unauthed_client: AsyncClient) -> None:
    """The /health endpoint should be accessible without authentication."""
    response = await unauthed_client.get("/health")
    assert response.status_code == 200
    assert response.json()["status"] == "ok"
```

Quick checks:

```bash
curl -f http://localhost:8000/health
docker compose ps
```

> **Warning:** `/health` currently reports only application liveness (`{"status":"ok"}`); it does not validate external AI credentials or downstream service readiness.

---

## CI/CD Status for Docker

No CI/CD workflow files are present in this repository (no `.github/workflows`, GitLab, CircleCI, Jenkins, or Buildkite pipeline definitions), so Docker image build/run behavior documented here is currently local/manual.


---

Source: first-docs-generation.md

# First Documentation Run

This guide walks you from first login to a generated, browsable docs site in `docsfy`.

## 1) Configure your environment

`docsfy` reads settings from `.env` (`pydantic-settings` in `src/docsfy/config.py`) and requires `ADMIN_KEY` at startup.

```bash
# .env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

```python
# src/docsfy/main.py
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Warning:** If you run over plain HTTP (for example `http://localhost:8000`), set `SECURE_COOKIES=false` in `.env`. Cookies are `secure=True` by default, so login sessions will not stick on HTTP.

## 2) Start `docsfy`

### Recommended: Docker Compose

```yaml
# docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

Run:

```bash
docker compose up --build
```

The container image installs AI CLIs during build:

```dockerfile
# Dockerfile
RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash"
RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash"
RUN mkdir -p /home/appuser/.npm-global \
    && npm config set prefix '/home/appuser/.npm-global' \
    && npm install -g @google/gemini-cli
```

### Local run (without Docker)

`pyproject.toml` defines a CLI entry point:

```toml
[project.scripts]
docsfy = "docsfy.main:run"
```

So after dependency setup, you can run:

```bash
uv run docsfy
```

`docsfy.main:run` defaults to `127.0.0.1:8000`.

## 3) Log in

Open: `http://localhost:8000/login`

The login form uses username + API key (labeled “Password” in the UI):

```html
<!-- src/docsfy/templates/login.html -->
<label for="username">Username</label>
<input type="text" id="username" name="username" ...>

<label for="api_key">Password</label>
<input type="password" id="api_key" name="api_key" ...>

<p>Admin login: username <strong>admin</strong> with the admin password.</p>
```

Backend auth logic:

```python
# src/docsfy/main.py
if username == "admin" and api_key == settings.admin_key:
    is_admin = True
    authenticated = True
else:
    user = await get_user_by_key(api_key)
    if user and user["username"] == username:
        authenticated = True
        is_admin = user.get("role") == "admin"
```

Session cookies are set as HTTP-only, strict same-site, 8-hour TTL:

```python
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

> **Note:** `SESSION_TTL_SECONDS` is `28800` (8 hours) in `src/docsfy/storage.py`.

## 4) Generate your first docs site

After login, go to dashboard (`/`) and use **Generate Documentation**.

```html
<!-- src/docsfy/templates/dashboard.html -->
<label for="gen-repo-url">Repository URL</label>
<input type="url" id="gen-repo-url" ... placeholder="https://github.com/org/repo" required>

<select id="gen-provider" class="form-select">
  <option value="claude">claude</option>
  <option value="gemini">gemini</option>
  <option value="cursor">cursor</option>
</select>

<input type="text" class="form-input" id="gen-model" ...>
<input type="checkbox" id="gen-force">
<button type="submit" class="btn btn-primary" id="gen-submit">Generate</button>
```

Frontend payload sent to the API:

```javascript
// src/docsfy/templates/dashboard.html
var body = {
    repo_url: repoUrl,
    ai_provider: provider,
    force: force
};
if (model) body.ai_model = model;

fetch('/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    credentials: 'same-origin',
    body: JSON.stringify(body)
})
```

Server request model constraints:

```python
# src/docsfy/models.py
if not self.repo_url and not self.repo_path:
    raise ValueError("Either 'repo_url' or 'repo_path' must be provided")
if self.repo_url and self.repo_path:
    raise ValueError("Provide either 'repo_url' or 'repo_path', not both")

https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$"
ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$"
```

Generation returns `202` with project name inferred from repo URL:

```python
# src/docsfy/main.py
return {"project": project_name, "status": "generating"}
```

```python
# tests/test_main.py
response = await client.post("/api/generate", json={"repo_url": "https://github.com/org/repo.git"})
assert response.status_code == 202
assert response.json()["project"] == "repo"
assert response.json()["status"] == "generating"
```

> **Warning:** Admin-only restriction applies to `repo_path`. Non-admin users get `403` for local path generation.
>
> **Warning:** Repo URLs resolving to localhost/private networks are rejected (`_reject_private_url` in `src/docsfy/main.py`).

## 5) Monitor generation

### From the dashboard

A generating variant shows a progress bar and a status link:

```html
<!-- src/docsfy/templates/dashboard.html -->
<span class="progress-text">Generating...</span>
<a href="/status/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}"
   target="_blank" class="status-link">View progress &rarr;</a>
```

Dashboard polling behavior:

```javascript
// src/docsfy/templates/dashboard.html
var statusPollInterval = null;      // Slow poll for status changes (10s)
var progressPollInterval = null;    // Fast poll for progress updates (5s)

statusPollInterval = setInterval(pollStatusChanges, 10000);
progressPollInterval = setInterval(pollProgressUpdates, 5000);
```

### From the status page

Status page polling behavior:

```javascript
// src/docsfy/templates/status.html
var POLL_INTERVAL_MS = 3000;
pollTimer = setInterval(pollProject, POLL_INTERVAL_MS);
```

Generation stage updates are written by backend as:

- `cloning`
- `planning`
- `incremental_planning`
- `generating_pages`
- `rendering`
- `up_to_date` (when no changes)

(from `_run_generation` and `_generate_from_path` in `src/docsfy/main.py`)

Ready-state messaging:

```html
<!-- src/docsfy/templates/status.html -->
<span id="success-text">
  {% if project.current_stage == 'up_to_date' %}
  Documentation is already up to date — no changes since last generation.
  {% else %}
  Documentation generated successfully!
  {% endif %}
</span>
```

> **Tip:** If you manually type URLs, always include provider/model segments from the current variant. The dashboard/status buttons build the correct URL for you.

## 6) Open and download your generated docs

When status is `ready`, use **View Documentation** or **Download**:

```html
<!-- src/docsfy/templates/status.html -->
<a href="/docs/{{ project.name }}/{{ project.ai_provider }}/{{ project.ai_model }}/"
   target="_blank" class="btn btn-primary" id="btn-view-docs">View Documentation</a>
<a href="/api/projects/{{ project.name }}/{{ project.ai_provider }}/{{ project.ai_model }}/download"
   class="btn btn-secondary" id="btn-download">Download</a>
```

Routes:

```python
# src/docsfy/main.py
@app.get("/docs/{project}/{provider}/{model}/{path:path}")  # variant-specific
@app.get("/docs/{project}/{path:path}")                      # latest ready variant
```

Integration tests confirm both variant and latest routes:

```python
# tests/test_integration.py
response = await client.get("/docs/test-repo/claude/opus/index.html")
assert response.status_code == 200

response = await client.get("/docs/test-repo/index.html")
assert response.status_code == 200

response = await client.get("/api/projects/test-repo/claude/opus/download")
assert response.headers["content-type"] == "application/gzip"
```

## 7) Where generated files are stored

Storage path is owner/project/provider/model scoped:

```python
# src/docsfy/storage.py
return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

Site directory:

```python
# src/docsfy/storage.py
return get_project_dir(name, ai_provider, ai_model, owner) / "site"
```

Renderer output includes:

- `index.html`
- `<slug>.html`
- `<slug>.md`
- `search-index.json`
- `llms.txt`
- `llms-full.txt`
- `.nojekyll`
- `assets/*`

```python
# src/docsfy/renderer.py
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
(output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8")
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

With Docker Compose, these are persisted under local `./data` because of `./data:/data`.

## 8) Optional sanity check after first run

Local test command defined in `tox.toml`:

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** This repository currently defines local quality gates (`tox`, `pre-commit`) but does not include a checked-in GitHub Actions workflow file.


---

Source: environment-variables.md

# Environment Variables

docsfy runtime configuration is defined in code and loaded via `pydantic-settings` from `.env` (plus environment variables).

```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```python
@lru_cache
def get_settings() -> Settings:
    return Settings()
```

> **Tip:** `get_settings()` is cached. After changing environment variables, restart the process to apply them.

## Core Runtime Variables

| Variable | Required | Default | Description |
| --- | --- | --- | --- |
| `ADMIN_KEY` | Yes | _(none)_ | Admin authentication secret. Required at startup, minimum length 16. Also used as HMAC secret for stored user API key hashes. |
| `AI_PROVIDER` | No | `claude` | Default AI provider used by dashboard + `/api/generate` when request does not specify one. Allowed providers: `claude`, `gemini`, `cursor`. |
| `AI_MODEL` | No | `claude-opus-4-6[1m]` | Default model name used when request omits `ai_model`. |
| `AI_CLI_TIMEOUT` | No | `60` | Default timeout for AI CLI calls (seconds). Must be `> 0`. |
| `LOG_LEVEL` | No | `INFO` | Logging level setting exposed in app config (`log_level`). |
| `DATA_DIR` | No | `/data` | Base directory for SQLite DB and generated artifacts. |
| `SECURE_COOKIES` | No | `true` | Controls `Secure` flag on session cookie. |

> **Note:** `LOG_LEVEL` is present in settings and `.env.example`; repository code does not directly call `setLevel()`, so final filtering behavior depends on `python-simple-logger` configuration.

## Validation and Fallback Behavior

`ADMIN_KEY` is enforced at app startup:

```python
settings = get_settings()
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

Provider/model/timeout defaulting in `/api/generate`:

```python
settings = get_settings()
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
...
ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout
```

Provider and model are validated before generation:

```python
if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
if not ai_model:
    raise HTTPException(status_code=400, detail="AI model must be specified.")
```

Timeout validation is strict in both settings and request schema:

```python
ai_cli_timeout: int = Field(default=60, gt=0)
```

```python
ai_cli_timeout: int | None = Field(default=None, gt=0)
```

## AI Provider Credential Variables

From the repository `.env.example`:

```env
# Claude - Option 1: API Key
# ANTHROPIC_API_KEY=

# Claude - Option 2: Vertex AI
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=
# ANTHROPIC_VERTEX_PROJECT_ID=

# Gemini
# GEMINI_API_KEY=

# Cursor
# CURSOR_API_KEY=
```

> **Note:** docsfy passes provider/model/timeout to `call_ai_cli(...)`; provider credential variables are expected to be present in the process environment for the installed CLIs.

## Storage and Security-Related Variables

`DATA_DIR` controls persistent paths:

```python
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
```

`ADMIN_KEY` is also used for API key hashing:

```python
secret = hmac_secret or os.getenv("ADMIN_KEY", "")
if not secret:
    msg = "ADMIN_KEY environment variable is required for key hashing"
    raise RuntimeError(msg)
return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()
```

> **Warning:** Rotating `ADMIN_KEY` invalidates existing stored user API key hashes. Users must regenerate API keys after rotation.

## Cookie Security (`SECURE_COOKIES`)

Session cookie flags are set from config:

```python
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

`.env.example` includes:

```env
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

> **Tip:** For local non-HTTPS development, set `SECURE_COOKIES=false` so browsers send the session cookie over HTTP.

## Process Runtime Variables (`docsfy` CLI)

When starting via the Python entrypoint (`docsfy` script), these are read:

```python
reload = os.getenv("DEBUG", "").lower() == "true"
host = os.getenv("HOST", "127.0.0.1")
port = int(os.getenv("PORT", "8000"))
uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload)
```

- `DEBUG`: enables `uvicorn` reload when set to `"true"`.
- `HOST`: bind address (default `127.0.0.1`).
- `PORT`: bind port (default `8000`).

> **Note:** In Docker, `HOST`/`PORT`/`DEBUG` are bypassed because the container entrypoint launches `uvicorn` with fixed arguments.

```dockerfile
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

## Docker Compose Environment Snippet

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

## Repository `.env` Template

```env
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

# Claude - Option 1: API Key
# ANTHROPIC_API_KEY=

# Claude - Option 2: Vertex AI
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=
# ANTHROPIC_VERTEX_PROJECT_ID=

# Gemini
# GEMINI_API_KEY=

# Cursor
# CURSOR_API_KEY=

# Logging
LOG_LEVEL=INFO

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

## Runtime Constants (Not Environment-Configurable)

These runtime settings exist in code but are not currently exposed as environment variables:

```python
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```

```python
MAX_CONCURRENT_PAGES = 5
```

```python
await asyncio.wait_for(task, timeout=5.0)
```


---

Source: ai-provider-setup.md

# AI Provider Setup

`docsfy` supports three provider options: `claude`, `gemini`, and `cursor`. Provider/model are treated as a first-class variant key, so the same repo can have multiple generated doc variants side by side.

```10:20:src/docsfy/models.py
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )
```

```1365:1370:src/docsfy/templates/dashboard.html
<label for="gen-provider">Provider</label>
<select id="gen-provider" class="form-select">
    <option value="claude"{% if default_provider == 'claude' %} selected{% endif %}>claude</option>
    <option value="gemini"{% if default_provider == 'gemini' %} selected{% endif %}>gemini</option>
    <option value="cursor"{% if default_provider == 'cursor' %} selected{% endif %}>cursor</option>
</select>
```

```3:11:src/docsfy/ai_client.py
from ai_cli_runner import (
    PROVIDERS,
    VALID_AI_PROVIDERS,
    ProviderConfig,
    call_ai_cli,
    check_ai_cli_available,
    get_ai_cli_timeout,
    run_parallel_with_limit,
)
```

> **Note:** `docsfy` delegates provider execution to `ai_cli_runner`; credentials are expected via environment variables consumed by provider CLIs.

## Credentials and Environment Variables

Use `.env` (loaded automatically by settings) to configure both app-level defaults and provider credentials.

```10:23:.env.example
# Claude - Option 1: API Key
# ANTHROPIC_API_KEY=

# Claude - Option 2: Vertex AI
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=
# ANTHROPIC_VERTEX_PROJECT_ID=

# Gemini
# GEMINI_API_KEY=

# Cursor
# CURSOR_API_KEY=
```

```10:13:src/docsfy/config.py
model_config = SettingsConfigDict(
    env_file=".env",
    env_file_encoding="utf-8",
    extra="ignore",
)
```

Set app defaults in `.env`:

```4:8:.env.example
# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

`ADMIN_KEY` is required at startup and must be at least 16 characters:

```82:89:src/docsfy/main.py
settings = get_settings()
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

If you run with Docker Compose, `.env` is wired automatically:

```1:8:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

## Provider CLI Prerequisites

The container image installs all three CLIs:

```26:57:Dockerfile
# Install bash (needed for CLI install scripts), git (required at runtime for gitpython), curl (for Claude CLI), and nodejs/npm (for Gemini CLI)
RUN apt-get update && apt-get install -y --no-install-recommends \
    bash \
    git \
    curl \
    nodejs \
    npm \
    && rm -rf /var/lib/apt/lists/*
...
# Install Claude Code CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash"

# Install Cursor Agent CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash"

# Configure npm for non-root global installs and install Gemini CLI
RUN mkdir -p /home/appuser/.npm-global \
    && npm config set prefix '/home/appuser/.npm-global' \
    && npm install -g @google/gemini-cli
```

## Model Selection Behavior

### 1) Server-side fallback and validation

If request values are omitted, `docsfy` falls back to settings defaults:

```454:466:src/docsfy/main.py
settings = get_settings()
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
project_name = gen_request.project_name
owner = request.state.username

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
if not ai_model:
    raise HTTPException(status_code=400, detail="AI model must be specified.")
```

Each `(project, provider, model)` is stored as a separate variant path:

```501:519:src/docsfy/storage.py
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    if not ai_provider or not ai_model:
        msg = "ai_provider and ai_model are required for project directory paths"
        raise ValueError(msg)
    ...
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

### 2) UI suggestions and auto-fill behavior

Model suggestions come from **ready** projects only:

```572:577:src/docsfy/storage.py
async def get_known_models() -> dict[str, list[str]]:
    """Get distinct ai_model values per ai_provider from completed projects."""
    async with aiosqlite.connect(DB_PATH) as db:
        cursor = await db.execute(
            "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model"
        )
```

When provider changes in the dashboard form:
- if current model is invalid for that provider, UI auto-fills the first known model
- if no known models exist for that provider, UI clears the model input

```1677:1697:src/docsfy/templates/dashboard.html
if (providerSelect && modelDropdown) {
    providerSelect.addEventListener('change', function() {
        if (_restoring) return;
        var newProvider = this.value;
        var modelsForProvider = knownModels[newProvider] || [];

        // If current model is not valid for the new provider, auto-fill
        if (modelInput) {
            var currentModel = modelInput.value;
            if (modelsForProvider.length > 0 && modelsForProvider.indexOf(currentModel) === -1) {
                modelInput.value = modelsForProvider[0];
                saveFormState();
            } else if (modelsForProvider.length === 0) {
                modelInput.value = '';
                modelInput.placeholder = 'Enter model name';
                saveFormState();
            }
        }

        filterModelOptions(modelDropdown, modelInput ? modelInput.value : '', newProvider);
    });
}
```

Generate request payload only includes `ai_model` when the input is non-empty:

```2043:2049:src/docsfy/templates/dashboard.html
var body = {
    repo_url: repoUrl,
    ai_provider: provider,
    force: force
};
if (model) body.ai_model = model;
```

Status page retry always sends the model input value:

```1367:1370:src/docsfy/templates/status.html
var payload = { repo_url: repoUrl };
if (providerSelect) payload.ai_provider = providerSelect.value;
if (modelInput) payload.ai_model = modelInput.value;
if (forceCheckbox && forceCheckbox.checked) payload.force = true;
```

> **Warning:** If `ai_model` is blank, server fallback uses `AI_MODEL` from settings. If you switched provider and left model empty, the fallback model may not match that provider.

> **Tip:** Keep `AI_PROVIDER` and `AI_MODEL` aligned in `.env`, and run one successful generation per provider/model pair to seed `known_models` suggestions.

### 3) Dynamic model list refresh

`known_models` is returned by `/api/status` and refreshed in the dashboard without full reload:

```409:419:src/docsfy/main.py
@app.get("/api/status")
async def status(request: Request) -> dict[str, Any]:
    ...
    known_models = await get_known_models()
    return {"projects": projects, "known_models": known_models}
```

```1886:1891:src/docsfy/templates/dashboard.html
// Update known models from the API so new models
// appear in dropdowns without a full page reload.
if (data.known_models) {
    knownModels = data.known_models;
    rebuildModelDropdownOptions();
}
```

## Cursor-Specific Behavior

For `cursor`, `docsfy` always adds `--trust` when checking availability and running generation calls.

```732:735:src/docsfy/main.py
cli_flags = ["--trust"] if ai_provider == "cursor" else None
available, msg = await check_ai_cli_available(
    ai_provider, ai_model, cli_flags=cli_flags
)
```

```41:49:src/docsfy/generator.py
# Build CLI flags based on provider
cli_flags = ["--trust"] if ai_provider == "cursor" else None
success, output = await call_ai_cli(
    prompt=prompt,
    cwd=repo_path,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    cli_flags=cli_flags,
)
```

> **Warning:** `cursor` runs with trust mode enabled by default in this app flow; only generate docs for repositories you trust.

## Secrets Hygiene in Tooling

`.env` is ignored by git, and pre-commit includes secret scanners:

```1:4:.gitignore
# Environment files with secrets
.env
.dev/.env
*.env.local
```

```38:52:.pre-commit-config.yaml
- repo: https://github.com/Yelp/detect-secrets
  rev: v1.5.0
  hooks:
    - id: detect-secrets
...
- repo: https://github.com/gitleaks/gitleaks
  rev: v8.30.0
  hooks:
    - id: gitleaks
```


---

Source: storage-paths.md

# Storage Paths and Data Layout

docsfy keeps **persistent runtime state** under `DATA_DIR`, with a clear split between:
- SQLite metadata (`docsfy.db`)
- per-variant filesystem artifacts (`projects/...`)
- generated static documentation site output (`site/...`)

## DATA_DIR Usage

`DATA_DIR` is a first-class setting, defaulting to `/data`, and is wired into startup DB initialization.

```python
# src/docsfy/config.py
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```python
# src/docsfy/main.py
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)

    _generating.clear()
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

```python
# src/docsfy/storage.py
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"

async def init_db(data_dir: str = "") -> None:
    global DB_PATH, DATA_DIR, PROJECTS_DIR
    if data_dir:
        DB_PATH = Path(data_dir) / "docsfy.db"
        DATA_DIR = Path(data_dir)
        PROJECTS_DIR = DATA_DIR / "projects"

    DB_PATH.parent.mkdir(parents=True, exist_ok=True)
    PROJECTS_DIR.mkdir(parents=True, exist_ok=True)
```

> **Note:** `.env.example` does not currently include `DATA_DIR`, but the app supports it via `Settings.data_dir` and `os.getenv("DATA_DIR", "/data")`.

## SQLite DB Location and Contents

SQLite DB path:
- `<DATA_DIR>/docsfy.db`

The DB is initialized in `init_db()` and includes project metadata plus auth/session data.

```python
# src/docsfy/storage.py
await db.execute("""
    CREATE TABLE IF NOT EXISTS projects (
        name TEXT NOT NULL,
        ai_provider TEXT NOT NULL DEFAULT '',
        ai_model TEXT NOT NULL DEFAULT '',
        owner TEXT NOT NULL DEFAULT '',
        repo_url TEXT NOT NULL,
        status TEXT NOT NULL DEFAULT 'generating',
        current_stage TEXT,
        last_commit_sha TEXT,
        last_generated TEXT,
        page_count INTEGER DEFAULT 0,
        error_message TEXT,
        plan_json TEXT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (name, ai_provider, ai_model, owner)
    )
""")
```

Additional tables created in the same function:
- `users`
- `project_access`
- `sessions`

`projects` uses a 4-part key `(name, ai_provider, ai_model, owner)`, which mirrors the on-disk variant path layout.

## Project Filesystem Layout

Project artifacts are stored under `<DATA_DIR>/projects/` and partitioned by owner, repo, provider, and model.

```python
# src/docsfy/storage.py
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    if not ai_provider or not ai_model:
        msg = "ai_provider and ai_model are required for project directory paths"
        raise ValueError(msg)
    # Sanitize path segments to prevent traversal
    for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
        if (
            "/" in segment
            or "\\" in segment
            or ".." in segment
            or segment.startswith(".")
        ):
            msg = f"Invalid {segment_name}: '{segment}'"
            raise ValueError(msg)
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model

def get_project_site_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "site"

def get_project_cache_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
```

Expected tree for one variant:

```text
<DATA_DIR>/
  docsfy.db
  projects/
    <owner-or-_default>/
      <project-name>/
        <ai-provider>/
          <ai-model>/
            plan.json
            cache/
              pages/
                <slug>.md
            site/
              .nojekyll
              index.html
              <slug>.html
              <slug>.md
              search-index.json
              llms.txt
              llms-full.txt
              assets/
                (copied files from src/docsfy/static/)
```

Owner fallback behavior is tested:

```python
# tests/test_storage.py
path = get_project_dir("my-repo", "claude", "opus", "")
assert "_default" in str(path)
```

## Project Cache Paths

Cache files are markdown pages stored at:

- `<DATA_DIR>/projects/<owner>/<project>/<provider>/<model>/cache/pages/<slug>.md`

Write/read behavior:

```python
# src/docsfy/generator.py
cache_file = cache_dir / f"{slug}.md"
if use_cache and cache_file.exists():
    logger.debug(f"[{_label}] Using cached page: {slug}")
    return cache_file.read_text(encoding="utf-8")

...
cache_dir.mkdir(parents=True, exist_ok=True)
cache_file.write_text(output, encoding="utf-8")
```

Invalidation behavior in generation flow:

```python
# src/docsfy/main.py
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
```

```python
# src/docsfy/main.py
cache_file = cache_dir / f"{slug}.md"
...
if cache_file.exists():
    cache_file.unlink()
```

- `force=true` removes the entire variant cache.
- incremental regeneration removes only selected cached pages.

## Generated Site Directories

Final rendered docs are written to each variant’s `site/` directory, while `plan.json` is written in the variant root.

```python
# src/docsfy/main.py
site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)

project_dir = get_project_dir(project_name, ai_provider, ai_model, owner)
(project_dir / "plan.json").write_text(json.dumps(plan, indent=2), encoding="utf-8")
```

`render_site()` fully rebuilds the output directory and writes the final artifact set:

```python
# src/docsfy/renderer.py
def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None:
    if output_dir.exists():
        shutil.rmtree(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    assets_dir = output_dir / "assets"
    assets_dir.mkdir(exist_ok=True)

    # Prevent GitHub Pages from running Jekyll
    (output_dir / ".nojekyll").touch()
    ...
    (output_dir / "index.html").write_text(index_html, encoding="utf-8")
    ...
    (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
    (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
    ...
    (output_dir / "search-index.json").write_text(
        json.dumps(search_index), encoding="utf-8"
    )
    ...
    (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
    (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

> **Warning:** `render_site()` deletes the previous `site/` directory before writing new output. Treat `site/` as generated output only.

## Container and Runtime Path Mapping

Containerized runs are explicitly wired to `/data` for persistence:

```yaml
# docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

```dockerfile
# Dockerfile
RUN useradd --create-home --shell /bin/bash -g 0 appuser \
    && mkdir -p /data \
    && chown appuser:0 /data \
    && chmod -R g+w /data
```

Generated data is intentionally not tracked in git:

```gitignore
# .gitignore
# Data
data/
.dev/data/
```

> **Tip:** In Docker deployments, back up the host-side `./data` directory to preserve both `docsfy.db` and generated docs artifacts.

## Ephemeral (Non-persistent) Paths

Not all file activity is under `DATA_DIR`:
- Remote repo cloning uses a temporary directory.
- Download archives are created as temporary `.tar.gz` files and removed after streaming.

```python
# src/docsfy/main.py
with tempfile.TemporaryDirectory() as tmp_dir:
    repo_dir, commit_sha = await asyncio.to_thread(
        clone_repo, repo_url, Path(tmp_dir)
    )
```

```python
# src/docsfy/main.py
tmp = tempfile.NamedTemporaryFile(suffix=".tar.gz", delete=False)
tar_path = Path(tmp.name)
tmp.close()
...
finally:
    tar_path.unlink(missing_ok=True)
```

> **Note:** This repository currently has no `.github/workflows/` or `.gitlab-ci.yml`; storage behavior is defined by runtime code and container configuration.


---

Source: session-cookie-settings.md

# Session and Cookie Settings

docsfy supports two authentication paths: Bearer tokens for API clients and cookies for browser sessions.

`src/docsfy/main.py`
```python
# 1. Check Authorization header (API clients)
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer "):
    token = auth_header[7:]
    if token == settings.admin_key:
        is_admin = True
        username = "admin"
    else:
        user = await get_user_by_key(token)

# 2. Check session cookie (browser) -- opaque session token
if not user and not is_admin:
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        session = await get_session(session_token)
```

## Secure Cookie Defaults

`SECURE_COOKIES` is enabled by default, and session cookies are set as `HttpOnly` with `SameSite=Strict`.

`src/docsfy/config.py`
```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

`src/docsfy/main.py`
```python
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

`src/docsfy/main.py`
```python
response.delete_cookie(
    "docsfy_session",
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
)
```

> **Warning:** With default settings, browsers do not send `Secure` cookies over plain HTTP. If you run docsfy on `http://localhost` and keep `SECURE_COOKIES=true`, login may appear to work but follow-up requests can redirect back to `/login`.

## SameSite Behavior

docsfy explicitly uses `SameSite=Strict` for session cookies, which blocks cookie sending in cross-site requests and helps reduce CSRF risk.

`src/docsfy/main.py`
```python
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

`tests/test_auth.py`
```python
async def test_login_cookie_has_samesite_strict(
    unauthed_client: AsyncClient,
) -> None:
    """Login cookie should have SameSite=strict."""
    response = await unauthed_client.post(
        "/login",
        data={"username": "admin", "api_key": TEST_ADMIN_KEY},
        follow_redirects=False,
    )
    set_cookie = response.headers.get("set-cookie", "")
    assert "samesite=strict" in set_cookie.lower()
```

> **Tip:** For cross-origin integrations, use `Authorization: Bearer <api_key>` rather than relying on browser cookies.

## TTL and Session Expiration

Session lifetime is 8 hours, enforced both in the cookie (`max_age`) and in server-side session lookup (`expires_at > now`).

`src/docsfy/storage.py`
```python
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```

`src/docsfy/storage.py`
```python
async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )
```

`src/docsfy/storage.py`
```python
async def get_session(token: str) -> dict[str, str | int | None] | None:
    """Look up a session. Returns None if expired or not found."""
    token_hash = _hash_session_token(token)
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        cursor = await db.execute(
            "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')",
            (token_hash,),
        )
```

`src/docsfy/main.py`
```python
await cleanup_expired_sessions()
```

`src/docsfy/storage.py`
```python
async def cleanup_expired_sessions() -> None:
    """Remove expired sessions.

    NOTE: This is called during application startup (lifespan) only.
    """
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute("DELETE FROM sessions WHERE expires_at <= datetime('now')")
        await db.commit()
```

> **Note:** Expired sessions are rejected even before cleanup runs, because `get_session()` filters by `expires_at` on every lookup.

## Opaque Session Tokens (Not API Keys)

Browser cookies carry a random session token, not the raw user/admin API key.

`tests/test_auth.py`
```python
async def test_session_cookie_is_opaque_token(unauthed_client: AsyncClient) -> None:
    """The session cookie should NOT contain the raw API key."""
    response = await unauthed_client.post(
        "/login",
        data={"username": "admin", "api_key": TEST_ADMIN_KEY},
        follow_redirects=False,
    )
    assert "docsfy_session" in response.cookies
    cookie_value = response.cookies["docsfy_session"]
    assert cookie_value != TEST_ADMIN_KEY
    assert len(cookie_value) > 20
```

## Local HTTP Development Adjustments

For local non-TLS development, explicitly disable secure cookies in `.env`.

`.env.example`
```bash
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

`docker-compose.yaml`
```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
```

Set this in your local `.env`:
```bash
SECURE_COOKIES=false
```

Then restart the app/container so `Settings` reloads the value.

> **Warning:** Do not use `SECURE_COOKIES=false` outside local HTTP development.

## Test/Automation Coverage for Cookie Rules

Cookie/session behavior is covered in unit tests and executed via `tox`.

`tox.toml`
```toml
envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

This includes tests for:
- `SameSite=strict` cookie headers
- opaque session cookie values
- session invalidation on logout
- expired-session cleanup behavior


---

Source: model-discovery-and-defaults.md

# Model Discovery and Defaults

docsfy builds its model picker suggestions from **real, successful generations** instead of a hardcoded model list.  
That keeps suggestions aligned with what has actually worked in your deployment.

## How a model becomes “known”

A model is considered known only when a project variant is stored with:

- non-empty `ai_provider`
- non-empty `ai_model`
- `status = 'ready'`

```python
async def get_known_models() -> dict[str, list[str]]:
    """Get distinct ai_model values per ai_provider from completed projects."""
    async with aiosqlite.connect(DB_PATH) as db:
        cursor = await db.execute(
            "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model"
        )
        rows = await cursor.fetchall()
        models: dict[str, list[str]] = {}
        for provider, model in rows:
            if provider not in models:
                models[provider] = []
            if model not in models[provider]:
                models[provider].append(model)
        return models
```

> **Warning:** `get_known_models()` is instance-wide. It does not filter by owner, so the suggestion catalog is shared across users in the same docsfy instance.

## When discovery happens in the generation lifecycle

Discovery is not a separate job. It happens naturally because variants are marked `ready`, then picked up by `get_known_models()`:

```python
if old_sha == commit_sha:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="ready",
        owner=owner,
        current_stage="up_to_date",
    )
    return
```

```python
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="ready",
    owner=owner,
    current_stage=None,
    last_commit_sha=commit_sha,
    page_count=page_count,
    plan_json=json.dumps(plan),
)
```

> **Tip:** If you want picker suggestions pre-populated for a provider/model pair, run one successful generation with that pair first.

## Default provider/model behavior

Defaults come from settings (`.env` or environment), with built-in fallbacks:

```python
class Settings(BaseSettings):
    ...
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"
    ai_cli_timeout: int = Field(default=60, gt=0)
```

```bash
# .env.example
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

If a generation request omits provider/model, API defaults are applied:

```python
settings = get_settings()
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
```

## How dashboard pickers are populated

The dashboard route injects both defaults and discovered models:

```python
known_models = await get_known_models()
...
html = template.render(
    grouped_projects=grouped,
    projects=projects,
    default_provider=settings.ai_provider,
    default_model=settings.ai_model,
    known_models=known_models,
    role=request.state.role,
    username=request.state.username,
)
```

The template uses those values for:

- the top-level Generate form
- each variant’s Regenerate controls

```html
<select id="gen-provider" class="form-select">
    <option value="claude"{% if default_provider == 'claude' %} selected{% endif %}>claude</option>
    <option value="gemini"{% if default_provider == 'gemini' %} selected{% endif %}>gemini</option>
    <option value="cursor"{% if default_provider == 'cursor' %} selected{% endif %}>cursor</option>
</select>

<input type="text" class="form-input" id="gen-model" value="{{ default_model }}" placeholder="Model name" autocomplete="off">
<div class="model-dropdown" id="model-dropdown">
    {% for provider, models in known_models.items() %}
    {% for model in models %}
    <div class="model-option" data-provider="{{ provider }}" data-value="{{ model }}">
        <span class="model-option-name">{{ model }}</span>
        <span class="model-option-provider">{{ provider }}</span>
    </div>
    {% endfor %}
    {% endfor %}
</div>
```

## Picker UX rules in the browser

The client receives `known_models` as JSON and enforces provider-aware filtering:

```javascript
var knownModels = {{ known_models | tojson }};

providerSelect.addEventListener('change', function() {
    if (_restoring) return;
    var newProvider = this.value;
    var modelsForProvider = knownModels[newProvider] || [];

    // If current model is not valid for the new provider, auto-fill
    if (modelInput) {
        var currentModel = modelInput.value;
        if (modelsForProvider.length > 0 && modelsForProvider.indexOf(currentModel) === -1) {
            modelInput.value = modelsForProvider[0];
            saveFormState();
        } else if (modelsForProvider.length === 0) {
            modelInput.value = '';
            modelInput.placeholder = 'Enter model name';
            saveFormState();
        }
    }

    filterModelOptions(modelDropdown, modelInput ? modelInput.value : '', newProvider);
});
```

The same provider-switch/autofill logic is also applied to per-variant regenerate controls.

> **Note:** Picker suggestions are assistive, not a strict backend whitelist. Users can type a model manually; backend validation only requires a valid provider and non-empty model string.

## Live model discovery updates in running dashboards

`/api/status` includes `known_models` on every poll response:

```python
@app.get("/api/status")
async def status(request: Request) -> dict[str, Any]:
    ...
    known_models = await get_known_models()
    return {"projects": projects, "known_models": known_models}
```

The dashboard polling loop updates model dropdowns without full refresh:

```javascript
if (data.known_models) {
    knownModels = data.known_models;
    rebuildModelDropdownOptions();
}
```

This means newly successful variants can teach new models to active dashboard sessions.

## Validation and quality signals (tests + CI entry points)

Model discovery and defaults are covered by tests:

```python
# tests/test_storage.py
models = await get_known_models()
assert "claude" in models
assert "opus-4-6" in models["claude"]
assert "sonnet-4-6" in models["claude"]
assert "gemini" in models
assert "gemini-2.5-pro" in models["gemini"]
```

```python
# tests/test_config.py
assert settings.ai_provider == "claude"
assert settings.ai_model == "claude-opus-4-6[1m]"
assert settings.ai_cli_timeout == 60
```

Pipeline entry points in this repo are defined via `tox` and pre-commit:

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```yaml
# .pre-commit-config.yaml (excerpt)
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    hooks:
      - id: ruff
      - id: ruff-format
```

> **Note:** No `.github/workflows` pipeline is committed in this repository; CI systems should invoke `tox` and pre-commit hooks directly.


---

Source: dashboard-workflow.md

# Dashboard Workflow

The dashboard is a server-rendered page at `/` (`src/docsfy/templates/dashboard.html`) with live updates from `/api/status`. It is built around **project variants** (`name + ai_provider + ai_model + owner`) and presents them grouped by repository name.

## How project listing works

On page load, the backend resolves visible projects based on the authenticated user role, then groups variants by repository name.

From `src/docsfy/main.py`:

```python
@app.get("/", response_class=HTMLResponse)
async def dashboard(request: Request) -> HTMLResponse:
    settings = get_settings()
    username = request.state.username
    is_admin = request.state.is_admin

    if is_admin:
        projects = await list_projects()  # admin sees all
    else:
        accessible = await get_user_accessible_projects(username)
        projects = await list_projects(owner=username, accessible=accessible)

    known_models = await get_known_models()

    # Group by repo name
    grouped: dict[str, list[dict[str, Any]]] = {}
    for p in projects:
        name = str(p["name"])
        if name not in grouped:
            grouped[name] = []
        grouped[name].append(p)

    template = _jinja_env.get_template("dashboard.html")
    html = template.render(
        grouped_projects=grouped,
        projects=projects,  # keep for backward compat
        default_provider=settings.ai_provider,
        default_model=settings.ai_model,
        known_models=known_models,
        role=request.state.role,
        username=request.state.username,
    )
    return HTMLResponse(content=html)
```

From `src/docsfy/storage.py` (project visibility and ordering):

```python
async def list_projects(
    owner: str | None = None,
    accessible: list[tuple[str, str]] | None = None,
) -> list[dict[str, str | int | None]]:
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        if owner is not None and accessible and len(accessible) > 0:
            # Build OR conditions for each (name, owner) pair
            conditions = ["(owner = ?)"]
            params: list[str] = [owner]
            for proj_name, proj_owner in accessible:
                conditions.append("(name = ? AND owner = ?)")
                params.extend([proj_name, proj_owner])
            query = f"SELECT * FROM projects WHERE {' OR '.join(conditions)} ORDER BY updated_at DESC"
            cursor = await db.execute(query, params)
        elif owner is not None:
            cursor = await db.execute(
                "SELECT * FROM projects WHERE owner = ? ORDER BY updated_at DESC",
                (owner,),
            )
        else:
            cursor = await db.execute("SELECT * FROM projects ORDER BY updated_at DESC")
        rows = await cursor.fetchall()
        return [dict(row) for row in rows]
```

## Variant cards and status-driven actions

Each project group contains one or more variant cards. Actions change based on variant status and role.

From `src/docsfy/templates/dashboard.html`:

```html
{% for repo_name, variants in grouped_projects.items() %}
<article class="project-group" data-repo="{{ repo_name }}">
    <div class="group-header">
        <span class="group-name">{{ repo_name }}</span>
        <span class="group-variant-count">{{ variants|length }} variant{{ 's' if variants|length > 1 else '' }}</span>
    </div>

    {% for variant in variants %}
    <div class="variant-card"
         data-project="{{ repo_name }}"
         data-provider="{{ variant.ai_provider }}"
         data-model="{{ variant.ai_model }}"
         data-status="{{ variant.status }}">

        {% if variant.status == 'ready' %}
        <div class="variant-actions">
            <a href="/docs/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/" target="_blank" class="btn btn-primary btn-sm">View Docs</a>
            <a href="/api/projects/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/download" class="btn btn-secondary btn-sm">Download</a>
            {% if role != 'viewer' %}
            <button class="btn btn-danger btn-sm" data-delete-variant="{{ repo_name }}/{{ variant.ai_provider }}/{{ variant.ai_model }}">Delete</button>
            {% endif %}
        </div>
        {% if role != 'viewer' %}
        {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }}
        {% endif %}

        {% elif variant.status == 'generating' %}
        <div class="variant-progress">
            <div class="progress-bar-container">
                <div class="progress-bar-wrapper">
                    <div class="progress-bar-fill" data-field="progress-bar" style="width: 0%"></div>
                </div>
            </div>
            <span class="progress-text">Generating...</span>
            <a href="/status/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}" target="_blank" class="status-link">View progress &rarr;</a>
            {% if role != 'viewer' %}
            <button class="btn btn-danger btn-sm" data-abort-variant="{{ repo_name }}/{{ variant.ai_provider }}/{{ variant.ai_model }}">Abort</button>
            {% endif %}
        </div>

        {% elif variant.status == 'error' or variant.status == 'aborted' %}
        <div class="variant-error">
            <span class="error-text">{{ variant.error_message }}</span>
        </div>
        <div class="variant-actions">
            {% if role != 'viewer' %}
            {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }}
            <button class="btn btn-danger btn-sm" data-delete-variant="{{ repo_name }}/{{ variant.ai_provider }}/{{ variant.ai_model }}">Delete</button>
            {% endif %}
        </div>
        {% endif %}
    </div>
    {% endfor %}
</article>
{% endfor %}
```

## Filtering and pagination

Filtering and pagination are done in the browser over already-rendered project groups.

From `src/docsfy/templates/dashboard.html`:

```javascript
var currentPage = 1;
var perPage = 10;

function getVisibleGroups() {
    /* Get project groups that match the search filter (not hidden by search) */
    return Array.from(document.querySelectorAll('.project-group')).filter(function(group) {
        return !group.classList.contains('search-hidden');
    });
}

function applyPagination() {
    var groups = getVisibleGroups();
    var totalPages = Math.max(1, Math.ceil(groups.length / perPage));
    if (currentPage > totalPages) currentPage = totalPages;

    var start = (currentPage - 1) * perPage;
    var end = start + perPage;

    groups.forEach(function(group, i) {
        group.style.display = (i >= start && i < end) ? '' : 'none';
    });

    var pageInfo = document.getElementById('page-info');
    var prevBtn = document.getElementById('prev-page');
    var nextBtn = document.getElementById('next-page');
    if (pageInfo) pageInfo.textContent = 'Page ' + currentPage + ' of ' + totalPages;
    if (prevBtn) prevBtn.disabled = currentPage <= 1;
    if (nextBtn) nextBtn.disabled = currentPage >= totalPages;
}
```

```javascript
var searchInput = document.getElementById('search-filter');
if (searchInput) {
    searchInput.addEventListener('input', function() {
        var query = this.value.toLowerCase().trim();
        var groups = document.querySelectorAll('.project-group');
        groups.forEach(function(group) {
            var name = group.getAttribute('data-repo').toLowerCase();
            if (!query || name.indexOf(query) !== -1) {
                group.classList.remove('search-hidden');
            } else {
                group.classList.add('search-hidden');
                group.style.display = 'none';
            }
        });
        currentPage = 1;
        applyPagination();
    });
}
```

> **Note:** Search matches only `data-repo` (repository name), not provider/model text, and pagination applies to visible project groups after filtering.

## Role-based UI and server enforcement

The dashboard has three roles: `admin`, `user`, and `viewer`.

- `admin`: sees all projects, admin link, owner badges, and write controls.
- `user`: sees owned + granted projects and write controls (no admin panel link).
- `viewer`: read-only dashboard (no generate/regenerate/delete/abort controls).

From `src/docsfy/templates/dashboard.html`:

```html
{% if role == 'admin' %}
<a href="/admin" class="top-bar-admin-link">Admin</a>
{% endif %}

{% if role != 'viewer' %}
<section class="generate-section">
    <h2>Generate Documentation</h2>
    ...
</section>
{% endif %}

{% if role == 'admin' and variant.owner %}
<span class="variant-owner">{{ variant.owner }}</span>
{% endif %}
```

From `src/docsfy/main.py`:

```python
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

```python
@app.post("/api/generate", status_code=202)
async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]:
    _require_write_access(request)
    # Fix 9: Local repo path access requires admin privileges
    if gen_request.repo_path and not request.state.is_admin:
        raise HTTPException(
            status_code=403,
            detail="Local repo path access requires admin privileges",
        )
```

From `tests/test_auth.py`:

```python
async def test_viewer_can_view_dashboard(_init_db: None) -> None:
    ...
    response = await ac.get("/")
    assert response.status_code == 200
    # Viewer should NOT see the generate form
    assert "Generate Documentation" not in response.text
```

```python
async def test_viewer_cannot_generate(_init_db: None) -> None:
    ...
    response = await ac.post(
        "/api/generate",
        json={
            "repo_url": "https://github.com/org/repo",
            "project_name": "test-proj",
        },
    )
    assert response.status_code == 403
    assert "Write access required" in response.json()["detail"]
```

> **Warning:** Write permissions are enforced server-side, not only hidden in the UI. Direct API calls from viewer accounts are rejected with `403`.

## Generation form behavior

The generate form is shown to non-viewers and includes:

- `Repository URL` (`required`, URL input)
- `Provider` (`claude`, `gemini`, `cursor`)
- `Model` (free text + provider-filtered combobox suggestions)
- `Force` checkbox

From `src/docsfy/templates/dashboard.html`:

```html
<form id="generate-form" autocomplete="off">
    <div class="form-row">
        <div class="form-group form-group-grow">
            <label for="gen-repo-url">Repository URL</label>
            <input type="url" id="gen-repo-url" class="form-input" placeholder="https://github.com/org/repo" required>
        </div>
    </div>
    <div class="form-row">
        <div class="form-group">
            <label for="gen-provider">Provider</label>
            <select id="gen-provider" class="form-select">
                <option value="claude"{% if default_provider == 'claude' %} selected{% endif %}>claude</option>
                <option value="gemini"{% if default_provider == 'gemini' %} selected{% endif %}>gemini</option>
                <option value="cursor"{% if default_provider == 'cursor' %} selected{% endif %}>cursor</option>
            </select>
        </div>
        <div class="form-group form-group-grow">
            <label for="gen-model">Model</label>
            <div class="model-combobox">
                <input type="text" class="form-input" id="gen-model" value="{{ default_model }}" placeholder="Model name" autocomplete="off">
            </div>
        </div>
        <div class="form-checkbox-group">
            <input type="checkbox" id="gen-force">
            <label for="gen-force">Force</label>
        </div>
        <button type="submit" class="btn btn-primary" id="gen-submit">
            Generate
        </button>
    </div>
</form>
```

### Form state persistence

The form persists state in `sessionStorage` and restores it after reloads (useful because status changes may trigger auto-reloads).

```javascript
function saveFormState() {
    var repoInput = document.getElementById('gen-repo-url');
    var providerSelect = document.getElementById('gen-provider');
    var modelInput = document.getElementById('gen-model');
    var forceCheck = document.getElementById('gen-force');

    if (repoInput) sessionStorage.setItem('docsfy-repo', repoInput.value);
    if (providerSelect) sessionStorage.setItem('docsfy-provider', providerSelect.value);
    if (modelInput) sessionStorage.setItem('docsfy-model', modelInput.value);
    if (forceCheck) sessionStorage.setItem('docsfy-force', forceCheck.checked ? '1' : '0');
}
```

### Submit behavior

On submit, the UI disables the button, sends `POST /api/generate`, shows a toast with a status link, and reloads.

```javascript
form.addEventListener('submit', function(e) {
    e.preventDefault();
    var repoUrl = document.getElementById('gen-repo-url').value.trim();
    var provider = document.getElementById('gen-provider').value;
    var model = document.getElementById('gen-model').value.trim();
    var force = document.getElementById('gen-force').checked;

    var body = {
        repo_url: repoUrl,
        ai_provider: provider,
        force: force
    };
    if (model) body.ai_model = model;

    fetch('/api/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        credentials: 'same-origin',
        redirect: 'manual',
        body: JSON.stringify(body)
    })
```

### Backend request validation

From `src/docsfy/models.py`:

```python
@model_validator(mode="after")
def validate_source(self) -> GenerateRequest:
    if not self.repo_url and not self.repo_path:
        msg = "Either 'repo_url' or 'repo_path' must be provided"
        raise ValueError(msg)
    if self.repo_url and self.repo_path:
        msg = "Provide either 'repo_url' or 'repo_path', not both"
        raise ValueError(msg)
    return self
```

```python
@property
def project_name(self) -> str:
    if self.repo_url:
        name = self.repo_url.rstrip("/").split("/")[-1]
        if name.endswith(".git"):
            name = name[:-4]
        return name
    if self.repo_path:
        return Path(self.repo_path).resolve().name
    return "unknown"
```

## Generation lifecycle, duplicate protection, and `force`

### API-side orchestration

From `src/docsfy/main.py`:

```python
settings = get_settings()
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
project_name = gen_request.project_name
owner = request.state.username

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
if not ai_model:
    raise HTTPException(status_code=400, detail="AI model must be specified.")

# Fix 6: Use lock to prevent race condition between check and add
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
async with _gen_lock:
    if gen_key in _generating:
        raise HTTPException(
            status_code=409,
            detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
        )

    await save_project(
        name=project_name,
        repo_url=gen_request.repo_url or gen_request.repo_path or "",
        status="generating",
        ai_provider=ai_provider,
        ai_model=ai_model,
        owner=owner,
    )

    task = asyncio.create_task(
        _run_generation(
            repo_url=gen_request.repo_url,
            repo_path=gen_request.repo_path,
            project_name=project_name,
            ai_provider=ai_provider,
            ai_model=ai_model,
            ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout,
            force=gen_request.force,
            owner=owner,
        )
    )
    _generating[gen_key] = task

return {"project": project_name, "status": "generating"}
```

### `force` and incremental behavior

From `src/docsfy/main.py` (`_generate_from_path`):

```python
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
    # Reset page count so API shows 0 during regeneration
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="generating",
        owner=owner,
        page_count=0,
    )
else:
    existing = await get_project(
        project_name, ai_provider=ai_provider, ai_model=ai_model, owner=owner
    )
    if existing and existing.get("last_generated"):
        old_sha = (
            str(existing["last_commit_sha"])
            if existing.get("last_commit_sha")
            else None
        )
        if old_sha == commit_sha:
            logger.info(
                f"[{project_name}] Project is up to date at {commit_sha[:8]}"
            )
            await update_project_status(
                project_name,
                ai_provider,
                ai_model,
                status="ready",
                owner=owner,
                current_stage="up_to_date",
            )
            return
```

```python
if old_sha and old_sha != commit_sha and not force and existing:
    changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
    ...
    pages_to_regen = await run_incremental_planner(
        repo_dir,
        project_name,
        ai_provider,
        ai_model,
        changed_files,
        existing_plan,
        ai_cli_timeout,
    )
    if pages_to_regen != ["all"]:
        # Delete only the cached pages that need regeneration
        for slug in pages_to_regen:
            ...
            cache_file = cache_dir / f"{slug}.md"
            ...
            if cache_file.exists():
                cache_file.unlink()
        use_cache = True
```

> **Tip:** Keep `Force` unchecked for normal runs to allow up-to-date short-circuiting and incremental regeneration from cache; use `Force` when you need a full clean rebuild.

## Polling behavior and live refresh

The dashboard uses two polling loops:

- `10s` status polling for variant state changes/new cards.
- `5s` progress polling while any variant is generating.

From `src/docsfy/templates/dashboard.html`:

```javascript
var statusPollInterval = null;      // Slow poll for status changes (10s)
var progressPollInterval = null;    // Fast poll for progress updates (5s)

function startStatusPolling() {
    if (isStatusPolling) return;
    isStatusPolling = true;
    statusPollInterval = setInterval(pollStatusChanges, 10000);
}

function startProgressPolling() {
    if (isProgressPolling) return;
    isProgressPolling = true;
    progressPollInterval = setInterval(pollProgressUpdates, 5000);
}
```

The same `/api/status` response also refreshes known model suggestions dynamically:

```javascript
if (data.known_models) {
    knownModels = data.known_models;
    rebuildModelDropdownOptions();
}
```

## Configuration relevant to dashboard workflow

### Default generation settings

From `.env.example`:

```env
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
# SECURE_COOKIES=false
```

From `src/docsfy/config.py`:

```python
class Settings(BaseSettings):
    ...
    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    ...
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

### Persistence/deployment

From `docker-compose.yaml`:

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

`./data` persists database state and generated project artifacts that drive dashboard listings/status across restarts.

## Verification references

- `tests/test_dashboard.py`: dashboard rendering, empty state, and project visibility.
- `tests/test_auth.py`: role behavior (admin/user/viewer), ownership scoping, access grants, and server-side permission checks.
- `tests/test_main.py`: `/api/generate`, duplicate generation conflicts (`409`), and endpoint behavior.
- `test-plans/e2e-ui-test-plan.md`: manual/E2E scenarios for search, pagination, regenerate, abort, and role-specific UI.

> **Note:** No `.github/workflows` pipeline files are present in this repository; dashboard workflow correctness is primarily represented by the test suite and E2E plan.


---

Source: managing-variants.md

# Managing Variants

A **variant** in docsfy is a generated documentation build for a specific combination of:

- project name
- AI provider
- AI model
- owner (user scope)

Variants are first-class objects across API, storage, UI, and docs serving routes.

## Variant identity and storage model

The `projects` table keys variants by `(name, ai_provider, ai_model, owner)`:

```python
await db.execute("""
    CREATE TABLE IF NOT EXISTS projects (
        name TEXT NOT NULL,
        ai_provider TEXT NOT NULL DEFAULT '',
        ai_model TEXT NOT NULL DEFAULT '',
        owner TEXT NOT NULL DEFAULT '',
        repo_url TEXT NOT NULL,
        status TEXT NOT NULL DEFAULT 'generating',
        current_stage TEXT,
        last_commit_sha TEXT,
        last_generated TEXT,
        page_count INTEGER DEFAULT 0,
        error_message TEXT,
        plan_json TEXT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (name, ai_provider, ai_model, owner)
    )
""")
```

Variant artifacts are also stored in owner-scoped filesystem paths:

```python
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    if not ai_provider or not ai_model:
        msg = "ai_provider and ai_model are required for project directory paths"
        raise ValueError(msg)
    # Sanitize path segments to prevent traversal
    for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
        if (
            "/" in segment
            or "\\" in segment
            or ".." in segment
            or segment.startswith(".")
        ):
            msg = f"Invalid {segment_name}: '{segment}'"
            raise ValueError(msg)
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

> **Note:** Owner scoping means two users can have the same `name/provider/model` variant without clobbering each other.

## Configure default provider/model

docsfy defaults come from environment-backed settings:

```yaml
# .env.example
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

```python
class Settings(BaseSettings):
    ...
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"
    ai_cli_timeout: int = Field(default=60, gt=0)
    data_dir: str = "/data"
```

At runtime, request values override defaults:

```python
settings = get_settings()
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
project_name = gen_request.project_name
owner = request.state.username
```

If you run with Docker Compose, generated variants persist under `./data`:

```yaml
services:
  docsfy:
    ...
    env_file: .env
    volumes:
      - ./data:/data
```

## Create a variant

Creation and regeneration both use `POST /api/generate`.

Request schema:

```python
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self
```

Example from tests:

```python
response = await client.post(
    "/api/generate",
    json={"repo_url": "https://github.com/org/repo.git", "force": True},
)
assert response.status_code == 202
```

When generation starts, docsfy stores the variant row and starts a background task:

```python
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
async with _gen_lock:
    if gen_key in _generating:
        raise HTTPException(
            status_code=409,
            detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
        )

    await save_project(
        name=project_name,
        repo_url=gen_request.repo_url or gen_request.repo_path or "",
        status="generating",
        ai_provider=ai_provider,
        ai_model=ai_model,
        owner=owner,
    )
    ...
```

> **Warning:** `repo_path` generation is admin-only, and viewers cannot create variants.
>
> - `Local repo path access requires admin privileges` (403)
> - `Write access required.` for viewer role (403)

## Regenerate a variant

### UI flow (dashboard + status page)

The dashboard renders per-variant controls with a Force checkbox:

```html
<label class="form-checkbox-sm"><input type="checkbox" data-regen-force="{{ repo_name }}"> Force</label>
<button class="btn btn-primary btn-sm" data-regenerate-variant="{{ repo_name }}" data-repo-url="{{ variant.repo_url }}">Regenerate</button>
```

Regenerate action sends a new `POST /api/generate` request:

```javascript
var body = {
    repo_url: repoUrl,
    ai_provider: provider,
    force: force
};
if (model) body.ai_model = model;

fetch('/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    credentials: 'same-origin',
    redirect: 'manual',
    body: JSON.stringify(body)
})
```

### Non-force regeneration (`force=false`)

docsfy tries to avoid unnecessary full rebuilds:

- if commit SHA is unchanged, it marks variant `ready` with stage `up_to_date`
- if commits differ, it can run incremental planning and selectively invalidate cached pages
- page generation uses cache when appropriate

```python
if existing and existing.get("last_generated"):
    old_sha = (
        str(existing["last_commit_sha"])
        if existing.get("last_commit_sha")
        else None
    )
    if old_sha == commit_sha:
        ...
        await update_project_status(
            project_name,
            ai_provider,
            ai_model,
            status="ready",
            owner=owner,
            current_stage="up_to_date",
        )
        return
...
if old_sha and old_sha != commit_sha and not force and existing:
    changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
    ...
```

```python
pages = await generate_all_pages(
    repo_path=repo_dir,
    plan=plan,
    cache_dir=cache_dir,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    use_cache=use_cache if use_cache else not force,
    project_name=project_name,
    owner=owner,
)
```

### Force regeneration (`force=true`)

Force mode clears the variant page cache and resets page count during regeneration:

```python
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
    # Reset page count so API shows 0 during regeneration
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="generating",
        owner=owner,
        page_count=0,
    )
```

> **Tip:** Use Force when you want a guaranteed clean rebuild (for example after major doc structure/model changes), not just incremental page updates.

## Delete variants safely

Variant deletion endpoint:

- `DELETE /api/projects/{name}/{provider}/{model}`

Safety behavior in backend:

1. Requires write access
2. Blocks deletion if the variant is currently generating (`409`)
3. Resolves the target variant with ownership/access rules
4. Deletes DB record
5. Deletes variant directory from disk

```python
for key in _generating:
    parts = key.split("/", 3)
    if (
        len(parts) == 4
        and parts[1] == name
        and parts[2] == provider
        and parts[3] == model
    ):
        raise HTTPException(
            status_code=409,
            detail=f"Cannot delete '{name}/{provider}/{model}' while generation is in progress. Abort first.",
        )

project = await _resolve_project(
    request, name, ai_provider=provider, ai_model=model
)

project_owner = str(project.get("owner", ""))
deleted = await delete_project(
    name, ai_provider=provider, ai_model=model, owner=project_owner
)
...
project_dir = get_project_dir(name, provider, model, project_owner)
if project_dir.exists():
    shutil.rmtree(project_dir)
```

The dashboard also forces an explicit confirmation:

```javascript
var confirmed = await modalConfirm('Delete Variant', 'Are you sure you want to delete "' + variantPath + '"? This will remove the generated documentation for this variant and cannot be undone.', true);
if (!confirmed) return;
...
fetch('/api/projects/' + encodeURIComponent(name) + '/' + encodeURIComponent(provider) + '/' + encodeURIComponent(model), {
    method: 'DELETE',
    credentials: 'same-origin',
    redirect: 'manual'
})
```

If the deleted variant was the last one for that project/owner pair, access grants are cleaned up:

```python
# Clean up project_access if no more variants remain for this name+owner
if cursor.rowcount > 0 and owner is not None:
    remaining = await db.execute(
        "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?",
        (name, owner),
    )
    row = await remaining.fetchone()
    if row and row[0] == 0:
        await db.execute(
            "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?",
            (name, owner),
        )
```

> **Warning:** You cannot delete an actively generating variant. Abort it first via `POST /api/projects/{name}/{provider}/{model}/abort`, then delete.

## Variant management endpoints (quick reference)

- `POST /api/generate`: create or regenerate a variant (`force` optional)
- `GET /api/projects/{name}`: list all variants for a project name
- `GET /api/projects/{name}/{provider}/{model}`: get one variant
- `POST /api/projects/{name}/{provider}/{model}/abort`: stop active generation for one variant
- `DELETE /api/projects/{name}/{provider}/{model}`: safely delete one variant
- `GET /docs/{project}/{provider}/{model}/{path:path}`: serve docs for one exact variant

## Behavior verification in tests

Variant lifecycle behavior is covered in tests, including force creation, duplicate protection, role restrictions, and delete flow:

```python
# tests/test_main.py
response = await client.post(
    "/api/generate",
    json={
        "repo_url": "https://github.com/org/repo.git",
        "ai_provider": "claude",
        "ai_model": "opus",
    },
)
assert response.status_code == 409
```

```python
# tests/test_auth.py
response = await ac.delete("/api/projects/proj-del/claude/opus")
assert response.status_code == 403
```

```python
# tests/test_integration.py
response = await client.delete("/api/projects/test-repo/claude/opus")
assert response.status_code == 200
```

Repository-level automated test command configuration:

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```


---

Source: status-and-progress.md

# Status and Progress Monitoring

The status page (`/status/{name}/{provider}/{model}`) is the per-variant monitoring view for doc generation. It combines backend state from the `projects` table with client-side polling and UI reconstruction (progress bar + activity log).

## Status Model and Data Source

Status values are defined centrally and stored in the `projects` row for each variant.

```python
# src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

The status page fetches variant state from the variant API endpoint:

```python
# src/docsfy/main.py
@app.get("/api/projects/{name}/{provider}/{model}")
async def get_variant_details(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> dict[str, str | int | None]:
    name = _validate_project_name(name)
    project = await _resolve_project(
        request, name, ai_provider=provider, ai_model=model
    )

    return project
```

Important fields used by the page:

- `status`: high-level state (`generating`, `ready`, `error`, `aborted`)
- `current_stage`: pipeline stage (`cloning`, `planning`, etc.)
- `page_count`: generated/cached page count
- `plan_json`: page plan (used to compute total pages)
- `error_message`: displayed on `error`/`aborted`
- `last_commit_sha`, `last_generated`: metadata updated on completion

## Polling Behavior

The status page uses interval polling (not WebSockets), with overlap protection and auth-aware redirect handling.

```javascript
// src/docsfy/templates/status.html
var POLL_INTERVAL_MS = 3000;

function startPolling() {
    if (pollTimer) return;
    pollTimer = setInterval(pollProject, POLL_INTERVAL_MS);
}

var _polling = false;
function pollProject() {
    if (_polling) return;
    _polling = true;
    fetch('/api/projects/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL), { credentials: 'same-origin', redirect: 'manual' })
        .then(function(res) {
            if (isAuthRedirect(res)) { handleAuthRedirect(); stopPolling(); return null; }
            if (!res.ok) throw new Error('Not found');
            return res.json();
        })
        .then(function(proj) {
            if (!proj) return;
            updateFromProject(proj);
        })
        .catch(function() {
            /* Silently fail; retry on next interval */
        })
        .finally(function() { _polling = false; });
}
```

> **Note:** Polling interval is hardcoded to `3000ms` in `status.html`; there is no environment variable for this.

Polling stops when status becomes terminal (`ready`, `error`, `aborted`) or when auth expires.

## Stage Updates (Backend Lifecycle)

The backend writes stage transitions via `update_project_status(...)` as generation progresses:

```python
# src/docsfy/main.py
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="cloning",
)

await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="planning",
)

await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="generating_pages",
    plan_json=json.dumps(plan),
)

await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="generating",
    owner=owner,
    current_stage="rendering",
    page_count=len(pages),
)

await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    status="ready",
    owner=owner,
    current_stage=None,
    last_commit_sha=commit_sha,
    page_count=page_count,
    plan_json=json.dumps(plan),
)
```

Up-to-date shortcut (no regeneration) is represented as `status="ready"` + `current_stage="up_to_date"`.

```python
# src/docsfy/main.py
if old_sha == commit_sha:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="ready",
        owner=owner,
        current_stage="up_to_date",
    )
    return
```

> **Warning:** Backend can emit `current_stage="incremental_planning"`, but the status page stage order only includes `cloning`, `planning`, `generating_pages`, and `rendering`, so that phase is shown generically.

## Activity Log Semantics

The activity log is reconstructed client-side from `status`, `current_stage`, `page_count`, and `plan_json`. It is not a server-side event stream.

```javascript
// src/docsfy/templates/status.html
var ICON_MAP = {
    done: 'icon-check',
    active: 'icon-spinner-sm',
    error: 'icon-x-circle',
    pending: 'icon-circle'
};

var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering'];
```

Behavior:

- On initial load and stage transitions: `buildInitialLog()` clears and rebuilds entries.
- On page count increase: the last active "Generating..." entry is converted to "Generated...", then next active page entry is appended.
- On completion (`ready`): log finalizes with:
  - `Rendered documentation site`
  - `Documentation ready!`
- On `up_to_date`: log is replaced with a single entry:
  - `Repository unchanged, docs already up to date`
- On `error`/`aborted`: active entry is marked as error and terminal failure entry is appended.

## Progress Bar Semantics

The status page uses `page_count` as numerator and `total_pages_from_plan` as denominator when available.

```javascript
// src/docsfy/templates/status.html
if (totalPagesFromPlan > 0) {
    var pct = Math.min(Math.round((newPageCount / totalPagesFromPlan) * 100), 100);
    progressBar.style.width = pct + '%';
    progressCount.textContent = newPageCount + ' / ' + totalPagesFromPlan + ' pages';
} else {
    progressCount.textContent = newPageCount + ' pages';
}
```

`page_count` is updated during page generation from cache file count:

```python
# src/docsfy/generator.py
existing_pages = len(list(cache_dir.glob("*.md")))
await update_project_status(
    project_name,
    ai_provider,
    ai_model,
    owner=owner,
    status="generating",
    page_count=existing_pages,
)
```

And forced regenerations reset count to zero:

```python
# src/docsfy/main.py
if force:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="generating",
        owner=owner,
        page_count=0,
    )
```

Progress completion behavior:

- On `ready`, UI forces progress bar to `100%` and label to `Complete`.
- If `plan_json` is unavailable, count shows `N pages` only; denominator and percentage are unknown.

> **Warning:** `page_count` reflects files present in the page cache, not strictly "new pages generated in this exact run." Incremental/cached runs can appear to jump.

> **Tip:** Use `force: true` when you want a fresh 0→N progress curve for reruns.

## Abort and Failure Monitoring

Abort action from the status page calls the variant-specific abort endpoint:

- `POST /api/projects/{name}/{provider}/{model}/abort`

On successful abort, backend writes:

- `status="aborted"`
- `error_message="Generation aborted by user"`
- `current_stage=None`

The status page then:

- stops polling
- switches log status to `Aborted`
- shows regenerate controls inline (provider/model/force + Regenerate)

If the server restarts mid-generation, startup logic converts orphaned `generating` projects to `error`:

```python
# src/docsfy/storage.py
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
```

> **Note:** This restart recovery is why a variant can move to `error` without a user-triggered abort or explicit generation exception in the live UI.


---

Source: abort-and-retry.md

# Abort and Retry Flows

docsfy handles abort and retry/regeneration as explicit state transitions for each variant (`project/provider/model`) and owner.

```python
# src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

A generation task is keyed by owner + variant so duplicate in-flight runs are blocked:

```python
# src/docsfy/main.py
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
async with _gen_lock:
    if gen_key in _generating:
        raise HTTPException(
            status_code=409,
            detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated",
        )
```

## Abort flow for active runs

### Endpoints

| Endpoint | Purpose |
|---|---|
| `POST /api/projects/{name}/{provider}/{model}/abort` | Abort a specific variant (recommended) |
| `POST /api/projects/{name}/abort` | Legacy/backward-compatible abort by project name |

> **Note:** The name-only abort endpoint is explicitly marked backward-compatible and aborts the first matching active run.

```python
# src/docsfy/main.py
@app.post("/api/projects/{name}/abort")
async def abort_generation(request: Request, name: str) -> dict[str, str]:
    """Abort generation for any variant of the given project name.

    Kept for backward compatibility. Finds the first active generation
    matching the project name.
    """
```

### What happens when abort is requested

1. Write access is required (`admin` or `user` role).
2. Ownership/access is verified.
3. The task is cancelled with `task.cancel()`.
4. Server waits up to 5 seconds for cancellation acknowledgment.
5. Variant status is persisted as `aborted` with an error message.

```python
# src/docsfy/main.py
task.cancel()
try:
    await asyncio.wait_for(task, timeout=5.0)
except asyncio.CancelledError:
    pass
except asyncio.TimeoutError as exc:
    raise HTTPException(
        status_code=409,
        detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.",
    ) from exc

await update_project_status(
    name,
    provider,
    model,
    status="aborted",
    owner=key_owner,
    error_message="Generation aborted by user",
    current_stage=None,
)
```

> **Warning:** Abort can return `409` (`Abort still in progress...`) if cancellation has not completed within 5 seconds. Retrying abort shortly is expected behavior.

### UI behavior during abort

On the status page and dashboard, running variants show an Abort button; the action uses a confirmation modal and calls the variant-specific abort API.

```javascript
// src/docsfy/templates/status.html
fetch('/api/projects/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL) + '/abort', { method: 'POST', credentials: 'same-origin', redirect: 'manual' })
```

```javascript
// src/docsfy/templates/_modal.html
function modalConfirm(title, body, danger) {
    return new Promise(function(resolve) {
        showModal({
            title: title, body: body, danger: danger,
            confirmText: danger ? 'Delete' : 'Confirm',
            cancelText: 'Cancel',
            onConfirm: function() { resolve(true); },
            onCancel: function() { resolve(false); },
        });
    });
}
```

## Retry/regeneration flow after error or abort

There is no dedicated `/retry` backend route. Retry/regeneration is implemented as a new `POST /api/generate` request, usually pre-filled from the failed/aborted variant.

```javascript
// src/docsfy/templates/status.html
var payload = { repo_url: repoUrl };
if (providerSelect) payload.ai_provider = providerSelect.value;
if (modelInput) payload.ai_model = modelInput.value;
if (forceCheckbox && forceCheckbox.checked) payload.force = true;

fetch('/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    credentials: 'same-origin',
    redirect: 'manual',
    body: JSON.stringify(payload)
})
```

Retry controls are only shown for `error` or `aborted` states:

```html
<!-- src/docsfy/templates/status.html -->
{% if project.status == 'error' or project.status == 'aborted' %}
<div class="regenerate-inline">
    <select class="form-select" id="retry-provider" aria-label="Provider for regeneration">...</select>
    <input type="text" class="form-input" id="retry-model" ...>
    <div class="form-checkbox-group">
        <input type="checkbox" id="retry-force">
        <label for="retry-force">Force</label>
    </div>
    <button class="btn btn-primary" id="btn-retry">Regenerate</button>
</div>
{% endif %}
```

> **Note:** If provider/model is changed during retry from the status page, the UI redirects to the new variant status URL.

## Force vs non-force regeneration

`force=true` clears cached pages and resets page count before regeneration:

```python
# src/docsfy/main.py
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="generating",
        owner=owner,
        page_count=0,
    )
```

Without force, docsfy can short-circuit to up-to-date if commit SHA is unchanged:

```python
# src/docsfy/main.py
if old_sha == commit_sha:
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="ready",
        owner=owner,
        current_stage="up_to_date",
    )
    return
```

Status UI explicitly surfaces this case:

```html
<!-- src/docsfy/templates/status.html -->
<span id="success-text">{% if project.current_stage == 'up_to_date' %}Documentation is already up to date — no changes since last generation.{% else %}Documentation generated successfully!{% endif %}</span>
```

> **Tip:** Use `Force` when you need a full refresh and do not want reuse of existing cached pages.

## Incremental regeneration behavior

If commit changed and previous plan exists, docsfy can ask the incremental planner which pages to regenerate.

```python
# src/docsfy/generator.py
if not success:
    logger.warning(f"[{project_name}] Incremental planner failed, regenerating all")
    return ["all"]

result = parse_json_list_response(output)
if result is None or not isinstance(result, list):
    return ["all"]
...
if not result:
    return ["all"]
```

`current_stage` values used through generation include:
- `cloning`
- `planning`
- `incremental_planning` (when applicable)
- `generating_pages`
- `rendering`
- `up_to_date` (ready without rebuild)

```javascript
// src/docsfy/templates/status.html
var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering'];
```

## Failure recovery and post-retry path

On startup, orphaned `generating` records are moved to `error`, which then enables regeneration controls.

```python
# src/docsfy/storage.py
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
```

Cancellation and hard failures during background generation are also persisted:

```python
# src/docsfy/main.py
except asyncio.CancelledError:
    await update_project_status(... status="aborted", error_message="Generation was cancelled", current_stage=None)
    raise
except Exception as exc:
    await update_project_status(... status="error", error_message=str(exc))
```

## Access control for abort/retry

Abort and regenerate both require write access, and abort additionally enforces ownership/grant checks.

```python
# src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    if request.state.role not in ("admin", "user"):
        raise HTTPException(status_code=403, detail="Write access required.")
```

```python
# src/docsfy/main.py
async def _check_ownership(...):
    if request.state.is_admin:
        return
    ...
    access = await get_project_access(project_name, project_owner=project_owner)
    if request.state.username in access:
        return
    raise HTTPException(status_code=404, detail="Not found")
```

Test coverage confirms viewer restriction:

```python
# tests/test_auth.py
response = await ac.post("/api/generate", json={...})
assert response.status_code == 403
assert "Write access required" in response.json()["detail"]
```

## Relevant configuration and automation

`AI_CLI_TIMEOUT` directly impacts failure timing (and therefore how often you hit retry/regeneration paths):

```bash
# .env.example
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

Automated tests are configured via `tox`:

```toml
# tox.toml
envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

Abort/retry expectations are also documented in end-to-end UI checks:

```markdown
# test-plans/e2e-ui-test-plan.md
- The status changes to `aborted`
- The error message shows "Generation aborted by user"
- The "Abort" button is replaced by regenerate controls (provider select, model input, force checkbox, and "Regenerate" button)
```

> **Warning:** Retry UI currently submits `repo_url` payloads. `GenerateRequest` accepts either `repo_url` or `repo_path` (not both), and `repo_url` is validated as a Git URL pattern. For local-path workflows, start a new generation with `repo_path` (admin-only) rather than relying on URL-based retry payloads.


---

Source: docs-view-and-download.md

# View and Download Generated Docs

docsfy exposes four read/download endpoints for generated documentation:

| Use case | Route | Resolution logic |
|---|---|---|
| View a specific variant | `/docs/{project}/{provider}/{model}/{path:path}` | Uses the exact `project/provider/model` variant |
| View latest ready variant | `/docs/{project}/{path:path}` | Picks the most recently generated **ready** variant |
| Download a specific variant | `/api/projects/{name}/{provider}/{model}/download` | Streams a `.tar.gz` for the exact variant |
| Download latest ready variant | `/api/projects/{name}/download` | Streams a `.tar.gz` for the latest ready variant |

> **Note:** If `path` is empty or `/`, docsfy serves `index.html`.

## Variant-specific docs route

Use this when you want deterministic docs for one provider/model pair.

```1379:1403:src/docsfy/main.py
@app.get("/docs/{project}/{provider}/{model}/{path:path}")
async def serve_variant_docs(
    request: Request,
    project: str,
    provider: str,
    model: str,
    path: str = "index.html",
) -> FileResponse:
    if not path or path == "/":
        path = "index.html"
    project = _validate_project_name(project)
    proj = await _resolve_project(
        request, project, ai_provider=provider, ai_model=model
    )
    # ...
    if not file_path.exists() or not file_path.is_file():
        raise HTTPException(status_code=404, detail="File not found")
    return FileResponse(file_path)
```

Examples:

- `/docs/test-repo/claude/opus/`
- `/docs/test-repo/claude/opus/index.html`
- `/docs/test-repo/claude/opus/introduction.html`

The dashboard uses this route for **View Docs**:

```1481:1485:src/docsfy/templates/dashboard.html
{% if variant.status == 'ready' %}
<div class="variant-actions">
    <a href="/docs/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/" target="_blank" class="btn btn-primary btn-sm">View Docs</a>
    <a href="/api/projects/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/download" class="btn btn-secondary btn-sm">Download</a>
```

> **Tip:** URL-encode `provider` and `model` path segments in scripts/clients (the UI already does this with `urlencode`).

## Latest-variant docs route

Use this when you want “the newest ready docs” without specifying provider/model.

```1406:1420:src/docsfy/main.py
@app.get("/docs/{project}/{path:path}")
async def serve_docs(
    request: Request, project: str, path: str = "index.html"
) -> FileResponse:
    """Serve the most recently generated variant."""
    if not path or path == "/":
        path = "index.html"
    project = _validate_project_name(project)
    if request.state.is_admin:
        latest = await get_latest_variant(project)
    else:
        latest = await get_latest_variant(project, owner=request.state.username)
    if not latest:
        raise HTTPException(status_code=404, detail="No docs available")
```

“Latest” is defined in storage as `status = 'ready'` ordered by `last_generated DESC`:

```552:566:src/docsfy/storage.py
async def get_latest_variant(
    name: str, owner: str | None = None
) -> dict[str, str | int | None] | None:
    """Get the most recently generated ready variant for a repo."""
    # ...
    cursor = await db.execute(
        "SELECT * FROM projects WHERE name = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1",
        (name,),
    )
```

> **Warning:** For non-admin users, latest routes are owner-scoped (`owner=request.state.username`). If a project is shared with you by access grant, use the variant-specific route instead.

## Download `.tar.gz` archives

### Download a specific variant

```1074:1112:src/docsfy/main.py
@app.get("/api/projects/{name}/{provider}/{model}/download")
async def download_variant(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> StreamingResponse:
    # ...
    if project["status"] != "ready":
        raise HTTPException(status_code=400, detail="Variant not ready")
    # ...
    with tarfile.open(tar_path, mode="w:gz") as tar:
        tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}")
    return StreamingResponse(
        _stream_and_cleanup(),
        media_type="application/gzip",
        headers={
            "Content-Disposition": f'attachment; filename="{name}-{provider}-{model}-docs.tar.gz"'
        },
    )
```

Behavior:

- Requires variant status `ready`
- Returns `Content-Type: application/gzip`
- Downloads as `{name}-{provider}-{model}-docs.tar.gz`
- Archive root directory is `{name}-{provider}-{model}/`

### Download latest ready variant

```1158:1194:src/docsfy/main.py
@app.get("/api/projects/{name}/download")
async def download_project(request: Request, name: str) -> StreamingResponse:
    # ...
    if request.state.is_admin:
        latest = await get_latest_variant(name)
    else:
        latest = await get_latest_variant(name, owner=request.state.username)
    if not latest:
        raise HTTPException(status_code=404, detail=f"No ready variant for '{name}'")
    # ...
    with tarfile.open(tar_path, mode="w:gz") as tar:
        tar.add(str(site_dir), arcname=name)
    return StreamingResponse(
        _stream_and_cleanup(),
        media_type="application/gzip",
        headers={"Content-Disposition": f'attachment; filename="{name}-docs.tar.gz"'},
    )
```

Behavior:

- Picks latest ready variant
- Downloads as `{name}-docs.tar.gz`
- Archive root directory is `{name}/`

### CLI download examples

```bash
# Specific variant
curl -L -OJ \
  -H "Authorization: Bearer ${DOCSFY_API_KEY}" \
  "http://localhost:8000/api/projects/test-repo/claude/opus/download"

# Latest ready variant
curl -L -OJ \
  -H "Authorization: Bearer ${DOCSFY_API_KEY}" \
  "http://localhost:8000/api/projects/test-repo/download"
```

> **Tip:** `-OJ` tells `curl` to use the server-provided filename from `Content-Disposition`.

## What is inside the archive

Generated site content comes from `render_site()`, which writes static assets and pages into the variant `site` directory:

```243:290:src/docsfy/renderer.py
index_html = render_index(project_name, tagline, navigation, repo_url=repo_url)
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
# ...
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
# ...
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)
# Generate llms.txt files
llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

Typical archive contents include:

- `index.html`
- `*.html` rendered pages
- `*.md` source markdown pages
- `search-index.json`
- `llms.txt` and `llms-full.txt`
- `assets/*` static CSS/JS
- `.nojekyll`

## Auth, access, and error behavior

API routes return `401` when unauthenticated; browser routes redirect to `/login`:

```151:155:src/docsfy/main.py
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)
```

Project names are validated before route resolution:

```73:77:src/docsfy/main.py
def _validate_project_name(name: str) -> str:
    """Validate project name to prevent path traversal."""
    if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name):
        raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'")
```

Common responses:

- `400`:
  - variant download when status is not ready (`"Variant not ready"`)
  - invalid project name
- `401`:
  - unauthenticated API requests
- `403`:
  - denied path traversal attempt (`"Access denied"`)
- `404`:
  - docs file missing
  - no latest ready docs (`"No docs available"`)
  - no ready variant for latest download
- `409`:
  - admin ambiguity when multiple owners have same `project/provider/model`

## Runtime configuration relevant to these routes

Default container mapping serves docsfy on port `8000`:

```1:10:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

Cookie/security settings are environment-driven:

```1:8:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

```27:28:.env.example
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

> **Note:** Models such as `claude-opus-4-6[1m]` contain characters that should be URL-encoded when used in path segments.

## Validation and test automation

The integration test explicitly validates all four routes:

```124:146:tests/test_integration.py
response = await client.get("/docs/test-repo/claude/opus/index.html")
assert response.status_code == 200
# ...
response = await client.get("/docs/test-repo/index.html")
assert response.status_code == 200
# ...
response = await client.get("/api/projects/test-repo/claude/opus/download")
assert response.status_code == 200
assert response.headers["content-type"] == "application/gzip"
# ...
response = await client.get("/api/projects/test-repo/download")
assert response.status_code == 200
assert response.headers["content-type"] == "application/gzip"
```

Repository test entrypoint:

```5:7:tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** No checked-in GitHub Actions or other CI workflow manifests are present; test automation is defined via `tox.toml`.


---

Source: generated-site-features.md

# Generated Site Features

docsfy-generated sites ship with a built-in front-end feature bundle from `src/docsfy/static/`, copied into each output site's `assets/` directory during rendering.

```python
# src/docsfy/renderer.py
if STATIC_DIR.exists():
    for static_file in STATIC_DIR.iterdir():
        if static_file.is_file():
            shutil.copy2(static_file, assets_dir / static_file.name)

search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)
```

```html
<!-- src/docsfy/templates/page.html -->
<script src="assets/theme.js"></script>
<script src="assets/search.js"></script>
<script src="assets/copy.js"></script>
<script src="assets/callouts.js"></script>
<script src="assets/scrollspy.js"></script>
<script src="assets/codelabels.js"></script>
<script src="assets/github.js"></script>
```

---

## Search Modal

The site uses a client-side modal search (`Cmd/Ctrl+K`) backed by `search-index.json`.

- Opens via keyboard shortcut, top-bar Search button, or sidebar search input focus.
- Matches against page title and markdown content.
- Limits results to 10 entries.
- Supports arrow navigation and Enter to open the selected result.

```javascript
// src/docsfy/static/search.js
fetch('search-index.json').then(function(r) { return r.json(); })
  .then(function(data) { index = data; }).catch(function() {});

document.addEventListener('keydown', function(e) {
  if ((e.metaKey || e.ctrlKey) && e.key === 'k') {
    e.preventDefault();
    openModal();
  }
  if (e.key === 'Escape') closeModal();
});

var matches = index.filter(function(item) {
  return item.title.toLowerCase().includes(q) || item.content.toLowerCase().includes(q);
}).slice(0, 10);
```

```python
# src/docsfy/renderer.py
index.append(
    {
        "slug": slug,
        "title": title_map.get(slug, slug),
        "content": content[:2000],
    }
)
```

> **Tip:** Search content is truncated to the first 2000 characters per page, so placing key terms early in each page improves discoverability.

---

## Theme Toggle (Dark/Light)

Theme state is controlled through the `data-theme` attribute on `<html>` and persisted in `localStorage` under `theme`.

```javascript
// src/docsfy/static/theme.js
var stored = getTheme();
if (stored) {
  document.documentElement.setAttribute('data-theme', stored);
} else {
  document.documentElement.setAttribute('data-theme', 'dark');
}
if (toggle) toggle.addEventListener('click', function() {
  var current = document.documentElement.getAttribute('data-theme');
  var next = current === 'dark' ? 'light' : 'dark';
  document.documentElement.setAttribute('data-theme', next);
  setTheme(next);
});
```

```css
/* src/docsfy/static/style.css */
[data-theme="dark"] .icon-sun { display: block; }
[data-theme="dark"] .icon-moon { display: none; }
```

> **Note:** Generated pages default to dark mode (`<html ... data-theme="dark">`) and switch to the saved preference when available.

---

## Callouts

Callouts are authored as markdown blockquotes with a bold first label (`Note`, `Warning`, `Tip`, etc.). A post-render script maps those labels to callout classes.

```javascript
// src/docsfy/static/callouts.js
var text = firstStrong.textContent.toLowerCase().replace(':', '').trim();

if (text === 'note' || text === 'info') {
  type = 'note';
} else if (text === 'warning' || text === 'caution') {
  type = 'warning';
} else if (text === 'tip' || text === 'hint') {
  type = 'tip';
} else if (text === 'danger' || text === 'error') {
  type = 'danger';
} else if (text === 'important') {
  type = 'important';
}

if (type) {
  bq.classList.add('callout', 'callout-' + type);
}
```

```css
/* src/docsfy/static/style.css */
blockquote.callout-note { border-left: 4px solid #3b82f6; background: rgba(59, 130, 246, 0.08); }
blockquote.callout-warning { border-left: 4px solid #f59e0b; background: rgba(245, 158, 11, 0.08); }
blockquote.callout-tip { border-left: 4px solid #10b981; background: rgba(16, 185, 129, 0.08); }
```

Use the same authoring format enforced in prompt generation:

```text
# src/docsfy/prompts.py
- Notes: > **Note:** text
- Warnings: > **Warning:** text
- Tips: > **Tip:** text
```

---

## Code Copy Buttons

Every `<pre>` block gets a `Copy` button automatically at runtime.

- Uses Clipboard API when available.
- Falls back to `document.execCommand('copy')` for compatibility.
- Shows temporary feedback (`Copied!` / `Failed`).

```javascript
// src/docsfy/static/copy.js
document.querySelectorAll('pre').forEach(function(pre) {
  var btn = document.createElement('button');
  btn.className = 'copy-btn';
  btn.textContent = 'Copy';
  btn.addEventListener('click', function() {
    var code = pre.querySelector('code');
    var text = code ? code.textContent : pre.textContent;
    if (navigator.clipboard && navigator.clipboard.writeText) {
      navigator.clipboard.writeText(text).then(function() {
        btn.textContent = 'Copied!';
        setTimeout(function() { btn.textContent = 'Copy'; }, 2000);
      }).catch(function() {
        fallbackCopy(text, btn);
      });
    } else {
      fallbackCopy(text, btn);
    }
  });
  pre.style.position = 'relative';
  pre.appendChild(btn);
});
```

```css
/* src/docsfy/static/style.css */
.copy-btn { opacity: 0; }
pre:hover .copy-btn { opacity: 1; }

@media (hover: none) {
  .copy-btn { opacity: 0.7; }
}
```

---

## Table of Contents (TOC)

TOC generation is handled during markdown conversion and rendered only when headings are present.

```python
# src/docsfy/renderer.py
md = markdown.Markdown(
    extensions=["fenced_code", "codehilite", "tables", "toc"],
    extension_configs={
        "codehilite": {"css_class": "highlight", "guess_lang": False},
        "toc": {"toc_depth": "2-3"},
    },
)
content_html = _sanitize_html(md.convert(md_text))
toc_html = getattr(md, "toc", "")
```

```html
<!-- src/docsfy/templates/page.html -->
{% if toc %}
<aside class="toc-sidebar">
    <div class="toc-container">
        <h3>On this page</h3>
        {{ toc | safe }}
    </div>
</aside>
{% endif %}
```

```javascript
// src/docsfy/static/scrollspy.js
var tocLinks = document.querySelectorAll('.toc-container a');
...
current.link.classList.add('active');
```

```css
/* src/docsfy/static/style.css */
@media (min-width: 1280px) {
    .toc-sidebar { display: block; }
    .content { margin-right: 220px; }
}
.toc-container ul ul { display: none; }
```

> **Warning:** `scrollspy.js` applies `active`, while the stylesheet defines `.toc-container a.toc-active`; align class names if you want a styled active-state indicator.

---

## GitHub Metadata (Repo Link + Stars)

When `repo_url` is available in the generated plan, pages render a GitHub button and lazily fetch star count from GitHub API.

```python
# src/docsfy/main.py
plan["repo_url"] = source_url
```

```python
# src/docsfy/renderer.py
repo_url: str = plan.get("repo_url", "")
...
page_html = render_page(..., repo_url=repo_url)
```

```html
<!-- src/docsfy/templates/page.html -->
{% if repo_url %}
<a href="{{ repo_url }}" ... id="github-link" data-repo-url="{{ repo_url }}">
    ...
    <span class="github-stars" id="github-stars"></span>
</a>
{% endif %}
```

```javascript
// src/docsfy/static/github.js
var match = repoUrl.match(/github\.com[/:]([^/]+)\/([^/.]+)/);
...
fetch('https://api.github.com/repos/' + owner + '/' + repo)
  .then(function(response) {
    if (!response.ok) return null;
    return response.json();
  })
  .then(function(data) {
    if (!data || typeof data.stargazers_count === 'undefined') return;
    var count = data.stargazers_count;
    var display;
    if (count >= 1000) {
      display = (count / 1000).toFixed(1).replace(/\.0$/, '') + 'k';
    } else {
      display = count.toString();
    }
    starsEl.textContent = display;
    starsEl.title = count.toLocaleString() + ' stars';
  })
  .catch(function() {
    // Silently fail - star count is a nice-to-have
  });
```

> **Note:** If `repo_url` is empty, the GitHub link and star counter are not rendered.
>
> **Tip:** The regex supports both `https://github.com/org/repo(.git)` and `git@github.com:org/repo.git` style URLs.

---

## Verification Coverage (Tests + Pipeline)

The rendering pipeline has unit coverage for generated artifacts and a defined test command in `tox`.

```python
# tests/test_renderer.py
render_site(plan=plan, pages=pages, output_dir=output_dir)
assert (output_dir / "search-index.json").exists()

index = json.loads((output_dir / "search-index.json").read_text())
assert index[0]["slug"] == "intro"
assert index[0]["title"] == "Intro"
```

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

Manual UI checks for these generated-site features are also documented in `test-plans/e2e-ui-test-plan.md` (see “Test 8: Generated Docs Quality”).


---

Source: incremental-regeneration.md

## What Is Tracked

Generation metadata (including commit SHA) is stored per variant (`name`, `ai_provider`, `ai_model`, `owner`) in SQLite:

```57:73:src/docsfy/storage.py
            CREATE TABLE IF NOT EXISTS projects (
                name TEXT NOT NULL,
                ai_provider TEXT NOT NULL DEFAULT '',
                ai_model TEXT NOT NULL DEFAULT '',
                owner TEXT NOT NULL DEFAULT '',
                repo_url TEXT NOT NULL,
                status TEXT NOT NULL DEFAULT 'generating',
                current_stage TEXT,
                last_commit_sha TEXT,
                last_generated TEXT,
                page_count INTEGER DEFAULT 0,
                error_message TEXT,
                plan_json TEXT,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                PRIMARY KEY (name, ai_provider, ai_model, owner)
            )
```

> **Note:** Incremental behavior is variant-scoped, not just project-scoped. Different providers/models maintain independent commit and cache state.

## Commit Diff Detection

When generation starts (and `force` is not set), `docsfy` compares the stored commit SHA to the current repository SHA.

If the SHA is identical, it exits early as `up_to_date`:

```850:868:src/docsfy/main.py
        if existing and existing.get("last_generated"):
            old_sha = (
                str(existing["last_commit_sha"])
                if existing.get("last_commit_sha")
                else None
            )
            if old_sha == commit_sha:
                logger.info(
                    f"[{project_name}] Project is up to date at {commit_sha[:8]}"
                )
                await update_project_status(
                    project_name,
                    ai_provider,
                    ai_model,
                    status="ready",
                    owner=owner,
                    current_stage="up_to_date",
                )
                return
```

If SHAs differ, it computes file-level diffs using Git:

```48:73:src/docsfy/repository.py
def get_changed_files(repo_path: Path, old_sha: str, new_sha: str) -> list[str] | None:
    """Get list of files changed between two commits.

    Returns None on error (caller should fall back to full regeneration),
    or an empty list when there are no changes.
    """
    if not re.match(r"^[0-9a-fA-F]{4,64}$", old_sha) or not re.match(
        r"^[0-9a-fA-F]{4,64}$", new_sha
    ):
        logger.warning("Invalid SHA format")
        return None
    try:
        result = subprocess.run(
            ["git", "diff", "--name-only", old_sha, new_sha],
            cwd=repo_path,
            capture_output=True,
            text=True,
            timeout=30,
        )
    except (subprocess.TimeoutExpired, OSError) as exc:
        logger.warning(f"Failed to get diff: {exc}")
        return None
    if result.returncode != 0:
        logger.warning(f"Failed to get diff: {result.stderr}")
        return None
    return [f.strip() for f in result.stdout.strip().split("\n") if f.strip()]
```

## Page-Level Cache Invalidation

After diff detection, the system runs an incremental planner and invalidates only selected cached pages (by slug), then reuses all other page caches.

```891:955:src/docsfy/main.py
    if old_sha and old_sha != commit_sha and not force and existing:
        changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
        if changed_files is None:
            # Error getting diff — fall back to full regeneration
            use_cache = False
        elif not changed_files:
            # Commits differ but tree is identical — nothing to regenerate
            await update_project_status(
                project_name,
                ai_provider,
                ai_model,
                status="ready",
                owner=owner,
                current_stage="up_to_date",
                last_commit_sha=commit_sha,
            )
            return
        elif changed_files:
            existing_plan_json = existing.get("plan_json")
            if existing_plan_json:
                try:
                    existing_plan = json.loads(str(existing_plan_json))
                    await update_project_status(
                        project_name,
                        ai_provider,
                        ai_model,
                        status="generating",
                        owner=owner,
                        current_stage="incremental_planning",
                    )
                    pages_to_regen = await run_incremental_planner(
                        repo_dir,
                        project_name,
                        ai_provider,
                        ai_model,
                        changed_files,
                        existing_plan,
                        ai_cli_timeout,
                    )
                    if pages_to_regen != ["all"]:
                        # Delete only the cached pages that need regeneration
                        for slug in pages_to_regen:
                            # Validate slug to prevent path traversal
                            if (
                                "/" in slug
                                or "\\" in slug
                                or ".." in slug
                                or slug.startswith(".")
                            ):
                                logger.warning(
                                    f"[{project_name}] Skipping invalid slug from incremental planner: {slug}"
                                )
                                continue
                            cache_file = cache_dir / f"{slug}.md"
                            # Extra safety: ensure the resolved path is inside cache_dir
                            try:
                                cache_file.resolve().relative_to(cache_dir.resolve())
                            except ValueError:
                                logger.warning(
                                    f"[{project_name}] Path traversal attempt in slug: {slug}"
                                )
                                continue
                            if cache_file.exists():
                                cache_file.unlink()
                        use_cache = True
```

Page cache entries are slug-based markdown files:

```89:114:src/docsfy/generator.py
    cache_file = cache_dir / f"{slug}.md"
    if use_cache and cache_file.exists():
        logger.debug(f"[{_label}] Using cached page: {slug}")
        return cache_file.read_text(encoding="utf-8")

    prompt = build_page_prompt(
        project_name=repo_path.name, page_title=title, page_description=description
    )
    # Build CLI flags based on provider
    cli_flags = ["--trust"] if ai_provider == "cursor" else None
    success, output = await call_ai_cli(
        prompt=prompt,
        cwd=repo_path,
        ai_provider=ai_provider,
        ai_model=ai_model,
        ai_cli_timeout=ai_cli_timeout,
        cli_flags=cli_flags,
    )
    if not success:
        logger.warning(f"[{_label}] Failed to generate page '{slug}': {output}")
        output = f"# {title}\n\n*Documentation generation failed. Please re-run.*"

    output = _strip_ai_preamble(output)
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file.write_text(output, encoding="utf-8")
```

Cache directory resolution:

```527:530:src/docsfy/storage.py
def get_project_cache_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
```

> **Tip:** Because cache is per slug (`{slug}.md`), incremental regeneration is fastest when page slugs remain stable across planner runs.

## Fallback to Full Regeneration

There are three fallback signals in code:

1. Diff failure (`changed_files is None`)
2. Incremental planner failure / parse failure
3. Incremental planner returning unusable output

Incremental planner fallback behavior:

```229:239:src/docsfy/generator.py
    if not success:
        logger.warning(f"[{project_name}] Incremental planner failed, regenerating all")
        return ["all"]

    result = parse_json_list_response(output)
    if result is None or not isinstance(result, list):
        return ["all"]
    # Validate all items are strings
    result = [item for item in result if isinstance(item, str)]
    if not result:
        return ["all"]
```

Planner prompt contract includes both `["all"]` and `[]` outputs:

```56:63:src/docsfy/prompts.py
Which pages from the existing plan need to be regenerated based on the changed files?
Output a JSON array of page slugs that need regeneration.

CRITICAL: Output ONLY a JSON array of strings. No explanation.
Example: ["introduction", "api-reference", "configuration"]
If all pages need regeneration, output: ["all"]
If no pages need regeneration, output: []
```

Forced full regeneration is explicit and clears cache first:

```832:845:src/docsfy/main.py
    if force:
        cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
        if cache_dir.exists():
            shutil.rmtree(cache_dir)
            logger.info(f"[{project_name}] Cleared cache (force=True)")
        # Reset page count so API shows 0 during regeneration
        await update_project_status(
            project_name,
            ai_provider,
            ai_model,
            status="generating",
            owner=owner,
            page_count=0,
        )
```

And `force` is exposed at API model level:

```18:20:src/docsfy/models.py
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )
```

Dashboard sends `force` in generation requests:

```2043:2047:src/docsfy/templates/dashboard.html
                var body = {
                    repo_url: repoUrl,
                    ai_provider: provider,
                    force: force
                };
```

> **Warning:** For non-force runs, `generate_all_pages` is called with `use_cache=use_cache if use_cache else not force`, which evaluates to `True` whenever `force` is `False`. In practice, this means true “full regeneration” is guaranteed when `force=true` (cache is deleted), while automatic fallback branches depend on whether cache files were invalidated/removed first.

## Runtime and Deployment Impact on Cache

Cache and metadata persist when `/data` is mounted:

```7:13:docker-compose.yaml
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

> **Warning:** Remote repositories are cloned shallow (`--depth 1`), which can prevent diffing against older stored SHAs if that commit is not present locally.

```25:27:src/docsfy/repository.py
    result = subprocess.run(
        ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)],
```

## Test and Pipeline Coverage

Key tests validate incremental/cache behaviors:

- Diff outcomes (`list`, `None`, empty list) in `tests/test_repository.py`
- Cache hit behavior in `tests/test_generator.py`
- Incremental planner fallback to `["all"]` in `tests/test_generator.py`

```85:124:tests/test_repository.py
def test_get_changed_files_success(tmp_path: Path) -> None:
    from docsfy.repository import get_changed_files

    with patch("docsfy.repository.subprocess.run") as mock_run:
        mock_run.return_value = MagicMock(
            returncode=0,
            stdout="src/main.py\nsrc/utils.py\nREADME.md\n",
            stderr="",
        )
        files = get_changed_files(tmp_path, "abc123", "def456")

    assert files == ["src/main.py", "src/utils.py", "README.md"]
    call_args = mock_run.call_args
    assert "diff" in call_args.args[0]
    assert "--name-only" in call_args.args[0]
    assert "abc123" in call_args.args[0]
    assert "def456" in call_args.args[0]
```

```103:123:tests/test_generator.py
async def test_generate_page_uses_cache(tmp_path: Path) -> None:
    from docsfy.generator import generate_page

    cache_dir = tmp_path / "cache"
    cache_dir.mkdir()
    cached = cache_dir / "introduction.md"
    cached.write_text("# Cached content")

    md = await generate_page(
        repo_path=tmp_path,
        slug="introduction",
        title="Introduction",
        description="Overview",
        cache_dir=cache_dir,
        ai_provider="claude",
        ai_model="opus",
        use_cache=True,
    )

    assert md == "# Cached content"
```

```144:183:tests/test_generator.py
async def test_run_incremental_planner_returns_all_on_failure(
    tmp_path: Path, sample_plan: dict
) -> None:
    from docsfy.generator import run_incremental_planner

    with patch(
        "docsfy.generator.call_ai_cli",
        return_value=(False, "AI error"),
    ):
        result = await run_incremental_planner(
            repo_path=tmp_path,
            project_name="test-repo",
            ai_provider="claude",
            ai_model="opus",
            changed_files=["src/main.py"],
            existing_plan=sample_plan,
        )

    assert result == ["all"]
```

Project automation used for CI-style validation in-repo:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```43:60:.pre-commit-config.yaml
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2
    hooks:
      - id: ruff
      - id: ruff-format

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
    hooks:
      - id: gitleaks

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.19.1
    hooks:
      - id: mypy
        exclude: (tests/)
        additional_dependencies:
          [types-requests, types-PyYAML, types-colorama, types-aiofiles, pydantic, types-Markdown]
```


---

Source: authentication-model.md

# Authentication Model

`docsfy` uses a single middleware gate for all requests and supports two authentication mechanisms:

- **Bearer token auth** for API/automation clients
- **Session-cookie auth** for browser/UI flows

## Authentication Gate and Evaluation Order

Every request passes through `AuthMiddleware`. Only three paths bypass auth.

```python
class AuthMiddleware(BaseHTTPMiddleware):
    """Authenticate every request via Bearer token or session cookie."""

    # Paths that do not require authentication
    _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

    async def dispatch(
        self, request: Request, call_next: RequestResponseEndpoint
    ) -> Response:
        if request.url.path in self._PUBLIC_PATHS:
            return await call_next(request)

        settings = get_settings()
        user = None
        is_admin = False
        username = ""

        # 1. Check Authorization header (API clients)
        auth_header = request.headers.get("authorization", "")
        if auth_header.startswith("Bearer "):
            token = auth_header[7:]
            if token == settings.admin_key:
                is_admin = True
                username = "admin"
            else:
                user = await get_user_by_key(token)

        # 2. Check session cookie (browser) -- opaque session token
        if not user and not is_admin:
            session_token = request.cookies.get("docsfy_session")
            if session_token:
                session = await get_session(session_token)
                if session:
                    is_admin = bool(session["is_admin"])
                    username = str(session["username"])
                    # Fix 8: For DB users (not ADMIN_KEY admin), verify user still exists
                    if username != "admin":
                        user = await get_user_by_username(username)
                        if not user:
                            # User was deleted since session was created
                            if request.url.path.startswith("/api/"):
                                return JSONResponse(
                                    status_code=401, content={"detail": "Unauthorized"}
                                )
                            return RedirectResponse(url="/login", status_code=302)

        if not user and not is_admin:
            # Not authenticated
            if request.url.path.startswith("/api/"):
                return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
            return RedirectResponse(url="/login", status_code=302)
```

> **Note:** Bearer auth is checked first. If Bearer fails (or is absent), middleware falls back to `docsfy_session`.

## Bearer Token Flow

Bearer tokens are accepted from the `Authorization` header (`Bearer <token>`):

- If token equals `ADMIN_KEY`, request is authenticated as built-in admin user (`admin`).
- Otherwise, token is treated as a user API key and looked up in the `users` table.
- User API keys are not stored raw; they are HMAC-hashed using `ADMIN_KEY` as secret.

```python
def hash_api_key(key: str, hmac_secret: str = "") -> str:
    """Hash an API key with HMAC-SHA256 for storage.

    Uses ADMIN_KEY as the HMAC secret so that even if the source is read,
    keys cannot be cracked without the environment secret.
    """
    # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
    # invalidate all existing api_key_hash values, requiring all users to
    # regenerate their API keys.
    secret = hmac_secret or os.getenv("ADMIN_KEY", "")
    if not secret:
        msg = "ADMIN_KEY environment variable is required for key hashing"
        raise RuntimeError(msg)
    return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()


async def get_user_by_key(api_key: str) -> dict[str, str | int | None] | None:
    """Look up a user by their raw API key."""
    key_hash = hash_api_key(api_key)
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        cursor = await db.execute(
            "SELECT * FROM users WHERE api_key_hash = ?", (key_hash,)
        )
        row = await cursor.fetchone()
        return dict(row) if row else None
```

> **Tip:** For scripts and CI jobs, prefer Bearer auth over login/cookies to keep requests stateless.

> **Warning:** Rotating `ADMIN_KEY` invalidates existing user API key hashes by design.

## Session-Cookie Flow

Browser login uses form fields `username` + `api_key` and creates an opaque session cookie.

```python
@app.post("/login", response_model=None)
async def login(request: Request) -> RedirectResponse | HTMLResponse:
    """Authenticate with username + API key and set a session cookie."""
    form = await request.form()
    username = str(form.get("username", ""))
    api_key = str(form.get("api_key", ""))
    settings = get_settings()

    is_admin = False
    authenticated = False

    # Check admin -- username must be "admin" and key must match
    if username == "admin" and api_key == settings.admin_key:
        is_admin = True
        authenticated = True
    else:
        # Check user key -- verify username matches the key's owner
        user = await get_user_by_key(api_key)
        if user and user["username"] == username:
            authenticated = True
            is_admin = user.get("role") == "admin"

    if authenticated:
        session_token = await create_session(username, is_admin=is_admin)
        response = RedirectResponse(url="/", status_code=302)
        response.set_cookie(
            "docsfy_session",
            session_token,
            httponly=True,
            samesite="strict",
            secure=settings.secure_cookies,
            max_age=SESSION_TTL_SECONDS,
        )
        return response
```

The login UI labels this field as password, but backend field name is still `api_key`:

```html
<!-- Field name="api_key" matches the POST handler in main.py (form.get("api_key")).
     Label says "Password" for UX, but the backend field name is api_key. -->
<label for="api_key">Password</label>
<input type="password" id="api_key" name="api_key" placeholder="Enter your password" required>
```

Session tokens are opaque and stored hashed, with an 8-hour TTL:

```python
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600

def _hash_session_token(token: str) -> str:
    """Hash a session token for storage."""
    return hashlib.sha256(token.encode()).hexdigest()

async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )
        await db.commit()
    return token
```

Logout clears both DB session state and browser cookie:

```python
@app.get("/logout")
async def logout(request: Request) -> RedirectResponse:
    """Clear the session cookie, delete session from DB, and redirect to login."""
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        await delete_session(session_token)
    settings = get_settings()
    response = RedirectResponse(url="/login", status_code=302)
    response.delete_cookie(
        "docsfy_session",
        httponly=True,
        samesite="strict",
        secure=settings.secure_cookies,
    )
    return response
```

> **Warning:** `secure_cookies` defaults to `True`; browser session cookies will not be sent over plain HTTP.

## Public Paths

Only these paths are unauthenticated:

- `/login`
- `/login/`
- `/health`

`/health` is also used by runtime health checks:

```yaml
services:
  docsfy:
    env_file: .env
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

## Protected Endpoint Behavior

### Unauthenticated requests

Behavior is path-class dependent:

- Any protected non-API route (for example `/`, `/status/...`, `/docs/...`, `/admin`) -> `302` redirect to `/login`
- Any protected API route under `/api/*` -> `401` JSON `{ "detail": "Unauthorized" }`

Verified in tests:

```python
async def test_login_redirect_when_unauthenticated(
    unauthed_client: AsyncClient,
) -> None:
    """Browser requests to protected pages should redirect to /login."""
    response = await unauthed_client.get("/", follow_redirects=False)
    assert response.status_code == 302
    assert response.headers["location"] == "/login"


async def test_api_returns_401_when_unauthenticated(
    unauthed_client: AsyncClient,
) -> None:
    """API requests without auth should return 401."""
    response = await unauthed_client.get("/api/status")
    assert response.status_code == 401
    assert response.json()["detail"] == "Unauthorized"
```

### Role-based authorization

`docsfy` enforces role checks after authentication:

```python
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )

def _require_admin(request: Request) -> None:
    """Raise 403 if the user is not an admin."""
    if not request.state.is_admin:
        raise HTTPException(status_code=403, detail="Admin access required")
```

- `viewer` users are read-only for write endpoints (`/api/generate`, delete/abort endpoints).
- `admin` role is required for `/admin` and `/api/admin/*`.
- `viewer` can still change their own password via `/api/me/rotate-key` (by explicit design).

```python
# Don't call _require_write_access -- viewers should be able to change their password
if request.state.is_admin and not request.state.user:
    raise HTTPException(
        status_code=400,
        detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
    )
```

### Ownership and resource visibility

Project-scoped access checks intentionally return `404` (not `403`) when a user lacks access, to avoid leaking resource existence:

```python
async def _check_ownership(
    request: Request, project_name: str, project: dict[str, Any]
) -> None:
    """Raise 404 if the requesting user does not own the project (unless admin)."""
    if request.state.is_admin:
        return
    project_owner = str(project.get("owner", ""))
    if project_owner == request.state.username:
        return
    # Check if user has been granted access (scoped by project_owner)
    access = await get_project_access(project_name, project_owner=project_owner)
    if request.state.username in access:
        return
    raise HTTPException(status_code=404, detail="Not found")
```

```python
# GET /api/projects/{name} - returns 404 to avoid leaking existence
response = await ac.get("/api/projects/secret-proj")
assert response.status_code == 404
```

### Additional protected behavior

- Non-admin use of `repo_path` in generation is denied (`403`).
- Admin variant resolution can return `409` if multiple owners exist for same project/provider/model without disambiguation.
- If a session belongs to a deleted DB user, middleware invalidates access and returns `401` (API) or `302` (UI redirect).

> **Warning:** In-app login rate limiting is marked TODO; enforce rate limiting at reverse proxy/load balancer level.

## Configuration

Environment-level auth settings:

```env
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Application defaults:

```python
class Settings(BaseSettings):
    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

Startup hard-fails if `ADMIN_KEY` is missing or too short:

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Tip:** For local non-TLS development, set `SECURE_COOKIES=false` in `.env` so browser sessions work over `http://`.

## Test and Automation Coverage

Auth behavior is regression-tested in `tests/test_auth.py`, and the repo test command is defined in `tox.toml`:

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** No dedicated GitHub/GitLab/Jenkins workflow files are present in this repository; automated auth verification currently depends on the tox/pytest path above.


---

Source: roles-and-permissions.md

# Roles and Permissions

docsfy uses role-based access control (RBAC) across both UI and API layers. Roles are stored in `src/docsfy/storage.py` and enforced in `src/docsfy/main.py`.

## Role Definitions

| Role | Intended use | Write-protected APIs | Admin APIs |
|---|---|---|---|
| `admin` | Full platform control | Allowed | Allowed |
| `user` | Normal project owner/contributor | Allowed | Denied |
| `viewer` | Read-only docs/project access | Denied | Denied |

```python
VALID_ROLES = frozenset({"admin", "user", "viewer"})
```

> **Note:** There are two admin paths in implementation:
> 1) the environment `ADMIN_KEY` account (`username == "admin"`), and  
> 2) a database user whose `role == "admin"`.

```python
# Determine the role
if is_admin:
    role = "admin"
    if not username:
        username = "admin"
else:
    assert user is not None  # guaranteed by the guard above
    role = str(user.get("role", "user"))
    username = str(user["username"])
    # Fix 6: DB user with admin role gets admin privileges
    if role == "admin":
        is_admin = True
```

## Authentication and Request Enforcement

Authentication accepts:

- `Authorization: Bearer <token>` (API clients)
- `docsfy_session` cookie (browser sessions)

Public routes are only `/login` and `/health`.

```python
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)
```

So unauthenticated behavior is:

- **UI routes** → `302` redirect to `/login`
- **API routes** → `401 {"detail":"Unauthorized"}`

## UI Capability Matrix

| UI action | admin | user | viewer | How it is enforced |
|---|---|---|---|---|
| Open dashboard (`/`) | ✅ | ✅ | ✅ | Auth middleware |
| See `Admin` link in header | ✅ | ❌ | ❌ | `dashboard.html` conditional |
| Open admin panel (`/admin`) | ✅ | ❌ | ❌ | `_require_admin()` |
| See Generate form | ✅ | ✅ | ❌ | `dashboard.html` conditional |
| Generate/regenerate/abort/delete controls | ✅ | ✅ | ❌ | UI conditional + API guard |
| View docs / download accessible variants | ✅ | ✅ | ✅ | ownership/grant resolution |
| Change own password button | ✅* | ✅ | ✅ | visible for all authenticated users |

\* `ADMIN_KEY` admin cannot rotate via `/api/me/rotate-key` (details below).

```html
{% if role == 'admin' %}
<a href="/admin" class="top-bar-admin-link">Admin</a>
{% endif %}

{% if role != 'viewer' %}
<section class="generate-section">
  ...
</section>
{% endif %}

{% if role != 'viewer' %}
<button class="btn btn-danger btn-sm" data-delete-variant="...">Delete</button>
{% endif %}
```

## Write-Protected API Permissions

All non-admin/non-user write attempts are rejected by a shared guard:

```python
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

### General write APIs (`admin` + `user` only)

| Endpoint | admin | user | viewer |
|---|---|---|---|
| `POST /api/generate` | ✅ | ✅ | ❌ (`403`) |
| `POST /api/projects/{name}/abort` | ✅ | ✅ | ❌ (`403`) |
| `POST /api/projects/{name}/{provider}/{model}/abort` | ✅ | ✅ | ❌ (`403`) |
| `DELETE /api/projects/{name}/{provider}/{model}` | ✅ | ✅ | ❌ (`403`) |
| `DELETE /api/projects/{name}` | ✅ | ✅ | ❌ (`403`) |

Additional restriction on generation source:

```python
# Fix 9: Local repo path access requires admin privileges
if gen_request.repo_path and not request.state.is_admin:
    raise HTTPException(
        status_code=403,
        detail="Local repo path access requires admin privileges",
    )
```

### Admin-only APIs (`admin` only)

| Endpoint | admin | user | viewer |
|---|---|---|---|
| `GET /admin` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `POST /api/admin/users` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `GET /api/admin/users` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `DELETE /api/admin/users/{username}` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `POST /api/admin/users/{username}/rotate-key` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `POST /api/admin/projects/{name}/access` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `GET /api/admin/projects/{name}/access` | ✅ | ❌ (`403`) | ❌ (`403`) |
| `DELETE /api/admin/projects/{name}/access/{username}` | ✅ | ❌ (`403`) | ❌ (`403`) |

```python
def _require_admin(request: Request) -> None:
    """Raise 403 if the user is not an admin."""
    if not request.state.is_admin:
        raise HTTPException(status_code=403, detail="Admin access required")
```

## Ownership, Sharing, and Visibility Rules

docsfy enforces ownership boundaries plus explicit grants:

- owners can access their own variants
- admins can access all
- non-owners can access only if granted in `project_access`

```python
async def _check_ownership(request: Request, project_name: str, project: dict[str, Any]) -> None:
    if request.state.is_admin:
        return
    project_owner = str(project.get("owner", ""))
    if project_owner == request.state.username:
        return
    access = await get_project_access(project_name, project_owner=project_owner)
    if request.state.username in access:
        return
    raise HTTPException(status_code=404, detail="Not found")
```

```python
if owner is not None and accessible and len(accessible) > 0:
    # Build OR conditions for each (name, owner) pair
    conditions = ["(owner = ?)"]
    ...
```

> **Warning:** Unauthorized project access intentionally returns `404` (not `403`) to avoid leaking resource existence.

### Shared-access route behavior

- Grant-aware routes use `_resolve_project()`:
  - `/api/projects/{name}/{provider}/{model}`
  - `/api/projects/{name}/{provider}/{model}/download`
  - `/docs/{project}/{provider}/{model}/{path}`
- Owner-scoped (non-admin) routes filter by `owner=request.state.username`:
  - `/api/projects/{name}`
  - `/api/projects/{name}/download`
  - `/docs/{project}/{path}`

> **Tip:** For users who received access via admin grant, prefer variant-scoped routes (`/{provider}/{model}`) for reliable access to shared projects.

## Password / API Key Rotation Semantics

Users (including `viewer`) can rotate their own key. This endpoint explicitly bypasses write-role restrictions.

```python
@app.post("/api/me/rotate-key")
async def rotate_own_key(request: Request) -> JSONResponse:
    """User rotates their own API key."""
    # Don't call _require_write_access -- viewers should be able to change their password
    if request.state.is_admin and not request.state.user:
        raise HTTPException(
            status_code=400,
            detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
        )
```

- `viewer` can rotate own key
- DB `admin` can rotate own key
- `ADMIN_KEY` admin cannot rotate through API; rotate the env var instead
- admin can rotate any user key via `/api/admin/users/{username}/rotate-key`

## Security and Configuration Snippets

`ADMIN_KEY` is mandatory and is also used for HMAC key hashing.

```bash
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

```python
admin_key: str = ""  # Required — validated at startup
secure_cookies: bool = True  # Set to False for local HTTP dev
```

Session cookie settings at login:

```python
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

Session tokens are opaque and stored hashed:

```python
token = secrets.token_urlsafe(32)
token_hash = _hash_session_token(token)
...
"INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)"
```

## Verification Coverage (Tests and Pipeline Config)

Role and permission behavior is covered in tests such as:

- `tests/test_auth.py`
- `tests/test_storage.py`
- `tests/test_main.py`

Example assertions:

```python
# Viewer is blocked from write API
response = await ac.post("/api/generate", json={"repo_url": "https://github.com/org/repo"})
assert response.status_code == 403
assert "Write access required" in response.json()["detail"]
```

```python
# Non-owner gets 404 (no resource existence leak)
response = await ac.get("/api/projects/secret-proj")
assert response.status_code == 404
```

Automated test command configured in `tox.toml`:

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Warning:** No repository workflow files were found under `.github/workflows`; if you enforce permissions checks in CI/CD, run the `tox` and pre-commit checks from your CI system explicitly.


---

Source: user-management.md

# User Management

docsfy uses API-key-based authentication with session cookies for browser workflows. User lifecycle operations (create, rotate password, delete) are admin-controlled.

## Authentication model

There are two admin paths:

1. **Environment admin**: username `admin` + `ADMIN_KEY`.
2. **Database admin user**: any username with role `admin`.

```python
# src/docsfy/main.py
# Check admin -- username must be "admin" and key must match
if username == "admin" and api_key == settings.admin_key:
    is_admin = True
    authenticated = True
else:
    # Check user key -- verify username matches the key's owner
    user = await get_user_by_key(api_key)
    if user and user["username"] == username:
        authenticated = True
        is_admin = user.get("role") == "admin"
```

> **Note:** In the UI, the login label says **Password**, but backend form/API field names use `api_key`.

## Required configuration

`ADMIN_KEY` is mandatory and must be at least 16 characters:

```python
# src/docsfy/main.py
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

```env
# .env.example
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Session cookies are `HttpOnly`, `SameSite=strict`, and `secure` by default:

```python
# src/docsfy/main.py
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

> **Tip:** For local non-HTTPS development, set `SECURE_COOKIES=false` so browser sessions work over `http://`.

## Roles and permissions

Roles are defined in storage:

```python
# src/docsfy/storage.py
VALID_ROLES = frozenset({"admin", "user", "viewer"})
```

Write operations are blocked for `viewer`:

```python
# src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

Dashboard UI also hides write controls for viewers and shows the Admin link only for admins.

## Creating users

Only admins can create users (`/admin` UI or `POST /api/admin/users`).

### Admin panel workflow

1. Log in as an admin.
2. Open `/admin`.
3. Enter username and select role (`user`, `admin`, `viewer`).
4. Submit **Create User**.
5. Save the returned password immediately.

```html
<!-- src/docsfy/templates/admin.html -->
<form id="create-user-form" onsubmit="createUser(event)">
    <input class="form-input" type="text" id="new-username" placeholder="Enter username" required>
    <select class="form-select" id="new-role">
        <option value="user">user</option>
        <option value="admin">admin</option>
        <option value="viewer">viewer</option>
    </select>
</form>
```

```javascript
// src/docsfy/templates/admin.html
const resp = await fetch("/api/admin/users", {
    method: "POST",
    headers: {"Content-Type": "application/json"},
    credentials: "same-origin",
    body: JSON.stringify({username: username, role: role})
});
const data = await resp.json();
document.getElementById("new-key-value").textContent = data.api_key;
```

```python
# src/docsfy/main.py
@app.post("/api/admin/users")
async def create_user_endpoint(request: Request) -> JSONResponse:
    _require_admin(request)
    body = await request.json()
    username = body.get("username", "")
    role = body.get("role", "user")
    username, raw_key = await create_user(username, role)
    return JSONResponse(
        content={"username": username, "api_key": raw_key, "role": role},
        headers={"Cache-Control": "no-store"},
    )
```

> **Warning:** Generated passwords are returned once (`api_key`/`new_api_key`) and are not retrievable later.

## Reserved usernames

`admin` is reserved (case-insensitive) for the environment-admin login convention.

```python
# src/docsfy/storage.py
if username.lower() == "admin":
    msg = "Username 'admin' is reserved"
    raise ValueError(msg)
```

Validation also enforces:
- length: 2-50 chars
- first char: alphanumeric
- allowed chars after first: alphanumeric, `.`, `_`, `-`

```python
# src/docsfy/storage.py
if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$", username):
    msg = f"Invalid username: '{username}'. Must be 2-50 alphanumeric characters, dots, hyphens, underscores."
    raise ValueError(msg)
```

Test coverage confirms case-insensitive reservation:

```python
# tests/test_auth.py
response = await admin_client.post(
    "/api/admin/users",
    json={"username": "Admin", "role": "user"},
)
assert response.status_code == 400
assert "reserved" in response.json()["detail"]
```

> **Warning:** Do not assign `admin` (any case) to regular users; creation is intentionally blocked.

## Deleting users

User deletion is admin-only and irreversible from the UI flow.

### Admin panel workflow

1. Open `/admin`.
2. Click **Delete** on the target user row.
3. Confirm in modal dialog.
4. User row is removed after successful API response.

```javascript
// src/docsfy/templates/admin.html
const resp = await fetch("/api/admin/users/" + encodeURIComponent(username), {
    method: "DELETE",
    credentials: "same-origin",
});
```

Backend self-delete guard:

```python
# src/docsfy/main.py
if username == request.state.username:
    raise HTTPException(status_code=400, detail="Cannot delete your own account")
```

Delete behavior in storage:

```python
# src/docsfy/storage.py
await db.execute("DELETE FROM sessions WHERE username = ?", (username,))
await db.execute("DELETE FROM projects WHERE owner = ?", (username,))
await db.execute("DELETE FROM project_access WHERE project_owner = ?", (username,))
await db.execute("DELETE FROM project_access WHERE username = ?", (username,))
cursor = await db.execute("DELETE FROM users WHERE username = ?", (username,))
```

When a deleted user still has an old session cookie, requests are rejected/redirected:

```python
# src/docsfy/main.py
if username != "admin":
    user = await get_user_by_username(username)
    if not user:
        if request.url.path.startswith("/api/"):
            return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
        return RedirectResponse(url="/login", status_code=302)
```

> **Warning:** Deleting a user also deletes that user’s active sessions, owned project records, and ACL entries.

## Password rotation workflows

### Admin rotates another user’s password

`POST /api/admin/users/{username}/rotate-key`  
Optional JSON body: `{"new_key": "..."}` (must be at least 16 chars). Empty body auto-generates a new password.

### User rotates own password

`POST /api/me/rotate-key`  
Also supports optional `new_key`, invalidates existing sessions, and clears the current session cookie.

```python
# src/docsfy/main.py
if request.state.is_admin and not request.state.user:
    raise HTTPException(
        status_code=400,
        detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
    )
```

> **Note:** `viewer` users are read-only for project writes, but they are still allowed to rotate their own password.

## Security storage notes

User API keys are not stored raw; hashes use HMAC with `ADMIN_KEY` as the secret:

```python
# src/docsfy/storage.py
# NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
# invalidate all existing api_key_hash values, requiring all users to
# regenerate their API keys.
return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()
```

> **Warning:** Rotating `ADMIN_KEY` invalidates all existing stored user API-key hashes.

## User management API quick reference

| Endpoint | Method | Access | Purpose |
|---|---|---|---|
| `/admin` | GET | admin | Admin panel UI |
| `/api/admin/users` | GET | admin | List users |
| `/api/admin/users` | POST | admin | Create user and return one-time `api_key` |
| `/api/admin/users/{username}` | DELETE | admin | Delete user |
| `/api/admin/users/{username}/rotate-key` | POST | admin | Rotate a user password |
| `/api/me/rotate-key` | POST | authenticated | Rotate own password |
| `/login` | GET/POST | public | Login page and credential submit |
| `/logout` | GET | authenticated | End session |

## Verification coverage

User management behavior is covered by tests in `tests/test_auth.py` and `tests/test_storage.py` (reserved usernames, self-delete guard, cookie/session behavior, role behavior, password rotation, ACL cleanup).

Test automation entrypoint:

```toml
# tox.toml
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```


---

Source: project-access-grants.md

# Project Access Grants

`docsfy` implements project sharing as **owner-scoped ACLs**. A grant is not global to a project name; it is scoped to a `(project_name, project_owner)` pair.

## Access Model (Owner-Scoped)

Each project variant is keyed by owner, and access grants are keyed by `(project_name, project_owner, username)`.

```56:73:src/docsfy/storage.py
        await db.execute("""
            CREATE TABLE IF NOT EXISTS projects (
                name TEXT NOT NULL,
                ai_provider TEXT NOT NULL DEFAULT '',
                ai_model TEXT NOT NULL DEFAULT '',
                owner TEXT NOT NULL DEFAULT '',
                repo_url TEXT NOT NULL,
                status TEXT NOT NULL DEFAULT 'generating',
                current_stage TEXT,
                last_commit_sha TEXT,
                last_generated TEXT,
                page_count INTEGER DEFAULT 0,
                error_message TEXT,
                plan_json TEXT,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                PRIMARY KEY (name, ai_provider, ai_model, owner)
            )
        """)
```

```237:244:src/docsfy/storage.py
        await db.execute("""
            CREATE TABLE IF NOT EXISTS project_access (
                project_name TEXT NOT NULL,
                project_owner TEXT NOT NULL DEFAULT '',
                username TEXT NOT NULL,
                PRIMARY KEY (project_name, project_owner, username)
            )
        """)
```

Because `project_access` does not include provider/model, a grant applies to **all variants** of that project for that owner.

```392:405:src/docsfy/storage.py
async def grant_project_access(
    project_name: str, username: str, project_owner: str = ""
) -> None:
    """Grant a user access to all variants of a project."""
    if not project_owner:
        msg = "project_owner is required for access grants"
        raise ValueError(msg)
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT OR IGNORE INTO project_access (project_name, project_owner, username) VALUES (?, ?, ?)",
            (project_name, project_owner, username),
        )
        await db.commit()
```

> **Note:** Project sharing is API-first. The admin HTML page in `src/docsfy/templates/admin.html` manages users, while grant/revoke flows are exercised via API calls in `test-plans/e2e-ui-test-plan.md`.

## Grant/Revoke/List APIs

All project-access APIs are admin-only.

```1203:1206:src/docsfy/main.py
def _require_admin(request: Request) -> None:
    """Raise 403 if the user is not an admin."""
    if not request.state.is_admin:
        raise HTTPException(status_code=403, detail="Admin access required")
```

```1266:1310:src/docsfy/main.py
@app.post("/api/admin/projects/{name}/access")
async def grant_access(request: Request, name: str) -> dict[str, str]:
    _require_admin(request)
    body = await request.json()
    username = body.get("username", "")
    project_owner = body.get("owner", "")
    if not username:
        raise HTTPException(status_code=400, detail="Username is required")
    if not project_owner:
        raise HTTPException(status_code=400, detail="Project owner is required")
    # Validate user exists
    user = await get_user_by_username(username)
    if not user:
        raise HTTPException(status_code=404, detail=f"User '{username}' not found")
    # Validate project exists for the specified owner
    variants = await list_variants(name, owner=project_owner)
    if not variants:
        raise HTTPException(
            status_code=404,
            detail=f"Project '{name}' not found for owner '{project_owner}'",
        )
    await grant_project_access(name, username, project_owner=project_owner)
    logger.info(
        f"[AUDIT] Admin '{request.state.username}' granted '{username}' access to '{name}' (owner: '{project_owner}')"
    )
    return {"granted": name, "username": username, "owner": project_owner}


@app.delete("/api/admin/projects/{name}/access/{username}")
async def revoke_access(request: Request, name: str, username: str) -> dict[str, str]:
    _require_admin(request)
    project_owner = request.query_params.get("owner", "")
    await revoke_project_access(name, username, project_owner=project_owner)
    logger.info(
        f"[AUDIT] Admin '{request.state.username}' revoked '{username}' access to '{name}' (owner: '{project_owner}')"
    )
    return {"revoked": name, "username": username}


@app.get("/api/admin/projects/{name}/access")
async def list_access(request: Request, name: str) -> dict[str, Any]:
    _require_admin(request)
    project_owner = request.query_params.get("owner", "")
    users = await get_project_access(name, project_owner=project_owner)
    return {"project": name, "owner": project_owner, "users": users}
```

Real API usage examples in the repo:

```1994:1994:test-plans/e2e-ui-test-plan.md
agent-browser javascript "fetch('/api/admin/projects/for-testing-only/access', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify({username: 'testviewer-e2e', owner: 'testuser-e2e'}) }).then(r => r.json()).then(d => JSON.stringify(d))"
```

```2054:2054:test-plans/e2e-ui-test-plan.md
agent-browser eval "fetch('/api/admin/projects/for-testing-only/access?owner=testuser-e2e', {credentials:'same-origin'}).then(r => r.json())"
```

```2069:2069:test-plans/e2e-ui-test-plan.md
agent-browser eval "fetch('/api/admin/projects/for-testing-only/access/testviewer-e2e?owner=testuser-e2e', {method:'DELETE', credentials:'same-origin'}).then(r => r.status)"
```

> **Warning:** Always pass `owner` for `GET /api/admin/projects/{name}/access` and `DELETE /api/admin/projects/{name}/access/{username}`. These handlers default `owner` to `""`, so omitting it usually targets no real owner-scoped grants.

## Non-Owner Visibility Rules

For non-admin users, `docsfy` combines owned projects with explicitly granted `(name, owner)` tuples on dashboard and status APIs:

```334:345:src/docsfy/main.py
@app.get("/", response_class=HTMLResponse)
async def dashboard(request: Request) -> HTMLResponse:
    settings = get_settings()
    username = request.state.username
    is_admin = request.state.is_admin

    if is_admin:
        projects = await list_projects()  # admin sees all
    else:
        accessible = await get_user_accessible_projects(username)
        projects = await list_projects(owner=username, accessible=accessible)
```

```366:379:src/docsfy/storage.py
async def list_projects(
    owner: str | None = None,
    accessible: list[tuple[str, str]] | None = None,
) -> list[dict[str, str | int | None]]:
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        if owner is not None and accessible and len(accessible) > 0:
            # Build OR conditions for each (name, owner) pair
            conditions = ["(owner = ?)"]
            params: list[str] = [owner]
            for proj_name, proj_owner in accessible:
                conditions.append("(name = ? AND owner = ?)")
```

Visibility checks return `404` for unauthorized project access (to avoid existence leaks), not `403`:

```194:207:src/docsfy/main.py
async def _check_ownership(
    request: Request, project_name: str, project: dict[str, Any]
) -> None:
    """Raise 404 if the requesting user does not own the project (unless admin)."""
    if request.state.is_admin:
        return
    project_owner = str(project.get("owner", ""))
    if project_owner == request.state.username:
        return
    # Check if user has been granted access (scoped by project_owner)
    access = await get_project_access(project_name, project_owner=project_owner)
    if request.state.username in access:
        return
    raise HTTPException(status_code=404, detail="Not found")
```

```580:608:tests/test_auth.py
async def test_non_owner_cannot_access_project(_init_db: None) -> None:
    """Non-admin user should not see projects owned by others."""
    from docsfy.main import _generating, app
    from docsfy.storage import create_user, save_project

    _generating.clear()
    _, bob_key = await create_user("bob-noowner")

    await save_project(
        name="secret-proj",
        repo_url="https://github.com/org/secret.git",
        ai_provider="claude",
        ai_model="opus",
        owner="alice-owner2",
    )

    transport = ASGITransport(app=app)
    async with AsyncClient(
        transport=transport,
        base_url="http://test",
        headers={"Authorization": f"Bearer {bob_key}"},
    ) as ac:
        # GET /api/projects/{name} - returns 404 to avoid leaking existence
        response = await ac.get("/api/projects/secret-proj")
        assert response.status_code == 404

        # GET /api/projects/{name}/{provider}/{model}
        response = await ac.get("/api/projects/secret-proj/claude/opus")
        assert response.status_code == 404
```

### Route Behavior Matrix (Non-Admin)

- Grant-aware routes:
  - `/`
  - `/api/status`
  - `/status/{name}/{provider}/{model}`
  - `/api/projects/{name}/{provider}/{model}`
  - `/api/projects/{name}/{provider}/{model}/download`
  - `/docs/{project}/{provider}/{model}/{path:path}`
- Owner-only (for non-admin) routes:
  - `/api/projects/{name}`
  - `/api/projects/{name}/download`
  - `/docs/{project}/{path:path}`

Evidence for owner-only generic routes:

```1115:1123:src/docsfy/main.py
@app.get("/api/projects/{name}")
async def get_project_details(request: Request, name: str) -> dict[str, Any]:
    name = _validate_project_name(name)
    if request.state.is_admin:
        variants = await list_variants(name)
    else:
        variants = await list_variants(name, owner=request.state.username)
    if not variants:
        raise HTTPException(status_code=404, detail=f"Project '{name}' not found")
```

```1158:1165:src/docsfy/main.py
@app.get("/api/projects/{name}/download")
async def download_project(request: Request, name: str) -> StreamingResponse:
    name = _validate_project_name(name)
    if request.state.is_admin:
        latest = await get_latest_variant(name)
    else:
        latest = await get_latest_variant(name, owner=request.state.username)
```

```1406:1418:src/docsfy/main.py
@app.get("/docs/{project}/{path:path}")
async def serve_docs(
    request: Request, project: str, path: str = "index.html"
) -> FileResponse:
    """Serve the most recently generated variant."""
    if not path or path == "/":
        path = "index.html"
    project = _validate_project_name(project)
    if request.state.is_admin:
        latest = await get_latest_variant(project)
    else:
        latest = await get_latest_variant(project, owner=request.state.username)
```

> **Tip:** For shared access, use variant-specific URLs (`/docs/{project}/{provider}/{model}/...` and `/api/projects/{name}/{provider}/{model}...`) because those routes resolve owner grants via `_resolve_project`.

## Revocation and Cleanup Semantics

Revocation is enforced at route level, not just UI hiding:

```2207:2228:test-plans/e2e-ui-test-plan.md
**Try accessing docs directly:**
```
agent-browser eval "fetch('/docs/for-testing-only/gemini/gemini-2.5-flash/index.html', {credentials:'same-origin'}).then(r => r.status)"
```

**Try accessing status page directly:**
```
agent-browser eval "fetch('/status/for-testing-only/gemini/gemini-2.5-flash', {credentials:'same-origin'}).then(r => r.status)"
```

**Try accessing download API directly:**
```
agent-browser eval "fetch('/api/projects/for-testing-only/gemini/gemini-2.5-flash/download', {credentials:'same-origin'}).then(r => r.status)"
```

**Check:** All direct URL accesses return 404, not just hidden from the dashboard.

**Expected result:**
- Docs endpoint returns `404`
- Status page endpoint returns `404`
- Download API endpoint returns `404`
- Revocation is enforced at the route level, not just UI level
```

`docsfy` also performs ACL cleanup when data is deleted:

```453:480:src/docsfy/storage.py
async def delete_project(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str | None = None
) -> bool:
    async with aiosqlite.connect(DB_PATH) as db:
        query = (
            "DELETE FROM projects WHERE name = ? AND ai_provider = ? AND ai_model = ?"
        )
        params: list[str] = [name, ai_provider, ai_model]
        if owner is not None:
            query += " AND owner = ?"
            params.append(owner)
        cursor = await db.execute(query, params)

        # Clean up project_access if no more variants remain for this name+owner
        if cursor.rowcount > 0 and owner is not None:
            remaining = await db.execute(
                "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?",
                (name, owner),
            )
            row = await remaining.fetchone()
            if row and row[0] == 0:
                await db.execute(
                    "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?",
                    (name, owner),
                )

        await db.commit()
        return cursor.rowcount > 0
```

```646:657:src/docsfy/storage.py
async def delete_user(username: str) -> bool:
    """Delete a user by username, invalidating all their sessions and cleaning up ACLs."""
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute("DELETE FROM sessions WHERE username = ?", (username,))
        # Clean up owned projects and their access entries
        await db.execute("DELETE FROM projects WHERE owner = ?", (username,))
        await db.execute(
            "DELETE FROM project_access WHERE project_owner = ?", (username,)
        )
        # Clean up ACL entries where user was granted access
        await db.execute("DELETE FROM project_access WHERE username = ?", (username,))
```

```446:470:tests/test_storage.py
async def test_delete_project_cleans_up_access(db_path: Path) -> None:
    from docsfy.storage import (
        delete_project,
        get_project_access,
        grant_project_access,
        save_project,
    )

    await save_project(
        name="cleanup-proj",
        repo_url="https://github.com/org/repo.git",
        ai_provider="claude",
        ai_model="opus",
        owner="testuser",
    )
    await grant_project_access("cleanup-proj", "alice", project_owner="testuser")

    # Delete the only variant
    await delete_project(
        "cleanup-proj", ai_provider="claude", ai_model="opus", owner="testuser"
    )

    # Access entries should be cleaned up
    users = await get_project_access("cleanup-proj", project_owner="testuser")
    assert len(users) == 0
```

## Viewer and Read-Only Behavior with Grants

Viewers can see assigned projects, but write operations remain blocked.

```185:191:src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

```1481:1492:src/docsfy/templates/dashboard.html
                        {% if variant.status == 'ready' %}
                        <div class="variant-actions">
                            <a href="/docs/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/" target="_blank" class="btn btn-primary btn-sm">View Docs</a>
                            <a href="/api/projects/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/download" class="btn btn-secondary btn-sm">Download</a>
                            {% if role != 'viewer' %}
                            <button class="btn btn-danger btn-sm" data-delete-variant="{{ repo_name }}/{{ variant.ai_provider }}/{{ variant.ai_model }}">Delete</button>
                            {% endif %}
                        </div>
                        {% if role != 'viewer' %}
                        <!-- Regenerate controls -->
                        {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }}
                        {% endif %}
```

```668:700:tests/test_auth.py
async def test_viewer_sees_assigned_projects(_init_db: None) -> None:
    """A viewer with granted access should see assigned projects."""
    from docsfy.main import _generating, app
    from docsfy.storage import create_user, grant_project_access, save_project

    _generating.clear()
    _, viewer_key = await create_user("viewer-assigned", role="viewer")

    # Create a project owned by someone else
    await save_project(
        name="assigned-proj",
        repo_url="https://github.com/org/assigned.git",
        ai_provider="claude",
        ai_model="opus",
        owner="other-owner",
    )

    # Grant viewer access to the project (scoped by project owner)
    await grant_project_access(
        "assigned-proj", "viewer-assigned", project_owner="other-owner"
    )

    transport = ASGITransport(app=app)
    async with AsyncClient(
        transport=transport,
        base_url="http://test",
        headers={"Authorization": f"Bearer {viewer_key}"},
    ) as ac:
        response = await ac.get("/api/status")
    assert response.status_code == 200
    projects = response.json()["projects"]
    project_names = [p["name"] for p in projects]
    assert "assigned-proj" in project_names
```

## Required Configuration

`ADMIN_KEY` is mandatory and must be at least 16 characters.

```80:89:src/docsfy/main.py
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)
```

```16:22:src/docsfy/config.py
    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```1:8:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

```6:8:docker-compose.yaml
    env_file: .env
    volumes:
      - ./data:/data
```

> **Warning:** Keep `SECURE_COOKIES=true` outside local HTTP development; admin APIs and grants are protected by authenticated sessions/bearer auth.

## Validation Coverage

Automated tests are configured through `tox`:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```


---

Source: api-key-rotation.md

# API Key Rotation

docsfy supports two API key rotation flows:

- **Self-service rotation** for the currently authenticated user.
- **Admin-initiated rotation** for any target user.

In the UI, API keys are labeled as **Password**, but server-side auth and storage use API key semantics.

> **Note:** Login uses `username` + `api_key`, and rotation responses return `new_api_key`.

```163:167:src/docsfy/templates/login.html
<!-- Field name="api_key" matches the POST handler in main.py (form.get("api_key")).
     Label says "Password" for UX, but the backend field name is api_key. -->
<label for="api_key">Password</label>
<input type="password" id="api_key" name="api_key" placeholder="Enter your password" required>
```

## Rotation Paths

| Path | Endpoint | Who can use it | `new_key` behavior | Session effect |
|---|---|---|---|---|
| Self-service | `POST /api/me/rotate-key` | Authenticated DB users (`admin`, `user`, `viewer`) | Optional; omit to auto-generate | All user sessions invalidated; current browser cookie cleared |
| Admin-initiated | `POST /api/admin/users/{username}/rotate-key` | Admin only | Optional; omit to auto-generate | All target user sessions invalidated |

### Self-Service Rotation

```1318:1353:src/docsfy/main.py
@app.post("/api/me/rotate-key")
async def rotate_own_key(request: Request) -> JSONResponse:
    """User rotates their own API key."""
    # Don't call _require_write_access -- viewers should be able to change their password
    if request.state.is_admin and not request.state.user:
        raise HTTPException(
            status_code=400,
            detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
        )

    body = await request.json()
    custom_key = body.get("new_key")  # Optional -- if provided, use it

    username = request.state.username
    try:
        new_key = await rotate_user_key(username, custom_key=custom_key)
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc)) from exc

    logger.info(f"[AUDIT] User '{username}' rotated their own API key")
    # Clear current session -- user must re-login with new key
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        await delete_session(session_token)
    settings = get_settings()
    response = JSONResponse(
        content={"username": username, "new_api_key": new_key},
        headers={"Cache-Control": "no-store"},
    )
    response.delete_cookie(
        "docsfy_session",
        httponly=True,
        samesite="strict",
        secure=settings.secure_cookies,
    )
    return response
```

The dashboard calls this endpoint from the **Change Password** action:

```2432:2460:src/docsfy/templates/dashboard.html
async function rotateOwnKey() {
    var newKey = await modalPrompt('Change Password', 'Enter new password (min 16 characters), or leave empty to auto-generate:', 'Minimum 16 characters', '', 'password');
    if (newKey === null) return;  // cancelled

    var body = {};
    if (newKey.trim()) {
        if (newKey.trim().length < 16) {
            await modalAlert('Invalid Password', 'Password must be at least 16 characters long.');
            return;
        }
        body.new_key = newKey.trim();
    }

    try {
        var resp = await fetch('/api/me/rotate-key', {
            method: 'POST',
            headers: {'Content-Type': 'application/json'},
            credentials: 'same-origin',
            body: JSON.stringify(body),
        });
        // ...
        await modalAlert('Password Changed', 'Your new password (save it now!):\n\n' + data.new_api_key + '\n\nYou will be redirected to login.');
        window.location.href = '/login';
    } catch (err) {
        await modalAlert('Error', 'Failed: ' + err.message);
    }
}
```

> **Tip:** Leave `new_key` empty to let the server generate a strong random key (`docsfy_...`).

> **Warning:** If you are authenticated via the `ADMIN_KEY` super-admin identity, self-service rotation is blocked. Rotate `ADMIN_KEY` in environment/config instead.

### Admin-Initiated Rotation

```1356:1374:src/docsfy/main.py
@app.post("/api/admin/users/{username}/rotate-key")
async def admin_rotate_key(request: Request, username: str) -> JSONResponse:
    """Admin rotates a user's API key."""
    _require_admin(request)
    body = await request.json()
    custom_key = body.get("new_key")
    try:
        new_key = await rotate_user_key(username, custom_key=custom_key)
    except ValueError as exc:
        detail = str(exc)
        status = 404 if "not found" in detail else 400
        raise HTTPException(status_code=status, detail=detail) from exc
    logger.info(
        f"[AUDIT] Admin '{request.state.username}' rotated API key for user '{username}'"
    )
    return JSONResponse(
        content={"username": username, "new_api_key": new_key},
        headers={"Cache-Control": "no-store"},
    )
```

Admin UI trigger:

```584:602:src/docsfy/templates/admin.html
var newKey = await modalPrompt("Change Password", "Enter new password for '" + username + "' (min 16 characters), or leave empty to auto-generate:", "Minimum 16 characters", "", "password");
if (newKey === null) return;

var body = {};
if (newKey.trim()) {
    if (newKey.trim().length < 16) {
        showAlert('error', 'Password must be at least 16 characters long.');
        return;
    }
    body.new_key = newKey.trim();
}

fetch('/api/admin/users/' + encodeURIComponent(username) + '/rotate-key', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    credentials: 'same-origin',
    redirect: 'error',
    body: JSON.stringify(body),
})
```

## Validation Rules

Server-side key validation is intentionally minimal and explicit:

```19:29:src/docsfy/storage.py
MIN_KEY_LENGTH = 16

def validate_api_key(key: str) -> None:
    """Validate API key meets minimum requirements."""
    if len(key) < MIN_KEY_LENGTH:
        msg = f"API key must be at least {MIN_KEY_LENGTH} characters long"
        raise ValueError(msg)
```

Startup validation for `ADMIN_KEY`:

```83:89:src/docsfy/main.py
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

What this means in practice:

- `new_key` is optional.
- If provided, it must be **at least 16 characters**.
- There is no additional server-side complexity/character-class validation.
- Admin rotation for a missing user returns `404`.

## Session Invalidation Behavior

Key rotation invalidates sessions in storage, then self-service rotation also clears the current browser cookie.

```724:743:src/docsfy/storage.py
async def rotate_user_key(username: str, custom_key: str | None = None) -> str:
    """Generate or set a new API key for a user. Returns the raw new key."""
    if custom_key:
        validate_api_key(custom_key)
        raw_key = custom_key
    else:
        raw_key = generate_api_key()
    key_hash = hash_api_key(raw_key)
    async with aiosqlite.connect(DB_PATH) as db:
        cursor = await db.execute(
            "UPDATE users SET api_key_hash = ? WHERE username = ?",
            (key_hash, username),
        )
        if cursor.rowcount == 0:
            msg = f"User '{username}' not found"
            raise ValueError(msg)
        # Invalidate all existing sessions for this user
        await db.execute("DELETE FROM sessions WHERE username = ?", (username,))
        await db.commit()
    return raw_key
```

Session and cookie settings relevant to post-rotation re-authentication:

```21:22:src/docsfy/storage.py
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```

```297:304:src/docsfy/main.py
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

Outcome summary:

- Old API key stops authenticating immediately.
- Existing sessions for that user are removed from the database.
- Self-service rotation removes the current `docsfy_session` cookie and forces re-login.
- Admin-initiated rotation logs out the target user(s), not the acting admin.

## Configuration

```1:2:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
```

```27:28:.env.example
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

```16:23:src/docsfy/config.py
admin_key: str = ""  # Required — validated at startup
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True  # Set to False for local HTTP dev
```

`docsfy` stores only HMAC hashes of API keys, not raw keys:

```588:601:src/docsfy/storage.py
def hash_api_key(key: str, hmac_secret: str = "") -> str:
    """Hash an API key with HMAC-SHA256 for storage.

    Uses ADMIN_KEY as the HMAC secret so that even if the source is read,
    keys cannot be cracked without the environment secret.
    """
    # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
    # invalidate all existing api_key_hash values, requiring all users to
    # regenerate their API keys.
    secret = hmac_secret or os.getenv("ADMIN_KEY", "")
```

> **Warning:** Rotating `ADMIN_KEY` changes the HMAC secret and invalidates all stored user key hashes. Plan a coordinated user key re-issuance.

## Verified Test Coverage

Rotation behavior is covered by automated tests:

```709:745:tests/test_auth.py
async def test_user_rotates_own_key(_init_db: None) -> None:
    """A user can rotate their own API key, invalidating the old one."""
    # ...
    resp = await ac.post(
        "/api/me/rotate-key",
        cookies={"docsfy_session": cookie},
        json={},
    )
    assert resp.status_code == 200
    data = resp.json()
    assert "new_api_key" in data
    assert data["new_api_key"] != key

    # Old key should no longer work for login
    resp = await ac.post(
        "/login",
        data={"username": "rotatetest", "api_key": key},
        follow_redirects=False,
    )
    assert resp.status_code != 302  # login should fail
```

```874:898:tests/test_auth.py
async def test_reject_short_custom_key(_init_db: None) -> None:
    """A custom key shorter than 16 characters should be rejected."""
    # ...
    resp = await ac.post(
        "/api/me/rotate-key",
        cookies={"docsfy_session": cookie},
        json={"new_key": "short"},
    )
    assert resp.status_code == 400
    assert "16 characters" in resp.json()["detail"]
```

```770:775:tests/test_auth.py
async def test_admin_rotates_nonexistent_user_key(
    admin_client: AsyncClient,
) -> None:
    """Admin rotating key for a non-existent user should return 404."""
    resp = await admin_client.post("/api/admin/users/no-such-user/rotate-key", json={})
    assert resp.status_code == 404
```

Repository test runner config:

```5:7:tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```


---

Source: security-controls.md

# Security Controls

`docsfy` implements layered controls around repository input, filesystem access, rendered HTML, and operational auditability.

## SSRF checks

`/api/generate` applies two validation layers before cloning remote repositories:

1. **Schema-level URL validation** (`GenerateRequest`) limits accepted formats to Git-style HTTPS/SSH URLs.
2. **Runtime SSRF guard** (`_reject_private_url`) blocks localhost/private targets, including DNS names that resolve to private IPs.

```python
# src/docsfy/models.py
@field_validator("repo_url")
@classmethod
def validate_repo_url(cls, v: str | None) -> str | None:
    if v is None:
        return v
    https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$"
    ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$"
    if not re.match(https_pattern, v) and not re.match(ssh_pattern, v):
        msg = f"Invalid git repository URL: '{v}'"
        raise ValueError(msg)
    return v
```

```python
# src/docsfy/main.py
if gen_request.repo_url:
    await _reject_private_url(gen_request.repo_url)

# ...
if hostname in ("localhost", "127.0.0.1", "::1", "0.0.0.0"):
    raise HTTPException(
        status_code=400,
        detail="Repository URL must not target localhost or private networks",
    )

# Check if hostname is an IP address in private range
try:
    addr = ipaddress.ip_address(hostname)
    if not addr.is_global:
        raise HTTPException(
            status_code=400,
            detail="Repository URL must not target localhost or private networks",
        )
except ValueError:
    # hostname is a DNS name - resolve and check
    resolved = await loop.run_in_executor(
        None, socket.getaddrinfo, hostname, None, socket.AF_UNSPEC, socket.SOCK_STREAM
    )
    for _family, _socktype, _proto, _canonname, sockaddr in resolved:
        ip_str = sockaddr[0]
        addr = ipaddress.ip_address(ip_str)
        if not addr.is_global:
            raise HTTPException(
                status_code=400,
                detail="Repository URL resolves to a private network address",
            )
```

Test coverage includes explicit SSRF assertions:

```python
# tests/test_main.py
with pytest.raises(HTTPException) as exc_info:
    await _reject_private_url("https://evil.com/org/repo")
assert exc_info.value.status_code == 400

response = await client.post(
    "/api/generate",
    json={"repo_url": "https://localhost/org/repo.git"},
)
assert response.status_code in (400, 422)
```

> **Note:** `_reject_private_url` is intentionally described in-code as **basic SSRF mitigation**; deeper controls (for example, DNS rebinding defenses) are expected at network/firewall layers.

---

## Path traversal protections

Path safety is enforced at multiple points, not just at route parsing.

### 1) Route/project identifier validation

Project names are constrained to alphanumeric + `.` `_` `-` patterns.

```python
# src/docsfy/main.py
def _validate_project_name(name: str) -> str:
    """Validate project name to prevent path traversal."""
    if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name):
        raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'")
    return name
```

### 2) Filesystem segment validation for project paths

`owner`, `ai_provider`, and `ai_model` path segments are rejected if they contain traversal markers.

```python
# src/docsfy/storage.py
def _validate_owner(owner: str) -> str:
    """Validate owner segment to prevent path traversal."""
    if not owner:
        return "_default"
    if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."):
        msg = f"Invalid owner: '{owner}'"
        raise ValueError(msg)
    return owner

def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    # Sanitize path segments to prevent traversal
    for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
        if (
            "/" in segment
            or "\\" in segment
            or ".." in segment
            or segment.startswith(".")
        ):
            msg = f"Invalid {segment_name}: '{segment}'"
            raise ValueError(msg)
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

### 3) Canonical path boundary checks when serving docs

Even with validated project names, requested file paths are resolved and forced to stay inside `site_dir`.

```python
# src/docsfy/main.py
file_path = site_dir / path
try:
    file_path.resolve().relative_to(site_dir.resolve())
except ValueError as exc:
    raise HTTPException(status_code=403, detail="Access denied") from exc
if not file_path.exists() or not file_path.is_file():
    raise HTTPException(status_code=404, detail="File not found")
return FileResponse(file_path)
```

### 4) Slug validation before cache/file writes and deletes

Generation/render steps reject or skip path-unsafe slugs.

```python
# src/docsfy/generator.py
if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug:
    msg = f"Invalid page slug: '{slug}'"
    raise ValueError(msg)

# src/docsfy/renderer.py
for slug, content in pages.items():
    if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug:
        logger.warning(f"Skipping invalid slug: {slug}")
    else:
        valid_pages[slug] = content
```

```python
# src/docsfy/main.py
if (
    "/" in slug
    or "\\" in slug
    or ".." in slug
    or slug.startswith(".")
):
    logger.warning(
        f"[{project_name}] Skipping invalid slug from incremental planner: {slug}"
    )
    continue
cache_file = cache_dir / f"{slug}.md"
try:
    cache_file.resolve().relative_to(cache_dir.resolve())
except ValueError:
    logger.warning(f"[{project_name}] Path traversal attempt in slug: {slug}")
    continue
```

---

## HTML sanitization

AI-generated markdown is converted to HTML and then sanitized before rendering.

### Sanitization behavior

- Removes `<script>` blocks
- Removes `<iframe>`, `<object>`, `<embed>`, and `<form>` tags
- Strips inline event handlers (`onclick`, `onerror`, etc.)
- Rewrites unsafe `href`/`src` values to `#`
- Allows only `http://`, `https://`, `#`, `/`, and `mailto:`

```python
# src/docsfy/renderer.py
def _sanitize_html(html: str) -> str:
    # Remove script tags and content
    html = re.sub(
        r"<script[^>]*>.*?</script>", "", html, flags=re.DOTALL | re.IGNORECASE
    )
    # Remove iframe, object, embed, form tags
    for tag in ["iframe", "object", "embed", "form"]:
        html = re.sub(
            rf"<{tag}[^>]*>.*?</{tag}>", "", html, flags=re.DOTALL | re.IGNORECASE
        )
        html = re.sub(rf"<{tag}[^>]*/>", "", html, flags=re.IGNORECASE)

    # Remove event handler attributes
    html = re.sub(r'\s+on\w+\s*=\s*["\'][^"\']*["\']', "", html, flags=re.IGNORECASE)
    html = re.sub(r"\s+on\w+\s*=\s*\S+", "", html, flags=re.IGNORECASE)

    # href/src allowlist; block non-allowed schemes by rewriting to "#"
    # ...
```

```python
# src/docsfy/renderer.py
def _md_to_html(md_text: str) -> tuple[str, str]:
    md = markdown.Markdown(
        extensions=["fenced_code", "codehilite", "tables", "toc"],
        extension_configs={
            "codehilite": {"css_class": "highlight", "guess_lang": False},
            "toc": {"toc_depth": "2-3"},
        },
    )
    content_html = _sanitize_html(md.convert(md_text))
    toc_html = getattr(md, "toc", "")
    return content_html, toc_html
```

`page` rendering uses `|safe` intentionally, after sanitization:

```html
<!-- src/docsfy/templates/page.html -->
<!-- SECURITY: |safe is intentional. Content is AI-generated markdown
     converted to HTML server-side by the markdown library -->
{{ content | safe }}
```

Automated tests validate the sanitizer behavior:

```python
# tests/test_renderer.py
result = _sanitize_html("<a href=javascript:alert(1)>x</a>")
assert "javascript:" not in result

result = _sanitize_html('<img src="x" onerror="alert(1)">')
assert "onerror" not in result

content_html, _ = _md_to_html('# Title\n\n<script>alert("xss")</script>\n\nSafe content.')
assert "<script" not in content_html
```

> **Warning:** Sanitization is regex-based in `renderer.py`; keep dependency and test updates frequent, because browser parsing edge cases evolve over time.

---

## Audit logging points

Security-sensitive actions are logged with a consistent `[AUDIT]` prefix.

### Logged events

| Area | Endpoint / action | Logged message pattern |
|---|---|---|
| Failed authentication | `POST /login` (invalid creds) | `"[AUDIT] Failed login attempt for username '...'"` |
| User lifecycle | `POST /api/admin/users`, `DELETE /api/admin/users/{username}` | Admin actor + target username + role |
| Access control changes | `POST /api/admin/projects/{name}/access`, `DELETE /api/admin/projects/{name}/access/{username}` | Admin actor + target user + project + owner scope |
| Key rotation | `POST /api/me/rotate-key`, `POST /api/admin/users/{username}/rotate-key` | Actor + target username |

```python
# src/docsfy/main.py
safe_username = username.replace("\n", "").replace("\r", "")[:100]
logger.info(f"[AUDIT] Failed login attempt for username '{safe_username}'")

logger.info(
    f"[AUDIT] User '{request.state.username}' created user '{username}' with role '{role}'"
)
logger.info(f"[AUDIT] User '{request.state.username}' deleted user '{username}'")

logger.info(
    f"[AUDIT] Admin '{request.state.username}' granted '{username}' access to '{name}' (owner: '{project_owner}')"
)
logger.info(
    f"[AUDIT] Admin '{request.state.username}' revoked '{username}' access to '{name}' (owner: '{project_owner}')"
)

logger.info(f"[AUDIT] User '{username}' rotated their own API key")
logger.info(
    f"[AUDIT] Admin '{request.state.username}' rotated API key for user '{username}'"
)
```

> **Tip:** Route `[AUDIT]` records to centralized logging/SIEM and alert on repeated failed logins, key rotations, and privilege/access changes.

---

## Security-relevant configuration and pipeline checks

### Runtime configuration

```python
# src/docsfy/main.py
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

```env
# .env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

### Pre-commit/CI security gates

```yaml
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    hooks:
      - id: detect-private-key
  - repo: https://github.com/Yelp/detect-secrets
    hooks:
      - id: detect-secrets
  - repo: https://github.com/gitleaks/gitleaks
    hooks:
      - id: gitleaks
```

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```toml
# .gitleaks.toml
[extend]
useDefault = true
```

> **Note:** No repository-hosted workflow files (`.github/workflows`, `.gitlab-ci.yml`, or `Jenkinsfile`) are present; these checks are configured for pre-commit and can be enforced by external CI orchestration.


---

Source: api-authentication.md

# Authentication Endpoints

docsfy supports two authentication mechanisms:

1. **Bearer API key** (recommended for API clients)
2. **Session cookie** (`docsfy_session`, used by browser login flow)

All routes are protected by middleware **except** `/login`, `/login/`, and `/health`.

```108:115:src/docsfy/main.py
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

async def dispatch(
    self, request: Request, call_next: RequestResponseEndpoint
) -> Response:
    if request.url.path in self._PUBLIC_PATHS:
        return await call_next(request)
```

## Endpoint Reference

| Endpoint | Method | Auth Required | Purpose | Success Behavior |
|---|---|---|---|---|
| `/login` | `GET` | No | Render login page | `200` HTML |
| `/login` | `POST` | No | Authenticate username + API key, create session | `302` redirect to `/`, sets `docsfy_session` cookie |
| `/logout` | `GET` | Yes | Invalidate session and clear cookie | `302` redirect to `/login`, deletes `docsfy_session` cookie |
| `/health` | `GET` | No | Liveness endpoint | `200` JSON |

> **Tip:** For programmatic clients, use `/api/*` routes. Unauthenticated API calls return JSON `401`, while non-API paths redirect to `/login`.

```151:155:src/docsfy/main.py
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)
```

## `POST /login` Details

`POST /login` reads **form fields** (not JSON): `username` and `api_key`.

```157:167:src/docsfy/templates/login.html
<form method="post" action="/login">
    <div class="form-group">
        <label for="username">Username</label>
        <input type="text" id="username" name="username" placeholder="Enter your username" required autocomplete="username" autofocus>
    </div>
    <div class="form-group">
        <!-- Field name="api_key" matches the POST handler in main.py (form.get("api_key")).
             Label says "Password" for UX, but the backend field name is api_key. -->
        <label for="api_key">Password</label>
```

Authentication logic:

- Admin login requires **both**:
  - `username == "admin"`
  - `api_key == ADMIN_KEY`
- User login requires:
  - `api_key` matches a stored user key
  - that key belongs to the submitted `username`

```283:305:src/docsfy/main.py
# Check admin -- username must be "admin" and key must match
if username == "admin" and api_key == settings.admin_key:
    is_admin = True
    authenticated = True
else:
    # Check user key -- verify username matches the key's owner
    user = await get_user_by_key(api_key)
    if user and user["username"] == username:
        authenticated = True
        is_admin = user.get("role") == "admin"

if authenticated:
    session_token = await create_session(username, is_admin=is_admin)
    response = RedirectResponse(url="/", status_code=302)
    response.set_cookie(
        "docsfy_session",
        session_token,
        httponly=True,
        samesite="strict",
        secure=settings.secure_cookies,
        max_age=SESSION_TTL_SECONDS,
    )
```

Failed login returns `401` with login HTML and `"Invalid username or password"`.

## `GET /logout` Details

`GET /logout`:

1. Reads `docsfy_session` cookie
2. Deletes the server-side session record
3. Deletes the cookie
4. Redirects to `/login`

```317:331:src/docsfy/main.py
@app.get("/logout")
async def logout(request: Request) -> RedirectResponse:
    """Clear the session cookie, delete session from DB, and redirect to login."""
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        await delete_session(session_token)
    settings = get_settings()
    response = RedirectResponse(url="/login", status_code=302)
    response.delete_cookie(
        "docsfy_session",
        httponly=True,
        samesite="strict",
        secure=settings.secure_cookies,
    )
```

## Cookie and Session Behavior

- Cookie name: `docsfy_session`
- Cookie attributes on login:
  - `HttpOnly`
  - `SameSite=Strict`
  - `Secure` controlled by `secure_cookies`
  - `Max-Age=28800` (8 hours)
- Session token is **opaque** and generated with `secrets.token_urlsafe(32)`
- Database stores a **SHA-256 hash** of session token, not raw token
- Session lookup enforces expiration (`expires_at > datetime('now')`)

```21:23:src/docsfy/storage.py
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```

```681:710:src/docsfy/storage.py
def _hash_session_token(token: str) -> str:
    """Hash a session token for storage."""
    return hashlib.sha256(token.encode()).hexdigest()

async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )

async def get_session(token: str) -> dict[str, str | int | None] | None:
    """Look up a session. Returns None if expired or not found."""
    token_hash = _hash_session_token(token)
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        cursor = await db.execute(
            "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')",
```

> **Note:** Middleware checks `Authorization: Bearer ...` **before** checking `docsfy_session`. If both are present, Bearer token path is evaluated first.

```122:136:src/docsfy/main.py
# 1. Check Authorization header (API clients)
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer "):
    token = auth_header[7:]
    if token == settings.admin_key:
        is_admin = True
        username = "admin"
    else:
        user = await get_user_by_key(token)

# 2. Check session cookie (browser) -- opaque session token
if not user and not is_admin:
    session_token = request.cookies.get("docsfy_session")
```

## API Client Auth Requirements

For API clients, send:

- `Authorization: Bearer <token>`

Accepted tokens:

- `ADMIN_KEY` (full admin access)
- User API key (role-based access)

Role gates:

- `admin`, `user` => write endpoints allowed
- `viewer` => read-only
- Admin endpoints require admin privileges

```185:191:src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
```

```1203:1207:src/docsfy/main.py
def _require_admin(request: Request) -> None:
    """Raise 403 if the user is not an admin."""
    if not request.state.is_admin:
        raise HTTPException(status_code=403, detail="Admin access required")
```

## Configuration for Authentication

`ADMIN_KEY` is mandatory and must be at least 16 characters. `SECURE_COOKIES` defaults to secure behavior.

```1:2:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
```

```27:29:.env.example
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

```83:89:src/docsfy/main.py
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Warning:** `secure_cookies` defaults to `True`; on plain HTTP local development, browser session cookies may not be set/sent unless `SECURE_COOKIES=false` is configured.

## Code-Backed Client Examples

Login via form and receive session cookie:

```101:111:tests/test_auth.py
async def test_login_with_admin_key(unauthed_client: AsyncClient) -> None:
    """POST /login with the admin key should set a session cookie and redirect."""
    response = await unauthed_client.post(
        "/login",
        data={"username": "admin", "api_key": TEST_ADMIN_KEY},
        follow_redirects=False,
    )
    assert response.status_code == 302
    assert response.headers["location"] == "/"
    assert "docsfy_session" in response.cookies
```

Bearer auth for API access:

```157:179:tests/test_auth.py
async def test_api_bearer_auth(admin_client: AsyncClient) -> None:
    """Requests with a valid Bearer token should succeed."""
    response = await admin_client.get("/api/status")
    assert response.status_code == 200
    assert "projects" in response.json()

async def test_api_bearer_auth_user_key(_init_db: None) -> None:
    """Requests with a valid user Bearer token should succeed."""
    from docsfy.main import _generating, app
    from docsfy.storage import create_user

    _generating.clear()
    _username, raw_key = await create_user("bob")
```

Unauthenticated API request behavior:

```87:93:tests/test_auth.py
async def test_api_returns_401_when_unauthenticated(
    unauthed_client: AsyncClient,
) -> None:
    """API requests without auth should return 401."""
    response = await unauthed_client.get("/api/status")
    assert response.status_code == 401
    assert response.json()["detail"] == "Unauthorized"
```

Auth contract is continuously validated by the test suite executed via `tox`:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```


---

Source: api-generation.md

# Generation Endpoints

`docsfy` generation is asynchronous: `POST /api/generate` accepts a request, schedules background work, and returns immediately. You then poll status endpoints until the variant reaches `ready`, `error`, or `aborted`.

> **Note:** Generation is scoped by **owner + project name + provider + model**. Two different users can generate the same repo/model combination without colliding.

## Endpoint Summary

| Method | Path | Purpose |
|---|---|---|
| `POST` | `/api/generate` | Start generation for a repo variant |
| `POST` | `/api/projects/{name}/{provider}/{model}/abort` | Abort an active generation for one variant |
| `POST` | `/api/projects/{name}/abort` | Legacy abort endpoint (name-only matching) |
| `GET` | `/api/status` | List visible projects + `known_models` for UI suggestions |
| `GET` | `/api/projects/{name}/{provider}/{model}` | Poll a single variant’s detailed status |

## Auth and Write Permissions

All `/api/*` endpoints require authentication. Generation and abort endpoints also require write access (`admin` or `user` role).

```151:191:src/docsfy/main.py
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)

def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

## `POST /api/generate`

### Request Schema

```10:64:src/docsfy/models.py
class GenerateRequest(BaseModel):
    repo_url: str | None = Field(
        default=None, description="Git repository URL (HTTPS or SSH)"
    )
    repo_path: str | None = Field(default=None, description="Local git repository path")
    ai_provider: Literal["claude", "gemini", "cursor"] | None = None
    ai_model: str | None = None
    ai_cli_timeout: int | None = Field(default=None, gt=0)
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )

    @model_validator(mode="after")
    def validate_source(self) -> GenerateRequest:
        if not self.repo_url and not self.repo_path:
            msg = "Either 'repo_url' or 'repo_path' must be provided"
            raise ValueError(msg)
        if self.repo_url and self.repo_path:
            msg = "Provide either 'repo_url' or 'repo_path', not both"
            raise ValueError(msg)
        return self

    @field_validator("repo_url")
    @classmethod
    def validate_repo_url(cls, v: str | None) -> str | None:
        if v is None:
            return v
        https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$"
        ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$"
        if not re.match(https_pattern, v) and not re.match(ssh_pattern, v):
            msg = f"Invalid git repository URL: '{v}'"
            raise ValueError(msg)
        return v

    @field_validator("repo_path")
    @classmethod
    def validate_repo_path(cls, v: str | None) -> str | None:
        if v is None:
            return v
        path = Path(v)
        if not path.is_absolute():
            msg = "repo_path must be an absolute path"
            raise ValueError(msg)
        return v

    @property
    def project_name(self) -> str:
        if self.repo_url:
            name = self.repo_url.rstrip("/").split("/")[-1]
            if name.endswith(".git"):
                name = name[:-4]
            return name
        if self.repo_path:
            return Path(self.repo_path).resolve().name
        return "unknown"
```

### Field Behavior

| Field | Type | Required | Validation | Effective default |
|---|---|---|---|---|
| `repo_url` | `string \| null` | One of `repo_url` or `repo_path` is required | Must match HTTPS/HTTP or SSH git URL pattern | None |
| `repo_path` | `string \| null` | One of `repo_url` or `repo_path` is required | Must be absolute path; endpoint also checks path exists and has `.git` | None |
| `ai_provider` | `claude \| gemini \| cursor \| null` | Optional | Literal enum in schema + server-side runtime check | `AI_PROVIDER` |
| `ai_model` | `string \| null` | Optional in body | Must be non-empty after fallback | `AI_MODEL` |
| `ai_cli_timeout` | `int \| null` | Optional | `> 0` | `AI_CLI_TIMEOUT` |
| `force` | `bool` | Optional | none | `false` |

### Actual request body shape (dashboard client)

```2043:2056:src/docsfy/templates/dashboard.html
var body = {
    repo_url: repoUrl,
    ai_provider: provider,
    force: force
};
if (model) body.ai_model = model;

fetch('/api/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    credentials: 'same-origin',
    redirect: 'manual',
    body: JSON.stringify(body)
})
```

### Success response

```73:83:tests/test_main.py
async def test_generate_endpoint_starts_generation(client: AsyncClient) -> None:
    with patch("docsfy.main.asyncio.create_task") as mock_task:
        mock_task.side_effect = lambda coro: coro.close()
        response = await client.post(
            "/api/generate",
            json={"repo_url": "https://github.com/org/repo.git"},
        )
    assert response.status_code == 202
    body = response.json()
    assert body["project"] == "repo"
    assert body["status"] == "generating"
```

Response shape:

```json
{
  "project": "<derived project name>",
  "status": "generating"
}
```

## Provider/Model Validation

Provider and model are resolved from request first, then environment defaults:

```455:467:src/docsfy/main.py
ai_provider = gen_request.ai_provider or settings.ai_provider
ai_model = gen_request.ai_model or settings.ai_model
project_name = gen_request.project_name
owner = request.state.username

if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
if not ai_model:
    raise HTTPException(status_code=400, detail="AI model must be specified.")
```

Supported providers are explicitly tested:

```14:17:tests/test_ai_client.py
assert "claude" in PROVIDERS
assert "gemini" in PROVIDERS
assert "cursor" in PROVIDERS
assert VALID_AI_PROVIDERS == frozenset({"claude", "gemini", "cursor"})
```

> **Note:** Model names are **not** checked against a strict server-side allowlist at request time; any non-empty string can pass input validation. Real compatibility is verified later by AI CLI availability checks.

## Conflict and Error Responses

### `POST /api/generate`

| HTTP | Condition | Typical detail |
|---|---|---|
| `202` | Accepted; generation queued | `{"project":"...","status":"generating"}` |
| `400` | Runtime validation failure | Invalid provider, empty effective model, bad local repo path, SSRF-protected URL |
| `401` | Missing/invalid auth for `/api/*` | `Unauthorized` |
| `403` | Viewer role or non-admin using `repo_path` | `Write access required.` / `Local repo path access requires admin privileges` |
| `409` | Same owner/name/provider/model already generating | `Variant 'name/provider/model' is already being generated` |
| `422` | Pydantic schema validation failure | Invalid URL, both/neither `repo_url` and `repo_path`, relative `repo_path`, bad enum, timeout <= 0 |

Examples verified in tests:

```68:71:tests/test_main.py
async def test_generate_endpoint_invalid_url(client: AsyncClient) -> None:
    response = await client.post("/api/generate", json={"repo_url": "not-a-url"})
    assert response.status_code == 422
```

```129:145:tests/test_main.py
async def test_generate_duplicate_variant(client: AsyncClient) -> None:
    """Test that generating the same variant twice returns 409."""
    from docsfy.main import _generating

    # gen_key format now includes owner: "owner/name/provider/model"
    _generating["admin/repo/claude/opus"] = asyncio.create_task(asyncio.sleep(100))
    try:
        response = await client.post(
            "/api/generate",
            json={
                "repo_url": "https://github.com/org/repo.git",
                "ai_provider": "claude",
                "ai_model": "opus",
            },
        )
        assert response.status_code == 409
```

```268:275:tests/test_main.py
async def test_generate_rejects_private_url(client: AsyncClient) -> None:
    """Test that SSRF protection rejects private/localhost URLs."""
    response = await client.post(
        "/api/generate",
        json={"repo_url": "https://localhost/org/repo.git"},
    )
    # Should be rejected by URL validation (either Pydantic or SSRF check)
    assert response.status_code in (400, 422)
```

### Abort Endpoints (`/api/projects/.../abort`)

```569:621:src/docsfy/main.py
@app.post("/api/projects/{name}/abort")
async def abort_generation(request: Request, name: str) -> dict[str, str]:
    """Abort generation for any variant of the given project name.

    Kept for backward compatibility. Finds the first active generation
    matching the project name.
    """
    _require_write_access(request)
    name = _validate_project_name(name)
    # Find active generation keys matching this project name
    matching_keys = [
        key
        for key in _generating
        if len(key.split("/", 3)) == 4 and key.split("/", 3)[1] == name
    ]
    if request.state.is_admin and len(matching_keys) > 1:
        distinct_owners = {key.split("/", 3)[0] for key in matching_keys}
        if len(distinct_owners) > 1:
            raise HTTPException(
                status_code=409,
                detail="Multiple owners found for this variant, please specify owner",
            )
...
    except asyncio.TimeoutError as exc:
        logger.warning(f"[{name}] Abort requested but cancellation still in progress")
        raise HTTPException(
            status_code=409,
            detail=f"Abort still in progress for '{name}'. Please retry shortly.",
        ) from exc
```

```642:699:src/docsfy/main.py
@app.post("/api/projects/{name}/{provider}/{model}/abort")
async def abort_variant(
    request: Request, name: str, provider: str, model: str
) -> dict[str, str]:
    _require_write_access(request)
...
    if not task:
        ...
        if not task:
            raise HTTPException(
                status_code=404,
                detail="No active generation for this variant",
            )
...
    except asyncio.TimeoutError as exc:
        logger.warning(
            f"[{gen_key}] Abort requested but cancellation still in progress"
        )
        raise HTTPException(
            status_code=409,
            detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.",
        ) from exc
```

> **Warning:** The name-only abort endpoint is legacy and can become ambiguous for admins when multiple owners have active generations for the same project name.

## Async Failures and Status Polling

`/api/generate` only validates/enqueues. Runtime failures are reflected later in project status.

```720:744:src/docsfy/main.py
async def _run_generation(
    repo_url: str | None,
    repo_path: str | None,
    project_name: str,
    ai_provider: str,
    ai_model: str,
    ai_cli_timeout: int,
    force: bool = False,
    owner: str = "",
) -> None:
    gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}"
    try:
        cli_flags = ["--trust"] if ai_provider == "cursor" else None
        available, msg = await check_ai_cli_available(
            ai_provider, ai_model, cli_flags=cli_flags
        )
        if not available:
            await update_project_status(
                project_name,
                ai_provider,
                ai_model,
                status="error",
                owner=owner,
                error_message=msg,
            )
            return
```

```803:812:src/docsfy/main.py
except Exception as exc:
    logger.error(f"Generation failed for {project_name}: {exc}")
    await update_project_status(
        project_name,
        ai_provider,
        ai_model,
        status="error",
        owner=owner,
        error_message=str(exc),
    )
```

```409:419:src/docsfy/main.py
@app.get("/api/status")
async def status(request: Request) -> dict[str, Any]:
    if request.state.is_admin:
        projects = await list_projects()
    else:
        accessible = await get_user_accessible_projects(request.state.username)
        projects = await list_projects(
            owner=request.state.username, accessible=accessible
        )
    known_models = await get_known_models()
    return {"projects": projects, "known_models": known_models}
```

Status values used by generation records:

```17:17:src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

> **Tip:** Use `GET /api/status` during polling and consume `known_models` to drive provider-specific model suggestions in clients.

## Configuration (Provider/Model/Timeout)

Environment defaults in `.env`:

```1:8:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

Application defaults when env vars are unset:

```16:22:src/docsfy/config.py
admin_key: str = ""  # Required — validated at startup
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True  # Set to False for local HTTP dev
```

`docker-compose` loads `.env` directly:

```1:8:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```


---

Source: api-projects-and-variants.md

# Project and Variant Endpoints

docsfy exposes both project-level and variant-level endpoints:

- **Project-level** endpoints use `/{name}` and either list, delete, abort, or download across variants.
- **Variant-level** endpoints use `/{name}/{provider}/{model}` and target one exact variant.

> **Tip:** Prefer variant-level endpoints in automation; project-level endpoints can select by owner/time and may be ambiguous in multi-owner setups.

## Authentication and Access

All endpoints below are protected except `/login` and `/health`. API requests without auth return `401`, and write endpoints require `admin` or `user` role (viewers are read-only).

```105:155:src/docsfy/main.py
class AuthMiddleware(BaseHTTPMiddleware):
    """Authenticate every request via Bearer token or session cookie."""

    # Paths that do not require authentication
    _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
        if not user and not is_admin:
            # Not authenticated
            if request.url.path.startswith("/api/"):
                return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
            return RedirectResponse(url="/login", status_code=302)
```

```185:191:src/docsfy/main.py
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )
```

## Endpoint Matrix

| Operation | Project Endpoint | Variant Endpoint | Method |
|---|---|---|---|
| Status list | `/api/status` | — | `GET` |
| Status page (HTML) | — | `/status/{name}/{provider}/{model}` | `GET` |
| Details | `/api/projects/{name}` | `/api/projects/{name}/{provider}/{model}` | `GET` |
| Delete | `/api/projects/{name}` | `/api/projects/{name}/{provider}/{model}` | `DELETE` |
| Abort | `/api/projects/{name}/abort` (legacy) | `/api/projects/{name}/{provider}/{model}/abort` | `POST` |
| Download | `/api/projects/{name}/download` | `/api/projects/{name}/{provider}/{model}/download` | `GET` |

## Variant Data Shape and Status Values

Variant payloads map directly to the `projects` table columns.

```57:73:src/docsfy/storage.py
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    repo_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'generating',
    current_stage TEXT,
    last_commit_sha TEXT,
    last_generated TEXT,
    page_count INTEGER DEFAULT 0,
    error_message TEXT,
    plan_json TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

```17:17:src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

## Status Endpoints

### `GET /api/status`
Returns:
- `projects`: accessible variants
- `known_models`: map of provider -> known ready models

For non-admin users, this includes owned variants **plus granted-access variants**.

```409:419:src/docsfy/main.py
@app.get("/api/status")
async def status(request: Request) -> dict[str, Any]:
    if request.state.is_admin:
        projects = await list_projects()
    else:
        accessible = await get_user_accessible_projects(request.state.username)
        projects = await list_projects(
            owner=request.state.username, accessible=accessible
        )
    known_models = await get_known_models()
    return {"projects": projects, "known_models": known_models}
```

### `GET /status/{name}/{provider}/{model}` (HTML)
Variant status UI page used by the dashboard/status flow.

```369:401:src/docsfy/main.py
@app.get("/status/{name}/{provider}/{model}", response_class=HTMLResponse)
async def project_status_page(
    request: Request, name: str, provider: str, model: str
) -> HTMLResponse:
    name = _validate_project_name(name)
    project = await _resolve_project(
        request, name, ai_provider=provider, ai_model=model
    )
...
    template = _jinja_env.get_template("status.html")
    html = template.render(
        project=project,
        plan_json=plan_json,
        total_pages=total_pages,
        known_models=known_models,
        default_provider=settings.ai_provider,
        default_model=settings.ai_model,
    )
    return HTMLResponse(content=html)
```

## Details Endpoints

### `GET /api/projects/{name}`
Returns `{ "name": "...", "variants": [...] }`.

- Admin: all owners’ variants for that name.
- Non-admin: only variants owned by `request.state.username`.

### `GET /api/projects/{name}/{provider}/{model}`
Returns one resolved variant object.

```1019:1124:src/docsfy/main.py
@app.get("/api/projects/{name}/{provider}/{model}")
async def get_variant_details(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> dict[str, str | int | None]:
    name = _validate_project_name(name)
    project = await _resolve_project(
        request, name, ai_provider=provider, ai_model=model
    )

    return project
...
@app.get("/api/projects/{name}")
async def get_project_details(request: Request, name: str) -> dict[str, Any]:
    name = _validate_project_name(name)
    if request.state.is_admin:
        variants = await list_variants(name)
    else:
        variants = await list_variants(name, owner=request.state.username)
    if not variants:
        raise HTTPException(status_code=404, detail=f"Project '{name}' not found")
    return {"name": name, "variants": variants}
```

> **Warning:** Variant resolution for admin can return `409` when the same `{name}/{provider}/{model}` exists under multiple owners.

```231:246:src/docsfy/main.py
# 2. For admin, disambiguate by owner
if request.state.is_admin:
    all_variants = await list_variants(name)
    matching = [
        v
        for v in all_variants
        if v.get("ai_provider") == ai_provider and v.get("ai_model") == ai_model
    ]
    if not matching:
        raise HTTPException(status_code=404, detail="Not found")
    distinct_owners = {str(v.get("owner", "")) for v in matching}
    if len(distinct_owners) > 1:
        raise HTTPException(
            status_code=409,
            detail="Multiple owners found for this variant, please specify owner",
        )
```

## Deletion Endpoints

### `DELETE /api/projects/{name}/{provider}/{model}`
- Requires write access.
- Rejects deletion with `409` if generation is active for that variant.
- Deletes DB record and variant directory.

### `DELETE /api/projects/{name}`
- Requires write access.
- Rejects with `409` if any variant with that project name is still generating.
- Admin deletes **all** variants for that name (across owners); non-admin deletes only own variants.

```1034:1071:src/docsfy/main.py
@app.delete("/api/projects/{name}/{provider}/{model}")
async def delete_variant(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> dict[str, str]:
    _require_write_access(request)
    name = _validate_project_name(name)

    # Check for active generation (scan all keys)
    for key in _generating:
...
            raise HTTPException(
                status_code=409,
                detail=f"Cannot delete '{name}/{provider}/{model}' while generation is in progress. Abort first.",
            )
...
    return {"deleted": f"{name}/{provider}/{model}"}
```

```1127:1155:src/docsfy/main.py
@app.delete("/api/projects/{name}")
async def delete_project_endpoint(request: Request, name: str) -> dict[str, str]:
    _require_write_access(request)
    name = _validate_project_name(name)
...
    if request.state.is_admin:
        variants = await list_variants(name)
    else:
        variants = await list_variants(name, owner=request.state.username)
...
    return {"deleted": name}
```

## Abort Endpoints

### `POST /api/projects/{name}/abort` (legacy)
Backwards-compatible endpoint that aborts the first active generation matching project name.

### `POST /api/projects/{name}/{provider}/{model}/abort`
Variant-specific abort endpoint.

Both endpoints:
- Require write access.
- Return `404` if no active generation.
- Can return `409` if cancellation is still in progress.
- Update status to `aborted` with `error_message="Generation aborted by user"`.

```569:639:src/docsfy/main.py
@app.post("/api/projects/{name}/abort")
async def abort_generation(request: Request, name: str) -> dict[str, str]:
    """Abort generation for any variant of the given project name.
...
    _require_write_access(request)
...
    if not task or not matching_key:
        raise HTTPException(
            status_code=404, detail=f"No active generation for '{name}'"
        )
...
    await update_project_status(
        name,
        ai_provider,
        ai_model,
        status="aborted",
        owner=key_owner,
        error_message="Generation aborted by user",
        current_stage=None,
    )
...
    return {"aborted": name}
```

```642:717:src/docsfy/main.py
@app.post("/api/projects/{name}/{provider}/{model}/abort")
async def abort_variant(
    request: Request, name: str, provider: str, model: str
) -> dict[str, str]:
    _require_write_access(request)
...
    if not task:
        ...
        if not task:
            raise HTTPException(
                status_code=404,
                detail="No active generation for this variant",
            )
...
    return {"aborted": f"{name}/{provider}/{model}"}
```

UI integration example (URL-encoding each path segment):

```2162:2176:src/docsfy/templates/dashboard.html
document.addEventListener('click', async function(e) {
    var abortBtn = e.target.closest('[data-abort-variant]');
    if (!abortBtn) return;
    var composite = abortBtn.getAttribute('data-abort-variant');
    // composite is "name/provider/model"
    var parts = composite.split('/');
    var name = parts[0];
    var provider = parts[1];
    var model = parts.slice(2).join('/');
...
    fetch('/api/projects/' + encodeURIComponent(name) + '/' + encodeURIComponent(provider) + '/' + encodeURIComponent(model) + '/abort', { method: 'POST', credentials: 'same-origin', redirect: 'manual' })
```

## Download Endpoints

### `GET /api/projects/{name}/{provider}/{model}/download`
- Requires variant to be `ready`, else `400 "Variant not ready"`.
- Streams `application/gzip`.
- Filename: `{name}-{provider}-{model}-docs.tar.gz`.

### `GET /api/projects/{name}/download`
- Selects latest ready variant (`last_generated DESC`).
- Streams `application/gzip`.
- Filename: `{name}-docs.tar.gz`.

```1074:1112:src/docsfy/main.py
@app.get("/api/projects/{name}/{provider}/{model}/download")
async def download_variant(
    request: Request,
    name: str,
    provider: str,
    model: str,
) -> StreamingResponse:
...
    if project["status"] != "ready":
        raise HTTPException(status_code=400, detail="Variant not ready")
...
    return StreamingResponse(
        _stream_and_cleanup(),
        media_type="application/gzip",
        headers={
            "Content-Disposition": f'attachment; filename="{name}-{provider}-{model}-docs.tar.gz"'
        },
    )
```

```1158:1194:src/docsfy/main.py
@app.get("/api/projects/{name}/download")
async def download_project(request: Request, name: str) -> StreamingResponse:
...
    if request.state.is_admin:
        latest = await get_latest_variant(name)
    else:
        latest = await get_latest_variant(name, owner=request.state.username)
    if not latest:
        raise HTTPException(status_code=404, detail=f"No ready variant for '{name}'")
...
    return StreamingResponse(
        _stream_and_cleanup(),
        media_type="application/gzip",
        headers={"Content-Disposition": f'attachment; filename="{name}-docs.tar.gz"'},
    )
```

Integration test coverage confirms both download routes return gzip content:

```138:146:tests/test_integration.py
# Download via variant-specific route
response = await client.get("/api/projects/test-repo/claude/opus/download")
assert response.status_code == 200
assert response.headers["content-type"] == "application/gzip"

# Download via latest-variant route
response = await client.get("/api/projects/test-repo/download")
assert response.status_code == 200
assert response.headers["content-type"] == "application/gzip"
```

## Validation and Common Error Cases

Project name is validated before project/variant operations:

```73:77:src/docsfy/main.py
def _validate_project_name(name: str) -> str:
    """Validate project name to prevent path traversal."""
    if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name):
        raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'")
    return name
```

Common errors:
- `400`: invalid project name; variant download attempted before ready.
- `401`: missing/invalid API auth for `/api/*`.
- `403`: write action by `viewer`.
- `404`: not found/not accessible/no active generation.
- `409`: delete while generating; admin owner ambiguity; abort still cancelling.

## Relevant Configuration Snippets

Auth/runtime settings affecting these endpoints:

```1:8:.env.example
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# AI Configuration
AI_PROVIDER=claude
# [1m] = 1 million token context window, this is a valid model identifier
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
```

```27:28:.env.example
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Operational health check (separate from project status API):

```9:13:docker-compose.yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
```

Test runner config used for endpoint coverage:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** `/api/projects/{name}/abort` is intentionally retained for backward compatibility; new clients should prefer `/api/projects/{name}/{provider}/{model}/abort`.


---

Source: api-admin.md

# Admin Endpoints

`docsfy` provides admin-only APIs for:

- user lifecycle management
- project access grants/revocations
- API key rotation (user keys via API, `ADMIN_KEY` via config)

Core route implementations live in `src/docsfy/main.py`, with persistence and validation in `src/docsfy/storage.py`.

## Authentication and Required Configuration

Admin routes require `request.state.is_admin`. Middleware sets this when auth is one of:

- `Authorization: Bearer <ADMIN_KEY>`
- `Authorization: Bearer <user-api-key>` where the DB user role is `admin`
- a valid admin `docsfy_session` cookie

> **Note:** Unauthenticated `/api/*` calls return `401` with `{"detail":"Unauthorized"}`; authenticated non-admin calls to admin routes return `403` with `{"detail":"Admin access required"}`.

Environment configuration from `.env.example`:

```bash
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Container runtime wiring from `docker-compose.yaml`:

```yaml
services:
  docsfy:
    env_file: .env
    volumes:
      - ./data:/data
```

## Endpoint Index

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/admin` | Admin UI page (HTML) |
| `POST` | `/api/admin/users` | Create user (returns generated API key once) |
| `GET` | `/api/admin/users` | List users |
| `DELETE` | `/api/admin/users/{username}` | Delete user |
| `POST` | `/api/admin/projects/{name}/access` | Grant project access |
| `GET` | `/api/admin/projects/{name}/access` | List project access |
| `DELETE` | `/api/admin/projects/{name}/access/{username}` | Revoke project access |
| `POST` | `/api/admin/users/{username}/rotate-key` | Admin rotates a user key |
| `POST` | `/api/me/rotate-key` | Logged-in user rotates own key |

## User CRUD

### Create User: `POST /api/admin/users`

Request JSON:
- `username` (required)
- `role` (optional, defaults to `user`; allowed: `admin`, `user`, `viewer`)

Actual request code from `src/docsfy/templates/admin.html`:

```javascript
const resp = await fetch("/api/admin/users", {
    method: "POST",
    headers: {"Content-Type": "application/json"},
    credentials: "same-origin",
    redirect: "error",
    body: JSON.stringify({username: username, role: role})
});
```

Actual success response from `src/docsfy/main.py`:

```python
return JSONResponse(
    content={"username": username, "api_key": raw_key, "role": role},
    headers={"Cache-Control": "no-store"},
)
```

Validation behavior:
- username `admin` is reserved (case-insensitive)
- username regex: `^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$`
- invalid role -> `400`
- missing username -> `400`
- DB insert failures (for example duplicate username) -> `400`

### List Users: `GET /api/admin/users`

Returns:
- `{"users": [...]}`

Each row is selected as:
- `id`, `username`, `role`, `created_at`

`api_key_hash` is not returned.

### Delete User: `DELETE /api/admin/users/{username}`

Actual request code from `src/docsfy/templates/admin.html`:

```javascript
const resp = await fetch("/api/admin/users/" + encodeURIComponent(username), {
    method: "DELETE",
    credentials: "same-origin",
    redirect: "error",
});
```

Success response:
- `{"deleted":"<username>"}`

Guardrails and side effects:
- admin cannot delete their own account (`400`)
- storage cleanup deletes that user’s sessions, owned projects (DB rows), and ACL entries where they are owner or grantee

> **Note:** User management supports create/list/delete. There is no dedicated endpoint for username rename or role update in place.

## Access Grant/Revoke/List

Access is owner-scoped: grants are keyed by `project_name + project_owner + username`, so grants apply to all variants for that project name under that owner.

### Grant Access: `POST /api/admin/projects/{name}/access`

Request JSON:
- `username` (required)
- `owner` (required)

Route behavior:
- verifies user exists
- verifies project exists for that owner (`list_variants(name, owner=owner)`)
- inserts grant via `grant_project_access(...)`

Example from `test-plans/e2e-ui-test-plan.md`:

```javascript
fetch('/api/admin/projects/for-testing-only/access', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify({username: 'testviewer-e2e', owner: 'testuser-e2e'}) }).then(r => r.json()).then(d => JSON.stringify(d))
```

Success response shape:
- `{"granted":"<name>","username":"<username>","owner":"<owner>"}`

### List Access: `GET /api/admin/projects/{name}/access?owner=<owner>`

Example from `test-plans/e2e-ui-test-plan.md`:

```javascript
fetch('/api/admin/projects/for-testing-only/access?owner=testuser-e2e', {credentials:'same-origin'}).then(r => r.json())
```

Success response shape:
- `{"project":"<name>","owner":"<owner>","users":[...]}`

### Revoke Access: `DELETE /api/admin/projects/{name}/access/{username}?owner=<owner>`

Example from `test-plans/e2e-ui-test-plan.md`:

```javascript
fetch('/api/admin/projects/for-testing-only/access/testviewer-e2e?owner=testuser-e2e', {method:'DELETE', credentials:'same-origin'}).then(r => r.status)
```

Success response shape:
- `{"revoked":"<name>","username":"<username>"}`

> **Tip:** Always pass `owner` on revoke/list requests. The route reads owner from query params and applies owner-scoped ACL operations.

## Key Rotation Operations

### Rotate Own Key: `POST /api/me/rotate-key`

Available to authenticated DB users (`admin`, `user`, `viewer`).

Request JSON:
- optional `new_key`
- if omitted, server generates a new key
- if provided, minimum length is 16

Actual dashboard request from `src/docsfy/templates/dashboard.html`:

```javascript
var resp = await fetch('/api/me/rotate-key', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    credentials: 'same-origin',
    body: JSON.stringify(body),
});
```

Behavior:
- returns `{"username":"<username>","new_api_key":"<key>"}` with `Cache-Control: no-store`
- invalidates that user’s sessions
- deletes current `docsfy_session` cookie (forces re-login)

`ADMIN_KEY` super-admin sessions are explicitly rejected:

```python
if request.state.is_admin and not request.state.user:
    raise HTTPException(
        status_code=400,
        detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
    )
```

### Admin Rotate User Key: `POST /api/admin/users/{username}/rotate-key`

Admin-only endpoint to rotate another user’s key.

Request JSON:
- optional `new_key` (same min length rule)

Actual admin panel request from `src/docsfy/templates/admin.html`:

```javascript
fetch('/api/admin/users/' + encodeURIComponent(username) + '/rotate-key', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    credentials: 'same-origin',
    redirect: 'error',
    body: JSON.stringify(body),
})
```

Behavior:
- success: `{"username":"<username>","new_api_key":"<key>"}` plus `Cache-Control: no-store`
- unknown user: `404`
- invalid custom key: `400`
- all sessions for the target user are invalidated by storage logic

### Rotating `ADMIN_KEY` Itself (Config Operation)

There is no API endpoint for rotating `ADMIN_KEY`; this is done in environment config and service restart.

Startup guard from `src/docsfy/main.py`:

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

HMAC linkage in `src/docsfy/storage.py`:

```python
# NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
# invalidate all existing api_key_hash values, requiring all users to
# regenerate their API keys.
secret = hmac_secret or os.getenv("ADMIN_KEY", "")
```

> **Warning:** Rotating `ADMIN_KEY` invalidates all existing DB user API keys. After restart, log in as `admin` with the new key and re-issue user keys (for example via `POST /api/admin/users/{username}/rotate-key`).

## Verification Notes

This repository currently has no `.github/workflows` directory. Test automation entry point is `tox.toml`:

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

Relevant endpoint coverage is present in:
- `tests/test_auth.py` (reserved username, self-delete guard, key rotation behavior)
- `tests/test_storage.py` (ACL grant/revoke/list and cleanup behavior)
- `test-plans/e2e-ui-test-plan.md` (end-to-end admin/access API usage examples)


---

Source: api-doc-serving.md

# Documentation Serving Routes

`docsfy` serves generated documentation files through two authenticated `/docs` route patterns:

| Route pattern | Purpose | Variant selection |
|---|---|---|
| `/docs/{project}/{provider}/{model}/{path}` | Serve a specific provider/model variant | Explicit (`provider` + `model`) |
| `/docs/{project}/{path}` | Serve the most recently generated **ready** variant | Automatic (`last_generated DESC`, ready-only) |

> **Warning:** Route declaration order matters. The variant-specific route must be registered before the generic `/docs/{project}/{path}` route, or variant URLs can be matched by the generic handler.

```1377:1435:src/docsfy/main.py
# IMPORTANT: variant-specific route MUST be defined BEFORE the generic route
# so FastAPI matches it first.
@app.get("/docs/{project}/{provider}/{model}/{path:path}")
async def serve_variant_docs(
    request: Request,
    project: str,
    provider: str,
    model: str,
    path: str = "index.html",
) -> FileResponse:
    if not path or path == "/":
        path = "index.html"
    project = _validate_project_name(project)
    proj = await _resolve_project(
        request, project, ai_provider=provider, ai_model=model
    )

    proj_owner = str(proj.get("owner", ""))
    site_dir = get_project_site_dir(project, provider, model, proj_owner)
    file_path = site_dir / path
    try:
        file_path.resolve().relative_to(site_dir.resolve())
    except ValueError as exc:
        raise HTTPException(status_code=403, detail="Access denied") from exc
    if not file_path.exists() or not file_path.is_file():
        raise HTTPException(status_code=404, detail="File not found")
    return FileResponse(file_path)


@app.get("/docs/{project}/{path:path}")
async def serve_docs(
    request: Request, project: str, path: str = "index.html"
) -> FileResponse:
    """Serve the most recently generated variant."""
    if not path or path == "/":
        path = "index.html"
    project = _validate_project_name(project)
    if request.state.is_admin:
        latest = await get_latest_variant(project)
    else:
        latest = await get_latest_variant(project, owner=request.state.username)
    if not latest:
        raise HTTPException(status_code=404, detail="No docs available")
    await _check_ownership(request, project, latest)
    latest_owner = str(latest.get("owner", ""))
    site_dir = get_project_site_dir(
        project,
        str(latest["ai_provider"]),
        str(latest["ai_model"]),
        latest_owner,
    )
    file_path = site_dir / path
    try:
        file_path.resolve().relative_to(site_dir.resolve())
    except ValueError as exc:
        raise HTTPException(status_code=403, detail="Access denied") from exc
    if not file_path.exists() or not file_path.is_file():
        raise HTTPException(status_code=404, detail="File not found")
    return FileResponse(file_path)
```

## Variant-Specific Serving (`/docs/{project}/{provider}/{model}/{path}`)

This route serves files from an explicit variant directory.

- Normalizes empty path or `/` to `index.html`.
- Resolves variant with `_resolve_project(...)`.
- Builds site directory with owner scoping (`get_project_site_dir(...)`).
- Blocks path traversal with `resolve().relative_to(...)`.
- Returns `404 File not found` if the file is missing.

Variant resolution behavior:

```210:261:src/docsfy/main.py
async def _resolve_project(
    request: Request,
    name: str,
    ai_provider: str,
    ai_model: str,
) -> dict[str, Any]:
    """Find a project variant, preferring the requesting user's owned copy.

    Raises 404 if not found or not accessible.
    """
    # 1. Try owned by requesting user
    if not request.state.is_admin:
        proj = await get_project(
            name,
            ai_provider=ai_provider,
            ai_model=ai_model,
            owner=request.state.username,
        )
        if proj:
            return proj

    # 2. For admin, disambiguate by owner
    if request.state.is_admin:
        all_variants = await list_variants(name)
        matching = [
            v
            for v in all_variants
            if v.get("ai_provider") == ai_provider and v.get("ai_model") == ai_model
        ]
        if not matching:
            raise HTTPException(status_code=404, detail="Not found")
        distinct_owners = {str(v.get("owner", "")) for v in matching}
        if len(distinct_owners) > 1:
            raise HTTPException(
                status_code=409,
                detail="Multiple owners found for this variant, please specify owner",
            )
        return matching[0]

    # 3. For non-admin, check granted access — find which owner granted access
    accessible = await get_user_accessible_projects(request.state.username)
    for proj_name, proj_owner in accessible:
        if proj_name == name and proj_owner:
            # Found a grant — look up this specific owner's variant
            proj = await get_project(
                name, ai_provider=ai_provider, ai_model=ai_model, owner=proj_owner
            )
            if proj:
                return proj

    # 4. Not found
    raise HTTPException(status_code=404, detail="Not found")
```

> **Note:** Variant-specific serving can resolve owned variants and access-granted variants for non-admin users.

## Latest-Ready Serving (`/docs/{project}/{path}`)

This route automatically picks the latest **ready** variant.

- Uses `get_latest_variant(...)`.
- Only considers `status='ready'`.
- Orders by `last_generated DESC`.
- Sets `last_generated` only when status becomes `ready`.

```295:330:src/docsfy/storage.py
async def update_project_status(
    name: str,
    ai_provider: str,
    ai_model: str,
    status: str,
    owner: str | None = None,
    last_commit_sha: str | None = None,
    page_count: int | None = None,
    error_message: str | None = None,
    plan_json: str | None = None,
    current_stage: str | None | object = _UNSET,
) -> None:
    ...
    if status == "ready":
        fields.append("last_generated = CURRENT_TIMESTAMP")
    ...
```

```552:569:src/docsfy/storage.py
async def get_latest_variant(
    name: str, owner: str | None = None
) -> dict[str, str | int | None] | None:
    """Get the most recently generated ready variant for a repo."""
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        if owner is not None:
            cursor = await db.execute(
                "SELECT * FROM projects WHERE name = ? AND owner = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1",
                (name, owner),
            )
        else:
            cursor = await db.execute(
                "SELECT * FROM projects WHERE name = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1",
                (name,),
            )
        row = await cursor.fetchone()
        return dict(row) if row else None
```

Ordering is explicitly tested:

```378:392:tests/test_storage.py
# Manually set last_generated to ensure deterministic ordering
# (CURRENT_TIMESTAMP may resolve to the same second for both rows)
async with aiosqlite.connect(DB_PATH) as db:
    await db.execute(
        "UPDATE projects SET last_generated = '2025-01-01 00:00:00' WHERE ai_provider = 'claude'"
    )
    await db.execute(
        "UPDATE projects SET last_generated = '2025-01-02 00:00:00' WHERE ai_provider = 'gemini'"
    )
    await db.commit()

latest = await get_latest_variant("repo")
assert latest is not None
# gemini has a later last_generated timestamp
assert latest["ai_provider"] == "gemini"
```

> **Warning:** For non-admin users, latest-route selection is owner-scoped (`owner=request.state.username`). If you rely on shared/access-granted projects, use the variant-specific `/docs/{project}/{provider}/{model}/...` route.

## What Files Are Served Under `/docs`

The serving routes can return any generated file in the variant site directory, including HTML pages, markdown sources, search index JSON, LLM text files, and static assets.

```215:233:src/docsfy/renderer.py
def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None:
    if output_dir.exists():
        shutil.rmtree(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    assets_dir = output_dir / "assets"
    assets_dir.mkdir(exist_ok=True)

    # Prevent GitHub Pages from running Jekyll
    (output_dir / ".nojekyll").touch()

    ...
    if STATIC_DIR.exists():
        for static_file in STATIC_DIR.iterdir():
            if static_file.is_file():
                shutil.copy2(static_file, assets_dir / static_file.name)
```

```243:290:src/docsfy/renderer.py
index_html = render_index(project_name, tagline, navigation, repo_url=repo_url)
(output_dir / "index.html").write_text(index_html, encoding="utf-8")

...
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")

search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)

# Generate llms.txt files
llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")

llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

Templates link these files with relative paths, which are compatible with `/docs/...` static serving:

```8:10:src/docsfy/templates/page.html
<link rel="stylesheet" href="assets/style.css">
<link rel="alternate" type="text/plain" title="LLM Documentation Index" href="llms.txt">
<link rel="alternate" type="text/plain" title="LLM Full Documentation" href="llms-full.txt">
```

## Authentication and Safety

Docs routes are protected by the same auth middleware as the rest of the app.

- `/login` and `/health` are public.
- Non-authenticated browser requests are redirected to `/login`.
- Non-authenticated API requests return `401`.

```108:115:src/docsfy/main.py
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

async def dispatch(
    self, request: Request, call_next: RequestResponseEndpoint
) -> Response:
    if request.url.path in self._PUBLIC_PATHS:
        return await call_next(request)
```

```151:155:src/docsfy/main.py
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)
```

Project and filesystem path safety checks:

```73:77:src/docsfy/main.py
def _validate_project_name(name: str) -> str:
    """Validate project name to prevent path traversal."""
    if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name):
        raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'")
    return name
```

```1396:1402:src/docsfy/main.py
file_path = site_dir / path
try:
    file_path.resolve().relative_to(site_dir.resolve())
except ValueError as exc:
    raise HTTPException(status_code=403, detail="Access denied") from exc
if not file_path.exists() or not file_path.is_file():
    raise HTTPException(status_code=404, detail="File not found")
```

## URL Construction for Provider/Model

Provider/model values should be URL-encoded in links. The UI templates already do this.

```1483:1484:src/docsfy/templates/dashboard.html
<a href="/docs/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/" target="_blank" class="btn btn-primary btn-sm">View Docs</a>
<a href="/api/projects/{{ repo_name }}/{{ variant.ai_provider | urlencode }}/{{ variant.ai_model | urlencode }}/download" class="btn btn-secondary btn-sm">Download</a>
```

```1188:1190:src/docsfy/templates/status.html
var viewBtn = document.createElement('a');
viewBtn.href = '/docs/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL) + '/';
viewBtn.target = '_blank';
```

```16:22:src/docsfy/config.py
admin_key: str = ""  # Required — validated at startup
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True  # Set to False for local HTTP dev
```

> **Tip:** Keep route construction encoded, especially for model names containing characters like `[` and `]` (for example `claude-opus-4-6[1m]`).

## Verified Behavior and Ops Configuration

Integration tests cover both serving paths:

```124:136:tests/test_integration.py
# Check docs are served via variant-specific route
response = await client.get("/docs/test-repo/claude/opus/index.html")
assert response.status_code == 200
assert "test-repo" in response.text

response = await client.get("/docs/test-repo/claude/opus/introduction.html")
assert response.status_code == 200
assert "Welcome!" in response.text

# Check docs are served via latest-variant route
response = await client.get("/docs/test-repo/index.html")
assert response.status_code == 200
assert "test-repo" in response.text
```

Deployment and test pipeline snippets relevant to `/docs` serving:

```1:13:docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** No repository-level GitHub/GitLab/Jenkins workflow files are present; automated validation in this repo is defined via local/CI-friendly tooling (`tox`, `pytest`, `pre-commit`) plus container health checks.


---

Source: api-health-status.md

# Health and Status Endpoints

Use these two endpoints for runtime health checks and UI state refresh:

- `GET /health`: service liveness check
- `GET /api/status`: authenticated project status feed for dashboard polling

## `GET /health`

`/health` is a public endpoint and returns a minimal JSON payload.

From `src/docsfy/main.py`:

```python
@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}
```

From `tests/test_auth.py`:

```python
async def test_health_is_public(unauthed_client: AsyncClient) -> None:
    """The /health endpoint should be accessible without authentication."""
    response = await unauthed_client.get("/health")
    assert response.status_code == 200
    assert response.json()["status"] == "ok"
```

From `src/docsfy/main.py` (auth middleware public paths):

```python
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
```

> **Note:** `/health` is intentionally lightweight and does not require login, Bearer token, or session cookie.

### Service-check configuration in this repository

From `Dockerfile`:

```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1
```

From `docker-compose.yaml`:

```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
```

From `src/docsfy/main.py` (startup requirement):

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Warning:** `/health` only confirms the app process/router is responding. It does not validate DB contents, generation state, or AI CLI availability.

---

## `GET /api/status`

`/api/status` powers dashboard updates. It is authenticated and returns both project rows and model metadata.

From `src/docsfy/main.py`:

```python
@app.get("/api/status")
async def status(request: Request) -> dict[str, Any]:
    if request.state.is_admin:
        projects = await list_projects()
    else:
        accessible = await get_user_accessible_projects(request.state.username)
        projects = await list_projects(
            owner=request.state.username, accessible=accessible
        )
    known_models = await get_known_models()
    return {"projects": projects, "known_models": known_models}
```

### Authentication and access behavior

From `src/docsfy/main.py` (API auth failure path):

```python
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
```

From `tests/test_auth.py`:

```python
response = await unauthed_client.get("/api/status")
assert response.status_code == 401
assert response.json()["detail"] == "Unauthorized"
```

From `src/docsfy/storage.py` (non-admin filtering logic):

```python
if owner is not None and accessible and len(accessible) > 0:
    # Build OR conditions for each (name, owner) pair
    conditions = ["(owner = ?)"]
    params: list[str] = [owner]
    for proj_name, proj_owner in accessible:
        conditions.append("(name = ? AND owner = ?)")
        params.extend([proj_name, proj_owner])
    query = f"SELECT * FROM projects WHERE {' OR '.join(conditions)} ORDER BY updated_at DESC"
```

From `tests/test_auth.py` (owner filtering is enforced):

```python
response = await ac.get("/api/status")
assert response.status_code == 200
projects = response.json()["projects"]
assert len(projects) == 1
assert projects[0]["name"] == "alice-proj"
```

From `tests/test_auth.py` (granted viewer access is included):

```python
response = await ac.get("/api/status")
assert response.status_code == 200
projects = response.json()["projects"]
project_names = [p["name"] for p in projects]
assert "assigned-proj" in project_names
```

> **Warning:** `/api/status` is not a public health endpoint; unauthenticated calls return `401 {"detail":"Unauthorized"}`.

### Response structure

`/api/status` returns:

- `projects`: list of project variant rows (`SELECT * FROM projects`, ordered by `updated_at DESC`)
- `known_models`: provider->models map derived from completed variants

From `src/docsfy/storage.py` (project schema):

```sql
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    repo_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'generating',
    current_stage TEXT,
    last_commit_sha TEXT,
    last_generated TEXT,
    page_count INTEGER DEFAULT 0,
    error_message TEXT,
    plan_json TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

From `src/docsfy/storage.py` (valid `status` values):

```python
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

From `src/docsfy/storage.py` (`known_models` population):

```python
cursor = await db.execute(
    "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model"
)
```

From `tests/test_main.py` (empty state behavior):

```python
response = await client.get("/api/status")
assert response.status_code == 200
assert response.json()["projects"] == []
```

---

## Dashboard Polling Contract (`/api/status`)

The dashboard uses `/api/status` as a polling source for both coarse status refresh and fast progress updates.

From `src/docsfy/templates/dashboard.html` (poll intervals):

```javascript
statusPollInterval = setInterval(pollStatusChanges, 10000);
progressPollInterval = setInterval(pollProgressUpdates, 5000);
```

From `src/docsfy/templates/dashboard.html` (status poll request + payload handling):

```javascript
fetch('/api/status', { credentials: 'same-origin', redirect: 'manual' })
    .then(function(res) {
        if (checkAuthRedirect(res)) return null;
        if (res.type === 'opaqueredirect') {
            checkAuthRedirect({ redirected: true, status: 302 });
            return null;
        }
        return res.json();
    })
    .then(function(data) {
        if (!data) return;
        var projectsList = data.projects || data;
        if (!Array.isArray(projectsList)) return;

        // Update known models from the API so new models
        // appear in dropdowns without a full page reload.
        if (data.known_models) {
            knownModels = data.known_models;
            rebuildModelDropdownOptions();
        }
```

From `src/docsfy/templates/dashboard.html` (progress calculations use `page_count` + `plan_json`):

```javascript
var pageCount = proj.page_count || 0;
var totalPages = 0;
var parsedPlan = null;
if (proj.plan_json) {
    if (typeof proj.plan_json === 'string') {
        try { parsedPlan = JSON.parse(proj.plan_json); } catch(e) { parsedPlan = null; }
    } else {
        parsedPlan = proj.plan_json;
    }
}
if (parsedPlan && parsedPlan.navigation) {
    parsedPlan.navigation.forEach(function(group) {
        totalPages += (group.pages || []).length;
    });
}
```

> **Tip:** For local HTTP development, disable secure cookies so browser polling can send the session cookie.

From `.env.example`:

```env
# Set to false for local HTTP development
# SECURE_COOKIES=false
```

From `src/docsfy/config.py`:

```python
secure_cookies: bool = True  # Set to False for local HTTP dev
```


---

Source: deployment-topologies.md

# Deployment Topologies

`docsfy` supports three practical deployment modes with the same core runtime behavior:

- **Local process** (single host, direct Python runtime)
- **Containerized** (Docker image + Compose orchestration)
- **OpenShift-style non-root runtime** (arbitrary UID, root-group writable paths)

## Shared Runtime Contract

Regardless of topology, startup and storage behavior are consistent.

```python
# src/docsfy/config.py
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```python
# src/docsfy/main.py
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)

    _generating.clear()
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

```python
# src/docsfy/storage.py
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"

# ...
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
PROJECTS_DIR.mkdir(parents=True, exist_ok=True)
```

> **Warning:** `ADMIN_KEY` is mandatory and must be at least 16 characters. The app exits at startup if it is missing or too short.

> **Warning:** The process must be able to write to `DATA_DIR` (default `/data`) to create `docsfy.db` and project artifacts.

---

## Topology 1: Local Process Deployment

Use this mode for development, single-user setups, or tightly controlled internal hosts.

### Runtime entry point

```toml
# pyproject.toml
[project.scripts]
docsfy = "docsfy.main:run"
```

```python
# src/docsfy/main.py
def run() -> None:
    import uvicorn

    reload = os.getenv("DEBUG", "").lower() == "true"
    host = os.getenv("HOST", "127.0.0.1")
    port = int(os.getenv("PORT", "8000"))
    uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload)
```

### Configuration pattern

```bash
# .env.example
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

LOG_LEVEL=INFO

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

### Local deployment notes

- Default bind is `127.0.0.1:8000`; set `HOST=0.0.0.0` only when you intentionally expose it.
- `SECURE_COOKIES` defaults to `true`; for plain HTTP local testing, set `SECURE_COOKIES=false`.
- Persistent state is filesystem-based (`DATA_DIR`, SQLite file, generated project/site outputs).
- Generation checks AI CLI availability before work starts (`check_ai_cli_available` in `src/docsfy/main.py`), so provider CLIs must be installed and on `PATH` in local installs.

> **Tip:** Keep local data isolated by setting `DATA_DIR` to a project-local folder during development.

---

## Topology 2: Containerized Deployment (Docker / Compose)

Use this mode for reproducible packaging and host portability.

### Image characteristics

```dockerfile
# Dockerfile
FROM python:3.12-slim AS builder
# ...
RUN uv sync --frozen --no-dev

FROM python:3.12-slim
# ...
RUN apt-get update && apt-get install -y --no-install-recommends \
    bash \
    git \
    curl \
    nodejs \
    npm \
    && rm -rf /var/lib/apt/lists/*
```

```dockerfile
# Dockerfile
# Install Claude Code CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash"

# Install Cursor Agent CLI (installs to ~/.local/bin)
RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash"

# Configure npm for non-root global installs and install Gemini CLI
RUN mkdir -p /home/appuser/.npm-global \
    && npm config set prefix '/home/appuser/.npm-global' \
    && npm install -g @google/gemini-cli
```

```dockerfile
# Dockerfile
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Compose topology in repo

```yaml
# docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

### Container deployment notes

- The container always serves on `0.0.0.0:8000` via `ENTRYPOINT`.
- `/data` is the persistence boundary and should be mounted to durable storage.
- Health is probe-ready through `/health` and both image-level and Compose-level health checks are defined.
- Runtime includes AI CLIs plus `git`, matching generation dependencies.

> **Note:** `docker-compose.yaml` already maps `./data` to `/data`, which aligns with the default `data_dir` in app settings.

---

## Topology 3: OpenShift-Style Non-Root Runtime

This image explicitly encodes compatibility with restricted/container-security platforms that run with an arbitrary non-root UID.

```dockerfile
# Dockerfile
# OpenShift runs containers as a random UID in the root group (GID 0)
RUN useradd --create-home --shell /bin/bash -g 0 appuser \
    && mkdir -p /data \
    && chown appuser:0 /data \
    && chmod -R g+w /data
```

```dockerfile
# Dockerfile
# Make /app group-writable for OpenShift compatibility
RUN chmod -R g+w /app

# Directories need group write+execute for OpenShift's arbitrary UID (in GID 0)
RUN find /home/appuser -type d -exec chmod g=u {} + \
    && npm cache clean --force 2>/dev/null; \
    rm -rf /home/appuser/.npm/_cacache

USER appuser
ENV PATH="/home/appuser/.local/bin:/home/appuser/.npm-global/bin:${PATH}"
ENV HOME="/home/appuser"
```

```dockerfile
# Dockerfile
# --no-sync prevents uv from attempting to modify the venv at runtime.
# This is required for OpenShift where containers run as an arbitrary UID
ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Why these settings matter

- **Arbitrary UID support:** group-writable paths (`/app`, `/data`, home dirs) allow runtime writes without root.
- **No passwd-entry dependency:** `HOME=/home/appuser` ensures tools can resolve user-home paths even with random UID.
- **Read-only venv safety:** `uv run --no-sync` prevents runtime attempts to mutate `.venv`, which can fail under restricted permissions.
- **Non-root execution:** final runtime user is `appuser`, not root.

> **Warning:** Do not remove `--no-sync` from the container startup command in restricted non-root environments; runtime package sync/write attempts can fail.

> **Warning:** Any mounted volume used for `/data` must permit group write compatible with GID `0` behavior.

> **Note:** This repository does not include Kubernetes/OpenShift manifest files. Platform manifests should preserve the image contract above (non-root, writable `/data`, unchanged startup semantics).

---

## Health, Auth, and Probe Behavior

```python
# src/docsfy/main.py
class AuthMiddleware(BaseHTTPMiddleware):
    _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
```

```python
# src/docsfy/main.py
@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}
```

- `/health` is intentionally unauthenticated and suitable for liveness/readiness checks.
- Most other routes are auth-protected by middleware.
- The code explicitly expects edge-level protections (for example, login rate limiting) to be handled by reverse proxy/ingress when needed.

---

## CI/CD and Verification Inputs in This Repo

No GitHub Actions or GitLab CI pipeline files are present in the repository root structure. Validation is defined through local/pipeline-friendly config:

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```yaml
# .pre-commit-config.yaml
ci:
  autofix_prs: false
  autoupdate_commit_msg: "ci: [pre-commit.ci] pre-commit autoupdate"

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
  - repo: https://github.com/PyCQA/flake8
  - repo: https://github.com/Yelp/detect-secrets
  - repo: https://github.com/astral-sh/ruff-pre-commit
  - repo: https://github.com/gitleaks/gitleaks
  - repo: https://github.com/pre-commit/mirrors-mypy
```

> **Tip:** In external CI/CD systems, treat `tox` + pre-commit hooks as the minimum gate before publishing container images.


---

Source: database-schema-and-migrations.md

# Database Schema and Migrations

Docsfy uses SQLite for all metadata, auth, ACL, and session state. The schema is managed in application code (not external migration files), and migrations run automatically at startup.

## Database location and startup flow

```python
# src/docsfy/storage.py
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
```

```python
# src/docsfy/config.py
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```python
# src/docsfy/main.py
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    if not settings.admin_key:
        logger.error("ADMIN_KEY environment variable is required")
        raise SystemExit(1)

    if len(settings.admin_key) < 16:
        logger.error("ADMIN_KEY must be at least 16 characters long")
        raise SystemExit(1)

    _generating.clear()
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

```yaml
# docker-compose.yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

> **Tip:** Persist `/data` (or your custom `DATA_DIR`) across restarts. If storage is ephemeral, your DB and generated project metadata are lost.

---

## Schema overview

Docsfy creates and maintains four tables in `init_db()`:

- `projects`
- `users`
- `project_access`
- `sessions`

### `projects`

Stores generated documentation variants per project/owner/provider/model combination.

```python
# src/docsfy/storage.py
await db.execute("""
    CREATE TABLE IF NOT EXISTS projects (
        name TEXT NOT NULL,
        ai_provider TEXT NOT NULL DEFAULT '',
        ai_model TEXT NOT NULL DEFAULT '',
        owner TEXT NOT NULL DEFAULT '',
        repo_url TEXT NOT NULL,
        status TEXT NOT NULL DEFAULT 'generating',
        current_stage TEXT,
        last_commit_sha TEXT,
        last_generated TEXT,
        page_count INTEGER DEFAULT 0,
        error_message TEXT,
        plan_json TEXT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (name, ai_provider, ai_model, owner)
    )
""")
```

`status` is constrained in code (not DB enum):

```python
# src/docsfy/storage.py
VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"})
```

Writes are idempotent via upsert on the composite PK:

```python
# src/docsfy/storage.py
await db.execute(
    """INSERT INTO projects (name, ai_provider, ai_model, owner, repo_url, status, updated_at)
       VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
       ON CONFLICT(name, ai_provider, ai_model, owner) DO UPDATE SET
       repo_url = excluded.repo_url,
       status = excluded.status,
       error_message = NULL,
       current_stage = NULL,
       updated_at = CURRENT_TIMESTAMP""",
    (name, ai_provider, ai_model, owner, repo_url, status),
)
```

### `users`

Stores API-key-authenticated users and role-based access.

```python
# src/docsfy/storage.py
await db.execute("""
    CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        username TEXT UNIQUE NOT NULL,
        api_key_hash TEXT NOT NULL UNIQUE,
        role TEXT NOT NULL DEFAULT 'user',
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")
```

API keys are not stored in plaintext; they are HMAC-hashed with `ADMIN_KEY`:

```python
# src/docsfy/storage.py
def hash_api_key(key: str, hmac_secret: str = "") -> str:
    """Hash an API key with HMAC-SHA256 for storage.

    Uses ADMIN_KEY as the HMAC secret so that even if the source is read,
    keys cannot be cracked without the environment secret.
    """
    # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
    # invalidate all existing api_key_hash values, requiring all users to
    # regenerate their API keys.
    secret = hmac_secret or os.getenv("ADMIN_KEY", "")
    if not secret:
        msg = "ADMIN_KEY environment variable is required for key hashing"
        raise RuntimeError(msg)
    return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()
```

> **Note:** Username `admin` is reserved in the DB user model; environment-admin auth is handled separately via `ADMIN_KEY`.

### `project_access`

Stores explicit ACL grants for sharing projects between users.

```python
# src/docsfy/storage.py
await db.execute("""
    CREATE TABLE IF NOT EXISTS project_access (
        project_name TEXT NOT NULL,
        project_owner TEXT NOT NULL DEFAULT '',
        username TEXT NOT NULL,
        PRIMARY KEY (project_name, project_owner, username)
    )
""")
```

Grant semantics are project-level for a given owner (all variants under that name/owner):

```python
# src/docsfy/storage.py
async def grant_project_access(
    project_name: str, username: str, project_owner: str = ""
) -> None:
    """Grant a user access to all variants of a project."""
    if not project_owner:
        msg = "project_owner is required for access grants"
        raise ValueError(msg)
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT OR IGNORE INTO project_access (project_name, project_owner, username) VALUES (?, ?, ?)",
            (project_name, project_owner, username),
        )
        await db.commit()
```

### `sessions`

Stores browser session state (token, user, role flag, expiration).

```python
# src/docsfy/storage.py
await db.execute("""
    CREATE TABLE IF NOT EXISTS sessions (
        token TEXT PRIMARY KEY,
        username TEXT NOT NULL,
        is_admin INTEGER NOT NULL DEFAULT 0,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        expires_at TIMESTAMP NOT NULL
    )
""")
```

Session tokens are opaque to clients and stored hashed in DB:

```python
# src/docsfy/storage.py
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600

def _hash_session_token(token: str) -> str:
    """Hash a session token for storage."""
    return hashlib.sha256(token.encode()).hexdigest()

async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )
        await db.commit()
    return token

async def get_session(token: str) -> dict[str, str | int | None] | None:
    """Look up a session. Returns None if expired or not found."""
    token_hash = _hash_session_token(token)
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        cursor = await db.execute(
            "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')",
            (token_hash,),
        )
        row = await cursor.fetchone()
        return dict(row) if row else None
```

And cookie max-age is aligned with session TTL:

```python
# src/docsfy/main.py
response.set_cookie(
    "docsfy_session",
    session_token,
    httponly=True,
    samesite="strict",
    secure=settings.secure_cookies,
    max_age=SESSION_TTL_SECONDS,
)
```

---

## Built-in migration behavior

Docsfy uses in-code, idempotent migrations inside `init_db()`.

> **Note:** There is no migration version table and no separate migration runner. Startup is the migration trigger.

### 1) `projects` PK migration (legacy 3-column to 4-column owner-aware PK)

Detection and migration are automatic:

```python
# src/docsfy/storage.py
# Migration: convert old 3-column PK table to 4-column PK (with owner)
cursor = await db.execute("PRAGMA table_info(projects)")
columns = await cursor.fetchall()
col_names = [c[1] for c in columns]

needs_pk_migration = False

# Detect old schema: owner not in columns, or owner is nullable
if "owner" not in col_names:
    needs_pk_migration = True
elif "ai_provider" not in col_names:
    needs_pk_migration = True
else:
    # Check if ai_provider is nullable (old schema)
    for col in columns:
        if col[1] == "ai_provider" and col[3] == 0:  # notnull=0 means nullable
            needs_pk_migration = True
            break
```

```python
# src/docsfy/storage.py
await db.execute(f"""
    INSERT OR IGNORE INTO projects_new
        (name, ai_provider, ai_model, owner, repo_url, status, current_stage,
         last_commit_sha, last_generated, page_count, error_message,
         plan_json, created_at, updated_at)
    SELECT {", ".join(select_cols)}
    FROM projects
""")
await db.execute("DROP TABLE projects")
await db.execute("ALTER TABLE projects_new RENAME TO projects")
```

> **Warning:** This migration rewrites the table and drops the original after copy. Back up `docsfy.db` before major upgrades.

> **Warning:** Data copy uses `INSERT OR IGNORE`; if legacy rows collide under the new composite key, ignored rows will not be migrated.

### 2) `users.role` backfill migration

```python
# src/docsfy/storage.py
# Migration: add role column for existing DBs
try:
    await db.execute(
        "ALTER TABLE users ADD COLUMN role TEXT NOT NULL DEFAULT 'user'"
    )
except sqlite3.OperationalError as exc:
    if "duplicate column name" not in str(exc).lower():
        logger.exception("Migration failed while adding column")
        raise
```

### 3) `users.api_key_hash` uniqueness migration

```python
# src/docsfy/storage.py
cursor = await db.execute("PRAGMA index_list(users)")
indexes = await cursor.fetchall()
has_unique_key_index = False
for idx in indexes:
    if idx[2]:  # unique=1
        cursor2 = await db.execute(f"PRAGMA index_info({idx[1]})")
        idx_cols = await cursor2.fetchall()
        for ic in idx_cols:
            if ic[2] == "api_key_hash":
                has_unique_key_index = True
                break
    if has_unique_key_index:
        break

if not has_unique_key_index:
    try:
        await db.execute(
            "CREATE UNIQUE INDEX IF NOT EXISTS idx_users_api_key_hash ON users (api_key_hash)"
        )
    except sqlite3.OperationalError as exc:
        if "unique" not in str(exc).lower():
            logger.exception("Migration failed while adding unique index")
            raise
```

### 4) `project_access.project_owner` backfill migration

```python
# src/docsfy/storage.py
# Migration: add project_owner column to project_access
try:
    await db.execute(
        "ALTER TABLE project_access ADD COLUMN project_owner TEXT NOT NULL DEFAULT ''"
    )
except sqlite3.OperationalError as exc:
    if "duplicate column name" not in str(exc).lower():
        logger.exception("Migration failed while adding column")
        raise
```

### 5) Startup recovery behavior (non-schema but migration-adjacent)

On restart, in-progress generations are marked failed:

```python
# src/docsfy/storage.py
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
```

Expired sessions are pruned during app startup:

```python
# src/docsfy/storage.py
async def cleanup_expired_sessions() -> None:
    """Remove expired sessions.

    NOTE: This is called during application startup (lifespan) only.
    Expired sessions accumulate between restarts but are harmless since
    get_session() filters by expires_at. For long-running deployments,
    consider calling this periodically (e.g., via a background task).
    TODO: Add periodic cleanup for long-running instances.
    """
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute("DELETE FROM sessions WHERE expires_at <= datetime('now')")
        await db.commit()
```

---

## Integrity model and relationships

Docsfy intentionally enforces most relationships at application level (not SQLite foreign keys).

- `projects.owner` logically maps to `users.username`
- `project_access.username` maps to `users.username`
- `project_access.(project_name, project_owner)` maps to project identity (across variants)
- `sessions.username` maps to user identity (including env-admin login path)

Cleanup logic is explicit in code:

```python
# src/docsfy/storage.py
async def delete_user(username: str) -> bool:
    """Delete a user by username, invalidating all their sessions and cleaning up ACLs."""
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute("DELETE FROM sessions WHERE username = ?", (username,))
        # Clean up owned projects and their access entries
        await db.execute("DELETE FROM projects WHERE owner = ?", (username,))
        await db.execute(
            "DELETE FROM project_access WHERE project_owner = ?", (username,)
        )
        # Clean up ACL entries where user was granted access
        await db.execute("DELETE FROM project_access WHERE username = ?", (username,))
        cursor = await db.execute("DELETE FROM users WHERE username = ?", (username,))
        await db.commit()
        return cursor.rowcount > 0
```

```python
# src/docsfy/storage.py
# Clean up project_access if no more variants remain for this name+owner
if cursor.rowcount > 0 and owner is not None:
    remaining = await db.execute(
        "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?",
        (name, owner),
    )
    row = await remaining.fetchone()
    if row and row[0] == 0:
        await db.execute(
            "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?",
            (name, owner),
        )
```

> **Warning:** Because there are no DB-level foreign keys, direct/manual SQL writes can create orphaned rows that the app does not automatically reconcile unless specific cleanup paths are triggered.

---

## Test and CI coverage for schema behavior

Key migration-adjacent behaviors are covered by tests:

```python
# tests/test_storage.py
async def test_init_db_resets_orphaned_generating(db_path: Path) -> None:
    from docsfy.storage import get_project, init_db, save_project

    await save_project(
        name="stuck-repo",
        repo_url="https://github.com/org/stuck.git",
        status="generating",
        ai_provider="claude",
        ai_model="opus",
        owner="testuser",
    )

    # Simulate server restart by re-running init_db
    await init_db()

    project = await get_project(
        "stuck-repo", ai_provider="claude", ai_model="opus", owner="testuser"
    )
    assert project is not None
    assert project["status"] == "error"
    assert "Server restarted" in project["error_message"]
```

```python
# tests/test_storage.py
async def test_cleanup_expired_sessions(db_path: Path) -> None:
    import aiosqlite

    from docsfy.storage import (
        DB_PATH,
        _hash_session_token,
        cleanup_expired_sessions,
        create_session,
    )

    # Directly insert a session with a past expiration
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            ("expired-token", "expired-user", 0, "2020-01-01T00:00:00"),
        )
        await db.commit()

    # Create a valid session
    valid_token = await create_session("valid-user", ttl_hours=8)

    await cleanup_expired_sessions()

    # Check that only the valid session remains
    async with aiosqlite.connect(DB_PATH) as db:
        cursor = await db.execute("SELECT COUNT(*) FROM sessions")
        row = await cursor.fetchone()
        assert row is not None
        assert row[0] == 1

        # Session tokens are stored as hashes
        token_hash = _hash_session_token(valid_token)
        cursor = await db.execute(
            "SELECT username FROM sessions WHERE token = ?", (token_hash,)
        )
        row = await cursor.fetchone()
        assert row is not None
        assert row[0] == "valid-user"
```

CI test entry point:

```toml
# tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** Current tests validate startup recovery, session cleanup, owner scoping, and `data_dir` initialization, but they do not include explicit fixtures for every legacy schema branch in `init_db()` (for example, a seeded pre-owner `projects` table).


---

Source: backup-and-recovery.md

# Backup and Recovery

`docsfy` persists operational state in `DATA_DIR` and expects both SQLite metadata and generated artifacts to remain available together.

## Where data is stored

`DATA_DIR` is configured via settings and passed into DB initialization at startup:

```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

```python
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    ...
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield
```

`storage.py` resolves concrete paths from `DATA_DIR`:

```python
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
```

Project artifacts are namespaced by owner/project/provider/model:

```python
def _validate_owner(owner: str) -> str:
    """Validate owner segment to prevent path traversal."""
    if not owner:
        return "_default"
    if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."):
        msg = f"Invalid owner: '{owner}'"
        raise ValueError(msg)
    return owner

def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    ...
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model

def get_project_site_dir(...):
    return get_project_dir(name, ai_provider, ai_model, owner) / "site"

def get_project_cache_dir(...):
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
```

Expected layout:

```text
DATA_DIR/
  docsfy.db
  projects/
    <owner-or-_default>/
      <project-name>/
        <ai-provider>/
          <ai-model>/
            plan.json
            cache/
              pages/
                *.md
            site/
              index.html
              *.html
              *.md
              search-index.json
              llms.txt
              llms-full.txt
              .nojekyll
              assets/*
```

## What to back up

Back up **both**:
1. `DATA_DIR/docsfy.db`
2. `DATA_DIR/projects/` (all owners/projects/variants)

SQLite holds project state plus auth/session/access data:

```python
await db.execute("""
    CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        username TEXT UNIQUE NOT NULL,
        api_key_hash TEXT NOT NULL UNIQUE,
        role TEXT NOT NULL DEFAULT 'user',
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")

await db.execute("""
    CREATE TABLE IF NOT EXISTS project_access (
        project_name TEXT NOT NULL,
        project_owner TEXT NOT NULL DEFAULT '',
        username TEXT NOT NULL,
        PRIMARY KEY (project_name, project_owner, username)
    )
""")

await db.execute("""
    CREATE TABLE IF NOT EXISTS sessions (
        token TEXT PRIMARY KEY,
        username TEXT NOT NULL,
        is_admin INTEGER NOT NULL DEFAULT 0,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        expires_at TIMESTAMP NOT NULL
    )
""")
```

Generated docs and indexes are written into each variant’s `site/` directory:

```python
if output_dir.exists():
    shutil.rmtree(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
assets_dir = output_dir / "assets"
assets_dir.mkdir(exist_ok=True)

(output_dir / ".nojekyll").touch()
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
(output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8")
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

> **Warning:** Backing up only `docsfy.db` or only `projects/` can produce mismatches (metadata points to missing files, or files exist without matching DB rows).

## Deployment persistence configuration

Containerized deployments should persist `/data` externally. Example from `docker-compose.yaml`:

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

Local repo config also avoids committing runtime data:

```gitignore
# Data
data/
.dev/data/
```

## Recommended backup procedure

1. Quiesce writes (stop `docsfy`, or ensure no generation is in progress).
2. Snapshot/copy the **entire** `DATA_DIR` atomically if possible.
3. Store versioned backups (daily full + retention policy).
4. Test restore periodically in a non-production environment.

> **Tip:** In Docker Compose setups, backing up host `./data` captures both `docsfy.db` and all generated variant artifacts because it maps directly to `/data`.

## Recovery procedure

1. Stop `docsfy`.
2. Restore `DATA_DIR` from the same backup set (`docsfy.db` + `projects/`).
3. Start `docsfy` and let startup run DB initialization/migrations.
4. Validate project status and docs serving.

Startup recovery behavior includes schema migration and handling interrupted generations:

```python
# Migration: convert old 3-column PK table to 4-column PK (with owner)
...
logger.info(
    "Migrating database to 4-column PK schema (name, ai_provider, ai_model, owner)"
)
...
await db.execute("ALTER TABLE projects_new RENAME TO projects")

# Reset orphaned "generating" projects from previous server run
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
```

> **Note:** After restore/restart, variants that were `generating` when the backup was taken are intentionally transitioned to `error` with `Server restarted during generation`.

## Variant/site export (supplemental backup)

`docsfy` can export rendered docs as `.tar.gz` through API endpoints:

```python
@app.get("/api/projects/{name}/{provider}/{model}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
    tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}")
```

```python
@app.get("/api/projects/{name}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
    tar.add(str(site_dir), arcname=name)
```

Use these as supplemental exports, not as full disaster-recovery backups.

> **Note:** Download endpoints package the `site/` output only; they do **not** include SQLite metadata (`docsfy.db`), `cache/pages`, or `plan.json`.

## Destructive operations to account for

Generation and delete operations remove data on disk:

```python
if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
...
project_dir = get_project_dir(name, provider, model, project_owner)
if project_dir.exists():
    shutil.rmtree(project_dir)
```

And each render replaces the full site directory:

```python
if output_dir.exists():
    shutil.rmtree(output_dir)
```

> **Warning:** `DELETE` endpoints and re-render operations are destructive on disk; recovery requires restoring from backup or regenerating from source repositories.


---

Source: testing-and-quality-gates.md

# Testing and Quality Gates

This project uses a layered quality stack: `pytest` for behavior coverage, `tox` as the test runner wrapper, and `pre-commit` for linting, type checking, and secret scanning.

> **Warning:** No repository CI pipeline files are currently present (`.github/workflows`, `.gitlab-ci.yml`, `.circleci`, `Jenkinsfile`, etc.). Quality gates are defined in local tooling (`tox.toml`, `.pre-commit-config.yaml`) and any external pre-commit service integration.

## Pytest Coverage Areas

`pytest` is configured in `pyproject.toml`:

```toml
[project.optional-dependencies]
dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"]

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["src"]
```

Current suite structure covers **149 tests across 13 test modules**:

- **Auth, RBAC, session and access control:** `tests/test_auth.py` (36), `tests/test_main.py` (15), `tests/test_dashboard.py` (4)
- **Storage and persistence behavior:** `tests/test_storage.py` (33)
- **Generation/planning/parser/repository logic:** `tests/test_generator.py` (8), `tests/test_json_parser.py` (15), `tests/test_prompts.py` (3), `tests/test_repository.py` (9)
- **Rendering and content safety:** `tests/test_renderer.py` (11)
- **Contracts and configuration models:** `tests/test_config.py` (3), `tests/test_models.py` (9), `tests/test_ai_client.py` (2)
- **End-to-end mocked flow:** `tests/test_integration.py` (1)

### Coverage Examples from Tests

**SSRF hardening (private DNS/IP rejection)** from `tests/test_main.py`:

```python
async def test_reject_private_url_dns(monkeypatch: pytest.MonkeyPatch) -> None:
    """Test that SSRF protection rejects DNS names resolving to private IPs."""
    import socket

    from docsfy.main import _reject_private_url

    def mock_getaddrinfo(
        host: str, port: object, *args: object, **kwargs: object
    ) -> list[
        tuple[socket.AddressFamily, socket.SocketKind, int, str, tuple[str, int]]
    ]:
        return [(socket.AF_INET, socket.SOCK_STREAM, 0, "", ("192.168.1.1", 0))]

    monkeypatch.setattr(socket, "getaddrinfo", mock_getaddrinfo)

    with pytest.raises(HTTPException) as exc_info:
        await _reject_private_url("https://evil.com/org/repo")
    assert exc_info.value.status_code == 400
```

**Role enforcement (viewer cannot generate docs)** from `tests/test_auth.py`:

```python
async def test_viewer_cannot_generate(_init_db: None) -> None:
    """A viewer should get 403 when trying to generate docs."""
    from docsfy.main import _generating, app
    from docsfy.storage import create_user

    _generating.clear()
    _, viewer_key = await create_user("viewer-gen", role="viewer")

    transport = ASGITransport(app=app)
    async with AsyncClient(
        transport=transport,
        base_url="http://test",
        headers={"Authorization": f"Bearer {viewer_key}"},
    ) as ac:
        response = await ac.post(
            "/api/generate",
            json={
                "repo_url": "https://github.com/org/repo",
                "project_name": "test-proj",
            },
        )
    assert response.status_code == 403
    assert "Write access required" in response.json()["detail"]
    _generating.clear()
```

**Output sanitization (XSS vectors blocked)** from `tests/test_renderer.py`:

```python
def test_sanitize_html_unquoted_javascript() -> None:
    from docsfy.renderer import _sanitize_html

    result = _sanitize_html("<a href=javascript:alert(1)>x</a>")
    assert "javascript:" not in result

    result = _sanitize_html("<img src=javascript:alert(1)>")
    assert "javascript:" not in result

    result = _sanitize_html("<a href=data:text/html,<script>alert(1)</script>>x</a>")
    assert "data:" not in result

    result = _sanitize_html("<img src=data:text/html,evil>")
    assert "data:" not in result
```

**End-to-end API/docs artifact flow** from `tests/test_integration.py`:

```python
response = await client.get("/docs/test-repo/claude/opus/index.html")
assert response.status_code == 200
assert "test-repo" in response.text

response = await client.get("/api/projects/test-repo/claude/opus/download")
assert response.status_code == 200
assert response.headers["content-type"] == "application/gzip"
```

> **Note:** There is no coverage threshold gate configured (no `pytest-cov` settings in `pyproject.toml` or `tox.toml`).

## tox Usage

`tox` is configured in `tox.toml` with a single environment:

```toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

What this means:

- `tox` runs the `unittests` env by default.
- Tests run through `uv` with dev extras.
- `pytest-xdist` is used with `-n auto` for parallel execution.
- Packaging/build is skipped for test runs (`skipsdist = true`).

> **Tip:** For parity with tox while debugging a single step, use the exact command from `tox.toml`: `uv run --extra dev pytest -n auto tests`.

## Pre-commit Hooks

Hook orchestration is defined in `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-added-large-files
      - id: check-docstring-first
      - id: check-executables-have-shebangs
      - id: check-merge-conflict
      - id: check-symlinks
      - id: detect-private-key
      - id: mixed-line-ending
      - id: debug-statements
      - id: trailing-whitespace
        args: [--markdown-linebreak-ext=md]
      - id: end-of-file-fixer
      - id: check-ast
      - id: check-builtin-literals
      - id: check-toml
```

It also wires lint/type/security hooks:

```yaml
  # flake8 retained for RedHatQE M511 plugin; ruff handles standard linting
  - repo: https://github.com/PyCQA/flake8
    rev: 7.3.0
    hooks:
      - id: flake8
        args: [--config=.flake8]
        additional_dependencies:
          [git+https://github.com/RedHatQE/flake8-plugins.git, flake8-mutable]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2
    hooks:
      - id: ruff
      - id: ruff-format

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.19.1
    hooks:
      - id: mypy
        exclude: (tests/)
```

## mypy Gate

Type checking is strict at project level in `pyproject.toml`:

```toml
[tool.mypy]
check_untyped_defs = true
disallow_any_generics = true
disallow_incomplete_defs = true
disallow_untyped_defs = true
no_implicit_optional = true
show_error_codes = true
warn_unused_ignores = true
strict_equality = true
extra_checks = true
warn_unused_configs = true
warn_redundant_casts = true
```

In pre-commit, mypy also installs extra stubs/deps:

```yaml
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.19.1
    hooks:
      - id: mypy
        exclude: (tests/)
        additional_dependencies:
          [types-requests, types-PyYAML, types-colorama, types-aiofiles, pydantic, types-Markdown]
```

## Ruff Gate

Ruff is enforced via pre-commit:

```yaml
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2
    hooks:
      - id: ruff
      - id: ruff-format
```

There is no dedicated `[tool.ruff]` section in `pyproject.toml`, so ruff runs with defaults unless overridden by hook-level args (none currently set).

## Flake8 Compatibility Gate (M511)

`flake8` is retained specifically for RedHatQE plugin checks:

```ini
[flake8]
select=M511
```

This keeps `M511` enforcement while ruff handles general linting.

## Secrets Scanning Gates

Secret scanning is layered in pre-commit:

```yaml
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: detect-private-key

  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
    hooks:
      - id: gitleaks
```

`gitleaks` has a repo-specific allowlist in `.gitleaks.toml`:

```toml
[extend]
useDefault = true

[allowlist]
paths = [
    '''tests/test_repository\.py''',
]
```

Test fixtures also use inline allowlist annotations where needed, for example in `tests/test_repository.py`:

```python
assert sha == "abc123def"  # pragma: allowlist secret
```

> **Warning:** Allowlisting should stay narrowly scoped to test fixtures only; broad allowlists can hide real leaks in production code.


---

Source: ci-cd-integration.md

# CI/CD Integration

Docsfy already has strong **automation building blocks** for CI, but they are not yet wired into a repository-managed CI pipeline. The project currently relies on `tox` for tests and `pre-commit` for linting, typing, and secret scanning.

> **Warning:** No CI workflow definitions are currently checked in (for example, no `.github/workflows`, `.gitlab-ci.yml`, or `Jenkinsfile`). Until a pipeline is added, enforcement depends on developers running checks locally.

## Current Automation Posture

### Test execution is defined in `tox`

```toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

- One tox environment exists: `unittests`
- Tests run with `pytest-xdist` (`-n auto`) for parallel execution
- `skipsdist = true` means tox does not build/install the package before testing

> **Note:** With `skipsdist = true`, CI validates source-tree behavior but not wheel/sdist installability.

### Python, pytest, and mypy defaults are centralized in `pyproject.toml`

```toml
[project]
requires-python = ">=3.12"

[project.optional-dependencies]
dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"]

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["src"]

[tool.mypy]
check_untyped_defs = true
disallow_any_generics = true
disallow_incomplete_defs = true
disallow_untyped_defs = true
no_implicit_optional = true
show_error_codes = true
warn_unused_ignores = true
strict_equality = true
extra_checks = true
warn_unused_configs = true
warn_redundant_casts = true
```

- CI runners should use Python 3.12+
- Async testing is first-class (`pytest-asyncio`)
- Mypy is configured in strict mode

### Lint, formatting, typing, and security checks are encoded in `.pre-commit-config.yaml`

```yaml
ci:
  autofix_prs: false
  autoupdate_commit_msg: "ci: [pre-commit.ci] pre-commit autoupdate"
```

```yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-added-large-files
      - id: check-docstring-first
      - id: check-executables-have-shebangs
      - id: check-merge-conflict
      - id: check-symlinks
      - id: detect-private-key
      - id: mixed-line-ending
      - id: debug-statements
      - id: trailing-whitespace
        args: [--markdown-linebreak-ext=md]
      - id: end-of-file-fixer
      - id: check-ast
      - id: check-builtin-literals
      - id: check-toml
```

```yaml
# flake8 retained for RedHatQE M511 plugin; ruff handles standard linting
- repo: https://github.com/PyCQA/flake8
  rev: 7.3.0
  hooks:
    - id: flake8
      args: [--config=.flake8]
      additional_dependencies:
        [git+https://github.com/RedHatQE/flake8-plugins.git, flake8-mutable]

- repo: https://github.com/astral-sh/ruff-pre-commit
  rev: v0.15.2
  hooks:
    - id: ruff
    - id: ruff-format

- repo: https://github.com/Yelp/detect-secrets
  rev: v1.5.0
  hooks:
    - id: detect-secrets

- repo: https://github.com/gitleaks/gitleaks
  rev: v8.30.0
  hooks:
    - id: gitleaks

- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v1.19.1
  hooks:
    - id: mypy
      exclude: (tests/)
```

- `ruff` + `ruff-format` handle general lint/format checks
- `flake8` is retained for rule `M511` via plugin
- `detect-secrets` and `gitleaks` provide layered secret scanning
- `mypy` runs as a hook and excludes `tests/`

```ini
[flake8]
select=M511
```

```toml
[extend]
useDefault = true

[allowlist]
paths = [
    '''tests/test_repository\.py''',
]
```

> **Warning:** The flake8 hook intentionally pulls `RedHatQE/flake8-plugins` from Git, so CI reproducibility depends on that upstream repository state unless you pin a commit.

## Deployment Readiness Signals Already in Code

The repo already contains deploy-friendly health checks in both app and container config:

```python
@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}
```

```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
```

## Recommended Pipeline Stages

Use the existing repository configuration as the source of truth:

1. **Setup**
   - Use Python 3.12 runner
   - Install `pre-commit`, `tox`, and `uv`

2. **Quality & Security Gate**
   - Run all hooks from `.pre-commit-config.yaml`
   - Enforces linting, formatting, type checks, and secret scanning

3. **Test Gate**
   - Run tox `unittests` env from `tox.toml`
   - Executes `pytest -n auto tests` through `uv`

4. **Build Gate (main/release branches)**
   - Build container from `Dockerfile`
   - Preserves runtime assumptions already encoded in the image

5. **Smoke Gate**
   - Start the built image and check `/health`
   - Fail fast before deployment if health probe fails

6. **Deploy**
   - Deploy only after all prior gates succeed

> **Tip:** Keep CI logic thin by reusing `tox.toml` and `.pre-commit-config.yaml` directly, instead of duplicating check logic in pipeline YAML.

## Why This Works Well for Docsfy

Tests are already written to run without external AI services by mocking expensive/external operations:

```python
with patch.dict(os.environ, {"ADMIN_KEY": TEST_ADMIN_KEY}):
    get_settings.cache_clear()
    await storage.init_db()
    ...

with (
    patch("docsfy.main.check_ai_cli_available", return_value=(True, "")),
    patch("docsfy.main.clone_repo", return_value=(tmp_path / "repo", "abc123")),
    patch("docsfy.main.run_planner", return_value=sample_plan),
    patch(
        "docsfy.main.generate_all_pages",
        return_value={"introduction": "# Intro\n\nWelcome!"},
    ),
):
    ...
```

This makes CI runs deterministic and suitable for pull-request validation without requiring real provider credentials.


---

Source: repository-structure.md

# Repository Structure

`docsfy` is organized as a `src`-layout Python service with server-rendered UI, static site rendering utilities, and a focused async test suite.

## Top-Level Layout

```text
docsfy/
├── src/docsfy/                  # Application package
│   ├── __init__.py
│   ├── main.py                  # FastAPI app, auth middleware, API routes
│   ├── config.py                # Environment-backed settings
│   ├── models.py                # Pydantic request/plan models
│   ├── storage.py               # SQLite + filesystem storage + user/session auth
│   ├── repository.py            # git clone/diff helpers
│   ├── ai_client.py             # AI CLI wrapper re-exports
│   ├── prompts.py               # Planner/page/incremental prompt builders
│   ├── json_parser.py           # Robust JSON extraction from AI output
│   ├── generator.py             # Planner + page generation orchestration
│   ├── renderer.py              # Markdown-to-HTML rendering + asset/site output
│   ├── templates/               # Jinja templates (app UI + generated docs pages)
│   └── static/                  # Frontend assets copied into generated docs
├── tests/                       # Unit + integration tests
├── docs/plans/                  # Design/implementation planning docs
├── test-plans/                  # End-to-end/manual UI test plan
├── pyproject.toml               # Packaging, deps, pytest config, script entrypoint
├── uv.lock                      # Locked dependency graph
├── tox.toml                     # Local test task runner
├── Dockerfile                   # Multi-stage runtime image
├── docker-compose.yaml          # Local container orchestration
├── .env.example                 # Environment variable template
├── .pre-commit-config.yaml      # Lint/type/security hooks
├── .flake8                      # Flake8 plugin settings
├── .gitleaks.toml               # Secret scanning config
├── .gitignore
└── OWNERS
```

## Source Modules (`src/docsfy`)

### API entrypoint and route wiring

`main.py` defines app startup, authentication middleware, API endpoints, and the end-to-end generation lifecycle.

```python
app = FastAPI(
    title="docsfy",
    description="AI-powered documentation generator",
    version="0.1.0",
    lifespan=lifespan,
)

app.add_middleware(AuthMiddleware)

@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}

@app.post("/api/generate", status_code=202)
async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]:
```

Key responsibilities:
- request auth (`Bearer` token or `docsfy_session` cookie)
- project/variant ownership checks
- generation task scheduling and abort logic
- docs serving (`/docs/...`) and archive download endpoints

### Settings and request models

- `config.py` centralizes runtime settings (`ADMIN_KEY`, `AI_PROVIDER`, `AI_MODEL`, `DATA_DIR`, cookie security, timeout).
- `models.py` validates generation input (`repo_url` vs `repo_path`) and doc-plan schemas (`DocPlan`, `NavGroup`, `DocPage`).

```python
class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True
```

### Generation pipeline modules

- `prompts.py`: prompt construction for planner, page generation, and incremental page selection.
- `ai_client.py`: re-exports provider/runtime helpers from `ai-cli-runner`.
- `json_parser.py`: resilient parsing from noisy AI output.
- `generator.py`: planning + page generation, cache support, bounded concurrency.
- `repository.py`: git clone and changed-file detection for incremental behavior.

```python
success, output = await call_ai_cli(
    prompt=prompt,
    cwd=repo_path,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    cli_flags=cli_flags,
)

results = await run_parallel_with_limit(
    coroutines, max_concurrency=MAX_CONCURRENT_PAGES
)
```

### Persistence and runtime pathing

`storage.py` owns both database schema/migrations and output path conventions.

```python
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
```

```python
CREATE TABLE IF NOT EXISTS projects (
    name TEXT NOT NULL,
    ai_provider TEXT NOT NULL DEFAULT '',
    ai_model TEXT NOT NULL DEFAULT '',
    owner TEXT NOT NULL DEFAULT '',
    repo_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'generating',
    ...
    PRIMARY KEY (name, ai_provider, ai_model, owner)
)
```

```python
def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    ...
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
```

## Templates and Static Assets

### Jinja templates (`src/docsfy/templates`)

- App UI pages: `dashboard.html`, `status.html`, `login.html`, `admin.html`
- Generated docs pages: `index.html`, `page.html`
- Shared partials: `_theme.html`, `_sidebar.html`, `_modal.html`

Generated docs templates explicitly load the packaged static assets:

```html
<script src="assets/theme.js"></script>
<script src="assets/search.js"></script>
<script src="assets/copy.js"></script>
<script src="assets/callouts.js"></script>
<script src="assets/scrollspy.js"></script>
<script src="assets/codelabels.js"></script>
<script src="assets/github.js"></script>
```

### Static frontend assets (`src/docsfy/static`)

- `style.css`: full docs theme (layout, typography, callouts, TOC, search modal)
- `theme.js`: dark/light theme toggle + persistence
- `search.js`: `Cmd/Ctrl+K` modal search using `search-index.json`
- `copy.js`: code block copy buttons
- `callouts.js`: transforms blockquotes (`Note`, `Warning`, `Tip`, etc.) into callouts
- `scrollspy.js`: active heading sync in TOC
- `codelabels.js`: inferred language badges on code blocks
- `github.js`: optional GitHub stars badge hydration

`renderer.py` copies these files to the generated site output and emits search/LLM artifacts:

```python
if STATIC_DIR.exists():
    for static_file in STATIC_DIR.iterdir():
        if static_file.is_file():
            shutil.copy2(static_file, assets_dir / static_file.name)

(output_dir / "search-index.json").write_text(
    json.dumps(search_index), encoding="utf-8"
)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

## Tests (`tests/`)

The suite is split by module/feature area:

- `test_main.py`: API route behavior and generation endpoint lifecycle
- `test_auth.py`: login/session flows, role permissions (`admin`, `user`, `viewer`)
- `test_storage.py`: DB CRUD, migrations, key/session management, ACL behavior
- `test_repository.py`: clone/local SHA/diff helpers
- `test_generator.py`: planner/page generation and incremental planner handling
- `test_renderer.py`: markdown rendering and HTML sanitization behavior
- `test_config.py`, `test_models.py`, `test_json_parser.py`, `test_prompts.py`, `test_ai_client.py`: focused unit tests
- `test_dashboard.py`: dashboard page rendering behavior
- `test_integration.py`: mocked full flow (`generate -> serve -> download -> delete`)

Example integration assertion flow:

```python
response = await client.get("/api/status")
assert response.status_code == 200
projects = response.json()["projects"]
assert len(projects) == 1
assert projects[0]["status"] == "ready"

response = await client.get("/docs/test-repo/claude/opus/index.html")
assert response.status_code == 200
```

## Runtime and Configuration Files

### Python packaging and app entrypoint

```toml
[project]
name = "docsfy"
requires-python = ">=3.12"
dependencies = [
    "ai-cli-runner",
    "fastapi",
    "uvicorn",
    "pydantic-settings",
    "python-simple-logger",
    "aiosqlite",
    "jinja2",
    "markdown",
    "pygments",
    "python-multipart>=0.0.22",
]

[project.scripts]
docsfy = "docsfy.main:run"
```

### Container/runtime config

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Environment template

```bash
ADMIN_KEY=your-secure-admin-key-here-min-16-chars
AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60
LOG_LEVEL=INFO
# SECURE_COOKIES=false
```

### Local quality/security tooling

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```yaml
# .pre-commit-config.yaml (excerpt)
- repo: https://github.com/astral-sh/ruff-pre-commit
  hooks:
    - id: ruff
    - id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
  hooks:
    - id: mypy
```

```gitignore
# Data
data/
.dev/data/
```

> **Warning:** Runtime state (`/data`, SQLite DB, generated sites/cache) is intentionally untracked; do not commit generated project output.

## CI/CD and Contributor Workflow

> **Note:** No hosted pipeline definitions (for example `.github/workflows/`) are currently checked into this repository.

Quality gates are still defined and reproducible locally via `tox`, `pytest`, `pre-commit`, and secret scanning configs (`.gitleaks.toml`, `detect-secrets` hook).

> **Tip:** Before opening a PR, run `uv run --extra dev pytest -n auto tests` and `pre-commit run --all-files`.

## Runtime Output Layout (Generated, Not Source-Controlled)

Based on `storage.py` + `renderer.py`, generation outputs are stored under owner/provider/model-specific paths:

```text
/data/
├── docsfy.db
└── projects/
    └── {owner}/
        └── {project}/
            └── {ai_provider}/{ai_model}/
                ├── plan.json
                ├── cache/pages/*.md
                └── site/
                    ├── .nojekyll
                    ├── index.html
                    ├── *.html
                    ├── *.md
                    ├── search-index.json
                    ├── llms.txt
                    ├── llms-full.txt
                    └── assets/*
```

This separation is important for multi-user and multi-variant isolation (`name + provider + model + owner`).


---

Source: extending-docsfy.md

# Extending docsfy

docsfy has four main extension surfaces:

1. Prompt construction (`src/docsfy/prompts.py`)
2. HTML rendering and template selection (`src/docsfy/renderer.py`, `src/docsfy/templates/`)
3. Frontend behavior and styling (`src/docsfy/static/` plus shared template partials)
4. Generation orchestration and caching (`src/docsfy/main.py`, `src/docsfy/generator.py`, `src/docsfy/repository.py`, `src/docsfy/storage.py`)

> **Note:** docsfy uses two template contexts:
> - **Generated docs pages**: `index.html`, `page.html`, and static assets copied to `assets/`
> - **Web app UI** (dashboard/admin/status/login): Jinja templates rendered by FastAPI, many with inline CSS/JS

---

## 1) Customizing prompts

All planner/page prompts are built in `src/docsfy/prompts.py`.

```python
PLAN_SCHEMA = """{
  "project_name": "string - project name",
  "tagline": "string - one-line project description",
  "navigation": [
    {
      "group": "string - section group name",
      "pages": [
        {
          "slug": "string - URL-friendly page identifier",
          "title": "string - human-readable page title",
          "description": "string - brief description of what this page covers"
        }
      ]
    }
  ]
}"""
```

```python
def build_planner_prompt(project_name: str) -> str:
    return f"""You are a technical documentation planner. Explore this repository thoroughly.
Explore the source code, configuration files, tests, CI/CD pipelines, and project structure.
Do NOT rely on the README — understand the project from its code and configuration.
...
Output format:
{PLAN_SCHEMA}"""
```

```python
def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str:
    return f"""You are a technical documentation writer. Explore this repository to write
the "{page_title}" page for the {project_name} documentation.
...
Use these callout formats for special content:
- Notes: > **Note:** text
- Warnings: > **Warning:** text
- Tips: > **Tip:** text
...
Output ONLY the markdown content for this page. No wrapping, no explanation."""
```

### Prompt contract you must preserve

The generation pipeline expects plan JSON with `navigation -> pages -> slug/title/description`:

```python
for group in plan.get("navigation", []):
    for page in group.get("pages", []):
        slug = page.get("slug", "")
        title = page.get("title", slug)
```

> **Warning:** If you change prompt output shape, update all plan consumers (`generator.py`, `renderer.py`, and any tests expecting `navigation/pages`).

---

## 2) Customizing renderer templates

Renderer wiring lives in `src/docsfy/renderer.py`:

```python
TEMPLATES_DIR = Path(__file__).parent / "templates"
STATIC_DIR = Path(__file__).parent / "static"

_jinja_env = Environment(
    loader=FileSystemLoader(str(TEMPLATES_DIR)),
    autoescape=select_autoescape(["html"]),
)
```

Generated docs pages use `index.html` and `page.html`:

```python
def render_page(...):
    env = _get_jinja_env()
    template = env.get_template("page.html")
    content_html, toc_html = _md_to_html(markdown_content)
    return template.render(...)

def render_index(...):
    env = _get_jinja_env()
    template = env.get_template("index.html")
    return template.render(...)
```

Site output assembly (`render_site`) includes static copy, HTML, markdown, search index, and LLM files:

```python
if output_dir.exists():
    shutil.rmtree(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
assets_dir = output_dir / "assets"
assets_dir.mkdir(exist_ok=True)

if STATIC_DIR.exists():
    for static_file in STATIC_DIR.iterdir():
        if static_file.is_file():
            shutil.copy2(static_file, assets_dir / static_file.name)

(output_dir / "index.html").write_text(index_html, encoding="utf-8")
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
(output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8")
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
```

> **Warning:** `render_site()` deletes `output_dir` before rendering. Do not place manual files there unless your extension re-creates them every run.

### Markdown-to-HTML behavior you can extend

```python
md = markdown.Markdown(
    extensions=["fenced_code", "codehilite", "tables", "toc"],
    extension_configs={
        "codehilite": {"css_class": "highlight", "guess_lang": False},
        "toc": {"toc_depth": "2-3"},
    },
)
content_html = _sanitize_html(md.convert(md_text))
```

The sanitizer strips dangerous tags/attributes and allowlists URL schemes (`http://`, `https://`, `#`, `/`, `mailto:`). If you loosen this, update `tests/test_renderer.py`.

---

## 3) Customizing frontend assets

Generated docs pages load assets from `assets/` (copied from `src/docsfy/static/`):

```html
<script src="assets/theme.js"></script>
<script src="assets/search.js"></script>
<script src="assets/copy.js"></script>
<script src="assets/callouts.js"></script>
<script src="assets/scrollspy.js"></script>
<script src="assets/codelabels.js"></script>
<script src="assets/github.js"></script>
{% include '_sidebar.html' %}
```

### Callout behavior

`src/docsfy/static/callouts.js` turns blockquotes into styled callouts based on first bold token:

```javascript
if (text === 'note' || text === 'info') {
  type = 'note';
} else if (text === 'warning' || text === 'caution') {
  type = 'warning';
} else if (text === 'tip' || text === 'hint') {
  type = 'tip';
} else if (text === 'danger' || text === 'error') {
  type = 'danger';
} else if (text === 'important') {
  type = 'important';
}
```

This matches the prompt’s preferred syntax (`> **Note:**`, `> **Warning:**`, `> **Tip:**`) and additional aliases.

### Theme, search, and code-label hooks

- `theme.js`: toggles `data-theme` and persists `localStorage["theme"]`
- `search.js`: loads `search-index.json` and provides Cmd/Ctrl+K modal search
- `codelabels.js`: maps `language-*` classes to human labels
- `style.css`: centralized design tokens (`:root` and `[data-theme="dark"]`)

> **Tip:** To add a new docs-page behavior, add a file under `src/docsfy/static/`, then include it in `src/docsfy/templates/page.html` and `src/docsfy/templates/index.html`.

---

## 4) Customizing generation logic

High-level flow starts in `POST /api/generate` (`src/docsfy/main.py`) and runs `_run_generation()` / `_generate_from_path()`.

Core orchestration:

```python
plan = await run_planner(
    repo_path=repo_dir,
    project_name=project_name,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
)

plan["repo_url"] = source_url
...
pages = await generate_all_pages(
    repo_path=repo_dir,
    plan=plan,
    cache_dir=cache_dir,
    ai_provider=ai_provider,
    ai_model=ai_model,
    ai_cli_timeout=ai_cli_timeout,
    use_cache=use_cache if use_cache else not force,
    project_name=project_name,
    owner=owner,
)

site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner)
render_site(plan=plan, pages=pages, output_dir=site_dir)
```

### Page generation parallelism

`src/docsfy/generator.py` limits concurrent page jobs:

```python
MAX_CONCURRENT_PAGES = 5
...
results = await run_parallel_with_limit(
    coroutines, max_concurrency=MAX_CONCURRENT_PAGES
)
```

### Incremental regeneration path

When commits differ, docsfy diffs changed files and asks the incremental planner which page slugs to invalidate:

```python
changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
...
pages_to_regen = await run_incremental_planner(
    repo_dir,
    project_name,
    ai_provider,
    ai_model,
    changed_files,
    existing_plan,
    ai_cli_timeout,
)
if pages_to_regen != ["all"]:
    for slug in pages_to_regen:
        cache_file = cache_dir / f"{slug}.md"
        if cache_file.exists():
            cache_file.unlink()
```

### Cache/output location model

`src/docsfy/storage.py` defines project storage layout:

```python
def get_project_dir(name: str, ai_provider: str = "", ai_model: str = "", owner: str = "") -> Path:
    ...
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model

def get_project_site_dir(...):
    return get_project_dir(...) / "site"

def get_project_cache_dir(...):
    return get_project_dir(...) / "cache" / "pages"
```

> **Warning:** Slug/path safety checks are enforced in both generation and rendering. If you change slug rules, update all validations (`main.py`, `generator.py`, `renderer.py`) consistently.

### Adding a new AI provider (beyond `claude/gemini/cursor`)

Provider support is explicitly constrained in request validation and API checks:

```python
ai_provider: Literal["claude", "gemini", "cursor"] | None = None
```

```python
if ai_provider not in ("claude", "gemini", "cursor"):
    raise HTTPException(
        status_code=400,
        detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.",
    )
```

Also update provider dropdowns in templates (`dashboard.html`, `status.html`) and relevant tests.

---

## 5) Configuration knobs for extension work

Runtime settings come from `.env` (see `.env.example`) via `src/docsfy/config.py`.

```env
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

AI_PROVIDER=claude
AI_MODEL=claude-opus-4-6[1m]
AI_CLI_TIMEOUT=60

LOG_LEVEL=INFO
# SECURE_COOKIES=false
```

`config.py` defaults:

```python
admin_key: str = ""
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]"
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True
```

App run-time host/port/debug toggles:

```python
reload = os.getenv("DEBUG", "").lower() == "true"
host = os.getenv("HOST", "127.0.0.1")
port = int(os.getenv("PORT", "8000"))
uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload)
```

Container/dev config (`docker-compose.yaml`):

```yaml
services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data
```

> **Tip:** For local HTTP-only development, set `SECURE_COOKIES=false` so session cookies are accepted without TLS.

---

## 6) Tests and CI/CD status when extending

The repo has strong unit/integration coverage for prompt building, generation, rendering, auth, and storage. Pytest is configured in `pyproject.toml`:

```toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["src"]
```

> **Warning:** There are currently no CI/CD workflow files in this repository (`.github/workflows` and `.gitlab-ci*` are absent). Run tests locally after extension changes:

- `pytest`
- Focused suites like `pytest tests/test_generator.py tests/test_renderer.py tests/test_main.py` for generation/rendering changes


---