# docsfy > Self-hosted AI documentation generator that turns Git repositories into searchable static docs through a FastAPI web service. --- Source: project-overview.md # Project Overview `docsfy` is a self-hosted, AI-powered documentation generation service. It takes a Git repository, uses an AI provider to plan and write documentation pages, and publishes a fully static docs site that can be viewed in-browser or downloaded as an archive. At runtime, it is a FastAPI web application with a built-in dashboard, status pages, authentication, role-based access, and per-project ownership/access control. ```toml [project] name = "docsfy" description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos" [project.scripts] docsfy = "docsfy.main:run" ``` ## What Problem It Solves Keeping documentation current is expensive and usually manual. `docsfy` addresses that by: - Generating docs from code, config, and tests (not just top-level project docs) - Tracking generated variants by AI provider/model - Supporting incremental regeneration when repositories change - Rendering polished static output ready for hosting or download - Adding team-grade controls (auth, roles, ownership, access grants) The prompt layer explicitly enforces source-first documentation generation: ```python def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str: return f"""You are a technical documentation writer. Explore this repository to write the "{page_title}" page for the {project_name} documentation. Page description: {page_description} Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines to write comprehensive, accurate documentation. Do NOT rely on the README. ... """ ``` ## Who It Is For `docsfy` is best suited for: - Platform/DevEx teams maintaining internal docs for many repositories - Engineering teams that want docs regenerated as code changes - Teams comparing documentation quality across AI providers/models - Organizations needing controlled docs access (admin/user/viewer + grants) ## How docsfy Works (High-Level) ### 1) Intake and validation A generation request accepts either a remote repo URL or a local repo path (admin-only), plus provider/model options: ```python class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self ``` ```python if gen_request.repo_path and not request.state.is_admin: raise HTTPException( status_code=403, detail="Local repo path access requires admin privileges", ) if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) ``` ### 2) Planning, incremental updates, and page generation The generation pipeline: - checks AI CLI availability - plans doc structure - optionally computes changed files between commits - regenerates pages (parallelized) - renders the final static site ```python plan = await run_planner( repo_path=repo_dir, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, ) plan["repo_url"] = source_url ``` ```python pages = await generate_all_pages( repo_path=repo_dir, plan=plan, cache_dir=cache_dir, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, use_cache=use_cache if use_cache else not force, project_name=project_name, owner=owner, ) site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) ``` ```python result = subprocess.run( ["git", "diff", "--name-only", old_sha, new_sha], cwd=repo_path, capture_output=True, text=True, timeout=30, ) ``` > **Tip:** Keep `force` disabled for normal runs. `docsfy` can reuse cached pages and use Git diffs to regenerate only what changed. ### 3) Static docs output + AI-friendly artifacts The renderer creates both human-facing and model-friendly assets: ```python (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") search_index = _build_search_index(valid_pages, plan) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) llms_txt = _build_llms_txt(plan) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") llms_full_txt = _build_llms_full_txt(plan, valid_pages) (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` The generated docs UI also includes search, theme switching, code copy buttons, callout styling, and sidebar navigation. ## Security and Access Model `docsfy` is multi-user and role-aware, with both Bearer-token API auth and cookie-based browser sessions. ```python # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) ... # 1. Check Authorization header (API clients) ... # 2. Check session cookie (browser) -- opaque session token ... if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) ``` ```python def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` Project variants are scoped by name + provider + model + owner: ```python CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', ... PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` Access can be delegated by admins on a per-project-owner basis: ```python @app.post("/api/admin/projects/{name}/access") async def grant_access(request: Request, name: str) -> dict[str, str]: ... await grant_project_access(name, username, project_owner=project_owner) ``` > **Warning:** `ADMIN_KEY` is required at startup and must be at least 16 characters; otherwise the app exits. ```python if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` ## Configuration and Deployment Core environment configuration comes from `.env`: ```env # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` Containerized local deployment uses `/data` for persistent state: ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] ``` Runtime entrypoint: ```dockerfile ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ## Quality and CI/CD Posture Quality checks are configured via `pre-commit` and `tox`: ```yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v6.0.0 - repo: https://github.com/gitleaks/gitleaks rev: v8.30.0 - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.2 ``` ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** No repository-hosted workflow files were found under `.github/workflows`; current automation is defined through local tooling and container health checks. --- Source: architecture.md # Architecture `docsfy` is a single FastAPI service that combines four major subsystems: - an authenticated web/API control plane, - a SQLite-backed metadata layer, - an asynchronous AI documentation generation pipeline, - a static site renderer that emits HTML, Markdown, search data, and LLM index files. ## High-Level Component Model - **Application layer**: `src/docsfy/main.py` - **Storage layer**: `src/docsfy/storage.py` - **Generation pipeline**: `src/docsfy/generator.py`, `src/docsfy/repository.py`, `src/docsfy/prompts.py`, `src/docsfy/ai_client.py`, `src/docsfy/json_parser.py` - **Static renderer**: `src/docsfy/renderer.py`, `src/docsfy/templates/*`, `src/docsfy/static/*` End-to-end flow: 1. `POST /api/generate` receives a `GenerateRequest`. 2. Request is authorized (Bearer token or session cookie). 3. Variant metadata is stored in SQLite (`status=generating`). 4. A background `asyncio` task runs cloning/planning/page generation/rendering. 5. Output site is written to filesystem under `/data/projects/.../site`. 6. Variant status flips to `ready`. 7. Docs are served from `/docs/{project}/{provider}/{model}/...`. ## FastAPI App Architecture The application enforces startup requirements (`ADMIN_KEY`), initializes DB state, and adds auth middleware globally: ```python @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) _generating.clear() await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` Authentication is centralized in `AuthMiddleware`: ```python class AuthMiddleware(BaseHTTPMiddleware): """Authenticate every request via Bearer token or session cookie.""" # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) async def dispatch( self, request: Request, call_next: RequestResponseEndpoint ) -> Response: if request.url.path in self._PUBLIC_PATHS: return await call_next(request) settings = get_settings() user = None is_admin = False username = "" # 1. Check Authorization header (API clients) auth_header = request.headers.get("authorization", "") if auth_header.startswith("Bearer "): token = auth_header[7:] if token == settings.admin_key: is_admin = True username = "admin" else: user = await get_user_by_key(token) ``` The generation endpoint uses a lock + in-memory task registry to prevent duplicate variant runs: ```python gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) await save_project( name=project_name, repo_url=gen_request.repo_url or gen_request.repo_path or "", status="generating", ai_provider=ai_provider, ai_model=ai_model, owner=owner, ) try: task = asyncio.create_task( _run_generation( repo_url=gen_request.repo_url, repo_path=gen_request.repo_path, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout, force=gen_request.force, owner=owner, ) ) _generating[gen_key] = task ``` > **Note:** Generated docs under `/docs/...` are still protected by middleware; only `/login` and `/health` are public. Static file serving is path-safe (prevents traversal beyond the variant site directory): ```python file_path = site_dir / path try: file_path.resolve().relative_to(site_dir.resolve()) except ValueError as exc: raise HTTPException(status_code=403, detail="Access denied") from exc if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") return FileResponse(file_path) ``` ## SQLite Storage Layer The `projects` table is variant-scoped by `(name, ai_provider, ai_model, owner)`: ```sql CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` Additional tables: - `users` (role-based accounts, hashed API keys), - `project_access` (per-owner access grants), - `sessions` (hashed session tokens + expiry). User key hashing uses HMAC with `ADMIN_KEY` as secret: ```python def hash_api_key(key: str, hmac_secret: str = "") -> str: """Hash an API key with HMAC-SHA256 for storage. Uses ADMIN_KEY as the HMAC secret so that even if the source is read, keys cannot be cracked without the environment secret. """ # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will # invalidate all existing api_key_hash values, requiring all users to # regenerate their API keys. secret = hmac_secret or os.getenv("ADMIN_KEY", "") if not secret: msg = "ADMIN_KEY environment variable is required for key hashing" raise RuntimeError(msg) return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest() ``` Project artifact paths are computed and sanitized: ```python def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: if not ai_provider or not ai_model: msg = "ai_provider and ai_model are required for project directory paths" raise ValueError(msg) # Sanitize path segments to prevent traversal for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]: if ( "/" in segment or "\\" in segment or ".." in segment or segment.startswith(".") ): msg = f"Invalid {segment_name}: '{segment}'" raise ValueError(msg) safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` > **Warning:** Rotating `ADMIN_KEY` invalidates existing `api_key_hash` records by design. ## AI Generation Pipeline Provider integration is intentionally delegated to `ai-cli-runner`: ```python from ai_cli_runner import ( PROVIDERS, VALID_AI_PROVIDERS, ProviderConfig, call_ai_cli, check_ai_cli_available, get_ai_cli_timeout, run_parallel_with_limit, ) ``` Main staged flow (`_generate_from_path`) updates `current_stage` in DB while progressing through planning, generation, and rendering: ```python await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="planning", ) plan = await run_planner( repo_path=repo_dir, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, ) plan["repo_url"] = source_url ``` ```python await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="generating_pages", plan_json=json.dumps(plan), ) pages = await generate_all_pages( repo_path=repo_dir, plan=plan, cache_dir=cache_dir, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, use_cache=use_cache if use_cache else not force, project_name=project_name, owner=owner, ) ``` ```python await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="rendering", page_count=len(pages), ) site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) ``` ```python await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), ) ``` Parallel page generation is bounded (`MAX_CONCURRENT_PAGES = 5`): ```python MAX_CONCURRENT_PAGES = 5 ... results = await run_parallel_with_limit( coroutines, max_concurrency=MAX_CONCURRENT_PAGES ) ``` Incremental regeneration uses git diff + AI page targeting: ```python changed_files = get_changed_files(repo_dir, old_sha, commit_sha) ... pages_to_regen = await run_incremental_planner( repo_dir, project_name, ai_provider, ai_model, changed_files, existing_plan, ai_cli_timeout, ) if pages_to_regen != ["all"]: # Delete only the cached pages that need regeneration for slug in pages_to_regen: ... cache_file = cache_dir / f"{slug}.md" ... if cache_file.exists(): cache_file.unlink() use_cache = True ``` Prompt construction explicitly requires source/config/test exploration and README avoidance: ```python def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str: return f"""You are a technical documentation writer. Explore this repository to write the "{page_title}" page for the {project_name} documentation. Page description: {page_description} Explore the codebase as needed. Read source files, configs, tests, and CI/CD pipelines to write comprehensive, accurate documentation. Do NOT rely on the README. ... """ ``` > **Tip:** Use `force=true` in `POST /api/generate` to clear cached pages and force a full rebuild. ## Static Site Renderer Renderer converts Markdown to HTML with syntax highlighting and TOC, then sanitizes generated HTML: ```python md = markdown.Markdown( extensions=["fenced_code", "codehilite", "tables", "toc"], extension_configs={ "codehilite": {"css_class": "highlight", "guess_lang": False}, "toc": {"toc_depth": "2-3"}, }, ) content_html = _sanitize_html(md.convert(md_text)) toc_html = getattr(md, "toc", "") ``` URL attributes are allowlisted in sanitization (`http`, `https`, `#`, `/`, `mailto`): ```python def _sanitize_url_attr(match: re.Match) -> str: # type: ignore[type-arg] attr = match.group(1) # href or src quote = match.group(2) # " or ' url = match.group(3) # the URL value ... if clean_url.startswith(("http://", "https://", "#", "/", "mailto:")): return match.group(0) # Keep as-is # Block everything else (javascript:, data:, vbscript:, etc.) return f"{attr}={quote}#{quote}" ``` Site output includes static pages and machine-readable indexes: ```python # Prevent GitHub Pages from running Jekyll (output_dir / ".nojekyll").touch() ... (output_dir / "index.html").write_text(index_html, encoding="utf-8") ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") ... (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) ... (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` The generated UI is enhanced client-side with static assets: - `search.js` (Cmd/Ctrl+K modal search over `search-index.json`), - `copy.js` (copy buttons on code blocks), - `callouts.js` (blockquote callout classes), - `theme.js`, `scrollspy.js`, `codelabels.js`, `github.js`. ## Configuration and Runtime App settings (Pydantic settings model): ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` Environment example: ```dotenv ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` Container compose: ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` Container entrypoint: ```dockerfile ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` > **Note:** `ADMIN_KEY` must be set and at least 16 characters, or startup exits. ## Testing and CI/CD Posture The repository has broad unit/integration coverage (`tests/test_main.py`, `tests/test_storage.py`, `tests/test_generator.py`, `tests/test_renderer.py`, `tests/test_auth.py`, `tests/test_integration.py`, etc.). Local test pipeline (`tox.toml`): ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` Local quality/security checks (`.pre-commit-config.yaml`) include: - `ruff` + `ruff-format`, - `mypy`, - `detect-secrets`, - `gitleaks`, - `flake8` (with project-specific plugin usage). > **Warning:** No in-repo hosted workflow definitions were found (for example, no `.github/workflows`), so remote CI/CD orchestration is external to this repository. --- Source: core-concepts.md # Core Concepts `docsfy` organizes generated documentation around six core entities: - **Project**: a repository identity (derived name + metadata). - **Variant**: one generated output for a specific AI provider/model. - **Owner**: the authenticated user who owns that project/variant namespace. - **Role**: authorization level (`admin`, `user`, `viewer`). - **Session**: login state via secure cookie and DB-backed expiry. - **Generated artifacts**: cached markdown and rendered static site files. > **Note:** In `docsfy`, project names are repository-centric, but storage and access are owner-scoped to avoid cross-user collisions. ## 1) Projects A generation request must include exactly one source (`repo_url` or `repo_path`), and `project_name` is derived from that source. ```10:30:src/docsfy/models.py class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self ``` ```55:64:src/docsfy/models.py @property def project_name(self) -> str: if self.repo_url: name = self.repo_url.rstrip("/").split("/")[-1] if name.endswith(".git"): name = name[:-4] return name if self.repo_path: return Path(self.repo_path).resolve().name return "unknown" ``` Projects are tracked in SQLite with generation metadata (`status`, commit SHA, page count, plan JSON, timestamps). ```56:73:src/docsfy/storage.py CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` ## 2) Variants A **variant** is one `(project, provider, model, owner)` tuple. This is the real unit of generation, status, deletion, serving, and download. ```282:290:src/docsfy/storage.py """INSERT INTO projects (name, ai_provider, ai_model, owner, repo_url, status, updated_at) VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP) ON CONFLICT(name, ai_provider, ai_model, owner) DO UPDATE SET repo_url = excluded.repo_url, status = excluded.status, error_message = NULL, current_stage = NULL, updated_at = CURRENT_TIMESTAMP""", (name, ai_provider, ai_model, owner, repo_url, status), ``` Variant-specific API/docs routes are explicit: ```1019:1041:src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}") async def get_variant_details( request: Request, name: str, provider: str, model: str, ) -> dict[str, str | int | None]: name = _validate_project_name(name) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model ) return project @app.delete("/api/projects/{name}/{provider}/{model}") async def delete_variant( request: Request, name: str, provider: str, model: str, ) -> dict[str, str]: ``` ```1379:1386:src/docsfy/main.py @app.get("/docs/{project}/{provider}/{model}/{path:path}") async def serve_variant_docs( request: Request, project: str, provider: str, model: str, path: str = "index.html", ) -> FileResponse: ``` ## 3) Owners Owner is set from the authenticated username at generation time: ```457:484:src/docsfy/main.py project_name = gen_request.project_name owner = request.state.username if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.") # Fix 6: Use lock to prevent race condition between check and add gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) await save_project( name=project_name, repo_url=gen_request.repo_url or gen_request.repo_path or "", status="generating", ai_provider=ai_provider, ai_model=ai_model, owner=owner, ) ``` Owner is also part of filesystem layout: ```501:519:src/docsfy/storage.py def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: if not ai_provider or not ai_model: msg = "ai_provider and ai_model are required for project directory paths" raise ValueError(msg) # Sanitize path segments to prevent traversal for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]: if ( "/" in segment or "\\" in segment or ".." in segment or segment.startswith(".") ): msg = f"Invalid {segment_name}: '{segment}'" raise ValueError(msg) safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` Cross-owner sharing is controlled through `project_access` and scoped by `(project_name, project_owner, username)`. ```237:243:src/docsfy/storage.py CREATE TABLE IF NOT EXISTS project_access ( project_name TEXT NOT NULL, project_owner TEXT NOT NULL DEFAULT '', username TEXT NOT NULL, PRIMARY KEY (project_name, project_owner, username) ) ``` > **Warning:** For admin users, if multiple owners have the same variant `(name/provider/model)`, owner is ambiguous and some variant routes return `409` until disambiguated. ```241:246:src/docsfy/main.py if len(distinct_owners) > 1: raise HTTPException( status_code=409, detail="Multiple owners found for this variant, please specify owner", ) ``` ## 4) Roles `docsfy` defines three roles: - **admin**: full access, including user and access management endpoints. - **user**: read/write project operations (generate, abort, delete) within accessible scope. - **viewer**: read-only access (dashboard/docs/download/status), no write operations. ```609:623:src/docsfy/storage.py VALID_ROLES = frozenset({"admin", "user", "viewer"}) async def create_user(username: str, role: str = "user") -> tuple[str, str]: """Create a user and return (username, raw_api_key).""" if username.lower() == "admin": msg = "Username 'admin' is reserved" raise ValueError(msg) if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$", username): msg = f"Invalid username: '{username}'. Must be 2-50 alphanumeric characters, dots, hyphens, underscores." raise ValueError(msg) if role not in VALID_ROLES: msg = f"Invalid role: '{role}'. Must be admin, user, or viewer." raise ValueError(msg) ``` ```185:191:src/docsfy/main.py def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ## 5) Sessions Authentication supports both: - `Authorization: Bearer ...` (admin key or user API key) - `docsfy_session` cookie (browser login flow) ```122:137:src/docsfy/main.py # 1. Check Authorization header (API clients) auth_header = request.headers.get("authorization", "") if auth_header.startswith("Bearer "): token = auth_header[7:] if token == settings.admin_key: is_admin = True username = "admin" else: user = await get_user_by_key(token) # 2. Check session cookie (browser) -- opaque session token if not user and not is_admin: session_token = request.cookies.get("docsfy_session") if session_token: session = await get_session(session_token) ``` Sessions are opaque tokens, hashed at rest, and expire after 8 hours. ```21:23:src/docsfy/storage.py SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 ``` ```686:713:src/docsfy/storage.py async def create_session( username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS ) -> str: """Create an opaque session token.""" token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours) expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S") async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", (token_hash, username, 1 if is_admin else 0, expires_str), ) await db.commit() return token ``` ```297:304:src/docsfy/main.py response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` > **Tip:** Keep `SECURE_COOKIES` enabled in production. Only set it to `false` for local HTTP development. ```27:28:.env.example # Set to false for local HTTP development # SECURE_COOKIES=false ``` ## 6) Generated Artifacts Each completed variant writes structured outputs under owner/project/provider/model: - `plan.json` (navigation plan used for rendering and status UI) - `cache/pages/*.md` (cached AI markdown for incremental regeneration) - `site/` (served static docs) Site generation includes HTML, markdown copies, search index, and LLM-friendly files: ```223:290:src/docsfy/renderer.py # Prevent GitHub Pages from running Jekyll (output_dir / ".nojekyll").touch() project_name: str = plan.get("project_name", "Documentation") tagline: str = plan.get("tagline", "") navigation: list[dict[str, Any]] = plan.get("navigation", []) repo_url: str = plan.get("repo_url", "") # ... (output_dir / "index.html").write_text(index_html, encoding="utf-8") # ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") search_index = _build_search_index(valid_pages, plan) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) # Generate llms.txt files llms_txt = _build_llms_txt(plan) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") llms_full_txt = _build_llms_full_txt(plan, valid_pages) (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` The orchestration layer persists the plan and final status: ```998:1015:src/docsfy/main.py site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) project_dir = get_project_dir(project_name, ai_provider, ai_model, owner) (project_dir / "plan.json").write_text(json.dumps(plan, indent=2), encoding="utf-8") page_count = len(pages) await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), ) ``` Persistent storage is typically mounted to `/data`: ```1:10:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] ``` ## 7) CI/CD and Quality Gate Context This repository currently has no checked-in `.github` workflow directory, but quality checks are still codified via local/CI-capable tooling: ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ```43:61:.pre-commit-config.yaml - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.2 hooks: - id: ruff - id: ruff-format - repo: https://github.com/gitleaks/gitleaks rev: v8.30.0 hooks: - id: gitleaks - repo: https://github.com/pre-commit/mirrors-mypy rev: v1.19.1 hooks: - id: mypy ``` In practice, these concepts fit together as: 1. Authenticated user (owner + role) submits generation request. 2. Request creates/updates a project variant. 3. Background pipeline plans, generates, renders artifacts. 4. Session-scoped or bearer-scoped access controls who can view/manage each variant. 5. Static artifacts are served directly or downloaded as `.tar.gz`. --- Source: generation-lifecycle.md # Generation Lifecycle docsfy runs generation as a background task per **variant** (`owner/project/provider/model`). A variant starts in `generating`, moves through internal stages, and finishes as `ready`, `error`, or `aborted`. ## 1) Request Intake and Variant Locking Generation starts at `POST /api/generate`. The request model enforces source rules (`repo_url` XOR `repo_path`) and derives `project_name`. ```10:64:src/docsfy/models.py class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self ``` The API path enforces permissions, prevents duplicate in-flight generation for the same variant key, persists `status="generating"`, then starts `_run_generation()` as an async task. ```422:505:src/docsfy/main.py @app.post("/api/generate", status_code=202) async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]: _require_write_access(request) # Fix 9: Local repo path access requires admin privileges if gen_request.repo_path and not request.state.is_admin: raise HTTPException( status_code=403, detail="Local repo path access requires admin privileges", ) # ... snip ... # Fix 6: Use lock to prevent race condition between check and add gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) await save_project( name=project_name, repo_url=gen_request.repo_url or gen_request.repo_path or "", status="generating", ai_provider=ai_provider, ai_model=ai_model, owner=owner, ) try: task = asyncio.create_task( _run_generation( repo_url=gen_request.repo_url, repo_path=gen_request.repo_path, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout, force=gen_request.force, owner=owner, ) ) _generating[gen_key] = task except Exception: _generating.pop(gen_key, None) raise return {"project": project_name, "status": "generating"} ``` > **Note:** `repo_path` is admin-only and must point to an absolute path containing `.git`. ## 2) Clone (or Local SHA Resolution) `_run_generation()` always enters `current_stage="cloning"` first. For remote sources, docsfy performs a shallow clone (`--depth 1`) and resolves HEAD SHA. For local sources, it skips clone and reads local HEAD SHA directly. ```720:789:src/docsfy/main.py async def _run_generation( repo_url: str | None, repo_path: str | None, project_name: str, ai_provider: str, ai_model: str, ai_cli_timeout: int, force: bool = False, owner: str = "", ) -> None: gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" try: # ... snip ... await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="cloning", ) if repo_path: # Local repository - use directly, no cloning needed local_path, commit_sha = get_local_repo_info(Path(repo_path)) await _generate_from_path( local_path, commit_sha, repo_url or repo_path, project_name, ai_provider, ai_model, ai_cli_timeout, force, owner, ) else: # Remote repository - clone to temp dir if repo_url is None: msg = "repo_url must be provided for remote repositories" raise ValueError(msg) with tempfile.TemporaryDirectory() as tmp_dir: repo_dir, commit_sha = await asyncio.to_thread( clone_repo, repo_url, Path(tmp_dir) ) await _generate_from_path( repo_dir, commit_sha, repo_url or "", project_name, ai_provider, ai_model, ai_cli_timeout, force, owner, ) ``` ```21:45:src/docsfy/repository.py def clone_repo(repo_url: str, base_dir: Path) -> tuple[Path, str]: repo_name = extract_repo_name(repo_url) repo_path = base_dir / repo_name logger.info(f"Cloning {repo_name} to {repo_path}") result = subprocess.run( ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)], capture_output=True, text=True, timeout=300, ) if result.returncode != 0: msg = f"Clone failed: {result.stderr or result.stdout}" raise RuntimeError(msg) sha_result = subprocess.run( ["git", "rev-parse", "HEAD"], cwd=repo_path, capture_output=True, text=True, ) if sha_result.returncode != 0: msg = f"Failed to get commit SHA: {sha_result.stderr or sha_result.stdout}" raise RuntimeError(msg) commit_sha = sha_result.stdout.strip() logger.info(f"Cloned {repo_name} at commit {commit_sha[:8]}") return repo_path, commit_sha ``` ## 3) Planning After source resolution, docsfy sets `current_stage="planning"` and calls the planner prompt. The prompt explicitly tells the model to inspect source/config/tests/CI and output strict JSON. ```24:42:src/docsfy/prompts.py def build_planner_prompt(project_name: str) -> str: return f"""You are a technical documentation planner. Explore this repository thoroughly. Explore the source code, configuration files, tests, CI/CD pipelines, and project structure. Do NOT rely on the README — understand the project from its code and configuration. Then create a documentation plan as a JSON object. The plan should cover: - Introduction and overview - Installation / getting started - Configuration (if applicable) - Usage guides for key features - API reference (if the project has an API) - Any other sections that would help users understand and use this project Project name: {project_name} CRITICAL: Your response must be ONLY a valid JSON object. No text before or after. No markdown code blocks. Output format: {PLAN_SCHEMA}""" ``` The parsed plan is stored into DB (`plan_json`) before page generation so UI clients can show structure/progress. ## 4) Incremental Planning and Cache Decisions When `force=true`, docsfy clears cached pages and resets `page_count` to `0`. Without force, it can short-circuit to `ready/up_to_date` if commit SHA did not change. ```832:867:src/docsfy/main.py if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") # Reset page count so API shows 0 during regeneration await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, page_count=0, ) else: existing = await get_project( project_name, ai_provider=ai_provider, ai_model=ai_model, owner=owner ) if existing and existing.get("last_generated"): old_sha = ( str(existing["last_commit_sha"]) if existing.get("last_commit_sha") else None ) if old_sha == commit_sha: logger.info( f"[{project_name}] Project is up to date at {commit_sha[:8]}" ) await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ``` If SHA changed and prior plan exists, docsfy runs incremental planning (`current_stage="incremental_planning"`) and removes only cached markdown files for affected slugs. ```913:955:src/docsfy/main.py await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="incremental_planning", ) pages_to_regen = await run_incremental_planner( repo_dir, project_name, ai_provider, ai_model, changed_files, existing_plan, ai_cli_timeout, ) if pages_to_regen != ["all"]: # Delete only the cached pages that need regeneration for slug in pages_to_regen: # Validate slug to prevent path traversal if ( "/" in slug or "\\" in slug or ".." in slug or slug.startswith(".") ): logger.warning( f"[{project_name}] Skipping invalid slug from incremental planner: {slug}" ) continue cache_file = cache_dir / f"{slug}.md" # Extra safety: ensure the resolved path is inside cache_dir try: cache_file.resolve().relative_to(cache_dir.resolve()) except ValueError: logger.warning( f"[{project_name}] Path traversal attempt in slug: {slug}" ) continue if cache_file.exists(): cache_file.unlink() use_cache = True ``` > **Tip:** Use `force: true` for a guaranteed clean rebuild when changing model/provider behavior. ## 5) Page Generation docsfy sets `current_stage="generating_pages"` and calls `generate_all_pages()` with concurrency cap `MAX_CONCURRENT_PAGES = 5`. Each page: - Validates slug safety - Uses cache if enabled - Calls AI for markdown - Writes cache file - Updates `page_count` during generation ```66:131:src/docsfy/generator.py async def generate_page( repo_path: Path, slug: str, title: str, description: str, cache_dir: Path, ai_provider: str, ai_model: str, ai_cli_timeout: int | None = None, use_cache: bool = False, project_name: str = "", owner: str = "", ) -> str: # Validate slug to prevent path traversal if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug: msg = f"Invalid page slug: '{slug}'" raise ValueError(msg) cache_file = cache_dir / f"{slug}.md" if use_cache and cache_file.exists(): logger.debug(f"[{_label}] Using cached page: {slug}") return cache_file.read_text(encoding="utf-8") # ... AI call snip ... output = _strip_ai_preamble(output) cache_dir.mkdir(parents=True, exist_ok=True) cache_file.write_text(output, encoding="utf-8") # Update page count in DB if project_name provided if project_name: existing_pages = len(list(cache_dir.glob("*.md"))) await update_project_status( project_name, ai_provider, ai_model, owner=owner, status="generating", page_count=existing_pages, ) ``` ```168:201:src/docsfy/generator.py coroutines = [ generate_page( repo_path=repo_path, slug=p["slug"], title=p["title"], description=p["description"], cache_dir=cache_dir, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, use_cache=use_cache, project_name=project_name, owner=owner, ) for p in all_pages ] results = await run_parallel_with_limit( coroutines, max_concurrency=MAX_CONCURRENT_PAGES ) pages: dict[str, str] = {} for page_info, result in zip(all_pages, results): if isinstance(result, Exception): logger.warning( f"[{_label}] Page generation failed for '{page_info['slug']}': {result}" ) pages[page_info["slug"]] = ( f"# {page_info['title']}\n\n*Documentation generation failed.*" ) else: pages[page_info["slug"]] = result ``` ## 6) Rendering and Publish After markdown generation, docsfy sets `current_stage="rendering"` and renders final static output. `render_site()` recreates output, copies assets, writes both HTML and markdown pages, search index, and `llms` files. ```215:292:src/docsfy/renderer.py def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None: if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) assets_dir = output_dir / "assets" assets_dir.mkdir(exist_ok=True) # Prevent GitHub Pages from running Jekyll (output_dir / ".nojekyll").touch() # ... snip ... for idx, slug_info in enumerate(valid_slug_order): # ... snip ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") search_index = _build_search_index(valid_pages, plan) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) # Generate llms.txt files llms_txt = _build_llms_txt(plan) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") llms_full_txt = _build_llms_full_txt(plan, valid_pages) (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` Final publish state: ```988:1015:src/docsfy/main.py await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="rendering", page_count=len(pages), ) site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) # ... snip ... await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), ) ``` ## Statuses and Stages ### Statuses `storage.py` defines canonical lifecycle statuses: ```17:17:src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` | Status | Meaning | Terminal | |---|---|---| | `generating` | Task is active | No | | `ready` | Docs published (or no-op `up_to_date`) | Yes | | `error` | Generation failed | Yes | | `aborted` | Generation canceled by user/task | Yes | ### `current_stage` values used in lifecycle - `cloning` - `planning` - `incremental_planning` - `generating_pages` - `rendering` - `up_to_date` (ready/no-op) - `null` (done/aborted) > **Note:** The status page timeline UI is hardcoded to `cloning`, `planning`, `generating_pages`, and `rendering`; `incremental_planning` is a backend stage but not in the stage-order array. ## 7) Monitoring in UI and API `/status/{name}/{provider}/{model}` computes total planned pages from `plan_json`, then the page JS polls variant details every 3 seconds. ```369:401:src/docsfy/main.py @app.get("/status/{name}/{provider}/{model}", response_class=HTMLResponse) async def project_status_page( request: Request, name: str, provider: str, model: str ) -> HTMLResponse: # ... snip ... if project.get("plan_json"): try: plan_json = json.loads(str(project["plan_json"])) for group in plan_json.get("navigation", []): total_pages += len(group.get("pages", [])) except (json.JSONDecodeError, TypeError): plan_json = None ``` ```948:1063:src/docsfy/templates/status.html var PROJECT_NAME = {{ project.name | tojson }}; var PROJECT_PROVIDER = {{ project.ai_provider | tojson }}; var PROJECT_MODEL = {{ project.ai_model | tojson }}; var POLL_INTERVAL_MS = 3000; var previousPageCount = {{ (project.page_count or 0) | tojson }}; var currentStatus = {{ project.status | tojson }}; var currentStage = {{ (project.current_stage or '') | tojson }} || null; var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering']; ``` ## 8) Ready, Error, and Aborted End States ### Ready - Final state after successful render - Also used for no-op updates with `current_stage="up_to_date"` - Download endpoint requires `ready` ```1086:1091:src/docsfy/main.py if project["status"] != "ready": raise HTTPException(status_code=400, detail="Variant not ready") project_owner = str(project.get("owner", "")) site_dir = get_project_site_dir(name, provider, model, project_owner) if not site_dir.exists(): raise HTTPException(status_code=404, detail="Site not found") ``` ### Error - Set when CLI availability fails or any unhandled exception occurs - Carries `error_message` - UI shows retry controls for `error` and `aborted` ### Aborted - Variant abort endpoint cancels task, waits up to 5s, then marks `aborted` ```642:717:src/docsfy/main.py @app.post("/api/projects/{name}/{provider}/{model}/abort") async def abort_variant( request: Request, name: str, provider: str, model: str ) -> dict[str, str]: # ... snip ... task.cancel() try: await asyncio.wait_for(task, timeout=5.0) except asyncio.CancelledError: pass except asyncio.TimeoutError as exc: raise HTTPException( status_code=409, detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.", ) from exc await update_project_status( name, provider, model, status="aborted", owner=key_owner, error_message="Generation aborted by user", current_stage=None, ) ``` > **Warning:** On server startup, any orphaned `generating` rows are automatically converted to `error` with `"Server restarted during generation"`. ```182:185:src/docsfy/storage.py # Reset orphaned "generating" projects from previous server run cursor = await db.execute( "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'" ) ``` ## 9) Storage Layout and Runtime Configuration Variant artifacts are stored under owner/project/provider/model paths: ```501:530:src/docsfy/storage.py def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: # ... snip ... safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model def get_project_site_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: return get_project_dir(name, ai_provider, ai_model, owner) / "site" def get_project_cache_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages" ``` Relevant runtime config: ```1:8:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` ```1:13:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` ## 10) How Lifecycle Behavior Is Validated Integration tests verify the full mocked flow (`generate -> ready -> serve -> download`), and storage tests verify restart recovery behavior. ```52:109:tests/test_integration.py async def test_full_flow_mock(client: AsyncClient, tmp_path: Path) -> None: """Test the full generate -> status -> download flow with mocked AI.""" # ... snip ... await _run_generation( repo_url="https://github.com/org/test-repo.git", repo_path=None, project_name="test-repo", ai_provider="claude", ai_model="opus", ai_cli_timeout=60, owner="admin", ) # Check status response = await client.get("/api/status") assert response.status_code == 200 projects = response.json()["projects"] assert len(projects) == 1 assert projects[0]["name"] == "test-repo" assert projects[0]["status"] == "ready" ``` ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** This repository does not include a checked-in `.github/workflows` directory; automation in-repo is defined via `tox` and `.pre-commit-config.yaml`. --- Source: prerequisites.md # Prerequisites Before running `docsfy`, make sure your environment has Python, `uv`, `git`, one supported AI CLI with credentials, and a valid `ADMIN_KEY`. ## Python and `uv` `docsfy` requires Python 3.12+. ```toml [project] name = "docsfy" version = "0.1.0" description = "AI-powered documentation generator - generates polished static HTML docs from GitHub repos" requires-python = ">=3.12" dependencies = [ "ai-cli-runner", "fastapi", "uvicorn", "pydantic-settings", "python-simple-logger", "aiosqlite", "jinja2", "markdown", "pygments", "python-multipart>=0.0.22", ] ``` The lock file enforces the same minimum Python version: ```toml version = 1 revision = 3 requires-python = ">=3.12" ``` The project workflow uses `uv` for install, running, and tests: ```dockerfile RUN uv sync --frozen --no-dev ``` ```dockerfile ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** `Settings` loads environment variables from `.env`, so your local config must be present there. ```python model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) ``` ## `git` is required `docsfy` uses `git` to clone repositories and resolve commit SHAs: ```python result = subprocess.run( ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)], capture_output=True, text=True, timeout=300, ) if result.returncode != 0: msg = f"Clone failed: {result.stderr or result.stdout}" raise RuntimeError(msg) sha_result = subprocess.run( ["git", "rev-parse", "HEAD"], cwd=repo_path, capture_output=True, text=True, ) ``` Local-path generation also requires a real git repo (`.git` must exist): ```python if not (repo_p / ".git").exists(): raise HTTPException( status_code=400, detail=f"Not a git repository (no .git directory): '{gen_request.repo_path}'", ) ``` ## Supported AI providers, CLIs, and credentials Supported providers are fixed to `claude`, `gemini`, and `cursor`: ```python ai_provider: Literal["claude", "gemini", "cursor"] | None = None ``` ```python assert VALID_AI_PROVIDERS == frozenset({"claude", "gemini", "cursor"}) ``` `AI_CLI_TIMEOUT` must be greater than zero: ```python ai_cli_timeout: int = Field(default=60, gt=0) ``` The container image installs all three AI CLIs: ```dockerfile # Install Claude Code CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash" # Install Cursor Agent CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash" # Configure npm for non-root global installs and install Gemini CLI RUN mkdir -p /home/appuser/.npm-global \ && npm config set prefix '/home/appuser/.npm-global' \ && npm install -g @google/gemini-cli ``` Credential/config variables expected in `.env`: ```dotenv AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # Claude - Option 1: API Key # ANTHROPIC_API_KEY= # Claude - Option 2: Vertex AI # CLAUDE_CODE_USE_VERTEX=1 # CLOUD_ML_REGION= # ANTHROPIC_VERTEX_PROJECT_ID= # Gemini # GEMINI_API_KEY= # Cursor # CURSOR_API_KEY= ``` The app checks provider CLI readiness before generation: ```python cli_flags = ["--trust"] if ai_provider == "cursor" else None available, msg = await check_ai_cli_available( ai_provider, ai_model, cli_flags=cli_flags ) if not available: await update_project_status( project_name, ai_provider, ai_model, status="error", owner=owner, error_message=msg, ) return ``` > **Tip:** You only need credentials for the provider selected in `AI_PROVIDER`, but that provider’s CLI must be installed and authenticated. ## Mandatory `ADMIN_KEY` setup `ADMIN_KEY` is required and must be at least 16 characters. ```dotenv # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars ``` Startup fails fast if `ADMIN_KEY` is missing or too short: ```python settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` `ADMIN_KEY` is also used as the admin login secret: ```python if username == "admin" and api_key == settings.admin_key: is_admin = True authenticated = True ``` And as the HMAC secret for API key hashing: ```python secret = hmac_secret or os.getenv("ADMIN_KEY", "") if not secret: msg = "ADMIN_KEY environment variable is required for key hashing" raise RuntimeError(msg) ``` > **Warning:** Rotating `ADMIN_KEY` invalidates existing API key hashes, and `ADMIN_KEY` users cannot rotate this through the API (`"ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead."`). ## Minimal `.env` baseline ```dotenv ADMIN_KEY= AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 LOG_LEVEL=INFO ``` For local HTTP (non-HTTPS) development, this optional setting is available: ```dotenv # Set to false for local HTTP development # SECURE_COOKIES=false ``` `docker-compose` also expects `.env`: ```yaml services: docsfy: env_file: .env ``` --- Source: local-installation.md # Local Installation docsfy is a Python FastAPI service packaged with a `pyproject.toml` + `uv.lock` workflow. ```toml [project] name = "docsfy" version = "0.1.0" requires-python = ">=3.12" [project.scripts] docsfy = "docsfy.main:run" [project.optional-dependencies] dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"] ``` ## Prerequisites - Python `3.12+` - `uv` (used for dependency and runtime commands in this repo) - `git` (required for repository cloning and diffing during generation) ```python def clone_repo(repo_url: str, base_dir: Path) -> tuple[Path, str]: result = subprocess.run( ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)], capture_output=True, text=True, timeout=300, ) ``` > **Tip:** Generation supports `claude`, `gemini`, and `cursor` providers. ```python ai_provider: Literal["claude", "gemini", "cursor"] | None = None ``` ## 1) Install dependencies From the repository root: ```bash uv sync --frozen --no-dev ``` This is the same locked install pattern used by the project container build: ```dockerfile RUN uv sync --frozen --no-dev ``` If you want to run tests later, the repo uses this dev command in `tox.toml`: ```toml commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ## 2) Configure local environment Copy the env template and create a local data directory: ```bash cp .env.example .env mkdir -p data ``` Base `.env` values come from `.env.example`: ```dotenv ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # ANTHROPIC_API_KEY= # GEMINI_API_KEY= # CURSOR_API_KEY= LOG_LEVEL=INFO # SECURE_COOKIES=false ``` Runtime defaults are defined in `src/docsfy/config.py`: ```python admin_key: str = "" ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True ``` Storage paths are derived from `DATA_DIR`: ```python DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" ``` Recommended local overrides in `.env`: ```dotenv DATA_DIR=./data SECURE_COOKIES=false ``` > **Warning:** `ADMIN_KEY` is mandatory and must be at least 16 characters, or startup exits. ```python if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` > **Warning:** For plain local HTTP (`http://127.0.0.1:8000`), keep `SECURE_COOKIES=false` so login sessions work in the browser. ## 3) Run the service Start docsfy: ```bash uv run docsfy ``` The entrypoint behavior is: ```python reload = os.getenv("DEBUG", "").lower() == "true" host = os.getenv("HOST", "127.0.0.1") port = int(os.getenv("PORT", "8000")) uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload) ``` Common local override (bind all interfaces, custom port, reload on code changes): ```bash HOST=0.0.0.0 PORT=8800 DEBUG=true uv run docsfy ``` ## 4) Verify startup Health endpoint: ```bash curl http://127.0.0.1:8000/health ``` Expected response: ```json {"status":"ok"} ``` Open the login page: `http://127.0.0.1:8000/login` - Username: `admin` - Password: value of `ADMIN_KEY` ```python if username == "admin" and api_key == settings.admin_key: is_admin = True authenticated = True ``` API auth smoke test (Bearer token): ```bash export ADMIN_KEY="your-admin-key" curl -sS http://127.0.0.1:8000/api/status \ -H "Authorization: Bearer ${ADMIN_KEY}" ``` > **Note:** Only `/login` and `/health` are public routes by default. ```python _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) ``` ## 5) Optional: generation smoke test ```bash curl -X POST http://127.0.0.1:8000/api/generate \ -H "Authorization: Bearer ${ADMIN_KEY}" \ -H "Content-Type: application/json" \ -d '{"repo_url":"https://github.com/org/repo.git"}' ``` Generation checks AI CLI availability at runtime: ```python available, msg = await check_ai_cli_available( ai_provider, ai_model, cli_flags=cli_flags ) if not available: await update_project_status( project_name, ai_provider, ai_model, status="error", owner=owner, error_message=msg, ) return ``` > **Note:** Install and authenticate the CLI for the provider you use (`claude`, `gemini`, or `cursor`) before running generation jobs. ## 6) Optional: run tests ```bash uv run --extra dev pytest -n auto tests ``` This matches the project’s `tox.toml` command exactly. --- Source: run-with-docker.md # Run with Docker This repository provides both a `Dockerfile` and a `docker-compose.yaml` to run `docsfy` as a containerized service on port `8000`. ## Prerequisites and Environment Create a local `.env` file from `.env.example`: ```bash cp .env.example .env ``` The shipped example includes required and optional runtime variables: ```env # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # Claude - Option 1: API Key # ANTHROPIC_API_KEY= # Claude - Option 2: Vertex AI # CLAUDE_CODE_USE_VERTEX=1 # CLOUD_ML_REGION= # ANTHROPIC_VERTEX_PROJECT_ID= # Gemini # GEMINI_API_KEY= # Cursor # CURSOR_API_KEY= # Logging LOG_LEVEL=INFO # Set to false for local HTTP development # SECURE_COOKIES=false ``` Startup enforces `ADMIN_KEY` presence and minimum length: ```python @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) _generating.clear() await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` > **Warning:** If `ADMIN_KEY` is missing or shorter than 16 characters, the container exits during startup. > **Warning:** `SECURE_COOKIES` defaults to `true`. For plain HTTP local development, set `SECURE_COOKIES=false` in `.env` or browser login cookies may not persist. --- ## Run with `docker compose` (recommended) Repository compose file: ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` Run it: ```bash mkdir -p data docker compose up --build ``` Detached mode: ```bash docker compose up -d --build ``` Stop and remove container/network: ```bash docker compose down ``` --- ## Run directly from `Dockerfile` The image is multi-stage (`builder` + runtime), installs dependencies with `uv`, and runs as non-root `appuser`: ```dockerfile FROM python:3.12-slim AS builder WORKDIR /app COPY --from=ghcr.io/astral-sh/uv:0.5.14 /uv /usr/local/bin/uv RUN apt-get update && apt-get install -y --no-install-recommends \ git \ && rm -rf /var/lib/apt/lists/* COPY pyproject.toml uv.lock ./ COPY src/ src/ RUN uv sync --frozen --no-dev FROM python:3.12-slim WORKDIR /app RUN apt-get update && apt-get install -y --no-install-recommends \ bash \ git \ curl \ nodejs \ npm \ && rm -rf /var/lib/apt/lists/* ``` Runtime data, health check, and entrypoint: ```dockerfile RUN useradd --create-home --shell /bin/bash -g 0 appuser \ && mkdir -p /data \ && chown appuser:0 /data \ && chmod -R g+w /data USER appuser ENV PATH="/home/appuser/.local/bin:/home/appuser/.npm-global/bin:${PATH}" ENV HOME="/home/appuser" EXPOSE 8000 HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` Build and run: ```bash docker build -t docsfy:local . mkdir -p data docker run --rm -p 8000:8000 --env-file .env -v "$(pwd)/data:/data" docsfy:local ``` > **Note:** The container listens on internal port `8000` (`ENTRYPOINT` is fixed to `--port 8000`). Change host-side port with mappings like `-p 8080:8000`. --- ## Mounted Data Volume (`/data`) Compose mounts host `./data` into container `/data`: ```yaml volumes: - ./data:/data ``` Application defaults also target `/data`: ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) data_dir: str = "/data" ``` Storage paths are derived from `DATA_DIR` and initialized on startup: ```python DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" async def init_db(data_dir: str = "") -> None: ... DB_PATH.parent.mkdir(parents=True, exist_ok=True) PROJECTS_DIR.mkdir(parents=True, exist_ok=True) ``` Project artifacts are organized under provider/model-specific subdirectories: ```python return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` The repository intentionally ignores local data folders: ```gitignore # Data data/ .dev/data/ ``` > **Tip:** Back up `./data` (especially `docsfy.db` and `projects/`) to preserve generated docs and metadata across container rebuilds. --- ## Health Checks Container-level health checks call the app endpoint: ```dockerfile HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ``` Compose defines the same check: ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` App endpoint implementation: ```python # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) @app.get("/health") async def health() -> dict[str, str]: return {"status": "ok"} ``` Behavior is covered in tests: ```python async def test_health_is_public(unauthed_client: AsyncClient) -> None: """The /health endpoint should be accessible without authentication.""" response = await unauthed_client.get("/health") assert response.status_code == 200 assert response.json()["status"] == "ok" ``` Quick checks: ```bash curl -f http://localhost:8000/health docker compose ps ``` > **Warning:** `/health` currently reports only application liveness (`{"status":"ok"}`); it does not validate external AI credentials or downstream service readiness. --- ## CI/CD Status for Docker No CI/CD workflow files are present in this repository (no `.github/workflows`, GitLab, CircleCI, Jenkins, or Buildkite pipeline definitions), so Docker image build/run behavior documented here is currently local/manual. --- Source: first-docs-generation.md # First Documentation Run This guide walks you from first login to a generated, browsable docs site in `docsfy`. ## 1) Configure your environment `docsfy` reads settings from `.env` (`pydantic-settings` in `src/docsfy/config.py`) and requires `ADMIN_KEY` at startup. ```bash # .env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # Set to false for local HTTP development # SECURE_COOKIES=false ``` ```python # src/docsfy/main.py if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` > **Warning:** If you run over plain HTTP (for example `http://localhost:8000`), set `SECURE_COOKIES=false` in `.env`. Cookies are `secure=True` by default, so login sessions will not stick on HTTP. ## 2) Start `docsfy` ### Recommended: Docker Compose ```yaml # docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` Run: ```bash docker compose up --build ``` The container image installs AI CLIs during build: ```dockerfile # Dockerfile RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash" RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash" RUN mkdir -p /home/appuser/.npm-global \ && npm config set prefix '/home/appuser/.npm-global' \ && npm install -g @google/gemini-cli ``` ### Local run (without Docker) `pyproject.toml` defines a CLI entry point: ```toml [project.scripts] docsfy = "docsfy.main:run" ``` So after dependency setup, you can run: ```bash uv run docsfy ``` `docsfy.main:run` defaults to `127.0.0.1:8000`. ## 3) Log in Open: `http://localhost:8000/login` The login form uses username + API key (labeled “Password” in the UI): ```html

Admin login: username admin with the admin password.

``` Backend auth logic: ```python # src/docsfy/main.py if username == "admin" and api_key == settings.admin_key: is_admin = True authenticated = True else: user = await get_user_by_key(api_key) if user and user["username"] == username: authenticated = True is_admin = user.get("role") == "admin" ``` Session cookies are set as HTTP-only, strict same-site, 8-hour TTL: ```python response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` > **Note:** `SESSION_TTL_SECONDS` is `28800` (8 hours) in `src/docsfy/storage.py`. ## 4) Generate your first docs site After login, go to dashboard (`/`) and use **Generate Documentation**. ```html ``` Frontend payload sent to the API: ```javascript // src/docsfy/templates/dashboard.html var body = { repo_url: repoUrl, ai_provider: provider, force: force }; if (model) body.ai_model = model; fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, credentials: 'same-origin', body: JSON.stringify(body) }) ``` Server request model constraints: ```python # src/docsfy/models.py if not self.repo_url and not self.repo_path: raise ValueError("Either 'repo_url' or 'repo_path' must be provided") if self.repo_url and self.repo_path: raise ValueError("Provide either 'repo_url' or 'repo_path', not both") https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$" ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$" ``` Generation returns `202` with project name inferred from repo URL: ```python # src/docsfy/main.py return {"project": project_name, "status": "generating"} ``` ```python # tests/test_main.py response = await client.post("/api/generate", json={"repo_url": "https://github.com/org/repo.git"}) assert response.status_code == 202 assert response.json()["project"] == "repo" assert response.json()["status"] == "generating" ``` > **Warning:** Admin-only restriction applies to `repo_path`. Non-admin users get `403` for local path generation. > > **Warning:** Repo URLs resolving to localhost/private networks are rejected (`_reject_private_url` in `src/docsfy/main.py`). ## 5) Monitor generation ### From the dashboard A generating variant shows a progress bar and a status link: ```html Generating... View progress → ``` Dashboard polling behavior: ```javascript // src/docsfy/templates/dashboard.html var statusPollInterval = null; // Slow poll for status changes (10s) var progressPollInterval = null; // Fast poll for progress updates (5s) statusPollInterval = setInterval(pollStatusChanges, 10000); progressPollInterval = setInterval(pollProgressUpdates, 5000); ``` ### From the status page Status page polling behavior: ```javascript // src/docsfy/templates/status.html var POLL_INTERVAL_MS = 3000; pollTimer = setInterval(pollProject, POLL_INTERVAL_MS); ``` Generation stage updates are written by backend as: - `cloning` - `planning` - `incremental_planning` - `generating_pages` - `rendering` - `up_to_date` (when no changes) (from `_run_generation` and `_generate_from_path` in `src/docsfy/main.py`) Ready-state messaging: ```html {% if project.current_stage == 'up_to_date' %} Documentation is already up to date — no changes since last generation. {% else %} Documentation generated successfully! {% endif %} ``` > **Tip:** If you manually type URLs, always include provider/model segments from the current variant. The dashboard/status buttons build the correct URL for you. ## 6) Open and download your generated docs When status is `ready`, use **View Documentation** or **Download**: ```html View Documentation Download ``` Routes: ```python # src/docsfy/main.py @app.get("/docs/{project}/{provider}/{model}/{path:path}") # variant-specific @app.get("/docs/{project}/{path:path}") # latest ready variant ``` Integration tests confirm both variant and latest routes: ```python # tests/test_integration.py response = await client.get("/docs/test-repo/claude/opus/index.html") assert response.status_code == 200 response = await client.get("/docs/test-repo/index.html") assert response.status_code == 200 response = await client.get("/api/projects/test-repo/claude/opus/download") assert response.headers["content-type"] == "application/gzip" ``` ## 7) Where generated files are stored Storage path is owner/project/provider/model scoped: ```python # src/docsfy/storage.py return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` Site directory: ```python # src/docsfy/storage.py return get_project_dir(name, ai_provider, ai_model, owner) / "site" ``` Renderer output includes: - `index.html` - `.html` - `.md` - `search-index.json` - `llms.txt` - `llms-full.txt` - `.nojekyll` - `assets/*` ```python # src/docsfy/renderer.py (output_dir / "index.html").write_text(index_html, encoding="utf-8") (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") (output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8") (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` With Docker Compose, these are persisted under local `./data` because of `./data:/data`. ## 8) Optional sanity check after first run Local test command defined in `tox.toml`: ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** This repository currently defines local quality gates (`tox`, `pre-commit`) but does not include a checked-in GitHub Actions workflow file. --- Source: environment-variables.md # Environment Variables docsfy runtime configuration is defined in code and loaded via `pydantic-settings` from `.env` (plus environment variables). ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```python @lru_cache def get_settings() -> Settings: return Settings() ``` > **Tip:** `get_settings()` is cached. After changing environment variables, restart the process to apply them. ## Core Runtime Variables | Variable | Required | Default | Description | | --- | --- | --- | --- | | `ADMIN_KEY` | Yes | _(none)_ | Admin authentication secret. Required at startup, minimum length 16. Also used as HMAC secret for stored user API key hashes. | | `AI_PROVIDER` | No | `claude` | Default AI provider used by dashboard + `/api/generate` when request does not specify one. Allowed providers: `claude`, `gemini`, `cursor`. | | `AI_MODEL` | No | `claude-opus-4-6[1m]` | Default model name used when request omits `ai_model`. | | `AI_CLI_TIMEOUT` | No | `60` | Default timeout for AI CLI calls (seconds). Must be `> 0`. | | `LOG_LEVEL` | No | `INFO` | Logging level setting exposed in app config (`log_level`). | | `DATA_DIR` | No | `/data` | Base directory for SQLite DB and generated artifacts. | | `SECURE_COOKIES` | No | `true` | Controls `Secure` flag on session cookie. | > **Note:** `LOG_LEVEL` is present in settings and `.env.example`; repository code does not directly call `setLevel()`, so final filtering behavior depends on `python-simple-logger` configuration. ## Validation and Fallback Behavior `ADMIN_KEY` is enforced at app startup: ```python settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` Provider/model/timeout defaulting in `/api/generate`: ```python settings = get_settings() ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model ... ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout ``` Provider and model are validated before generation: ```python if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.") ``` Timeout validation is strict in both settings and request schema: ```python ai_cli_timeout: int = Field(default=60, gt=0) ``` ```python ai_cli_timeout: int | None = Field(default=None, gt=0) ``` ## AI Provider Credential Variables From the repository `.env.example`: ```env # Claude - Option 1: API Key # ANTHROPIC_API_KEY= # Claude - Option 2: Vertex AI # CLAUDE_CODE_USE_VERTEX=1 # CLOUD_ML_REGION= # ANTHROPIC_VERTEX_PROJECT_ID= # Gemini # GEMINI_API_KEY= # Cursor # CURSOR_API_KEY= ``` > **Note:** docsfy passes provider/model/timeout to `call_ai_cli(...)`; provider credential variables are expected to be present in the process environment for the installed CLIs. ## Storage and Security-Related Variables `DATA_DIR` controls persistent paths: ```python DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" ``` `ADMIN_KEY` is also used for API key hashing: ```python secret = hmac_secret or os.getenv("ADMIN_KEY", "") if not secret: msg = "ADMIN_KEY environment variable is required for key hashing" raise RuntimeError(msg) return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest() ``` > **Warning:** Rotating `ADMIN_KEY` invalidates existing stored user API key hashes. Users must regenerate API keys after rotation. ## Cookie Security (`SECURE_COOKIES`) Session cookie flags are set from config: ```python response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` `.env.example` includes: ```env # Set to false for local HTTP development # SECURE_COOKIES=false ``` > **Tip:** For local non-HTTPS development, set `SECURE_COOKIES=false` so browsers send the session cookie over HTTP. ## Process Runtime Variables (`docsfy` CLI) When starting via the Python entrypoint (`docsfy` script), these are read: ```python reload = os.getenv("DEBUG", "").lower() == "true" host = os.getenv("HOST", "127.0.0.1") port = int(os.getenv("PORT", "8000")) uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload) ``` - `DEBUG`: enables `uvicorn` reload when set to `"true"`. - `HOST`: bind address (default `127.0.0.1`). - `PORT`: bind port (default `8000`). > **Note:** In Docker, `HOST`/`PORT`/`DEBUG` are bypassed because the container entrypoint launches `uvicorn` with fixed arguments. ```dockerfile ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ## Docker Compose Environment Snippet ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` ## Repository `.env` Template ```env # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # Claude - Option 1: API Key # ANTHROPIC_API_KEY= # Claude - Option 2: Vertex AI # CLAUDE_CODE_USE_VERTEX=1 # CLOUD_ML_REGION= # ANTHROPIC_VERTEX_PROJECT_ID= # Gemini # GEMINI_API_KEY= # Cursor # CURSOR_API_KEY= # Logging LOG_LEVEL=INFO # Set to false for local HTTP development # SECURE_COOKIES=false ``` ## Runtime Constants (Not Environment-Configurable) These runtime settings exist in code but are not currently exposed as environment variables: ```python SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 ``` ```python MAX_CONCURRENT_PAGES = 5 ``` ```python await asyncio.wait_for(task, timeout=5.0) ``` --- Source: ai-provider-setup.md # AI Provider Setup `docsfy` supports three provider options: `claude`, `gemini`, and `cursor`. Provider/model are treated as a first-class variant key, so the same repo can have multiple generated doc variants side by side. ```10:20:src/docsfy/models.py class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) ``` ```1365:1370:src/docsfy/templates/dashboard.html ``` ```3:11:src/docsfy/ai_client.py from ai_cli_runner import ( PROVIDERS, VALID_AI_PROVIDERS, ProviderConfig, call_ai_cli, check_ai_cli_available, get_ai_cli_timeout, run_parallel_with_limit, ) ``` > **Note:** `docsfy` delegates provider execution to `ai_cli_runner`; credentials are expected via environment variables consumed by provider CLIs. ## Credentials and Environment Variables Use `.env` (loaded automatically by settings) to configure both app-level defaults and provider credentials. ```10:23:.env.example # Claude - Option 1: API Key # ANTHROPIC_API_KEY= # Claude - Option 2: Vertex AI # CLAUDE_CODE_USE_VERTEX=1 # CLOUD_ML_REGION= # ANTHROPIC_VERTEX_PROJECT_ID= # Gemini # GEMINI_API_KEY= # Cursor # CURSOR_API_KEY= ``` ```10:13:src/docsfy/config.py model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) ``` Set app defaults in `.env`: ```4:8:.env.example # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` `ADMIN_KEY` is required at startup and must be at least 16 characters: ```82:89:src/docsfy/main.py settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` If you run with Docker Compose, `.env` is wired automatically: ```1:8:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` ## Provider CLI Prerequisites The container image installs all three CLIs: ```26:57:Dockerfile # Install bash (needed for CLI install scripts), git (required at runtime for gitpython), curl (for Claude CLI), and nodejs/npm (for Gemini CLI) RUN apt-get update && apt-get install -y --no-install-recommends \ bash \ git \ curl \ nodejs \ npm \ && rm -rf /var/lib/apt/lists/* ... # Install Claude Code CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash" # Install Cursor Agent CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash" # Configure npm for non-root global installs and install Gemini CLI RUN mkdir -p /home/appuser/.npm-global \ && npm config set prefix '/home/appuser/.npm-global' \ && npm install -g @google/gemini-cli ``` ## Model Selection Behavior ### 1) Server-side fallback and validation If request values are omitted, `docsfy` falls back to settings defaults: ```454:466:src/docsfy/main.py settings = get_settings() ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model project_name = gen_request.project_name owner = request.state.username if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.") ``` Each `(project, provider, model)` is stored as a separate variant path: ```501:519:src/docsfy/storage.py def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: if not ai_provider or not ai_model: msg = "ai_provider and ai_model are required for project directory paths" raise ValueError(msg) ... return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` ### 2) UI suggestions and auto-fill behavior Model suggestions come from **ready** projects only: ```572:577:src/docsfy/storage.py async def get_known_models() -> dict[str, list[str]]: """Get distinct ai_model values per ai_provider from completed projects.""" async with aiosqlite.connect(DB_PATH) as db: cursor = await db.execute( "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model" ) ``` When provider changes in the dashboard form: - if current model is invalid for that provider, UI auto-fills the first known model - if no known models exist for that provider, UI clears the model input ```1677:1697:src/docsfy/templates/dashboard.html if (providerSelect && modelDropdown) { providerSelect.addEventListener('change', function() { if (_restoring) return; var newProvider = this.value; var modelsForProvider = knownModels[newProvider] || []; // If current model is not valid for the new provider, auto-fill if (modelInput) { var currentModel = modelInput.value; if (modelsForProvider.length > 0 && modelsForProvider.indexOf(currentModel) === -1) { modelInput.value = modelsForProvider[0]; saveFormState(); } else if (modelsForProvider.length === 0) { modelInput.value = ''; modelInput.placeholder = 'Enter model name'; saveFormState(); } } filterModelOptions(modelDropdown, modelInput ? modelInput.value : '', newProvider); }); } ``` Generate request payload only includes `ai_model` when the input is non-empty: ```2043:2049:src/docsfy/templates/dashboard.html var body = { repo_url: repoUrl, ai_provider: provider, force: force }; if (model) body.ai_model = model; ``` Status page retry always sends the model input value: ```1367:1370:src/docsfy/templates/status.html var payload = { repo_url: repoUrl }; if (providerSelect) payload.ai_provider = providerSelect.value; if (modelInput) payload.ai_model = modelInput.value; if (forceCheckbox && forceCheckbox.checked) payload.force = true; ``` > **Warning:** If `ai_model` is blank, server fallback uses `AI_MODEL` from settings. If you switched provider and left model empty, the fallback model may not match that provider. > **Tip:** Keep `AI_PROVIDER` and `AI_MODEL` aligned in `.env`, and run one successful generation per provider/model pair to seed `known_models` suggestions. ### 3) Dynamic model list refresh `known_models` is returned by `/api/status` and refreshed in the dashboard without full reload: ```409:419:src/docsfy/main.py @app.get("/api/status") async def status(request: Request) -> dict[str, Any]: ... known_models = await get_known_models() return {"projects": projects, "known_models": known_models} ``` ```1886:1891:src/docsfy/templates/dashboard.html // Update known models from the API so new models // appear in dropdowns without a full page reload. if (data.known_models) { knownModels = data.known_models; rebuildModelDropdownOptions(); } ``` ## Cursor-Specific Behavior For `cursor`, `docsfy` always adds `--trust` when checking availability and running generation calls. ```732:735:src/docsfy/main.py cli_flags = ["--trust"] if ai_provider == "cursor" else None available, msg = await check_ai_cli_available( ai_provider, ai_model, cli_flags=cli_flags ) ``` ```41:49:src/docsfy/generator.py # Build CLI flags based on provider cli_flags = ["--trust"] if ai_provider == "cursor" else None success, output = await call_ai_cli( prompt=prompt, cwd=repo_path, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, cli_flags=cli_flags, ) ``` > **Warning:** `cursor` runs with trust mode enabled by default in this app flow; only generate docs for repositories you trust. ## Secrets Hygiene in Tooling `.env` is ignored by git, and pre-commit includes secret scanners: ```1:4:.gitignore # Environment files with secrets .env .dev/.env *.env.local ``` ```38:52:.pre-commit-config.yaml - repo: https://github.com/Yelp/detect-secrets rev: v1.5.0 hooks: - id: detect-secrets ... - repo: https://github.com/gitleaks/gitleaks rev: v8.30.0 hooks: - id: gitleaks ``` --- Source: storage-paths.md # Storage Paths and Data Layout docsfy keeps **persistent runtime state** under `DATA_DIR`, with a clear split between: - SQLite metadata (`docsfy.db`) - per-variant filesystem artifacts (`projects/...`) - generated static documentation site output (`site/...`) ## DATA_DIR Usage `DATA_DIR` is a first-class setting, defaulting to `/data`, and is wired into startup DB initialization. ```python # src/docsfy/config.py class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```python # src/docsfy/main.py @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) _generating.clear() await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` ```python # src/docsfy/storage.py DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" async def init_db(data_dir: str = "") -> None: global DB_PATH, DATA_DIR, PROJECTS_DIR if data_dir: DB_PATH = Path(data_dir) / "docsfy.db" DATA_DIR = Path(data_dir) PROJECTS_DIR = DATA_DIR / "projects" DB_PATH.parent.mkdir(parents=True, exist_ok=True) PROJECTS_DIR.mkdir(parents=True, exist_ok=True) ``` > **Note:** `.env.example` does not currently include `DATA_DIR`, but the app supports it via `Settings.data_dir` and `os.getenv("DATA_DIR", "/data")`. ## SQLite DB Location and Contents SQLite DB path: - `/docsfy.db` The DB is initialized in `init_db()` and includes project metadata plus auth/session data. ```python # src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) """) ``` Additional tables created in the same function: - `users` - `project_access` - `sessions` `projects` uses a 4-part key `(name, ai_provider, ai_model, owner)`, which mirrors the on-disk variant path layout. ## Project Filesystem Layout Project artifacts are stored under `/projects/` and partitioned by owner, repo, provider, and model. ```python # src/docsfy/storage.py def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: if not ai_provider or not ai_model: msg = "ai_provider and ai_model are required for project directory paths" raise ValueError(msg) # Sanitize path segments to prevent traversal for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]: if ( "/" in segment or "\\" in segment or ".." in segment or segment.startswith(".") ): msg = f"Invalid {segment_name}: '{segment}'" raise ValueError(msg) safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model def get_project_site_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: return get_project_dir(name, ai_provider, ai_model, owner) / "site" def get_project_cache_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages" ``` Expected tree for one variant: ```text / docsfy.db projects/ / / / / plan.json cache/ pages/ .md site/ .nojekyll index.html .html .md search-index.json llms.txt llms-full.txt assets/ (copied files from src/docsfy/static/) ``` Owner fallback behavior is tested: ```python # tests/test_storage.py path = get_project_dir("my-repo", "claude", "opus", "") assert "_default" in str(path) ``` ## Project Cache Paths Cache files are markdown pages stored at: - `/projects/////cache/pages/.md` Write/read behavior: ```python # src/docsfy/generator.py cache_file = cache_dir / f"{slug}.md" if use_cache and cache_file.exists(): logger.debug(f"[{_label}] Using cached page: {slug}") return cache_file.read_text(encoding="utf-8") ... cache_dir.mkdir(parents=True, exist_ok=True) cache_file.write_text(output, encoding="utf-8") ``` Invalidation behavior in generation flow: ```python # src/docsfy/main.py if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") ``` ```python # src/docsfy/main.py cache_file = cache_dir / f"{slug}.md" ... if cache_file.exists(): cache_file.unlink() ``` - `force=true` removes the entire variant cache. - incremental regeneration removes only selected cached pages. ## Generated Site Directories Final rendered docs are written to each variant’s `site/` directory, while `plan.json` is written in the variant root. ```python # src/docsfy/main.py site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) project_dir = get_project_dir(project_name, ai_provider, ai_model, owner) (project_dir / "plan.json").write_text(json.dumps(plan, indent=2), encoding="utf-8") ``` `render_site()` fully rebuilds the output directory and writes the final artifact set: ```python # src/docsfy/renderer.py def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None: if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) assets_dir = output_dir / "assets" assets_dir.mkdir(exist_ok=True) # Prevent GitHub Pages from running Jekyll (output_dir / ".nojekyll").touch() ... (output_dir / "index.html").write_text(index_html, encoding="utf-8") ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") ... (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) ... (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` > **Warning:** `render_site()` deletes the previous `site/` directory before writing new output. Treat `site/` as generated output only. ## Container and Runtime Path Mapping Containerized runs are explicitly wired to `/data` for persistence: ```yaml # docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` ```dockerfile # Dockerfile RUN useradd --create-home --shell /bin/bash -g 0 appuser \ && mkdir -p /data \ && chown appuser:0 /data \ && chmod -R g+w /data ``` Generated data is intentionally not tracked in git: ```gitignore # .gitignore # Data data/ .dev/data/ ``` > **Tip:** In Docker deployments, back up the host-side `./data` directory to preserve both `docsfy.db` and generated docs artifacts. ## Ephemeral (Non-persistent) Paths Not all file activity is under `DATA_DIR`: - Remote repo cloning uses a temporary directory. - Download archives are created as temporary `.tar.gz` files and removed after streaming. ```python # src/docsfy/main.py with tempfile.TemporaryDirectory() as tmp_dir: repo_dir, commit_sha = await asyncio.to_thread( clone_repo, repo_url, Path(tmp_dir) ) ``` ```python # src/docsfy/main.py tmp = tempfile.NamedTemporaryFile(suffix=".tar.gz", delete=False) tar_path = Path(tmp.name) tmp.close() ... finally: tar_path.unlink(missing_ok=True) ``` > **Note:** This repository currently has no `.github/workflows/` or `.gitlab-ci.yml`; storage behavior is defined by runtime code and container configuration. --- Source: session-cookie-settings.md # Session and Cookie Settings docsfy supports two authentication paths: Bearer tokens for API clients and cookies for browser sessions. `src/docsfy/main.py` ```python # 1. Check Authorization header (API clients) auth_header = request.headers.get("authorization", "") if auth_header.startswith("Bearer "): token = auth_header[7:] if token == settings.admin_key: is_admin = True username = "admin" else: user = await get_user_by_key(token) # 2. Check session cookie (browser) -- opaque session token if not user and not is_admin: session_token = request.cookies.get("docsfy_session") if session_token: session = await get_session(session_token) ``` ## Secure Cookie Defaults `SECURE_COOKIES` is enabled by default, and session cookies are set as `HttpOnly` with `SameSite=Strict`. `src/docsfy/config.py` ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` `src/docsfy/main.py` ```python response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` `src/docsfy/main.py` ```python response.delete_cookie( "docsfy_session", httponly=True, samesite="strict", secure=settings.secure_cookies, ) ``` > **Warning:** With default settings, browsers do not send `Secure` cookies over plain HTTP. If you run docsfy on `http://localhost` and keep `SECURE_COOKIES=true`, login may appear to work but follow-up requests can redirect back to `/login`. ## SameSite Behavior docsfy explicitly uses `SameSite=Strict` for session cookies, which blocks cookie sending in cross-site requests and helps reduce CSRF risk. `src/docsfy/main.py` ```python response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` `tests/test_auth.py` ```python async def test_login_cookie_has_samesite_strict( unauthed_client: AsyncClient, ) -> None: """Login cookie should have SameSite=strict.""" response = await unauthed_client.post( "/login", data={"username": "admin", "api_key": TEST_ADMIN_KEY}, follow_redirects=False, ) set_cookie = response.headers.get("set-cookie", "") assert "samesite=strict" in set_cookie.lower() ``` > **Tip:** For cross-origin integrations, use `Authorization: Bearer ` rather than relying on browser cookies. ## TTL and Session Expiration Session lifetime is 8 hours, enforced both in the cookie (`max_age`) and in server-side session lookup (`expires_at > now`). `src/docsfy/storage.py` ```python SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 ``` `src/docsfy/storage.py` ```python async def create_session( username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS ) -> str: """Create an opaque session token.""" token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours) expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S") async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", (token_hash, username, 1 if is_admin else 0, expires_str), ) ``` `src/docsfy/storage.py` ```python async def get_session(token: str) -> dict[str, str | int | None] | None: """Look up a session. Returns None if expired or not found.""" token_hash = _hash_session_token(token) async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row cursor = await db.execute( "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')", (token_hash,), ) ``` `src/docsfy/main.py` ```python await cleanup_expired_sessions() ``` `src/docsfy/storage.py` ```python async def cleanup_expired_sessions() -> None: """Remove expired sessions. NOTE: This is called during application startup (lifespan) only. """ async with aiosqlite.connect(DB_PATH) as db: await db.execute("DELETE FROM sessions WHERE expires_at <= datetime('now')") await db.commit() ``` > **Note:** Expired sessions are rejected even before cleanup runs, because `get_session()` filters by `expires_at` on every lookup. ## Opaque Session Tokens (Not API Keys) Browser cookies carry a random session token, not the raw user/admin API key. `tests/test_auth.py` ```python async def test_session_cookie_is_opaque_token(unauthed_client: AsyncClient) -> None: """The session cookie should NOT contain the raw API key.""" response = await unauthed_client.post( "/login", data={"username": "admin", "api_key": TEST_ADMIN_KEY}, follow_redirects=False, ) assert "docsfy_session" in response.cookies cookie_value = response.cookies["docsfy_session"] assert cookie_value != TEST_ADMIN_KEY assert len(cookie_value) > 20 ``` ## Local HTTP Development Adjustments For local non-TLS development, explicitly disable secure cookies in `.env`. `.env.example` ```bash # Set to false for local HTTP development # SECURE_COOKIES=false ``` `docker-compose.yaml` ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env ``` Set this in your local `.env`: ```bash SECURE_COOKIES=false ``` Then restart the app/container so `Settings` reloads the value. > **Warning:** Do not use `SECURE_COOKIES=false` outside local HTTP development. ## Test/Automation Coverage for Cookie Rules Cookie/session behavior is covered in unit tests and executed via `tox`. `tox.toml` ```toml envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` This includes tests for: - `SameSite=strict` cookie headers - opaque session cookie values - session invalidation on logout - expired-session cleanup behavior --- Source: model-discovery-and-defaults.md # Model Discovery and Defaults docsfy builds its model picker suggestions from **real, successful generations** instead of a hardcoded model list. That keeps suggestions aligned with what has actually worked in your deployment. ## How a model becomes “known” A model is considered known only when a project variant is stored with: - non-empty `ai_provider` - non-empty `ai_model` - `status = 'ready'` ```python async def get_known_models() -> dict[str, list[str]]: """Get distinct ai_model values per ai_provider from completed projects.""" async with aiosqlite.connect(DB_PATH) as db: cursor = await db.execute( "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model" ) rows = await cursor.fetchall() models: dict[str, list[str]] = {} for provider, model in rows: if provider not in models: models[provider] = [] if model not in models[provider]: models[provider].append(model) return models ``` > **Warning:** `get_known_models()` is instance-wide. It does not filter by owner, so the suggestion catalog is shared across users in the same docsfy instance. ## When discovery happens in the generation lifecycle Discovery is not a separate job. It happens naturally because variants are marked `ready`, then picked up by `get_known_models()`: ```python if old_sha == commit_sha: await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ``` ```python await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), ) ``` > **Tip:** If you want picker suggestions pre-populated for a provider/model pair, run one successful generation with that pair first. ## Default provider/model behavior Defaults come from settings (`.env` or environment), with built-in fallbacks: ```python class Settings(BaseSettings): ... ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" ai_cli_timeout: int = Field(default=60, gt=0) ``` ```bash # .env.example AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` If a generation request omits provider/model, API defaults are applied: ```python settings = get_settings() ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model ``` ## How dashboard pickers are populated The dashboard route injects both defaults and discovered models: ```python known_models = await get_known_models() ... html = template.render( grouped_projects=grouped, projects=projects, default_provider=settings.ai_provider, default_model=settings.ai_model, known_models=known_models, role=request.state.role, username=request.state.username, ) ``` The template uses those values for: - the top-level Generate form - each variant’s Regenerate controls ```html
{% for provider, models in known_models.items() %} {% for model in models %}
{{ model }} {{ provider }}
{% endfor %} {% endfor %}
``` ## Picker UX rules in the browser The client receives `known_models` as JSON and enforces provider-aware filtering: ```javascript var knownModels = {{ known_models | tojson }}; providerSelect.addEventListener('change', function() { if (_restoring) return; var newProvider = this.value; var modelsForProvider = knownModels[newProvider] || []; // If current model is not valid for the new provider, auto-fill if (modelInput) { var currentModel = modelInput.value; if (modelsForProvider.length > 0 && modelsForProvider.indexOf(currentModel) === -1) { modelInput.value = modelsForProvider[0]; saveFormState(); } else if (modelsForProvider.length === 0) { modelInput.value = ''; modelInput.placeholder = 'Enter model name'; saveFormState(); } } filterModelOptions(modelDropdown, modelInput ? modelInput.value : '', newProvider); }); ``` The same provider-switch/autofill logic is also applied to per-variant regenerate controls. > **Note:** Picker suggestions are assistive, not a strict backend whitelist. Users can type a model manually; backend validation only requires a valid provider and non-empty model string. ## Live model discovery updates in running dashboards `/api/status` includes `known_models` on every poll response: ```python @app.get("/api/status") async def status(request: Request) -> dict[str, Any]: ... known_models = await get_known_models() return {"projects": projects, "known_models": known_models} ``` The dashboard polling loop updates model dropdowns without full refresh: ```javascript if (data.known_models) { knownModels = data.known_models; rebuildModelDropdownOptions(); } ``` This means newly successful variants can teach new models to active dashboard sessions. ## Validation and quality signals (tests + CI entry points) Model discovery and defaults are covered by tests: ```python # tests/test_storage.py models = await get_known_models() assert "claude" in models assert "opus-4-6" in models["claude"] assert "sonnet-4-6" in models["claude"] assert "gemini" in models assert "gemini-2.5-pro" in models["gemini"] ``` ```python # tests/test_config.py assert settings.ai_provider == "claude" assert settings.ai_model == "claude-opus-4-6[1m]" assert settings.ai_cli_timeout == 60 ``` Pipeline entry points in this repo are defined via `tox` and pre-commit: ```toml # tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ```yaml # .pre-commit-config.yaml (excerpt) repos: - repo: https://github.com/astral-sh/ruff-pre-commit hooks: - id: ruff - id: ruff-format ``` > **Note:** No `.github/workflows` pipeline is committed in this repository; CI systems should invoke `tox` and pre-commit hooks directly. --- Source: dashboard-workflow.md # Dashboard Workflow The dashboard is a server-rendered page at `/` (`src/docsfy/templates/dashboard.html`) with live updates from `/api/status`. It is built around **project variants** (`name + ai_provider + ai_model + owner`) and presents them grouped by repository name. ## How project listing works On page load, the backend resolves visible projects based on the authenticated user role, then groups variants by repository name. From `src/docsfy/main.py`: ```python @app.get("/", response_class=HTMLResponse) async def dashboard(request: Request) -> HTMLResponse: settings = get_settings() username = request.state.username is_admin = request.state.is_admin if is_admin: projects = await list_projects() # admin sees all else: accessible = await get_user_accessible_projects(username) projects = await list_projects(owner=username, accessible=accessible) known_models = await get_known_models() # Group by repo name grouped: dict[str, list[dict[str, Any]]] = {} for p in projects: name = str(p["name"]) if name not in grouped: grouped[name] = [] grouped[name].append(p) template = _jinja_env.get_template("dashboard.html") html = template.render( grouped_projects=grouped, projects=projects, # keep for backward compat default_provider=settings.ai_provider, default_model=settings.ai_model, known_models=known_models, role=request.state.role, username=request.state.username, ) return HTMLResponse(content=html) ``` From `src/docsfy/storage.py` (project visibility and ordering): ```python async def list_projects( owner: str | None = None, accessible: list[tuple[str, str]] | None = None, ) -> list[dict[str, str | int | None]]: async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row if owner is not None and accessible and len(accessible) > 0: # Build OR conditions for each (name, owner) pair conditions = ["(owner = ?)"] params: list[str] = [owner] for proj_name, proj_owner in accessible: conditions.append("(name = ? AND owner = ?)") params.extend([proj_name, proj_owner]) query = f"SELECT * FROM projects WHERE {' OR '.join(conditions)} ORDER BY updated_at DESC" cursor = await db.execute(query, params) elif owner is not None: cursor = await db.execute( "SELECT * FROM projects WHERE owner = ? ORDER BY updated_at DESC", (owner,), ) else: cursor = await db.execute("SELECT * FROM projects ORDER BY updated_at DESC") rows = await cursor.fetchall() return [dict(row) for row in rows] ``` ## Variant cards and status-driven actions Each project group contains one or more variant cards. Actions change based on variant status and role. From `src/docsfy/templates/dashboard.html`: ```html {% for repo_name, variants in grouped_projects.items() %}
{{ repo_name }} {{ variants|length }} variant{{ 's' if variants|length > 1 else '' }}
{% for variant in variants %}
{% if variant.status == 'ready' %}
View Docs Download {% if role != 'viewer' %} {% endif %}
{% if role != 'viewer' %} {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }} {% endif %} {% elif variant.status == 'generating' %}
Generating... View progress → {% if role != 'viewer' %} {% endif %}
{% elif variant.status == 'error' or variant.status == 'aborted' %}
{{ variant.error_message }}
{% if role != 'viewer' %} {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }} {% endif %}
{% endif %}
{% endfor %}
{% endfor %} ``` ## Filtering and pagination Filtering and pagination are done in the browser over already-rendered project groups. From `src/docsfy/templates/dashboard.html`: ```javascript var currentPage = 1; var perPage = 10; function getVisibleGroups() { /* Get project groups that match the search filter (not hidden by search) */ return Array.from(document.querySelectorAll('.project-group')).filter(function(group) { return !group.classList.contains('search-hidden'); }); } function applyPagination() { var groups = getVisibleGroups(); var totalPages = Math.max(1, Math.ceil(groups.length / perPage)); if (currentPage > totalPages) currentPage = totalPages; var start = (currentPage - 1) * perPage; var end = start + perPage; groups.forEach(function(group, i) { group.style.display = (i >= start && i < end) ? '' : 'none'; }); var pageInfo = document.getElementById('page-info'); var prevBtn = document.getElementById('prev-page'); var nextBtn = document.getElementById('next-page'); if (pageInfo) pageInfo.textContent = 'Page ' + currentPage + ' of ' + totalPages; if (prevBtn) prevBtn.disabled = currentPage <= 1; if (nextBtn) nextBtn.disabled = currentPage >= totalPages; } ``` ```javascript var searchInput = document.getElementById('search-filter'); if (searchInput) { searchInput.addEventListener('input', function() { var query = this.value.toLowerCase().trim(); var groups = document.querySelectorAll('.project-group'); groups.forEach(function(group) { var name = group.getAttribute('data-repo').toLowerCase(); if (!query || name.indexOf(query) !== -1) { group.classList.remove('search-hidden'); } else { group.classList.add('search-hidden'); group.style.display = 'none'; } }); currentPage = 1; applyPagination(); }); } ``` > **Note:** Search matches only `data-repo` (repository name), not provider/model text, and pagination applies to visible project groups after filtering. ## Role-based UI and server enforcement The dashboard has three roles: `admin`, `user`, and `viewer`. - `admin`: sees all projects, admin link, owner badges, and write controls. - `user`: sees owned + granted projects and write controls (no admin panel link). - `viewer`: read-only dashboard (no generate/regenerate/delete/abort controls). From `src/docsfy/templates/dashboard.html`: ```html {% if role == 'admin' %} Admin {% endif %} {% if role != 'viewer' %}

Generate Documentation

...
{% endif %} {% if role == 'admin' and variant.owner %} {{ variant.owner }} {% endif %} ``` From `src/docsfy/main.py`: ```python def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ```python @app.post("/api/generate", status_code=202) async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]: _require_write_access(request) # Fix 9: Local repo path access requires admin privileges if gen_request.repo_path and not request.state.is_admin: raise HTTPException( status_code=403, detail="Local repo path access requires admin privileges", ) ``` From `tests/test_auth.py`: ```python async def test_viewer_can_view_dashboard(_init_db: None) -> None: ... response = await ac.get("/") assert response.status_code == 200 # Viewer should NOT see the generate form assert "Generate Documentation" not in response.text ``` ```python async def test_viewer_cannot_generate(_init_db: None) -> None: ... response = await ac.post( "/api/generate", json={ "repo_url": "https://github.com/org/repo", "project_name": "test-proj", }, ) assert response.status_code == 403 assert "Write access required" in response.json()["detail"] ``` > **Warning:** Write permissions are enforced server-side, not only hidden in the UI. Direct API calls from viewer accounts are rejected with `403`. ## Generation form behavior The generate form is shown to non-viewers and includes: - `Repository URL` (`required`, URL input) - `Provider` (`claude`, `gemini`, `cursor`) - `Model` (free text + provider-filtered combobox suggestions) - `Force` checkbox From `src/docsfy/templates/dashboard.html`: ```html
``` ### Form state persistence The form persists state in `sessionStorage` and restores it after reloads (useful because status changes may trigger auto-reloads). ```javascript function saveFormState() { var repoInput = document.getElementById('gen-repo-url'); var providerSelect = document.getElementById('gen-provider'); var modelInput = document.getElementById('gen-model'); var forceCheck = document.getElementById('gen-force'); if (repoInput) sessionStorage.setItem('docsfy-repo', repoInput.value); if (providerSelect) sessionStorage.setItem('docsfy-provider', providerSelect.value); if (modelInput) sessionStorage.setItem('docsfy-model', modelInput.value); if (forceCheck) sessionStorage.setItem('docsfy-force', forceCheck.checked ? '1' : '0'); } ``` ### Submit behavior On submit, the UI disables the button, sends `POST /api/generate`, shows a toast with a status link, and reloads. ```javascript form.addEventListener('submit', function(e) { e.preventDefault(); var repoUrl = document.getElementById('gen-repo-url').value.trim(); var provider = document.getElementById('gen-provider').value; var model = document.getElementById('gen-model').value.trim(); var force = document.getElementById('gen-force').checked; var body = { repo_url: repoUrl, ai_provider: provider, force: force }; if (model) body.ai_model = model; fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, credentials: 'same-origin', redirect: 'manual', body: JSON.stringify(body) }) ``` ### Backend request validation From `src/docsfy/models.py`: ```python @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self ``` ```python @property def project_name(self) -> str: if self.repo_url: name = self.repo_url.rstrip("/").split("/")[-1] if name.endswith(".git"): name = name[:-4] return name if self.repo_path: return Path(self.repo_path).resolve().name return "unknown" ``` ## Generation lifecycle, duplicate protection, and `force` ### API-side orchestration From `src/docsfy/main.py`: ```python settings = get_settings() ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model project_name = gen_request.project_name owner = request.state.username if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.") # Fix 6: Use lock to prevent race condition between check and add gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) await save_project( name=project_name, repo_url=gen_request.repo_url or gen_request.repo_path or "", status="generating", ai_provider=ai_provider, ai_model=ai_model, owner=owner, ) task = asyncio.create_task( _run_generation( repo_url=gen_request.repo_url, repo_path=gen_request.repo_path, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=gen_request.ai_cli_timeout or settings.ai_cli_timeout, force=gen_request.force, owner=owner, ) ) _generating[gen_key] = task return {"project": project_name, "status": "generating"} ``` ### `force` and incremental behavior From `src/docsfy/main.py` (`_generate_from_path`): ```python if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") # Reset page count so API shows 0 during regeneration await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, page_count=0, ) else: existing = await get_project( project_name, ai_provider=ai_provider, ai_model=ai_model, owner=owner ) if existing and existing.get("last_generated"): old_sha = ( str(existing["last_commit_sha"]) if existing.get("last_commit_sha") else None ) if old_sha == commit_sha: logger.info( f"[{project_name}] Project is up to date at {commit_sha[:8]}" ) await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ``` ```python if old_sha and old_sha != commit_sha and not force and existing: changed_files = get_changed_files(repo_dir, old_sha, commit_sha) ... pages_to_regen = await run_incremental_planner( repo_dir, project_name, ai_provider, ai_model, changed_files, existing_plan, ai_cli_timeout, ) if pages_to_regen != ["all"]: # Delete only the cached pages that need regeneration for slug in pages_to_regen: ... cache_file = cache_dir / f"{slug}.md" ... if cache_file.exists(): cache_file.unlink() use_cache = True ``` > **Tip:** Keep `Force` unchecked for normal runs to allow up-to-date short-circuiting and incremental regeneration from cache; use `Force` when you need a full clean rebuild. ## Polling behavior and live refresh The dashboard uses two polling loops: - `10s` status polling for variant state changes/new cards. - `5s` progress polling while any variant is generating. From `src/docsfy/templates/dashboard.html`: ```javascript var statusPollInterval = null; // Slow poll for status changes (10s) var progressPollInterval = null; // Fast poll for progress updates (5s) function startStatusPolling() { if (isStatusPolling) return; isStatusPolling = true; statusPollInterval = setInterval(pollStatusChanges, 10000); } function startProgressPolling() { if (isProgressPolling) return; isProgressPolling = true; progressPollInterval = setInterval(pollProgressUpdates, 5000); } ``` The same `/api/status` response also refreshes known model suggestions dynamically: ```javascript if (data.known_models) { knownModels = data.known_models; rebuildModelDropdownOptions(); } ``` ## Configuration relevant to dashboard workflow ### Default generation settings From `.env.example`: ```env ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 # SECURE_COOKIES=false ``` From `src/docsfy/config.py`: ```python class Settings(BaseSettings): ... admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) ... secure_cookies: bool = True # Set to False for local HTTP dev ``` ### Persistence/deployment From `docker-compose.yaml`: ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` `./data` persists database state and generated project artifacts that drive dashboard listings/status across restarts. ## Verification references - `tests/test_dashboard.py`: dashboard rendering, empty state, and project visibility. - `tests/test_auth.py`: role behavior (admin/user/viewer), ownership scoping, access grants, and server-side permission checks. - `tests/test_main.py`: `/api/generate`, duplicate generation conflicts (`409`), and endpoint behavior. - `test-plans/e2e-ui-test-plan.md`: manual/E2E scenarios for search, pagination, regenerate, abort, and role-specific UI. > **Note:** No `.github/workflows` pipeline files are present in this repository; dashboard workflow correctness is primarily represented by the test suite and E2E plan. --- Source: managing-variants.md # Managing Variants A **variant** in docsfy is a generated documentation build for a specific combination of: - project name - AI provider - AI model - owner (user scope) Variants are first-class objects across API, storage, UI, and docs serving routes. ## Variant identity and storage model The `projects` table keys variants by `(name, ai_provider, ai_model, owner)`: ```python await db.execute(""" CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) """) ``` Variant artifacts are also stored in owner-scoped filesystem paths: ```python def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: if not ai_provider or not ai_model: msg = "ai_provider and ai_model are required for project directory paths" raise ValueError(msg) # Sanitize path segments to prevent traversal for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]: if ( "/" in segment or "\\" in segment or ".." in segment or segment.startswith(".") ): msg = f"Invalid {segment_name}: '{segment}'" raise ValueError(msg) safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` > **Note:** Owner scoping means two users can have the same `name/provider/model` variant without clobbering each other. ## Configure default provider/model docsfy defaults come from environment-backed settings: ```yaml # .env.example AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` ```python class Settings(BaseSettings): ... ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" ai_cli_timeout: int = Field(default=60, gt=0) data_dir: str = "/data" ``` At runtime, request values override defaults: ```python settings = get_settings() ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model project_name = gen_request.project_name owner = request.state.username ``` If you run with Docker Compose, generated variants persist under `./data`: ```yaml services: docsfy: ... env_file: .env volumes: - ./data:/data ``` ## Create a variant Creation and regeneration both use `POST /api/generate`. Request schema: ```python class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self ``` Example from tests: ```python response = await client.post( "/api/generate", json={"repo_url": "https://github.com/org/repo.git", "force": True}, ) assert response.status_code == 202 ``` When generation starts, docsfy stores the variant row and starts a background task: ```python gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) await save_project( name=project_name, repo_url=gen_request.repo_url or gen_request.repo_path or "", status="generating", ai_provider=ai_provider, ai_model=ai_model, owner=owner, ) ... ``` > **Warning:** `repo_path` generation is admin-only, and viewers cannot create variants. > > - `Local repo path access requires admin privileges` (403) > - `Write access required.` for viewer role (403) ## Regenerate a variant ### UI flow (dashboard + status page) The dashboard renders per-variant controls with a Force checkbox: ```html ``` Regenerate action sends a new `POST /api/generate` request: ```javascript var body = { repo_url: repoUrl, ai_provider: provider, force: force }; if (model) body.ai_model = model; fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, credentials: 'same-origin', redirect: 'manual', body: JSON.stringify(body) }) ``` ### Non-force regeneration (`force=false`) docsfy tries to avoid unnecessary full rebuilds: - if commit SHA is unchanged, it marks variant `ready` with stage `up_to_date` - if commits differ, it can run incremental planning and selectively invalidate cached pages - page generation uses cache when appropriate ```python if existing and existing.get("last_generated"): old_sha = ( str(existing["last_commit_sha"]) if existing.get("last_commit_sha") else None ) if old_sha == commit_sha: ... await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ... if old_sha and old_sha != commit_sha and not force and existing: changed_files = get_changed_files(repo_dir, old_sha, commit_sha) ... ``` ```python pages = await generate_all_pages( repo_path=repo_dir, plan=plan, cache_dir=cache_dir, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, use_cache=use_cache if use_cache else not force, project_name=project_name, owner=owner, ) ``` ### Force regeneration (`force=true`) Force mode clears the variant page cache and resets page count during regeneration: ```python if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") # Reset page count so API shows 0 during regeneration await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, page_count=0, ) ``` > **Tip:** Use Force when you want a guaranteed clean rebuild (for example after major doc structure/model changes), not just incremental page updates. ## Delete variants safely Variant deletion endpoint: - `DELETE /api/projects/{name}/{provider}/{model}` Safety behavior in backend: 1. Requires write access 2. Blocks deletion if the variant is currently generating (`409`) 3. Resolves the target variant with ownership/access rules 4. Deletes DB record 5. Deletes variant directory from disk ```python for key in _generating: parts = key.split("/", 3) if ( len(parts) == 4 and parts[1] == name and parts[2] == provider and parts[3] == model ): raise HTTPException( status_code=409, detail=f"Cannot delete '{name}/{provider}/{model}' while generation is in progress. Abort first.", ) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model ) project_owner = str(project.get("owner", "")) deleted = await delete_project( name, ai_provider=provider, ai_model=model, owner=project_owner ) ... project_dir = get_project_dir(name, provider, model, project_owner) if project_dir.exists(): shutil.rmtree(project_dir) ``` The dashboard also forces an explicit confirmation: ```javascript var confirmed = await modalConfirm('Delete Variant', 'Are you sure you want to delete "' + variantPath + '"? This will remove the generated documentation for this variant and cannot be undone.', true); if (!confirmed) return; ... fetch('/api/projects/' + encodeURIComponent(name) + '/' + encodeURIComponent(provider) + '/' + encodeURIComponent(model), { method: 'DELETE', credentials: 'same-origin', redirect: 'manual' }) ``` If the deleted variant was the last one for that project/owner pair, access grants are cleaned up: ```python # Clean up project_access if no more variants remain for this name+owner if cursor.rowcount > 0 and owner is not None: remaining = await db.execute( "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?", (name, owner), ) row = await remaining.fetchone() if row and row[0] == 0: await db.execute( "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?", (name, owner), ) ``` > **Warning:** You cannot delete an actively generating variant. Abort it first via `POST /api/projects/{name}/{provider}/{model}/abort`, then delete. ## Variant management endpoints (quick reference) - `POST /api/generate`: create or regenerate a variant (`force` optional) - `GET /api/projects/{name}`: list all variants for a project name - `GET /api/projects/{name}/{provider}/{model}`: get one variant - `POST /api/projects/{name}/{provider}/{model}/abort`: stop active generation for one variant - `DELETE /api/projects/{name}/{provider}/{model}`: safely delete one variant - `GET /docs/{project}/{provider}/{model}/{path:path}`: serve docs for one exact variant ## Behavior verification in tests Variant lifecycle behavior is covered in tests, including force creation, duplicate protection, role restrictions, and delete flow: ```python # tests/test_main.py response = await client.post( "/api/generate", json={ "repo_url": "https://github.com/org/repo.git", "ai_provider": "claude", "ai_model": "opus", }, ) assert response.status_code == 409 ``` ```python # tests/test_auth.py response = await ac.delete("/api/projects/proj-del/claude/opus") assert response.status_code == 403 ``` ```python # tests/test_integration.py response = await client.delete("/api/projects/test-repo/claude/opus") assert response.status_code == 200 ``` Repository-level automated test command configuration: ```toml # tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` --- Source: status-and-progress.md # Status and Progress Monitoring The status page (`/status/{name}/{provider}/{model}`) is the per-variant monitoring view for doc generation. It combines backend state from the `projects` table with client-side polling and UI reconstruction (progress bar + activity log). ## Status Model and Data Source Status values are defined centrally and stored in the `projects` row for each variant. ```python # src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` The status page fetches variant state from the variant API endpoint: ```python # src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}") async def get_variant_details( request: Request, name: str, provider: str, model: str, ) -> dict[str, str | int | None]: name = _validate_project_name(name) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model ) return project ``` Important fields used by the page: - `status`: high-level state (`generating`, `ready`, `error`, `aborted`) - `current_stage`: pipeline stage (`cloning`, `planning`, etc.) - `page_count`: generated/cached page count - `plan_json`: page plan (used to compute total pages) - `error_message`: displayed on `error`/`aborted` - `last_commit_sha`, `last_generated`: metadata updated on completion ## Polling Behavior The status page uses interval polling (not WebSockets), with overlap protection and auth-aware redirect handling. ```javascript // src/docsfy/templates/status.html var POLL_INTERVAL_MS = 3000; function startPolling() { if (pollTimer) return; pollTimer = setInterval(pollProject, POLL_INTERVAL_MS); } var _polling = false; function pollProject() { if (_polling) return; _polling = true; fetch('/api/projects/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL), { credentials: 'same-origin', redirect: 'manual' }) .then(function(res) { if (isAuthRedirect(res)) { handleAuthRedirect(); stopPolling(); return null; } if (!res.ok) throw new Error('Not found'); return res.json(); }) .then(function(proj) { if (!proj) return; updateFromProject(proj); }) .catch(function() { /* Silently fail; retry on next interval */ }) .finally(function() { _polling = false; }); } ``` > **Note:** Polling interval is hardcoded to `3000ms` in `status.html`; there is no environment variable for this. Polling stops when status becomes terminal (`ready`, `error`, `aborted`) or when auth expires. ## Stage Updates (Backend Lifecycle) The backend writes stage transitions via `update_project_status(...)` as generation progresses: ```python # src/docsfy/main.py await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="cloning", ) await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="planning", ) await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="generating_pages", plan_json=json.dumps(plan), ) await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, current_stage="rendering", page_count=len(pages), ) await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), ) ``` Up-to-date shortcut (no regeneration) is represented as `status="ready"` + `current_stage="up_to_date"`. ```python # src/docsfy/main.py if old_sha == commit_sha: await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ``` > **Warning:** Backend can emit `current_stage="incremental_planning"`, but the status page stage order only includes `cloning`, `planning`, `generating_pages`, and `rendering`, so that phase is shown generically. ## Activity Log Semantics The activity log is reconstructed client-side from `status`, `current_stage`, `page_count`, and `plan_json`. It is not a server-side event stream. ```javascript // src/docsfy/templates/status.html var ICON_MAP = { done: 'icon-check', active: 'icon-spinner-sm', error: 'icon-x-circle', pending: 'icon-circle' }; var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering']; ``` Behavior: - On initial load and stage transitions: `buildInitialLog()` clears and rebuilds entries. - On page count increase: the last active "Generating..." entry is converted to "Generated...", then next active page entry is appended. - On completion (`ready`): log finalizes with: - `Rendered documentation site` - `Documentation ready!` - On `up_to_date`: log is replaced with a single entry: - `Repository unchanged, docs already up to date` - On `error`/`aborted`: active entry is marked as error and terminal failure entry is appended. ## Progress Bar Semantics The status page uses `page_count` as numerator and `total_pages_from_plan` as denominator when available. ```javascript // src/docsfy/templates/status.html if (totalPagesFromPlan > 0) { var pct = Math.min(Math.round((newPageCount / totalPagesFromPlan) * 100), 100); progressBar.style.width = pct + '%'; progressCount.textContent = newPageCount + ' / ' + totalPagesFromPlan + ' pages'; } else { progressCount.textContent = newPageCount + ' pages'; } ``` `page_count` is updated during page generation from cache file count: ```python # src/docsfy/generator.py existing_pages = len(list(cache_dir.glob("*.md"))) await update_project_status( project_name, ai_provider, ai_model, owner=owner, status="generating", page_count=existing_pages, ) ``` And forced regenerations reset count to zero: ```python # src/docsfy/main.py if force: await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, page_count=0, ) ``` Progress completion behavior: - On `ready`, UI forces progress bar to `100%` and label to `Complete`. - If `plan_json` is unavailable, count shows `N pages` only; denominator and percentage are unknown. > **Warning:** `page_count` reflects files present in the page cache, not strictly "new pages generated in this exact run." Incremental/cached runs can appear to jump. > **Tip:** Use `force: true` when you want a fresh 0→N progress curve for reruns. ## Abort and Failure Monitoring Abort action from the status page calls the variant-specific abort endpoint: - `POST /api/projects/{name}/{provider}/{model}/abort` On successful abort, backend writes: - `status="aborted"` - `error_message="Generation aborted by user"` - `current_stage=None` The status page then: - stops polling - switches log status to `Aborted` - shows regenerate controls inline (provider/model/force + Regenerate) If the server restarts mid-generation, startup logic converts orphaned `generating` projects to `error`: ```python # src/docsfy/storage.py cursor = await db.execute( "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'" ) ``` > **Note:** This restart recovery is why a variant can move to `error` without a user-triggered abort or explicit generation exception in the live UI. --- Source: abort-and-retry.md # Abort and Retry Flows docsfy handles abort and retry/regeneration as explicit state transitions for each variant (`project/provider/model`) and owner. ```python # src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` A generation task is keyed by owner + variant so duplicate in-flight runs are blocked: ```python # src/docsfy/main.py gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", ) ``` ## Abort flow for active runs ### Endpoints | Endpoint | Purpose | |---|---| | `POST /api/projects/{name}/{provider}/{model}/abort` | Abort a specific variant (recommended) | | `POST /api/projects/{name}/abort` | Legacy/backward-compatible abort by project name | > **Note:** The name-only abort endpoint is explicitly marked backward-compatible and aborts the first matching active run. ```python # src/docsfy/main.py @app.post("/api/projects/{name}/abort") async def abort_generation(request: Request, name: str) -> dict[str, str]: """Abort generation for any variant of the given project name. Kept for backward compatibility. Finds the first active generation matching the project name. """ ``` ### What happens when abort is requested 1. Write access is required (`admin` or `user` role). 2. Ownership/access is verified. 3. The task is cancelled with `task.cancel()`. 4. Server waits up to 5 seconds for cancellation acknowledgment. 5. Variant status is persisted as `aborted` with an error message. ```python # src/docsfy/main.py task.cancel() try: await asyncio.wait_for(task, timeout=5.0) except asyncio.CancelledError: pass except asyncio.TimeoutError as exc: raise HTTPException( status_code=409, detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.", ) from exc await update_project_status( name, provider, model, status="aborted", owner=key_owner, error_message="Generation aborted by user", current_stage=None, ) ``` > **Warning:** Abort can return `409` (`Abort still in progress...`) if cancellation has not completed within 5 seconds. Retrying abort shortly is expected behavior. ### UI behavior during abort On the status page and dashboard, running variants show an Abort button; the action uses a confirmation modal and calls the variant-specific abort API. ```javascript // src/docsfy/templates/status.html fetch('/api/projects/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL) + '/abort', { method: 'POST', credentials: 'same-origin', redirect: 'manual' }) ``` ```javascript // src/docsfy/templates/_modal.html function modalConfirm(title, body, danger) { return new Promise(function(resolve) { showModal({ title: title, body: body, danger: danger, confirmText: danger ? 'Delete' : 'Confirm', cancelText: 'Cancel', onConfirm: function() { resolve(true); }, onCancel: function() { resolve(false); }, }); }); } ``` ## Retry/regeneration flow after error or abort There is no dedicated `/retry` backend route. Retry/regeneration is implemented as a new `POST /api/generate` request, usually pre-filled from the failed/aborted variant. ```javascript // src/docsfy/templates/status.html var payload = { repo_url: repoUrl }; if (providerSelect) payload.ai_provider = providerSelect.value; if (modelInput) payload.ai_model = modelInput.value; if (forceCheckbox && forceCheckbox.checked) payload.force = true; fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, credentials: 'same-origin', redirect: 'manual', body: JSON.stringify(payload) }) ``` Retry controls are only shown for `error` or `aborted` states: ```html {% if project.status == 'error' or project.status == 'aborted' %}
{% endif %} ``` > **Note:** If provider/model is changed during retry from the status page, the UI redirects to the new variant status URL. ## Force vs non-force regeneration `force=true` clears cached pages and resets page count before regeneration: ```python # src/docsfy/main.py if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") await update_project_status( project_name, ai_provider, ai_model, status="generating", owner=owner, page_count=0, ) ``` Without force, docsfy can short-circuit to up-to-date if commit SHA is unchanged: ```python # src/docsfy/main.py if old_sha == commit_sha: await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage="up_to_date", ) return ``` Status UI explicitly surfaces this case: ```html {% if project.current_stage == 'up_to_date' %}Documentation is already up to date — no changes since last generation.{% else %}Documentation generated successfully!{% endif %} ``` > **Tip:** Use `Force` when you need a full refresh and do not want reuse of existing cached pages. ## Incremental regeneration behavior If commit changed and previous plan exists, docsfy can ask the incremental planner which pages to regenerate. ```python # src/docsfy/generator.py if not success: logger.warning(f"[{project_name}] Incremental planner failed, regenerating all") return ["all"] result = parse_json_list_response(output) if result is None or not isinstance(result, list): return ["all"] ... if not result: return ["all"] ``` `current_stage` values used through generation include: - `cloning` - `planning` - `incremental_planning` (when applicable) - `generating_pages` - `rendering` - `up_to_date` (ready without rebuild) ```javascript // src/docsfy/templates/status.html var STAGES = ['cloning', 'planning', 'generating_pages', 'rendering']; ``` ## Failure recovery and post-retry path On startup, orphaned `generating` records are moved to `error`, which then enables regeneration controls. ```python # src/docsfy/storage.py cursor = await db.execute( "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'" ) ``` Cancellation and hard failures during background generation are also persisted: ```python # src/docsfy/main.py except asyncio.CancelledError: await update_project_status(... status="aborted", error_message="Generation was cancelled", current_stage=None) raise except Exception as exc: await update_project_status(... status="error", error_message=str(exc)) ``` ## Access control for abort/retry Abort and regenerate both require write access, and abort additionally enforces ownership/grant checks. ```python # src/docsfy/main.py def _require_write_access(request: Request) -> None: if request.state.role not in ("admin", "user"): raise HTTPException(status_code=403, detail="Write access required.") ``` ```python # src/docsfy/main.py async def _check_ownership(...): if request.state.is_admin: return ... access = await get_project_access(project_name, project_owner=project_owner) if request.state.username in access: return raise HTTPException(status_code=404, detail="Not found") ``` Test coverage confirms viewer restriction: ```python # tests/test_auth.py response = await ac.post("/api/generate", json={...}) assert response.status_code == 403 assert "Write access required" in response.json()["detail"] ``` ## Relevant configuration and automation `AI_CLI_TIMEOUT` directly impacts failure timing (and therefore how often you hit retry/regeneration paths): ```bash # .env.example AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` Automated tests are configured via `tox`: ```toml # tox.toml envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` Abort/retry expectations are also documented in end-to-end UI checks: ```markdown # test-plans/e2e-ui-test-plan.md - The status changes to `aborted` - The error message shows "Generation aborted by user" - The "Abort" button is replaced by regenerate controls (provider select, model input, force checkbox, and "Regenerate" button) ``` > **Warning:** Retry UI currently submits `repo_url` payloads. `GenerateRequest` accepts either `repo_url` or `repo_path` (not both), and `repo_url` is validated as a Git URL pattern. For local-path workflows, start a new generation with `repo_path` (admin-only) rather than relying on URL-based retry payloads. --- Source: docs-view-and-download.md # View and Download Generated Docs docsfy exposes four read/download endpoints for generated documentation: | Use case | Route | Resolution logic | |---|---|---| | View a specific variant | `/docs/{project}/{provider}/{model}/{path:path}` | Uses the exact `project/provider/model` variant | | View latest ready variant | `/docs/{project}/{path:path}` | Picks the most recently generated **ready** variant | | Download a specific variant | `/api/projects/{name}/{provider}/{model}/download` | Streams a `.tar.gz` for the exact variant | | Download latest ready variant | `/api/projects/{name}/download` | Streams a `.tar.gz` for the latest ready variant | > **Note:** If `path` is empty or `/`, docsfy serves `index.html`. ## Variant-specific docs route Use this when you want deterministic docs for one provider/model pair. ```1379:1403:src/docsfy/main.py @app.get("/docs/{project}/{provider}/{model}/{path:path}") async def serve_variant_docs( request: Request, project: str, provider: str, model: str, path: str = "index.html", ) -> FileResponse: if not path or path == "/": path = "index.html" project = _validate_project_name(project) proj = await _resolve_project( request, project, ai_provider=provider, ai_model=model ) # ... if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") return FileResponse(file_path) ``` Examples: - `/docs/test-repo/claude/opus/` - `/docs/test-repo/claude/opus/index.html` - `/docs/test-repo/claude/opus/introduction.html` The dashboard uses this route for **View Docs**: ```1481:1485:src/docsfy/templates/dashboard.html {% if variant.status == 'ready' %}
View Docs Download ``` > **Tip:** URL-encode `provider` and `model` path segments in scripts/clients (the UI already does this with `urlencode`). ## Latest-variant docs route Use this when you want “the newest ready docs” without specifying provider/model. ```1406:1420:src/docsfy/main.py @app.get("/docs/{project}/{path:path}") async def serve_docs( request: Request, project: str, path: str = "index.html" ) -> FileResponse: """Serve the most recently generated variant.""" if not path or path == "/": path = "index.html" project = _validate_project_name(project) if request.state.is_admin: latest = await get_latest_variant(project) else: latest = await get_latest_variant(project, owner=request.state.username) if not latest: raise HTTPException(status_code=404, detail="No docs available") ``` “Latest” is defined in storage as `status = 'ready'` ordered by `last_generated DESC`: ```552:566:src/docsfy/storage.py async def get_latest_variant( name: str, owner: str | None = None ) -> dict[str, str | int | None] | None: """Get the most recently generated ready variant for a repo.""" # ... cursor = await db.execute( "SELECT * FROM projects WHERE name = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1", (name,), ) ``` > **Warning:** For non-admin users, latest routes are owner-scoped (`owner=request.state.username`). If a project is shared with you by access grant, use the variant-specific route instead. ## Download `.tar.gz` archives ### Download a specific variant ```1074:1112:src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}/download") async def download_variant( request: Request, name: str, provider: str, model: str, ) -> StreamingResponse: # ... if project["status"] != "ready": raise HTTPException(status_code=400, detail="Variant not ready") # ... with tarfile.open(tar_path, mode="w:gz") as tar: tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}") return StreamingResponse( _stream_and_cleanup(), media_type="application/gzip", headers={ "Content-Disposition": f'attachment; filename="{name}-{provider}-{model}-docs.tar.gz"' }, ) ``` Behavior: - Requires variant status `ready` - Returns `Content-Type: application/gzip` - Downloads as `{name}-{provider}-{model}-docs.tar.gz` - Archive root directory is `{name}-{provider}-{model}/` ### Download latest ready variant ```1158:1194:src/docsfy/main.py @app.get("/api/projects/{name}/download") async def download_project(request: Request, name: str) -> StreamingResponse: # ... if request.state.is_admin: latest = await get_latest_variant(name) else: latest = await get_latest_variant(name, owner=request.state.username) if not latest: raise HTTPException(status_code=404, detail=f"No ready variant for '{name}'") # ... with tarfile.open(tar_path, mode="w:gz") as tar: tar.add(str(site_dir), arcname=name) return StreamingResponse( _stream_and_cleanup(), media_type="application/gzip", headers={"Content-Disposition": f'attachment; filename="{name}-docs.tar.gz"'}, ) ``` Behavior: - Picks latest ready variant - Downloads as `{name}-docs.tar.gz` - Archive root directory is `{name}/` ### CLI download examples ```bash # Specific variant curl -L -OJ \ -H "Authorization: Bearer ${DOCSFY_API_KEY}" \ "http://localhost:8000/api/projects/test-repo/claude/opus/download" # Latest ready variant curl -L -OJ \ -H "Authorization: Bearer ${DOCSFY_API_KEY}" \ "http://localhost:8000/api/projects/test-repo/download" ``` > **Tip:** `-OJ` tells `curl` to use the server-provided filename from `Content-Disposition`. ## What is inside the archive Generated site content comes from `render_site()`, which writes static assets and pages into the variant `site` directory: ```243:290:src/docsfy/renderer.py index_html = render_index(project_name, tagline, navigation, repo_url=repo_url) (output_dir / "index.html").write_text(index_html, encoding="utf-8") # ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") # ... (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) # Generate llms.txt files llms_txt = _build_llms_txt(plan) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") llms_full_txt = _build_llms_full_txt(plan, valid_pages) (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` Typical archive contents include: - `index.html` - `*.html` rendered pages - `*.md` source markdown pages - `search-index.json` - `llms.txt` and `llms-full.txt` - `assets/*` static CSS/JS - `.nojekyll` ## Auth, access, and error behavior API routes return `401` when unauthenticated; browser routes redirect to `/login`: ```151:155:src/docsfy/main.py if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) ``` Project names are validated before route resolution: ```73:77:src/docsfy/main.py def _validate_project_name(name: str) -> str: """Validate project name to prevent path traversal.""" if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name): raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'") ``` Common responses: - `400`: - variant download when status is not ready (`"Variant not ready"`) - invalid project name - `401`: - unauthenticated API requests - `403`: - denied path traversal attempt (`"Access denied"`) - `404`: - docs file missing - no latest ready docs (`"No docs available"`) - no ready variant for latest download - `409`: - admin ambiguity when multiple owners have same `project/provider/model` ## Runtime configuration relevant to these routes Default container mapping serves docsfy on port `8000`: ```1:10:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` Cookie/security settings are environment-driven: ```1:8:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` ```27:28:.env.example # Set to false for local HTTP development # SECURE_COOKIES=false ``` > **Note:** Models such as `claude-opus-4-6[1m]` contain characters that should be URL-encoded when used in path segments. ## Validation and test automation The integration test explicitly validates all four routes: ```124:146:tests/test_integration.py response = await client.get("/docs/test-repo/claude/opus/index.html") assert response.status_code == 200 # ... response = await client.get("/docs/test-repo/index.html") assert response.status_code == 200 # ... response = await client.get("/api/projects/test-repo/claude/opus/download") assert response.status_code == 200 assert response.headers["content-type"] == "application/gzip" # ... response = await client.get("/api/projects/test-repo/download") assert response.status_code == 200 assert response.headers["content-type"] == "application/gzip" ``` Repository test entrypoint: ```5:7:tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** No checked-in GitHub Actions or other CI workflow manifests are present; test automation is defined via `tox.toml`. --- Source: generated-site-features.md # Generated Site Features docsfy-generated sites ship with a built-in front-end feature bundle from `src/docsfy/static/`, copied into each output site's `assets/` directory during rendering. ```python # src/docsfy/renderer.py if STATIC_DIR.exists(): for static_file in STATIC_DIR.iterdir(): if static_file.is_file(): shutil.copy2(static_file, assets_dir / static_file.name) search_index = _build_search_index(valid_pages, plan) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) ``` ```html ``` --- ## Search Modal The site uses a client-side modal search (`Cmd/Ctrl+K`) backed by `search-index.json`. - Opens via keyboard shortcut, top-bar Search button, or sidebar search input focus. - Matches against page title and markdown content. - Limits results to 10 entries. - Supports arrow navigation and Enter to open the selected result. ```javascript // src/docsfy/static/search.js fetch('search-index.json').then(function(r) { return r.json(); }) .then(function(data) { index = data; }).catch(function() {}); document.addEventListener('keydown', function(e) { if ((e.metaKey || e.ctrlKey) && e.key === 'k') { e.preventDefault(); openModal(); } if (e.key === 'Escape') closeModal(); }); var matches = index.filter(function(item) { return item.title.toLowerCase().includes(q) || item.content.toLowerCase().includes(q); }).slice(0, 10); ``` ```python # src/docsfy/renderer.py index.append( { "slug": slug, "title": title_map.get(slug, slug), "content": content[:2000], } ) ``` > **Tip:** Search content is truncated to the first 2000 characters per page, so placing key terms early in each page improves discoverability. --- ## Theme Toggle (Dark/Light) Theme state is controlled through the `data-theme` attribute on `` and persisted in `localStorage` under `theme`. ```javascript // src/docsfy/static/theme.js var stored = getTheme(); if (stored) { document.documentElement.setAttribute('data-theme', stored); } else { document.documentElement.setAttribute('data-theme', 'dark'); } if (toggle) toggle.addEventListener('click', function() { var current = document.documentElement.getAttribute('data-theme'); var next = current === 'dark' ? 'light' : 'dark'; document.documentElement.setAttribute('data-theme', next); setTheme(next); }); ``` ```css /* src/docsfy/static/style.css */ [data-theme="dark"] .icon-sun { display: block; } [data-theme="dark"] .icon-moon { display: none; } ``` > **Note:** Generated pages default to dark mode (``) and switch to the saved preference when available. --- ## Callouts Callouts are authored as markdown blockquotes with a bold first label (`Note`, `Warning`, `Tip`, etc.). A post-render script maps those labels to callout classes. ```javascript // src/docsfy/static/callouts.js var text = firstStrong.textContent.toLowerCase().replace(':', '').trim(); if (text === 'note' || text === 'info') { type = 'note'; } else if (text === 'warning' || text === 'caution') { type = 'warning'; } else if (text === 'tip' || text === 'hint') { type = 'tip'; } else if (text === 'danger' || text === 'error') { type = 'danger'; } else if (text === 'important') { type = 'important'; } if (type) { bq.classList.add('callout', 'callout-' + type); } ``` ```css /* src/docsfy/static/style.css */ blockquote.callout-note { border-left: 4px solid #3b82f6; background: rgba(59, 130, 246, 0.08); } blockquote.callout-warning { border-left: 4px solid #f59e0b; background: rgba(245, 158, 11, 0.08); } blockquote.callout-tip { border-left: 4px solid #10b981; background: rgba(16, 185, 129, 0.08); } ``` Use the same authoring format enforced in prompt generation: ```text # src/docsfy/prompts.py - Notes: > **Note:** text - Warnings: > **Warning:** text - Tips: > **Tip:** text ``` --- ## Code Copy Buttons Every `
` block gets a `Copy` button automatically at runtime.

- Uses Clipboard API when available.
- Falls back to `document.execCommand('copy')` for compatibility.
- Shows temporary feedback (`Copied!` / `Failed`).

```javascript
// src/docsfy/static/copy.js
document.querySelectorAll('pre').forEach(function(pre) {
  var btn = document.createElement('button');
  btn.className = 'copy-btn';
  btn.textContent = 'Copy';
  btn.addEventListener('click', function() {
    var code = pre.querySelector('code');
    var text = code ? code.textContent : pre.textContent;
    if (navigator.clipboard && navigator.clipboard.writeText) {
      navigator.clipboard.writeText(text).then(function() {
        btn.textContent = 'Copied!';
        setTimeout(function() { btn.textContent = 'Copy'; }, 2000);
      }).catch(function() {
        fallbackCopy(text, btn);
      });
    } else {
      fallbackCopy(text, btn);
    }
  });
  pre.style.position = 'relative';
  pre.appendChild(btn);
});
```

```css
/* src/docsfy/static/style.css */
.copy-btn { opacity: 0; }
pre:hover .copy-btn { opacity: 1; }

@media (hover: none) {
  .copy-btn { opacity: 0.7; }
}
```

---

## Table of Contents (TOC)

TOC generation is handled during markdown conversion and rendered only when headings are present.

```python
# src/docsfy/renderer.py
md = markdown.Markdown(
    extensions=["fenced_code", "codehilite", "tables", "toc"],
    extension_configs={
        "codehilite": {"css_class": "highlight", "guess_lang": False},
        "toc": {"toc_depth": "2-3"},
    },
)
content_html = _sanitize_html(md.convert(md_text))
toc_html = getattr(md, "toc", "")
```

```html

{% if toc %}

{% endif %}
```

```javascript
// src/docsfy/static/scrollspy.js
var tocLinks = document.querySelectorAll('.toc-container a');
...
current.link.classList.add('active');
```

```css
/* src/docsfy/static/style.css */
@media (min-width: 1280px) {
    .toc-sidebar { display: block; }
    .content { margin-right: 220px; }
}
.toc-container ul ul { display: none; }
```

> **Warning:** `scrollspy.js` applies `active`, while the stylesheet defines `.toc-container a.toc-active`; align class names if you want a styled active-state indicator.

---

## GitHub Metadata (Repo Link + Stars)

When `repo_url` is available in the generated plan, pages render a GitHub button and lazily fetch star count from GitHub API.

```python
# src/docsfy/main.py
plan["repo_url"] = source_url
```

```python
# src/docsfy/renderer.py
repo_url: str = plan.get("repo_url", "")
...
page_html = render_page(..., repo_url=repo_url)
```

```html

{% if repo_url %}

    ...
    

{% endif %}
```

```javascript
// src/docsfy/static/github.js
var match = repoUrl.match(/github\.com[/:]([^/]+)\/([^/.]+)/);
...
fetch('https://api.github.com/repos/' + owner + '/' + repo)
  .then(function(response) {
    if (!response.ok) return null;
    return response.json();
  })
  .then(function(data) {
    if (!data || typeof data.stargazers_count === 'undefined') return;
    var count = data.stargazers_count;
    var display;
    if (count >= 1000) {
      display = (count / 1000).toFixed(1).replace(/\.0$/, '') + 'k';
    } else {
      display = count.toString();
    }
    starsEl.textContent = display;
    starsEl.title = count.toLocaleString() + ' stars';
  })
  .catch(function() {
    // Silently fail - star count is a nice-to-have
  });
```

> **Note:** If `repo_url` is empty, the GitHub link and star counter are not rendered.
>
> **Tip:** The regex supports both `https://github.com/org/repo(.git)` and `git@github.com:org/repo.git` style URLs.

---

## Verification Coverage (Tests + Pipeline)

The rendering pipeline has unit coverage for generated artifacts and a defined test command in `tox`.

```python
# tests/test_renderer.py
render_site(plan=plan, pages=pages, output_dir=output_dir)
assert (output_dir / "search-index.json").exists()

index = json.loads((output_dir / "search-index.json").read_text())
assert index[0]["slug"] == "intro"
assert index[0]["title"] == "Intro"
```

```toml
# tox.toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

Manual UI checks for these generated-site features are also documented in `test-plans/e2e-ui-test-plan.md` (see “Test 8: Generated Docs Quality”).


---

Source: incremental-regeneration.md

## What Is Tracked

Generation metadata (including commit SHA) is stored per variant (`name`, `ai_provider`, `ai_model`, `owner`) in SQLite:

```57:73:src/docsfy/storage.py
            CREATE TABLE IF NOT EXISTS projects (
                name TEXT NOT NULL,
                ai_provider TEXT NOT NULL DEFAULT '',
                ai_model TEXT NOT NULL DEFAULT '',
                owner TEXT NOT NULL DEFAULT '',
                repo_url TEXT NOT NULL,
                status TEXT NOT NULL DEFAULT 'generating',
                current_stage TEXT,
                last_commit_sha TEXT,
                last_generated TEXT,
                page_count INTEGER DEFAULT 0,
                error_message TEXT,
                plan_json TEXT,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                PRIMARY KEY (name, ai_provider, ai_model, owner)
            )
```

> **Note:** Incremental behavior is variant-scoped, not just project-scoped. Different providers/models maintain independent commit and cache state.

## Commit Diff Detection

When generation starts (and `force` is not set), `docsfy` compares the stored commit SHA to the current repository SHA.

If the SHA is identical, it exits early as `up_to_date`:

```850:868:src/docsfy/main.py
        if existing and existing.get("last_generated"):
            old_sha = (
                str(existing["last_commit_sha"])
                if existing.get("last_commit_sha")
                else None
            )
            if old_sha == commit_sha:
                logger.info(
                    f"[{project_name}] Project is up to date at {commit_sha[:8]}"
                )
                await update_project_status(
                    project_name,
                    ai_provider,
                    ai_model,
                    status="ready",
                    owner=owner,
                    current_stage="up_to_date",
                )
                return
```

If SHAs differ, it computes file-level diffs using Git:

```48:73:src/docsfy/repository.py
def get_changed_files(repo_path: Path, old_sha: str, new_sha: str) -> list[str] | None:
    """Get list of files changed between two commits.

    Returns None on error (caller should fall back to full regeneration),
    or an empty list when there are no changes.
    """
    if not re.match(r"^[0-9a-fA-F]{4,64}$", old_sha) or not re.match(
        r"^[0-9a-fA-F]{4,64}$", new_sha
    ):
        logger.warning("Invalid SHA format")
        return None
    try:
        result = subprocess.run(
            ["git", "diff", "--name-only", old_sha, new_sha],
            cwd=repo_path,
            capture_output=True,
            text=True,
            timeout=30,
        )
    except (subprocess.TimeoutExpired, OSError) as exc:
        logger.warning(f"Failed to get diff: {exc}")
        return None
    if result.returncode != 0:
        logger.warning(f"Failed to get diff: {result.stderr}")
        return None
    return [f.strip() for f in result.stdout.strip().split("\n") if f.strip()]
```

## Page-Level Cache Invalidation

After diff detection, the system runs an incremental planner and invalidates only selected cached pages (by slug), then reuses all other page caches.

```891:955:src/docsfy/main.py
    if old_sha and old_sha != commit_sha and not force and existing:
        changed_files = get_changed_files(repo_dir, old_sha, commit_sha)
        if changed_files is None:
            # Error getting diff — fall back to full regeneration
            use_cache = False
        elif not changed_files:
            # Commits differ but tree is identical — nothing to regenerate
            await update_project_status(
                project_name,
                ai_provider,
                ai_model,
                status="ready",
                owner=owner,
                current_stage="up_to_date",
                last_commit_sha=commit_sha,
            )
            return
        elif changed_files:
            existing_plan_json = existing.get("plan_json")
            if existing_plan_json:
                try:
                    existing_plan = json.loads(str(existing_plan_json))
                    await update_project_status(
                        project_name,
                        ai_provider,
                        ai_model,
                        status="generating",
                        owner=owner,
                        current_stage="incremental_planning",
                    )
                    pages_to_regen = await run_incremental_planner(
                        repo_dir,
                        project_name,
                        ai_provider,
                        ai_model,
                        changed_files,
                        existing_plan,
                        ai_cli_timeout,
                    )
                    if pages_to_regen != ["all"]:
                        # Delete only the cached pages that need regeneration
                        for slug in pages_to_regen:
                            # Validate slug to prevent path traversal
                            if (
                                "/" in slug
                                or "\\" in slug
                                or ".." in slug
                                or slug.startswith(".")
                            ):
                                logger.warning(
                                    f"[{project_name}] Skipping invalid slug from incremental planner: {slug}"
                                )
                                continue
                            cache_file = cache_dir / f"{slug}.md"
                            # Extra safety: ensure the resolved path is inside cache_dir
                            try:
                                cache_file.resolve().relative_to(cache_dir.resolve())
                            except ValueError:
                                logger.warning(
                                    f"[{project_name}] Path traversal attempt in slug: {slug}"
                                )
                                continue
                            if cache_file.exists():
                                cache_file.unlink()
                        use_cache = True
```

Page cache entries are slug-based markdown files:

```89:114:src/docsfy/generator.py
    cache_file = cache_dir / f"{slug}.md"
    if use_cache and cache_file.exists():
        logger.debug(f"[{_label}] Using cached page: {slug}")
        return cache_file.read_text(encoding="utf-8")

    prompt = build_page_prompt(
        project_name=repo_path.name, page_title=title, page_description=description
    )
    # Build CLI flags based on provider
    cli_flags = ["--trust"] if ai_provider == "cursor" else None
    success, output = await call_ai_cli(
        prompt=prompt,
        cwd=repo_path,
        ai_provider=ai_provider,
        ai_model=ai_model,
        ai_cli_timeout=ai_cli_timeout,
        cli_flags=cli_flags,
    )
    if not success:
        logger.warning(f"[{_label}] Failed to generate page '{slug}': {output}")
        output = f"# {title}\n\n*Documentation generation failed. Please re-run.*"

    output = _strip_ai_preamble(output)
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file.write_text(output, encoding="utf-8")
```

Cache directory resolution:

```527:530:src/docsfy/storage.py
def get_project_cache_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
```

> **Tip:** Because cache is per slug (`{slug}.md`), incremental regeneration is fastest when page slugs remain stable across planner runs.

## Fallback to Full Regeneration

There are three fallback signals in code:

1. Diff failure (`changed_files is None`)
2. Incremental planner failure / parse failure
3. Incremental planner returning unusable output

Incremental planner fallback behavior:

```229:239:src/docsfy/generator.py
    if not success:
        logger.warning(f"[{project_name}] Incremental planner failed, regenerating all")
        return ["all"]

    result = parse_json_list_response(output)
    if result is None or not isinstance(result, list):
        return ["all"]
    # Validate all items are strings
    result = [item for item in result if isinstance(item, str)]
    if not result:
        return ["all"]
```

Planner prompt contract includes both `["all"]` and `[]` outputs:

```56:63:src/docsfy/prompts.py
Which pages from the existing plan need to be regenerated based on the changed files?
Output a JSON array of page slugs that need regeneration.

CRITICAL: Output ONLY a JSON array of strings. No explanation.
Example: ["introduction", "api-reference", "configuration"]
If all pages need regeneration, output: ["all"]
If no pages need regeneration, output: []
```

Forced full regeneration is explicit and clears cache first:

```832:845:src/docsfy/main.py
    if force:
        cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
        if cache_dir.exists():
            shutil.rmtree(cache_dir)
            logger.info(f"[{project_name}] Cleared cache (force=True)")
        # Reset page count so API shows 0 during regeneration
        await update_project_status(
            project_name,
            ai_provider,
            ai_model,
            status="generating",
            owner=owner,
            page_count=0,
        )
```

And `force` is exposed at API model level:

```18:20:src/docsfy/models.py
    force: bool = Field(
        default=False, description="Force full regeneration, ignoring cache"
    )
```

Dashboard sends `force` in generation requests:

```2043:2047:src/docsfy/templates/dashboard.html
                var body = {
                    repo_url: repoUrl,
                    ai_provider: provider,
                    force: force
                };
```

> **Warning:** For non-force runs, `generate_all_pages` is called with `use_cache=use_cache if use_cache else not force`, which evaluates to `True` whenever `force` is `False`. In practice, this means true “full regeneration” is guaranteed when `force=true` (cache is deleted), while automatic fallback branches depend on whether cache files were invalidated/removed first.

## Runtime and Deployment Impact on Cache

Cache and metadata persist when `/data` is mounted:

```7:13:docker-compose.yaml
    volumes:
      - ./data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

> **Warning:** Remote repositories are cloned shallow (`--depth 1`), which can prevent diffing against older stored SHAs if that commit is not present locally.

```25:27:src/docsfy/repository.py
    result = subprocess.run(
        ["git", "clone", "--depth", "1", "--", repo_url, str(repo_path)],
```

## Test and Pipeline Coverage

Key tests validate incremental/cache behaviors:

- Diff outcomes (`list`, `None`, empty list) in `tests/test_repository.py`
- Cache hit behavior in `tests/test_generator.py`
- Incremental planner fallback to `["all"]` in `tests/test_generator.py`

```85:124:tests/test_repository.py
def test_get_changed_files_success(tmp_path: Path) -> None:
    from docsfy.repository import get_changed_files

    with patch("docsfy.repository.subprocess.run") as mock_run:
        mock_run.return_value = MagicMock(
            returncode=0,
            stdout="src/main.py\nsrc/utils.py\nREADME.md\n",
            stderr="",
        )
        files = get_changed_files(tmp_path, "abc123", "def456")

    assert files == ["src/main.py", "src/utils.py", "README.md"]
    call_args = mock_run.call_args
    assert "diff" in call_args.args[0]
    assert "--name-only" in call_args.args[0]
    assert "abc123" in call_args.args[0]
    assert "def456" in call_args.args[0]
```

```103:123:tests/test_generator.py
async def test_generate_page_uses_cache(tmp_path: Path) -> None:
    from docsfy.generator import generate_page

    cache_dir = tmp_path / "cache"
    cache_dir.mkdir()
    cached = cache_dir / "introduction.md"
    cached.write_text("# Cached content")

    md = await generate_page(
        repo_path=tmp_path,
        slug="introduction",
        title="Introduction",
        description="Overview",
        cache_dir=cache_dir,
        ai_provider="claude",
        ai_model="opus",
        use_cache=True,
    )

    assert md == "# Cached content"
```

```144:183:tests/test_generator.py
async def test_run_incremental_planner_returns_all_on_failure(
    tmp_path: Path, sample_plan: dict
) -> None:
    from docsfy.generator import run_incremental_planner

    with patch(
        "docsfy.generator.call_ai_cli",
        return_value=(False, "AI error"),
    ):
        result = await run_incremental_planner(
            repo_path=tmp_path,
            project_name="test-repo",
            ai_provider="claude",
            ai_model="opus",
            changed_files=["src/main.py"],
            existing_plan=sample_plan,
        )

    assert result == ["all"]
```

Project automation used for CI-style validation in-repo:

```1:7:tox.toml
skipsdist = true

envlist = ["unittests"]

[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

```43:60:.pre-commit-config.yaml
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.2
    hooks:
      - id: ruff
      - id: ruff-format

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
    hooks:
      - id: gitleaks

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.19.1
    hooks:
      - id: mypy
        exclude: (tests/)
        additional_dependencies:
          [types-requests, types-PyYAML, types-colorama, types-aiofiles, pydantic, types-Markdown]
```


---

Source: authentication-model.md

# Authentication Model

`docsfy` uses a single middleware gate for all requests and supports two authentication mechanisms:

- **Bearer token auth** for API/automation clients
- **Session-cookie auth** for browser/UI flows

## Authentication Gate and Evaluation Order

Every request passes through `AuthMiddleware`. Only three paths bypass auth.

```python
class AuthMiddleware(BaseHTTPMiddleware):
    """Authenticate every request via Bearer token or session cookie."""

    # Paths that do not require authentication
    _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})

    async def dispatch(
        self, request: Request, call_next: RequestResponseEndpoint
    ) -> Response:
        if request.url.path in self._PUBLIC_PATHS:
            return await call_next(request)

        settings = get_settings()
        user = None
        is_admin = False
        username = ""

        # 1. Check Authorization header (API clients)
        auth_header = request.headers.get("authorization", "")
        if auth_header.startswith("Bearer "):
            token = auth_header[7:]
            if token == settings.admin_key:
                is_admin = True
                username = "admin"
            else:
                user = await get_user_by_key(token)

        # 2. Check session cookie (browser) -- opaque session token
        if not user and not is_admin:
            session_token = request.cookies.get("docsfy_session")
            if session_token:
                session = await get_session(session_token)
                if session:
                    is_admin = bool(session["is_admin"])
                    username = str(session["username"])
                    # Fix 8: For DB users (not ADMIN_KEY admin), verify user still exists
                    if username != "admin":
                        user = await get_user_by_username(username)
                        if not user:
                            # User was deleted since session was created
                            if request.url.path.startswith("/api/"):
                                return JSONResponse(
                                    status_code=401, content={"detail": "Unauthorized"}
                                )
                            return RedirectResponse(url="/login", status_code=302)

        if not user and not is_admin:
            # Not authenticated
            if request.url.path.startswith("/api/"):
                return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
            return RedirectResponse(url="/login", status_code=302)
```

> **Note:** Bearer auth is checked first. If Bearer fails (or is absent), middleware falls back to `docsfy_session`.

## Bearer Token Flow

Bearer tokens are accepted from the `Authorization` header (`Bearer `):

- If token equals `ADMIN_KEY`, request is authenticated as built-in admin user (`admin`).
- Otherwise, token is treated as a user API key and looked up in the `users` table.
- User API keys are not stored raw; they are HMAC-hashed using `ADMIN_KEY` as secret.

```python
def hash_api_key(key: str, hmac_secret: str = "") -> str:
    """Hash an API key with HMAC-SHA256 for storage.

    Uses ADMIN_KEY as the HMAC secret so that even if the source is read,
    keys cannot be cracked without the environment secret.
    """
    # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will
    # invalidate all existing api_key_hash values, requiring all users to
    # regenerate their API keys.
    secret = hmac_secret or os.getenv("ADMIN_KEY", "")
    if not secret:
        msg = "ADMIN_KEY environment variable is required for key hashing"
        raise RuntimeError(msg)
    return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest()


async def get_user_by_key(api_key: str) -> dict[str, str | int | None] | None:
    """Look up a user by their raw API key."""
    key_hash = hash_api_key(api_key)
    async with aiosqlite.connect(DB_PATH) as db:
        db.row_factory = aiosqlite.Row
        cursor = await db.execute(
            "SELECT * FROM users WHERE api_key_hash = ?", (key_hash,)
        )
        row = await cursor.fetchone()
        return dict(row) if row else None
```

> **Tip:** For scripts and CI jobs, prefer Bearer auth over login/cookies to keep requests stateless.

> **Warning:** Rotating `ADMIN_KEY` invalidates existing user API key hashes by design.

## Session-Cookie Flow

Browser login uses form fields `username` + `api_key` and creates an opaque session cookie.

```python
@app.post("/login", response_model=None)
async def login(request: Request) -> RedirectResponse | HTMLResponse:
    """Authenticate with username + API key and set a session cookie."""
    form = await request.form()
    username = str(form.get("username", ""))
    api_key = str(form.get("api_key", ""))
    settings = get_settings()

    is_admin = False
    authenticated = False

    # Check admin -- username must be "admin" and key must match
    if username == "admin" and api_key == settings.admin_key:
        is_admin = True
        authenticated = True
    else:
        # Check user key -- verify username matches the key's owner
        user = await get_user_by_key(api_key)
        if user and user["username"] == username:
            authenticated = True
            is_admin = user.get("role") == "admin"

    if authenticated:
        session_token = await create_session(username, is_admin=is_admin)
        response = RedirectResponse(url="/", status_code=302)
        response.set_cookie(
            "docsfy_session",
            session_token,
            httponly=True,
            samesite="strict",
            secure=settings.secure_cookies,
            max_age=SESSION_TTL_SECONDS,
        )
        return response
```

The login UI labels this field as password, but backend field name is still `api_key`:

```html



```

Session tokens are opaque and stored hashed, with an 8-hour TTL:

```python
SESSION_TTL_SECONDS = 28800  # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600

def _hash_session_token(token: str) -> str:
    """Hash a session token for storage."""
    return hashlib.sha256(token.encode()).hexdigest()

async def create_session(
    username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS
) -> str:
    """Create an opaque session token."""
    token = secrets.token_urlsafe(32)
    token_hash = _hash_session_token(token)
    expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours)
    expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S")
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)",
            (token_hash, username, 1 if is_admin else 0, expires_str),
        )
        await db.commit()
    return token
```

Logout clears both DB session state and browser cookie:

```python
@app.get("/logout")
async def logout(request: Request) -> RedirectResponse:
    """Clear the session cookie, delete session from DB, and redirect to login."""
    session_token = request.cookies.get("docsfy_session")
    if session_token:
        await delete_session(session_token)
    settings = get_settings()
    response = RedirectResponse(url="/login", status_code=302)
    response.delete_cookie(
        "docsfy_session",
        httponly=True,
        samesite="strict",
        secure=settings.secure_cookies,
    )
    return response
```

> **Warning:** `secure_cookies` defaults to `True`; browser session cookies will not be sent over plain HTTP.

## Public Paths

Only these paths are unauthenticated:

- `/login`
- `/login/`
- `/health`

`/health` is also used by runtime health checks:

```yaml
services:
  docsfy:
    env_file: .env
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

## Protected Endpoint Behavior

### Unauthenticated requests

Behavior is path-class dependent:

- Any protected non-API route (for example `/`, `/status/...`, `/docs/...`, `/admin`) -> `302` redirect to `/login`
- Any protected API route under `/api/*` -> `401` JSON `{ "detail": "Unauthorized" }`

Verified in tests:

```python
async def test_login_redirect_when_unauthenticated(
    unauthed_client: AsyncClient,
) -> None:
    """Browser requests to protected pages should redirect to /login."""
    response = await unauthed_client.get("/", follow_redirects=False)
    assert response.status_code == 302
    assert response.headers["location"] == "/login"


async def test_api_returns_401_when_unauthenticated(
    unauthed_client: AsyncClient,
) -> None:
    """API requests without auth should return 401."""
    response = await unauthed_client.get("/api/status")
    assert response.status_code == 401
    assert response.json()["detail"] == "Unauthorized"
```

### Role-based authorization

`docsfy` enforces role checks after authentication:

```python
def _require_write_access(request: Request) -> None:
    """Raise 403 if user is a viewer (read-only)."""
    if request.state.role not in ("admin", "user"):
        raise HTTPException(
            status_code=403,
            detail="Write access required.",
        )

def _require_admin(request: Request) -> None:
    """Raise 403 if the user is not an admin."""
    if not request.state.is_admin:
        raise HTTPException(status_code=403, detail="Admin access required")
```

- `viewer` users are read-only for write endpoints (`/api/generate`, delete/abort endpoints).
- `admin` role is required for `/admin` and `/api/admin/*`.
- `viewer` can still change their own password via `/api/me/rotate-key` (by explicit design).

```python
# Don't call _require_write_access -- viewers should be able to change their password
if request.state.is_admin and not request.state.user:
    raise HTTPException(
        status_code=400,
        detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.",
    )
```

### Ownership and resource visibility

Project-scoped access checks intentionally return `404` (not `403`) when a user lacks access, to avoid leaking resource existence:

```python
async def _check_ownership(
    request: Request, project_name: str, project: dict[str, Any]
) -> None:
    """Raise 404 if the requesting user does not own the project (unless admin)."""
    if request.state.is_admin:
        return
    project_owner = str(project.get("owner", ""))
    if project_owner == request.state.username:
        return
    # Check if user has been granted access (scoped by project_owner)
    access = await get_project_access(project_name, project_owner=project_owner)
    if request.state.username in access:
        return
    raise HTTPException(status_code=404, detail="Not found")
```

```python
# GET /api/projects/{name} - returns 404 to avoid leaking existence
response = await ac.get("/api/projects/secret-proj")
assert response.status_code == 404
```

### Additional protected behavior

- Non-admin use of `repo_path` in generation is denied (`403`).
- Admin variant resolution can return `409` if multiple owners exist for same project/provider/model without disambiguation.
- If a session belongs to a deleted DB user, middleware invalidates access and returns `401` (API) or `302` (UI redirect).

> **Warning:** In-app login rate limiting is marked TODO; enforce rate limiting at reverse proxy/load balancer level.

## Configuration

Environment-level auth settings:

```env
# REQUIRED - Admin key for user management (minimum 16 characters)
ADMIN_KEY=your-secure-admin-key-here-min-16-chars

# Set to false for local HTTP development
# SECURE_COOKIES=false
```

Application defaults:

```python
class Settings(BaseSettings):
    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
```

Startup hard-fails if `ADMIN_KEY` is missing or too short:

```python
if not settings.admin_key:
    logger.error("ADMIN_KEY environment variable is required")
    raise SystemExit(1)

if len(settings.admin_key) < 16:
    logger.error("ADMIN_KEY must be at least 16 characters long")
    raise SystemExit(1)
```

> **Tip:** For local non-TLS development, set `SECURE_COOKIES=false` in `.env` so browser sessions work over `http://`.

## Test and Automation Coverage

Auth behavior is regression-tested in `tests/test_auth.py`, and the repo test command is defined in `tox.toml`:

```toml
[env.unittests]
deps = ["uv"]
commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```

> **Note:** No dedicated GitHub/GitLab/Jenkins workflow files are present in this repository; automated auth verification currently depends on the tox/pytest path above.


---

Source: roles-and-permissions.md

# Roles and Permissions

docsfy uses role-based access control (RBAC) across both UI and API layers. Roles are stored in `src/docsfy/storage.py` and enforced in `src/docsfy/main.py`.

## Role Definitions

| Role | Intended use | Write-protected APIs | Admin APIs |
|---|---|---|---|
| `admin` | Full platform control | Allowed | Allowed |
| `user` | Normal project owner/contributor | Allowed | Denied |
| `viewer` | Read-only docs/project access | Denied | Denied |

```python
VALID_ROLES = frozenset({"admin", "user", "viewer"})
```

> **Note:** There are two admin paths in implementation:
> 1) the environment `ADMIN_KEY` account (`username == "admin"`), and  
> 2) a database user whose `role == "admin"`.

```python
# Determine the role
if is_admin:
    role = "admin"
    if not username:
        username = "admin"
else:
    assert user is not None  # guaranteed by the guard above
    role = str(user.get("role", "user"))
    username = str(user["username"])
    # Fix 6: DB user with admin role gets admin privileges
    if role == "admin":
        is_admin = True
```

## Authentication and Request Enforcement

Authentication accepts:

- `Authorization: Bearer ` (API clients)
- `docsfy_session` cookie (browser sessions)

Public routes are only `/login` and `/health`.

```python
# Paths that do not require authentication
_PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"})
...
if not user and not is_admin:
    # Not authenticated
    if request.url.path.startswith("/api/"):
        return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    return RedirectResponse(url="/login", status_code=302)
```

So unauthenticated behavior is:

- **UI routes** → `302` redirect to `/login`
- **API routes** → `401 {"detail":"Unauthorized"}`

## UI Capability Matrix

| UI action | admin | user | viewer | How it is enforced |
|---|---|---|---|---|
| Open dashboard (`/`) | ✅ | ✅ | ✅ | Auth middleware |
| See `Admin` link in header | ✅ | ❌ | ❌ | `dashboard.html` conditional |
| Open admin panel (`/admin`) | ✅ | ❌ | ❌ | `_require_admin()` |
| See Generate form | ✅ | ✅ | ❌ | `dashboard.html` conditional |
| Generate/regenerate/abort/delete controls | ✅ | ✅ | ❌ | UI conditional + API guard |
| View docs / download accessible variants | ✅ | ✅ | ✅ | ownership/grant resolution |
| Change own password button | ✅* | ✅ | ✅ | visible for all authenticated users |

\* `ADMIN_KEY` admin cannot rotate via `/api/me/rotate-key` (details below).

```html
{% if role == 'admin' %}
Admin
{% endif %}

{% if role != 'viewer' %}
...
{% endif %} {% if role != 'viewer' %} {% endif %} ``` ## Write-Protected API Permissions All non-admin/non-user write attempts are rejected by a shared guard: ```python def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ### General write APIs (`admin` + `user` only) | Endpoint | admin | user | viewer | |---|---|---|---| | `POST /api/generate` | ✅ | ✅ | ❌ (`403`) | | `POST /api/projects/{name}/abort` | ✅ | ✅ | ❌ (`403`) | | `POST /api/projects/{name}/{provider}/{model}/abort` | ✅ | ✅ | ❌ (`403`) | | `DELETE /api/projects/{name}/{provider}/{model}` | ✅ | ✅ | ❌ (`403`) | | `DELETE /api/projects/{name}` | ✅ | ✅ | ❌ (`403`) | Additional restriction on generation source: ```python # Fix 9: Local repo path access requires admin privileges if gen_request.repo_path and not request.state.is_admin: raise HTTPException( status_code=403, detail="Local repo path access requires admin privileges", ) ``` ### Admin-only APIs (`admin` only) | Endpoint | admin | user | viewer | |---|---|---|---| | `GET /admin` | ✅ | ❌ (`403`) | ❌ (`403`) | | `POST /api/admin/users` | ✅ | ❌ (`403`) | ❌ (`403`) | | `GET /api/admin/users` | ✅ | ❌ (`403`) | ❌ (`403`) | | `DELETE /api/admin/users/{username}` | ✅ | ❌ (`403`) | ❌ (`403`) | | `POST /api/admin/users/{username}/rotate-key` | ✅ | ❌ (`403`) | ❌ (`403`) | | `POST /api/admin/projects/{name}/access` | ✅ | ❌ (`403`) | ❌ (`403`) | | `GET /api/admin/projects/{name}/access` | ✅ | ❌ (`403`) | ❌ (`403`) | | `DELETE /api/admin/projects/{name}/access/{username}` | ✅ | ❌ (`403`) | ❌ (`403`) | ```python def _require_admin(request: Request) -> None: """Raise 403 if the user is not an admin.""" if not request.state.is_admin: raise HTTPException(status_code=403, detail="Admin access required") ``` ## Ownership, Sharing, and Visibility Rules docsfy enforces ownership boundaries plus explicit grants: - owners can access their own variants - admins can access all - non-owners can access only if granted in `project_access` ```python async def _check_ownership(request: Request, project_name: str, project: dict[str, Any]) -> None: if request.state.is_admin: return project_owner = str(project.get("owner", "")) if project_owner == request.state.username: return access = await get_project_access(project_name, project_owner=project_owner) if request.state.username in access: return raise HTTPException(status_code=404, detail="Not found") ``` ```python if owner is not None and accessible and len(accessible) > 0: # Build OR conditions for each (name, owner) pair conditions = ["(owner = ?)"] ... ``` > **Warning:** Unauthorized project access intentionally returns `404` (not `403`) to avoid leaking resource existence. ### Shared-access route behavior - Grant-aware routes use `_resolve_project()`: - `/api/projects/{name}/{provider}/{model}` - `/api/projects/{name}/{provider}/{model}/download` - `/docs/{project}/{provider}/{model}/{path}` - Owner-scoped (non-admin) routes filter by `owner=request.state.username`: - `/api/projects/{name}` - `/api/projects/{name}/download` - `/docs/{project}/{path}` > **Tip:** For users who received access via admin grant, prefer variant-scoped routes (`/{provider}/{model}`) for reliable access to shared projects. ## Password / API Key Rotation Semantics Users (including `viewer`) can rotate their own key. This endpoint explicitly bypasses write-role restrictions. ```python @app.post("/api/me/rotate-key") async def rotate_own_key(request: Request) -> JSONResponse: """User rotates their own API key.""" # Don't call _require_write_access -- viewers should be able to change their password if request.state.is_admin and not request.state.user: raise HTTPException( status_code=400, detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.", ) ``` - `viewer` can rotate own key - DB `admin` can rotate own key - `ADMIN_KEY` admin cannot rotate through API; rotate the env var instead - admin can rotate any user key via `/api/admin/users/{username}/rotate-key` ## Security and Configuration Snippets `ADMIN_KEY` is mandatory and is also used for HMAC key hashing. ```bash # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # Set to false for local HTTP development # SECURE_COOKIES=false ``` ```python admin_key: str = "" # Required — validated at startup secure_cookies: bool = True # Set to False for local HTTP dev ``` Session cookie settings at login: ```python response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` Session tokens are opaque and stored hashed: ```python token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) ... "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)" ``` ## Verification Coverage (Tests and Pipeline Config) Role and permission behavior is covered in tests such as: - `tests/test_auth.py` - `tests/test_storage.py` - `tests/test_main.py` Example assertions: ```python # Viewer is blocked from write API response = await ac.post("/api/generate", json={"repo_url": "https://github.com/org/repo"}) assert response.status_code == 403 assert "Write access required" in response.json()["detail"] ``` ```python # Non-owner gets 404 (no resource existence leak) response = await ac.get("/api/projects/secret-proj") assert response.status_code == 404 ``` Automated test command configured in `tox.toml`: ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Warning:** No repository workflow files were found under `.github/workflows`; if you enforce permissions checks in CI/CD, run the `tox` and pre-commit checks from your CI system explicitly. --- Source: user-management.md # User Management docsfy uses API-key-based authentication with session cookies for browser workflows. User lifecycle operations (create, rotate password, delete) are admin-controlled. ## Authentication model There are two admin paths: 1. **Environment admin**: username `admin` + `ADMIN_KEY`. 2. **Database admin user**: any username with role `admin`. ```python # src/docsfy/main.py # Check admin -- username must be "admin" and key must match if username == "admin" and api_key == settings.admin_key: is_admin = True authenticated = True else: # Check user key -- verify username matches the key's owner user = await get_user_by_key(api_key) if user and user["username"] == username: authenticated = True is_admin = user.get("role") == "admin" ``` > **Note:** In the UI, the login label says **Password**, but backend form/API field names use `api_key`. ## Required configuration `ADMIN_KEY` is mandatory and must be at least 16 characters: ```python # src/docsfy/main.py if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` ```env # .env.example ADMIN_KEY=your-secure-admin-key-here-min-16-chars # Set to false for local HTTP development # SECURE_COOKIES=false ``` Session cookies are `HttpOnly`, `SameSite=strict`, and `secure` by default: ```python # src/docsfy/main.py response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` > **Tip:** For local non-HTTPS development, set `SECURE_COOKIES=false` so browser sessions work over `http://`. ## Roles and permissions Roles are defined in storage: ```python # src/docsfy/storage.py VALID_ROLES = frozenset({"admin", "user", "viewer"}) ``` Write operations are blocked for `viewer`: ```python # src/docsfy/main.py def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` Dashboard UI also hides write controls for viewers and shows the Admin link only for admins. ## Creating users Only admins can create users (`/admin` UI or `POST /api/admin/users`). ### Admin panel workflow 1. Log in as an admin. 2. Open `/admin`. 3. Enter username and select role (`user`, `admin`, `viewer`). 4. Submit **Create User**. 5. Save the returned password immediately. ```html
``` ```javascript // src/docsfy/templates/admin.html const resp = await fetch("/api/admin/users", { method: "POST", headers: {"Content-Type": "application/json"}, credentials: "same-origin", body: JSON.stringify({username: username, role: role}) }); const data = await resp.json(); document.getElementById("new-key-value").textContent = data.api_key; ``` ```python # src/docsfy/main.py @app.post("/api/admin/users") async def create_user_endpoint(request: Request) -> JSONResponse: _require_admin(request) body = await request.json() username = body.get("username", "") role = body.get("role", "user") username, raw_key = await create_user(username, role) return JSONResponse( content={"username": username, "api_key": raw_key, "role": role}, headers={"Cache-Control": "no-store"}, ) ``` > **Warning:** Generated passwords are returned once (`api_key`/`new_api_key`) and are not retrievable later. ## Reserved usernames `admin` is reserved (case-insensitive) for the environment-admin login convention. ```python # src/docsfy/storage.py if username.lower() == "admin": msg = "Username 'admin' is reserved" raise ValueError(msg) ``` Validation also enforces: - length: 2-50 chars - first char: alphanumeric - allowed chars after first: alphanumeric, `.`, `_`, `-` ```python # src/docsfy/storage.py if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$", username): msg = f"Invalid username: '{username}'. Must be 2-50 alphanumeric characters, dots, hyphens, underscores." raise ValueError(msg) ``` Test coverage confirms case-insensitive reservation: ```python # tests/test_auth.py response = await admin_client.post( "/api/admin/users", json={"username": "Admin", "role": "user"}, ) assert response.status_code == 400 assert "reserved" in response.json()["detail"] ``` > **Warning:** Do not assign `admin` (any case) to regular users; creation is intentionally blocked. ## Deleting users User deletion is admin-only and irreversible from the UI flow. ### Admin panel workflow 1. Open `/admin`. 2. Click **Delete** on the target user row. 3. Confirm in modal dialog. 4. User row is removed after successful API response. ```javascript // src/docsfy/templates/admin.html const resp = await fetch("/api/admin/users/" + encodeURIComponent(username), { method: "DELETE", credentials: "same-origin", }); ``` Backend self-delete guard: ```python # src/docsfy/main.py if username == request.state.username: raise HTTPException(status_code=400, detail="Cannot delete your own account") ``` Delete behavior in storage: ```python # src/docsfy/storage.py await db.execute("DELETE FROM sessions WHERE username = ?", (username,)) await db.execute("DELETE FROM projects WHERE owner = ?", (username,)) await db.execute("DELETE FROM project_access WHERE project_owner = ?", (username,)) await db.execute("DELETE FROM project_access WHERE username = ?", (username,)) cursor = await db.execute("DELETE FROM users WHERE username = ?", (username,)) ``` When a deleted user still has an old session cookie, requests are rejected/redirected: ```python # src/docsfy/main.py if username != "admin": user = await get_user_by_username(username) if not user: if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) ``` > **Warning:** Deleting a user also deletes that user’s active sessions, owned project records, and ACL entries. ## Password rotation workflows ### Admin rotates another user’s password `POST /api/admin/users/{username}/rotate-key` Optional JSON body: `{"new_key": "..."}` (must be at least 16 chars). Empty body auto-generates a new password. ### User rotates own password `POST /api/me/rotate-key` Also supports optional `new_key`, invalidates existing sessions, and clears the current session cookie. ```python # src/docsfy/main.py if request.state.is_admin and not request.state.user: raise HTTPException( status_code=400, detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.", ) ``` > **Note:** `viewer` users are read-only for project writes, but they are still allowed to rotate their own password. ## Security storage notes User API keys are not stored raw; hashes use HMAC with `ADMIN_KEY` as the secret: ```python # src/docsfy/storage.py # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will # invalidate all existing api_key_hash values, requiring all users to # regenerate their API keys. return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest() ``` > **Warning:** Rotating `ADMIN_KEY` invalidates all existing stored user API-key hashes. ## User management API quick reference | Endpoint | Method | Access | Purpose | |---|---|---|---| | `/admin` | GET | admin | Admin panel UI | | `/api/admin/users` | GET | admin | List users | | `/api/admin/users` | POST | admin | Create user and return one-time `api_key` | | `/api/admin/users/{username}` | DELETE | admin | Delete user | | `/api/admin/users/{username}/rotate-key` | POST | admin | Rotate a user password | | `/api/me/rotate-key` | POST | authenticated | Rotate own password | | `/login` | GET/POST | public | Login page and credential submit | | `/logout` | GET | authenticated | End session | ## Verification coverage User management behavior is covered by tests in `tests/test_auth.py` and `tests/test_storage.py` (reserved usernames, self-delete guard, cookie/session behavior, role behavior, password rotation, ACL cleanup). Test automation entrypoint: ```toml # tox.toml commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` --- Source: project-access-grants.md # Project Access Grants `docsfy` implements project sharing as **owner-scoped ACLs**. A grant is not global to a project name; it is scoped to a `(project_name, project_owner)` pair. ## Access Model (Owner-Scoped) Each project variant is keyed by owner, and access grants are keyed by `(project_name, project_owner, username)`. ```56:73:src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) """) ``` ```237:244:src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS project_access ( project_name TEXT NOT NULL, project_owner TEXT NOT NULL DEFAULT '', username TEXT NOT NULL, PRIMARY KEY (project_name, project_owner, username) ) """) ``` Because `project_access` does not include provider/model, a grant applies to **all variants** of that project for that owner. ```392:405:src/docsfy/storage.py async def grant_project_access( project_name: str, username: str, project_owner: str = "" ) -> None: """Grant a user access to all variants of a project.""" if not project_owner: msg = "project_owner is required for access grants" raise ValueError(msg) async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT OR IGNORE INTO project_access (project_name, project_owner, username) VALUES (?, ?, ?)", (project_name, project_owner, username), ) await db.commit() ``` > **Note:** Project sharing is API-first. The admin HTML page in `src/docsfy/templates/admin.html` manages users, while grant/revoke flows are exercised via API calls in `test-plans/e2e-ui-test-plan.md`. ## Grant/Revoke/List APIs All project-access APIs are admin-only. ```1203:1206:src/docsfy/main.py def _require_admin(request: Request) -> None: """Raise 403 if the user is not an admin.""" if not request.state.is_admin: raise HTTPException(status_code=403, detail="Admin access required") ``` ```1266:1310:src/docsfy/main.py @app.post("/api/admin/projects/{name}/access") async def grant_access(request: Request, name: str) -> dict[str, str]: _require_admin(request) body = await request.json() username = body.get("username", "") project_owner = body.get("owner", "") if not username: raise HTTPException(status_code=400, detail="Username is required") if not project_owner: raise HTTPException(status_code=400, detail="Project owner is required") # Validate user exists user = await get_user_by_username(username) if not user: raise HTTPException(status_code=404, detail=f"User '{username}' not found") # Validate project exists for the specified owner variants = await list_variants(name, owner=project_owner) if not variants: raise HTTPException( status_code=404, detail=f"Project '{name}' not found for owner '{project_owner}'", ) await grant_project_access(name, username, project_owner=project_owner) logger.info( f"[AUDIT] Admin '{request.state.username}' granted '{username}' access to '{name}' (owner: '{project_owner}')" ) return {"granted": name, "username": username, "owner": project_owner} @app.delete("/api/admin/projects/{name}/access/{username}") async def revoke_access(request: Request, name: str, username: str) -> dict[str, str]: _require_admin(request) project_owner = request.query_params.get("owner", "") await revoke_project_access(name, username, project_owner=project_owner) logger.info( f"[AUDIT] Admin '{request.state.username}' revoked '{username}' access to '{name}' (owner: '{project_owner}')" ) return {"revoked": name, "username": username} @app.get("/api/admin/projects/{name}/access") async def list_access(request: Request, name: str) -> dict[str, Any]: _require_admin(request) project_owner = request.query_params.get("owner", "") users = await get_project_access(name, project_owner=project_owner) return {"project": name, "owner": project_owner, "users": users} ``` Real API usage examples in the repo: ```1994:1994:test-plans/e2e-ui-test-plan.md agent-browser javascript "fetch('/api/admin/projects/for-testing-only/access', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify({username: 'testviewer-e2e', owner: 'testuser-e2e'}) }).then(r => r.json()).then(d => JSON.stringify(d))" ``` ```2054:2054:test-plans/e2e-ui-test-plan.md agent-browser eval "fetch('/api/admin/projects/for-testing-only/access?owner=testuser-e2e', {credentials:'same-origin'}).then(r => r.json())" ``` ```2069:2069:test-plans/e2e-ui-test-plan.md agent-browser eval "fetch('/api/admin/projects/for-testing-only/access/testviewer-e2e?owner=testuser-e2e', {method:'DELETE', credentials:'same-origin'}).then(r => r.status)" ``` > **Warning:** Always pass `owner` for `GET /api/admin/projects/{name}/access` and `DELETE /api/admin/projects/{name}/access/{username}`. These handlers default `owner` to `""`, so omitting it usually targets no real owner-scoped grants. ## Non-Owner Visibility Rules For non-admin users, `docsfy` combines owned projects with explicitly granted `(name, owner)` tuples on dashboard and status APIs: ```334:345:src/docsfy/main.py @app.get("/", response_class=HTMLResponse) async def dashboard(request: Request) -> HTMLResponse: settings = get_settings() username = request.state.username is_admin = request.state.is_admin if is_admin: projects = await list_projects() # admin sees all else: accessible = await get_user_accessible_projects(username) projects = await list_projects(owner=username, accessible=accessible) ``` ```366:379:src/docsfy/storage.py async def list_projects( owner: str | None = None, accessible: list[tuple[str, str]] | None = None, ) -> list[dict[str, str | int | None]]: async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row if owner is not None and accessible and len(accessible) > 0: # Build OR conditions for each (name, owner) pair conditions = ["(owner = ?)"] params: list[str] = [owner] for proj_name, proj_owner in accessible: conditions.append("(name = ? AND owner = ?)") ``` Visibility checks return `404` for unauthorized project access (to avoid existence leaks), not `403`: ```194:207:src/docsfy/main.py async def _check_ownership( request: Request, project_name: str, project: dict[str, Any] ) -> None: """Raise 404 if the requesting user does not own the project (unless admin).""" if request.state.is_admin: return project_owner = str(project.get("owner", "")) if project_owner == request.state.username: return # Check if user has been granted access (scoped by project_owner) access = await get_project_access(project_name, project_owner=project_owner) if request.state.username in access: return raise HTTPException(status_code=404, detail="Not found") ``` ```580:608:tests/test_auth.py async def test_non_owner_cannot_access_project(_init_db: None) -> None: """Non-admin user should not see projects owned by others.""" from docsfy.main import _generating, app from docsfy.storage import create_user, save_project _generating.clear() _, bob_key = await create_user("bob-noowner") await save_project( name="secret-proj", repo_url="https://github.com/org/secret.git", ai_provider="claude", ai_model="opus", owner="alice-owner2", ) transport = ASGITransport(app=app) async with AsyncClient( transport=transport, base_url="http://test", headers={"Authorization": f"Bearer {bob_key}"}, ) as ac: # GET /api/projects/{name} - returns 404 to avoid leaking existence response = await ac.get("/api/projects/secret-proj") assert response.status_code == 404 # GET /api/projects/{name}/{provider}/{model} response = await ac.get("/api/projects/secret-proj/claude/opus") assert response.status_code == 404 ``` ### Route Behavior Matrix (Non-Admin) - Grant-aware routes: - `/` - `/api/status` - `/status/{name}/{provider}/{model}` - `/api/projects/{name}/{provider}/{model}` - `/api/projects/{name}/{provider}/{model}/download` - `/docs/{project}/{provider}/{model}/{path:path}` - Owner-only (for non-admin) routes: - `/api/projects/{name}` - `/api/projects/{name}/download` - `/docs/{project}/{path:path}` Evidence for owner-only generic routes: ```1115:1123:src/docsfy/main.py @app.get("/api/projects/{name}") async def get_project_details(request: Request, name: str) -> dict[str, Any]: name = _validate_project_name(name) if request.state.is_admin: variants = await list_variants(name) else: variants = await list_variants(name, owner=request.state.username) if not variants: raise HTTPException(status_code=404, detail=f"Project '{name}' not found") ``` ```1158:1165:src/docsfy/main.py @app.get("/api/projects/{name}/download") async def download_project(request: Request, name: str) -> StreamingResponse: name = _validate_project_name(name) if request.state.is_admin: latest = await get_latest_variant(name) else: latest = await get_latest_variant(name, owner=request.state.username) ``` ```1406:1418:src/docsfy/main.py @app.get("/docs/{project}/{path:path}") async def serve_docs( request: Request, project: str, path: str = "index.html" ) -> FileResponse: """Serve the most recently generated variant.""" if not path or path == "/": path = "index.html" project = _validate_project_name(project) if request.state.is_admin: latest = await get_latest_variant(project) else: latest = await get_latest_variant(project, owner=request.state.username) ``` > **Tip:** For shared access, use variant-specific URLs (`/docs/{project}/{provider}/{model}/...` and `/api/projects/{name}/{provider}/{model}...`) because those routes resolve owner grants via `_resolve_project`. ## Revocation and Cleanup Semantics Revocation is enforced at route level, not just UI hiding: ```2207:2228:test-plans/e2e-ui-test-plan.md **Try accessing docs directly:** ``` agent-browser eval "fetch('/docs/for-testing-only/gemini/gemini-2.5-flash/index.html', {credentials:'same-origin'}).then(r => r.status)" ``` **Try accessing status page directly:** ``` agent-browser eval "fetch('/status/for-testing-only/gemini/gemini-2.5-flash', {credentials:'same-origin'}).then(r => r.status)" ``` **Try accessing download API directly:** ``` agent-browser eval "fetch('/api/projects/for-testing-only/gemini/gemini-2.5-flash/download', {credentials:'same-origin'}).then(r => r.status)" ``` **Check:** All direct URL accesses return 404, not just hidden from the dashboard. **Expected result:** - Docs endpoint returns `404` - Status page endpoint returns `404` - Download API endpoint returns `404` - Revocation is enforced at the route level, not just UI level ``` `docsfy` also performs ACL cleanup when data is deleted: ```453:480:src/docsfy/storage.py async def delete_project( name: str, ai_provider: str = "", ai_model: str = "", owner: str | None = None ) -> bool: async with aiosqlite.connect(DB_PATH) as db: query = ( "DELETE FROM projects WHERE name = ? AND ai_provider = ? AND ai_model = ?" ) params: list[str] = [name, ai_provider, ai_model] if owner is not None: query += " AND owner = ?" params.append(owner) cursor = await db.execute(query, params) # Clean up project_access if no more variants remain for this name+owner if cursor.rowcount > 0 and owner is not None: remaining = await db.execute( "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?", (name, owner), ) row = await remaining.fetchone() if row and row[0] == 0: await db.execute( "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?", (name, owner), ) await db.commit() return cursor.rowcount > 0 ``` ```646:657:src/docsfy/storage.py async def delete_user(username: str) -> bool: """Delete a user by username, invalidating all their sessions and cleaning up ACLs.""" async with aiosqlite.connect(DB_PATH) as db: await db.execute("DELETE FROM sessions WHERE username = ?", (username,)) # Clean up owned projects and their access entries await db.execute("DELETE FROM projects WHERE owner = ?", (username,)) await db.execute( "DELETE FROM project_access WHERE project_owner = ?", (username,) ) # Clean up ACL entries where user was granted access await db.execute("DELETE FROM project_access WHERE username = ?", (username,)) ``` ```446:470:tests/test_storage.py async def test_delete_project_cleans_up_access(db_path: Path) -> None: from docsfy.storage import ( delete_project, get_project_access, grant_project_access, save_project, ) await save_project( name="cleanup-proj", repo_url="https://github.com/org/repo.git", ai_provider="claude", ai_model="opus", owner="testuser", ) await grant_project_access("cleanup-proj", "alice", project_owner="testuser") # Delete the only variant await delete_project( "cleanup-proj", ai_provider="claude", ai_model="opus", owner="testuser" ) # Access entries should be cleaned up users = await get_project_access("cleanup-proj", project_owner="testuser") assert len(users) == 0 ``` ## Viewer and Read-Only Behavior with Grants Viewers can see assigned projects, but write operations remain blocked. ```185:191:src/docsfy/main.py def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ```1481:1492:src/docsfy/templates/dashboard.html {% if variant.status == 'ready' %}
View Docs Download {% if role != 'viewer' %} {% endif %}
{% if role != 'viewer' %} {{ regen_controls(variant, repo_name, default_provider, default_model, known_models) }} {% endif %} ``` ```668:700:tests/test_auth.py async def test_viewer_sees_assigned_projects(_init_db: None) -> None: """A viewer with granted access should see assigned projects.""" from docsfy.main import _generating, app from docsfy.storage import create_user, grant_project_access, save_project _generating.clear() _, viewer_key = await create_user("viewer-assigned", role="viewer") # Create a project owned by someone else await save_project( name="assigned-proj", repo_url="https://github.com/org/assigned.git", ai_provider="claude", ai_model="opus", owner="other-owner", ) # Grant viewer access to the project (scoped by project owner) await grant_project_access( "assigned-proj", "viewer-assigned", project_owner="other-owner" ) transport = ASGITransport(app=app) async with AsyncClient( transport=transport, base_url="http://test", headers={"Authorization": f"Bearer {viewer_key}"}, ) as ac: response = await ac.get("/api/status") assert response.status_code == 200 projects = response.json()["projects"] project_names = [p["name"] for p in projects] assert "assigned-proj" in project_names ``` ## Required Configuration `ADMIN_KEY` is mandatory and must be at least 16 characters. ```80:89:src/docsfy/main.py @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` ```16:22:src/docsfy/config.py admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```1:8:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` ```6:8:docker-compose.yaml env_file: .env volumes: - ./data:/data ``` > **Warning:** Keep `SECURE_COOKIES=true` outside local HTTP development; admin APIs and grants are protected by authenticated sessions/bearer auth. ## Validation Coverage Automated tests are configured through `tox`: ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` --- Source: api-key-rotation.md # API Key Rotation docsfy supports two API key rotation flows: - **Self-service rotation** for the currently authenticated user. - **Admin-initiated rotation** for any target user. In the UI, API keys are labeled as **Password**, but server-side auth and storage use API key semantics. > **Note:** Login uses `username` + `api_key`, and rotation responses return `new_api_key`. ```163:167:src/docsfy/templates/login.html ``` ## Rotation Paths | Path | Endpoint | Who can use it | `new_key` behavior | Session effect | |---|---|---|---|---| | Self-service | `POST /api/me/rotate-key` | Authenticated DB users (`admin`, `user`, `viewer`) | Optional; omit to auto-generate | All user sessions invalidated; current browser cookie cleared | | Admin-initiated | `POST /api/admin/users/{username}/rotate-key` | Admin only | Optional; omit to auto-generate | All target user sessions invalidated | ### Self-Service Rotation ```1318:1353:src/docsfy/main.py @app.post("/api/me/rotate-key") async def rotate_own_key(request: Request) -> JSONResponse: """User rotates their own API key.""" # Don't call _require_write_access -- viewers should be able to change their password if request.state.is_admin and not request.state.user: raise HTTPException( status_code=400, detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.", ) body = await request.json() custom_key = body.get("new_key") # Optional -- if provided, use it username = request.state.username try: new_key = await rotate_user_key(username, custom_key=custom_key) except ValueError as exc: raise HTTPException(status_code=400, detail=str(exc)) from exc logger.info(f"[AUDIT] User '{username}' rotated their own API key") # Clear current session -- user must re-login with new key session_token = request.cookies.get("docsfy_session") if session_token: await delete_session(session_token) settings = get_settings() response = JSONResponse( content={"username": username, "new_api_key": new_key}, headers={"Cache-Control": "no-store"}, ) response.delete_cookie( "docsfy_session", httponly=True, samesite="strict", secure=settings.secure_cookies, ) return response ``` The dashboard calls this endpoint from the **Change Password** action: ```2432:2460:src/docsfy/templates/dashboard.html async function rotateOwnKey() { var newKey = await modalPrompt('Change Password', 'Enter new password (min 16 characters), or leave empty to auto-generate:', 'Minimum 16 characters', '', 'password'); if (newKey === null) return; // cancelled var body = {}; if (newKey.trim()) { if (newKey.trim().length < 16) { await modalAlert('Invalid Password', 'Password must be at least 16 characters long.'); return; } body.new_key = newKey.trim(); } try { var resp = await fetch('/api/me/rotate-key', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify(body), }); // ... await modalAlert('Password Changed', 'Your new password (save it now!):\n\n' + data.new_api_key + '\n\nYou will be redirected to login.'); window.location.href = '/login'; } catch (err) { await modalAlert('Error', 'Failed: ' + err.message); } } ``` > **Tip:** Leave `new_key` empty to let the server generate a strong random key (`docsfy_...`). > **Warning:** If you are authenticated via the `ADMIN_KEY` super-admin identity, self-service rotation is blocked. Rotate `ADMIN_KEY` in environment/config instead. ### Admin-Initiated Rotation ```1356:1374:src/docsfy/main.py @app.post("/api/admin/users/{username}/rotate-key") async def admin_rotate_key(request: Request, username: str) -> JSONResponse: """Admin rotates a user's API key.""" _require_admin(request) body = await request.json() custom_key = body.get("new_key") try: new_key = await rotate_user_key(username, custom_key=custom_key) except ValueError as exc: detail = str(exc) status = 404 if "not found" in detail else 400 raise HTTPException(status_code=status, detail=detail) from exc logger.info( f"[AUDIT] Admin '{request.state.username}' rotated API key for user '{username}'" ) return JSONResponse( content={"username": username, "new_api_key": new_key}, headers={"Cache-Control": "no-store"}, ) ``` Admin UI trigger: ```584:602:src/docsfy/templates/admin.html var newKey = await modalPrompt("Change Password", "Enter new password for '" + username + "' (min 16 characters), or leave empty to auto-generate:", "Minimum 16 characters", "", "password"); if (newKey === null) return; var body = {}; if (newKey.trim()) { if (newKey.trim().length < 16) { showAlert('error', 'Password must be at least 16 characters long.'); return; } body.new_key = newKey.trim(); } fetch('/api/admin/users/' + encodeURIComponent(username) + '/rotate-key', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', redirect: 'error', body: JSON.stringify(body), }) ``` ## Validation Rules Server-side key validation is intentionally minimal and explicit: ```19:29:src/docsfy/storage.py MIN_KEY_LENGTH = 16 def validate_api_key(key: str) -> None: """Validate API key meets minimum requirements.""" if len(key) < MIN_KEY_LENGTH: msg = f"API key must be at least {MIN_KEY_LENGTH} characters long" raise ValueError(msg) ``` Startup validation for `ADMIN_KEY`: ```83:89:src/docsfy/main.py if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` What this means in practice: - `new_key` is optional. - If provided, it must be **at least 16 characters**. - There is no additional server-side complexity/character-class validation. - Admin rotation for a missing user returns `404`. ## Session Invalidation Behavior Key rotation invalidates sessions in storage, then self-service rotation also clears the current browser cookie. ```724:743:src/docsfy/storage.py async def rotate_user_key(username: str, custom_key: str | None = None) -> str: """Generate or set a new API key for a user. Returns the raw new key.""" if custom_key: validate_api_key(custom_key) raw_key = custom_key else: raw_key = generate_api_key() key_hash = hash_api_key(raw_key) async with aiosqlite.connect(DB_PATH) as db: cursor = await db.execute( "UPDATE users SET api_key_hash = ? WHERE username = ?", (key_hash, username), ) if cursor.rowcount == 0: msg = f"User '{username}' not found" raise ValueError(msg) # Invalidate all existing sessions for this user await db.execute("DELETE FROM sessions WHERE username = ?", (username,)) await db.commit() return raw_key ``` Session and cookie settings relevant to post-rotation re-authentication: ```21:22:src/docsfy/storage.py SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 ``` ```297:304:src/docsfy/main.py response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` Outcome summary: - Old API key stops authenticating immediately. - Existing sessions for that user are removed from the database. - Self-service rotation removes the current `docsfy_session` cookie and forces re-login. - Admin-initiated rotation logs out the target user(s), not the acting admin. ## Configuration ```1:2:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars ``` ```27:28:.env.example # Set to false for local HTTP development # SECURE_COOKIES=false ``` ```16:23:src/docsfy/config.py admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` `docsfy` stores only HMAC hashes of API keys, not raw keys: ```588:601:src/docsfy/storage.py def hash_api_key(key: str, hmac_secret: str = "") -> str: """Hash an API key with HMAC-SHA256 for storage. Uses ADMIN_KEY as the HMAC secret so that even if the source is read, keys cannot be cracked without the environment secret. """ # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will # invalidate all existing api_key_hash values, requiring all users to # regenerate their API keys. secret = hmac_secret or os.getenv("ADMIN_KEY", "") ``` > **Warning:** Rotating `ADMIN_KEY` changes the HMAC secret and invalidates all stored user key hashes. Plan a coordinated user key re-issuance. ## Verified Test Coverage Rotation behavior is covered by automated tests: ```709:745:tests/test_auth.py async def test_user_rotates_own_key(_init_db: None) -> None: """A user can rotate their own API key, invalidating the old one.""" # ... resp = await ac.post( "/api/me/rotate-key", cookies={"docsfy_session": cookie}, json={}, ) assert resp.status_code == 200 data = resp.json() assert "new_api_key" in data assert data["new_api_key"] != key # Old key should no longer work for login resp = await ac.post( "/login", data={"username": "rotatetest", "api_key": key}, follow_redirects=False, ) assert resp.status_code != 302 # login should fail ``` ```874:898:tests/test_auth.py async def test_reject_short_custom_key(_init_db: None) -> None: """A custom key shorter than 16 characters should be rejected.""" # ... resp = await ac.post( "/api/me/rotate-key", cookies={"docsfy_session": cookie}, json={"new_key": "short"}, ) assert resp.status_code == 400 assert "16 characters" in resp.json()["detail"] ``` ```770:775:tests/test_auth.py async def test_admin_rotates_nonexistent_user_key( admin_client: AsyncClient, ) -> None: """Admin rotating key for a non-existent user should return 404.""" resp = await admin_client.post("/api/admin/users/no-such-user/rotate-key", json={}) assert resp.status_code == 404 ``` Repository test runner config: ```5:7:tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` --- Source: security-controls.md # Security Controls `docsfy` implements layered controls around repository input, filesystem access, rendered HTML, and operational auditability. ## SSRF checks `/api/generate` applies two validation layers before cloning remote repositories: 1. **Schema-level URL validation** (`GenerateRequest`) limits accepted formats to Git-style HTTPS/SSH URLs. 2. **Runtime SSRF guard** (`_reject_private_url`) blocks localhost/private targets, including DNS names that resolve to private IPs. ```python # src/docsfy/models.py @field_validator("repo_url") @classmethod def validate_repo_url(cls, v: str | None) -> str | None: if v is None: return v https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$" ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$" if not re.match(https_pattern, v) and not re.match(ssh_pattern, v): msg = f"Invalid git repository URL: '{v}'" raise ValueError(msg) return v ``` ```python # src/docsfy/main.py if gen_request.repo_url: await _reject_private_url(gen_request.repo_url) # ... if hostname in ("localhost", "127.0.0.1", "::1", "0.0.0.0"): raise HTTPException( status_code=400, detail="Repository URL must not target localhost or private networks", ) # Check if hostname is an IP address in private range try: addr = ipaddress.ip_address(hostname) if not addr.is_global: raise HTTPException( status_code=400, detail="Repository URL must not target localhost or private networks", ) except ValueError: # hostname is a DNS name - resolve and check resolved = await loop.run_in_executor( None, socket.getaddrinfo, hostname, None, socket.AF_UNSPEC, socket.SOCK_STREAM ) for _family, _socktype, _proto, _canonname, sockaddr in resolved: ip_str = sockaddr[0] addr = ipaddress.ip_address(ip_str) if not addr.is_global: raise HTTPException( status_code=400, detail="Repository URL resolves to a private network address", ) ``` Test coverage includes explicit SSRF assertions: ```python # tests/test_main.py with pytest.raises(HTTPException) as exc_info: await _reject_private_url("https://evil.com/org/repo") assert exc_info.value.status_code == 400 response = await client.post( "/api/generate", json={"repo_url": "https://localhost/org/repo.git"}, ) assert response.status_code in (400, 422) ``` > **Note:** `_reject_private_url` is intentionally described in-code as **basic SSRF mitigation**; deeper controls (for example, DNS rebinding defenses) are expected at network/firewall layers. --- ## Path traversal protections Path safety is enforced at multiple points, not just at route parsing. ### 1) Route/project identifier validation Project names are constrained to alphanumeric + `.` `_` `-` patterns. ```python # src/docsfy/main.py def _validate_project_name(name: str) -> str: """Validate project name to prevent path traversal.""" if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name): raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'") return name ``` ### 2) Filesystem segment validation for project paths `owner`, `ai_provider`, and `ai_model` path segments are rejected if they contain traversal markers. ```python # src/docsfy/storage.py def _validate_owner(owner: str) -> str: """Validate owner segment to prevent path traversal.""" if not owner: return "_default" if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."): msg = f"Invalid owner: '{owner}'" raise ValueError(msg) return owner def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: # Sanitize path segments to prevent traversal for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]: if ( "/" in segment or "\\" in segment or ".." in segment or segment.startswith(".") ): msg = f"Invalid {segment_name}: '{segment}'" raise ValueError(msg) safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` ### 3) Canonical path boundary checks when serving docs Even with validated project names, requested file paths are resolved and forced to stay inside `site_dir`. ```python # src/docsfy/main.py file_path = site_dir / path try: file_path.resolve().relative_to(site_dir.resolve()) except ValueError as exc: raise HTTPException(status_code=403, detail="Access denied") from exc if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") return FileResponse(file_path) ``` ### 4) Slug validation before cache/file writes and deletes Generation/render steps reject or skip path-unsafe slugs. ```python # src/docsfy/generator.py if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug: msg = f"Invalid page slug: '{slug}'" raise ValueError(msg) # src/docsfy/renderer.py for slug, content in pages.items(): if "/" in slug or "\\" in slug or slug.startswith(".") or ".." in slug: logger.warning(f"Skipping invalid slug: {slug}") else: valid_pages[slug] = content ``` ```python # src/docsfy/main.py if ( "/" in slug or "\\" in slug or ".." in slug or slug.startswith(".") ): logger.warning( f"[{project_name}] Skipping invalid slug from incremental planner: {slug}" ) continue cache_file = cache_dir / f"{slug}.md" try: cache_file.resolve().relative_to(cache_dir.resolve()) except ValueError: logger.warning(f"[{project_name}] Path traversal attempt in slug: {slug}") continue ``` --- ## HTML sanitization AI-generated markdown is converted to HTML and then sanitized before rendering. ### Sanitization behavior - Removes `", "", html, flags=re.DOTALL | re.IGNORECASE ) # Remove iframe, object, embed, form tags for tag in ["iframe", "object", "embed", "form"]: html = re.sub( rf"<{tag}[^>]*>.*?", "", html, flags=re.DOTALL | re.IGNORECASE ) html = re.sub(rf"<{tag}[^>]*/>", "", html, flags=re.IGNORECASE) # Remove event handler attributes html = re.sub(r'\s+on\w+\s*=\s*["\'][^"\']*["\']', "", html, flags=re.IGNORECASE) html = re.sub(r"\s+on\w+\s*=\s*\S+", "", html, flags=re.IGNORECASE) # href/src allowlist; block non-allowed schemes by rewriting to "#" # ... ``` ```python # src/docsfy/renderer.py def _md_to_html(md_text: str) -> tuple[str, str]: md = markdown.Markdown( extensions=["fenced_code", "codehilite", "tables", "toc"], extension_configs={ "codehilite": {"css_class": "highlight", "guess_lang": False}, "toc": {"toc_depth": "2-3"}, }, ) content_html = _sanitize_html(md.convert(md_text)) toc_html = getattr(md, "toc", "") return content_html, toc_html ``` `page` rendering uses `|safe` intentionally, after sanitization: ```html {{ content | safe }} ``` Automated tests validate the sanitizer behavior: ```python # tests/test_renderer.py result = _sanitize_html("x") assert "javascript:" not in result result = _sanitize_html('') assert "onerror" not in result content_html, _ = _md_to_html('# Title\n\n\n\nSafe content.') assert " **Warning:** Sanitization is regex-based in `renderer.py`; keep dependency and test updates frequent, because browser parsing edge cases evolve over time. --- ## Audit logging points Security-sensitive actions are logged with a consistent `[AUDIT]` prefix. ### Logged events | Area | Endpoint / action | Logged message pattern | |---|---|---| | Failed authentication | `POST /login` (invalid creds) | `"[AUDIT] Failed login attempt for username '...'"` | | User lifecycle | `POST /api/admin/users`, `DELETE /api/admin/users/{username}` | Admin actor + target username + role | | Access control changes | `POST /api/admin/projects/{name}/access`, `DELETE /api/admin/projects/{name}/access/{username}` | Admin actor + target user + project + owner scope | | Key rotation | `POST /api/me/rotate-key`, `POST /api/admin/users/{username}/rotate-key` | Actor + target username | ```python # src/docsfy/main.py safe_username = username.replace("\n", "").replace("\r", "")[:100] logger.info(f"[AUDIT] Failed login attempt for username '{safe_username}'") logger.info( f"[AUDIT] User '{request.state.username}' created user '{username}' with role '{role}'" ) logger.info(f"[AUDIT] User '{request.state.username}' deleted user '{username}'") logger.info( f"[AUDIT] Admin '{request.state.username}' granted '{username}' access to '{name}' (owner: '{project_owner}')" ) logger.info( f"[AUDIT] Admin '{request.state.username}' revoked '{username}' access to '{name}' (owner: '{project_owner}')" ) logger.info(f"[AUDIT] User '{username}' rotated their own API key") logger.info( f"[AUDIT] Admin '{request.state.username}' rotated API key for user '{username}'" ) ``` > **Tip:** Route `[AUDIT]` records to centralized logging/SIEM and alert on repeated failed logins, key rotations, and privilege/access changes. --- ## Security-relevant configuration and pipeline checks ### Runtime configuration ```python # src/docsfy/main.py if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` ```env # .env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # Set to false for local HTTP development # SECURE_COOKIES=false ``` ### Pre-commit/CI security gates ```yaml # .pre-commit-config.yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks hooks: - id: detect-private-key - repo: https://github.com/Yelp/detect-secrets hooks: - id: detect-secrets - repo: https://github.com/gitleaks/gitleaks hooks: - id: gitleaks ``` ```toml # tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ```toml # .gitleaks.toml [extend] useDefault = true ``` > **Note:** No repository-hosted workflow files (`.github/workflows`, `.gitlab-ci.yml`, or `Jenkinsfile`) are present; these checks are configured for pre-commit and can be enforced by external CI orchestration. --- Source: api-authentication.md # Authentication Endpoints docsfy supports two authentication mechanisms: 1. **Bearer API key** (recommended for API clients) 2. **Session cookie** (`docsfy_session`, used by browser login flow) All routes are protected by middleware **except** `/login`, `/login/`, and `/health`. ```108:115:src/docsfy/main.py # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) async def dispatch( self, request: Request, call_next: RequestResponseEndpoint ) -> Response: if request.url.path in self._PUBLIC_PATHS: return await call_next(request) ``` ## Endpoint Reference | Endpoint | Method | Auth Required | Purpose | Success Behavior | |---|---|---|---|---| | `/login` | `GET` | No | Render login page | `200` HTML | | `/login` | `POST` | No | Authenticate username + API key, create session | `302` redirect to `/`, sets `docsfy_session` cookie | | `/logout` | `GET` | Yes | Invalidate session and clear cookie | `302` redirect to `/login`, deletes `docsfy_session` cookie | | `/health` | `GET` | No | Liveness endpoint | `200` JSON | > **Tip:** For programmatic clients, use `/api/*` routes. Unauthenticated API calls return JSON `401`, while non-API paths redirect to `/login`. ```151:155:src/docsfy/main.py if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) ``` ## `POST /login` Details `POST /login` reads **form fields** (not JSON): `username` and `api_key`. ```157:167:src/docsfy/templates/login.html
``` Authentication logic: - Admin login requires **both**: - `username == "admin"` - `api_key == ADMIN_KEY` - User login requires: - `api_key` matches a stored user key - that key belongs to the submitted `username` ```283:305:src/docsfy/main.py # Check admin -- username must be "admin" and key must match if username == "admin" and api_key == settings.admin_key: is_admin = True authenticated = True else: # Check user key -- verify username matches the key's owner user = await get_user_by_key(api_key) if user and user["username"] == username: authenticated = True is_admin = user.get("role") == "admin" if authenticated: session_token = await create_session(username, is_admin=is_admin) response = RedirectResponse(url="/", status_code=302) response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` Failed login returns `401` with login HTML and `"Invalid username or password"`. ## `GET /logout` Details `GET /logout`: 1. Reads `docsfy_session` cookie 2. Deletes the server-side session record 3. Deletes the cookie 4. Redirects to `/login` ```317:331:src/docsfy/main.py @app.get("/logout") async def logout(request: Request) -> RedirectResponse: """Clear the session cookie, delete session from DB, and redirect to login.""" session_token = request.cookies.get("docsfy_session") if session_token: await delete_session(session_token) settings = get_settings() response = RedirectResponse(url="/login", status_code=302) response.delete_cookie( "docsfy_session", httponly=True, samesite="strict", secure=settings.secure_cookies, ) ``` ## Cookie and Session Behavior - Cookie name: `docsfy_session` - Cookie attributes on login: - `HttpOnly` - `SameSite=Strict` - `Secure` controlled by `secure_cookies` - `Max-Age=28800` (8 hours) - Session token is **opaque** and generated with `secrets.token_urlsafe(32)` - Database stores a **SHA-256 hash** of session token, not raw token - Session lookup enforces expiration (`expires_at > datetime('now')`) ```21:23:src/docsfy/storage.py SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 ``` ```681:710:src/docsfy/storage.py def _hash_session_token(token: str) -> str: """Hash a session token for storage.""" return hashlib.sha256(token.encode()).hexdigest() async def create_session( username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS ) -> str: """Create an opaque session token.""" token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours) expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S") async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", (token_hash, username, 1 if is_admin else 0, expires_str), ) async def get_session(token: str) -> dict[str, str | int | None] | None: """Look up a session. Returns None if expired or not found.""" token_hash = _hash_session_token(token) async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row cursor = await db.execute( "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')", ``` > **Note:** Middleware checks `Authorization: Bearer ...` **before** checking `docsfy_session`. If both are present, Bearer token path is evaluated first. ```122:136:src/docsfy/main.py # 1. Check Authorization header (API clients) auth_header = request.headers.get("authorization", "") if auth_header.startswith("Bearer "): token = auth_header[7:] if token == settings.admin_key: is_admin = True username = "admin" else: user = await get_user_by_key(token) # 2. Check session cookie (browser) -- opaque session token if not user and not is_admin: session_token = request.cookies.get("docsfy_session") ``` ## API Client Auth Requirements For API clients, send: - `Authorization: Bearer ` Accepted tokens: - `ADMIN_KEY` (full admin access) - User API key (role-based access) Role gates: - `admin`, `user` => write endpoints allowed - `viewer` => read-only - Admin endpoints require admin privileges ```185:191:src/docsfy/main.py def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ``` ```1203:1207:src/docsfy/main.py def _require_admin(request: Request) -> None: """Raise 403 if the user is not an admin.""" if not request.state.is_admin: raise HTTPException(status_code=403, detail="Admin access required") ``` ## Configuration for Authentication `ADMIN_KEY` is mandatory and must be at least 16 characters. `SECURE_COOKIES` defaults to secure behavior. ```1:2:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars ``` ```27:29:.env.example # Set to false for local HTTP development # SECURE_COOKIES=false ``` ```83:89:src/docsfy/main.py if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` > **Warning:** `secure_cookies` defaults to `True`; on plain HTTP local development, browser session cookies may not be set/sent unless `SECURE_COOKIES=false` is configured. ## Code-Backed Client Examples Login via form and receive session cookie: ```101:111:tests/test_auth.py async def test_login_with_admin_key(unauthed_client: AsyncClient) -> None: """POST /login with the admin key should set a session cookie and redirect.""" response = await unauthed_client.post( "/login", data={"username": "admin", "api_key": TEST_ADMIN_KEY}, follow_redirects=False, ) assert response.status_code == 302 assert response.headers["location"] == "/" assert "docsfy_session" in response.cookies ``` Bearer auth for API access: ```157:179:tests/test_auth.py async def test_api_bearer_auth(admin_client: AsyncClient) -> None: """Requests with a valid Bearer token should succeed.""" response = await admin_client.get("/api/status") assert response.status_code == 200 assert "projects" in response.json() async def test_api_bearer_auth_user_key(_init_db: None) -> None: """Requests with a valid user Bearer token should succeed.""" from docsfy.main import _generating, app from docsfy.storage import create_user _generating.clear() _username, raw_key = await create_user("bob") ``` Unauthenticated API request behavior: ```87:93:tests/test_auth.py async def test_api_returns_401_when_unauthenticated( unauthed_client: AsyncClient, ) -> None: """API requests without auth should return 401.""" response = await unauthed_client.get("/api/status") assert response.status_code == 401 assert response.json()["detail"] == "Unauthorized" ``` Auth contract is continuously validated by the test suite executed via `tox`: ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` --- Source: api-generation.md # Generation Endpoints `docsfy` generation is asynchronous: `POST /api/generate` accepts a request, schedules background work, and returns immediately. You then poll status endpoints until the variant reaches `ready`, `error`, or `aborted`. > **Note:** Generation is scoped by **owner + project name + provider + model**. Two different users can generate the same repo/model combination without colliding. ## Endpoint Summary | Method | Path | Purpose | |---|---|---| | `POST` | `/api/generate` | Start generation for a repo variant | | `POST` | `/api/projects/{name}/{provider}/{model}/abort` | Abort an active generation for one variant | | `POST` | `/api/projects/{name}/abort` | Legacy abort endpoint (name-only matching) | | `GET` | `/api/status` | List visible projects + `known_models` for UI suggestions | | `GET` | `/api/projects/{name}/{provider}/{model}` | Poll a single variant’s detailed status | ## Auth and Write Permissions All `/api/*` endpoints require authentication. Generation and abort endpoints also require write access (`admin` or `user` role). ```151:191:src/docsfy/main.py if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ## `POST /api/generate` ### Request Schema ```10:64:src/docsfy/models.py class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" ) @model_validator(mode="after") def validate_source(self) -> GenerateRequest: if not self.repo_url and not self.repo_path: msg = "Either 'repo_url' or 'repo_path' must be provided" raise ValueError(msg) if self.repo_url and self.repo_path: msg = "Provide either 'repo_url' or 'repo_path', not both" raise ValueError(msg) return self @field_validator("repo_url") @classmethod def validate_repo_url(cls, v: str | None) -> str | None: if v is None: return v https_pattern = r"^https?://[\w.\-]+/[\w.\-]+/[\w.\-]+(\.git)?$" ssh_pattern = r"^git@[\w.\-]+:[\w.\-]+/[\w.\-]+(\.git)?$" if not re.match(https_pattern, v) and not re.match(ssh_pattern, v): msg = f"Invalid git repository URL: '{v}'" raise ValueError(msg) return v @field_validator("repo_path") @classmethod def validate_repo_path(cls, v: str | None) -> str | None: if v is None: return v path = Path(v) if not path.is_absolute(): msg = "repo_path must be an absolute path" raise ValueError(msg) return v @property def project_name(self) -> str: if self.repo_url: name = self.repo_url.rstrip("/").split("/")[-1] if name.endswith(".git"): name = name[:-4] return name if self.repo_path: return Path(self.repo_path).resolve().name return "unknown" ``` ### Field Behavior | Field | Type | Required | Validation | Effective default | |---|---|---|---|---| | `repo_url` | `string \| null` | One of `repo_url` or `repo_path` is required | Must match HTTPS/HTTP or SSH git URL pattern | None | | `repo_path` | `string \| null` | One of `repo_url` or `repo_path` is required | Must be absolute path; endpoint also checks path exists and has `.git` | None | | `ai_provider` | `claude \| gemini \| cursor \| null` | Optional | Literal enum in schema + server-side runtime check | `AI_PROVIDER` | | `ai_model` | `string \| null` | Optional in body | Must be non-empty after fallback | `AI_MODEL` | | `ai_cli_timeout` | `int \| null` | Optional | `> 0` | `AI_CLI_TIMEOUT` | | `force` | `bool` | Optional | none | `false` | ### Actual request body shape (dashboard client) ```2043:2056:src/docsfy/templates/dashboard.html var body = { repo_url: repoUrl, ai_provider: provider, force: force }; if (model) body.ai_model = model; fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, credentials: 'same-origin', redirect: 'manual', body: JSON.stringify(body) }) ``` ### Success response ```73:83:tests/test_main.py async def test_generate_endpoint_starts_generation(client: AsyncClient) -> None: with patch("docsfy.main.asyncio.create_task") as mock_task: mock_task.side_effect = lambda coro: coro.close() response = await client.post( "/api/generate", json={"repo_url": "https://github.com/org/repo.git"}, ) assert response.status_code == 202 body = response.json() assert body["project"] == "repo" assert body["status"] == "generating" ``` Response shape: ```json { "project": "", "status": "generating" } ``` ## Provider/Model Validation Provider and model are resolved from request first, then environment defaults: ```455:467:src/docsfy/main.py ai_provider = gen_request.ai_provider or settings.ai_provider ai_model = gen_request.ai_model or settings.ai_model project_name = gen_request.project_name owner = request.state.username if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.") ``` Supported providers are explicitly tested: ```14:17:tests/test_ai_client.py assert "claude" in PROVIDERS assert "gemini" in PROVIDERS assert "cursor" in PROVIDERS assert VALID_AI_PROVIDERS == frozenset({"claude", "gemini", "cursor"}) ``` > **Note:** Model names are **not** checked against a strict server-side allowlist at request time; any non-empty string can pass input validation. Real compatibility is verified later by AI CLI availability checks. ## Conflict and Error Responses ### `POST /api/generate` | HTTP | Condition | Typical detail | |---|---|---| | `202` | Accepted; generation queued | `{"project":"...","status":"generating"}` | | `400` | Runtime validation failure | Invalid provider, empty effective model, bad local repo path, SSRF-protected URL | | `401` | Missing/invalid auth for `/api/*` | `Unauthorized` | | `403` | Viewer role or non-admin using `repo_path` | `Write access required.` / `Local repo path access requires admin privileges` | | `409` | Same owner/name/provider/model already generating | `Variant 'name/provider/model' is already being generated` | | `422` | Pydantic schema validation failure | Invalid URL, both/neither `repo_url` and `repo_path`, relative `repo_path`, bad enum, timeout <= 0 | Examples verified in tests: ```68:71:tests/test_main.py async def test_generate_endpoint_invalid_url(client: AsyncClient) -> None: response = await client.post("/api/generate", json={"repo_url": "not-a-url"}) assert response.status_code == 422 ``` ```129:145:tests/test_main.py async def test_generate_duplicate_variant(client: AsyncClient) -> None: """Test that generating the same variant twice returns 409.""" from docsfy.main import _generating # gen_key format now includes owner: "owner/name/provider/model" _generating["admin/repo/claude/opus"] = asyncio.create_task(asyncio.sleep(100)) try: response = await client.post( "/api/generate", json={ "repo_url": "https://github.com/org/repo.git", "ai_provider": "claude", "ai_model": "opus", }, ) assert response.status_code == 409 ``` ```268:275:tests/test_main.py async def test_generate_rejects_private_url(client: AsyncClient) -> None: """Test that SSRF protection rejects private/localhost URLs.""" response = await client.post( "/api/generate", json={"repo_url": "https://localhost/org/repo.git"}, ) # Should be rejected by URL validation (either Pydantic or SSRF check) assert response.status_code in (400, 422) ``` ### Abort Endpoints (`/api/projects/.../abort`) ```569:621:src/docsfy/main.py @app.post("/api/projects/{name}/abort") async def abort_generation(request: Request, name: str) -> dict[str, str]: """Abort generation for any variant of the given project name. Kept for backward compatibility. Finds the first active generation matching the project name. """ _require_write_access(request) name = _validate_project_name(name) # Find active generation keys matching this project name matching_keys = [ key for key in _generating if len(key.split("/", 3)) == 4 and key.split("/", 3)[1] == name ] if request.state.is_admin and len(matching_keys) > 1: distinct_owners = {key.split("/", 3)[0] for key in matching_keys} if len(distinct_owners) > 1: raise HTTPException( status_code=409, detail="Multiple owners found for this variant, please specify owner", ) ... except asyncio.TimeoutError as exc: logger.warning(f"[{name}] Abort requested but cancellation still in progress") raise HTTPException( status_code=409, detail=f"Abort still in progress for '{name}'. Please retry shortly.", ) from exc ``` ```642:699:src/docsfy/main.py @app.post("/api/projects/{name}/{provider}/{model}/abort") async def abort_variant( request: Request, name: str, provider: str, model: str ) -> dict[str, str]: _require_write_access(request) ... if not task: ... if not task: raise HTTPException( status_code=404, detail="No active generation for this variant", ) ... except asyncio.TimeoutError as exc: logger.warning( f"[{gen_key}] Abort requested but cancellation still in progress" ) raise HTTPException( status_code=409, detail=f"Abort still in progress for '{gen_key}'. Please retry shortly.", ) from exc ``` > **Warning:** The name-only abort endpoint is legacy and can become ambiguous for admins when multiple owners have active generations for the same project name. ## Async Failures and Status Polling `/api/generate` only validates/enqueues. Runtime failures are reflected later in project status. ```720:744:src/docsfy/main.py async def _run_generation( repo_url: str | None, repo_path: str | None, project_name: str, ai_provider: str, ai_model: str, ai_cli_timeout: int, force: bool = False, owner: str = "", ) -> None: gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" try: cli_flags = ["--trust"] if ai_provider == "cursor" else None available, msg = await check_ai_cli_available( ai_provider, ai_model, cli_flags=cli_flags ) if not available: await update_project_status( project_name, ai_provider, ai_model, status="error", owner=owner, error_message=msg, ) return ``` ```803:812:src/docsfy/main.py except Exception as exc: logger.error(f"Generation failed for {project_name}: {exc}") await update_project_status( project_name, ai_provider, ai_model, status="error", owner=owner, error_message=str(exc), ) ``` ```409:419:src/docsfy/main.py @app.get("/api/status") async def status(request: Request) -> dict[str, Any]: if request.state.is_admin: projects = await list_projects() else: accessible = await get_user_accessible_projects(request.state.username) projects = await list_projects( owner=request.state.username, accessible=accessible ) known_models = await get_known_models() return {"projects": projects, "known_models": known_models} ``` Status values used by generation records: ```17:17:src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` > **Tip:** Use `GET /api/status` during polling and consume `known_models` to drive provider-specific model suggestions in clients. ## Configuration (Provider/Model/Timeout) Environment defaults in `.env`: ```1:8:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` Application defaults when env vars are unset: ```16:22:src/docsfy/config.py admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` `docker-compose` loads `.env` directly: ```1:8:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` --- Source: api-projects-and-variants.md # Project and Variant Endpoints docsfy exposes both project-level and variant-level endpoints: - **Project-level** endpoints use `/{name}` and either list, delete, abort, or download across variants. - **Variant-level** endpoints use `/{name}/{provider}/{model}` and target one exact variant. > **Tip:** Prefer variant-level endpoints in automation; project-level endpoints can select by owner/time and may be ambiguous in multi-owner setups. ## Authentication and Access All endpoints below are protected except `/login` and `/health`. API requests without auth return `401`, and write endpoints require `admin` or `user` role (viewers are read-only). ```105:155:src/docsfy/main.py class AuthMiddleware(BaseHTTPMiddleware): """Authenticate every request via Bearer token or session cookie.""" # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) ... if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) ``` ```185:191:src/docsfy/main.py def _require_write_access(request: Request) -> None: """Raise 403 if user is a viewer (read-only).""" if request.state.role not in ("admin", "user"): raise HTTPException( status_code=403, detail="Write access required.", ) ``` ## Endpoint Matrix | Operation | Project Endpoint | Variant Endpoint | Method | |---|---|---|---| | Status list | `/api/status` | — | `GET` | | Status page (HTML) | — | `/status/{name}/{provider}/{model}` | `GET` | | Details | `/api/projects/{name}` | `/api/projects/{name}/{provider}/{model}` | `GET` | | Delete | `/api/projects/{name}` | `/api/projects/{name}/{provider}/{model}` | `DELETE` | | Abort | `/api/projects/{name}/abort` (legacy) | `/api/projects/{name}/{provider}/{model}/abort` | `POST` | | Download | `/api/projects/{name}/download` | `/api/projects/{name}/{provider}/{model}/download` | `GET` | ## Variant Data Shape and Status Values Variant payloads map directly to the `projects` table columns. ```57:73:src/docsfy/storage.py CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` ```17:17:src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` ## Status Endpoints ### `GET /api/status` Returns: - `projects`: accessible variants - `known_models`: map of provider -> known ready models For non-admin users, this includes owned variants **plus granted-access variants**. ```409:419:src/docsfy/main.py @app.get("/api/status") async def status(request: Request) -> dict[str, Any]: if request.state.is_admin: projects = await list_projects() else: accessible = await get_user_accessible_projects(request.state.username) projects = await list_projects( owner=request.state.username, accessible=accessible ) known_models = await get_known_models() return {"projects": projects, "known_models": known_models} ``` ### `GET /status/{name}/{provider}/{model}` (HTML) Variant status UI page used by the dashboard/status flow. ```369:401:src/docsfy/main.py @app.get("/status/{name}/{provider}/{model}", response_class=HTMLResponse) async def project_status_page( request: Request, name: str, provider: str, model: str ) -> HTMLResponse: name = _validate_project_name(name) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model ) ... template = _jinja_env.get_template("status.html") html = template.render( project=project, plan_json=plan_json, total_pages=total_pages, known_models=known_models, default_provider=settings.ai_provider, default_model=settings.ai_model, ) return HTMLResponse(content=html) ``` ## Details Endpoints ### `GET /api/projects/{name}` Returns `{ "name": "...", "variants": [...] }`. - Admin: all owners’ variants for that name. - Non-admin: only variants owned by `request.state.username`. ### `GET /api/projects/{name}/{provider}/{model}` Returns one resolved variant object. ```1019:1124:src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}") async def get_variant_details( request: Request, name: str, provider: str, model: str, ) -> dict[str, str | int | None]: name = _validate_project_name(name) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model ) return project ... @app.get("/api/projects/{name}") async def get_project_details(request: Request, name: str) -> dict[str, Any]: name = _validate_project_name(name) if request.state.is_admin: variants = await list_variants(name) else: variants = await list_variants(name, owner=request.state.username) if not variants: raise HTTPException(status_code=404, detail=f"Project '{name}' not found") return {"name": name, "variants": variants} ``` > **Warning:** Variant resolution for admin can return `409` when the same `{name}/{provider}/{model}` exists under multiple owners. ```231:246:src/docsfy/main.py # 2. For admin, disambiguate by owner if request.state.is_admin: all_variants = await list_variants(name) matching = [ v for v in all_variants if v.get("ai_provider") == ai_provider and v.get("ai_model") == ai_model ] if not matching: raise HTTPException(status_code=404, detail="Not found") distinct_owners = {str(v.get("owner", "")) for v in matching} if len(distinct_owners) > 1: raise HTTPException( status_code=409, detail="Multiple owners found for this variant, please specify owner", ) ``` ## Deletion Endpoints ### `DELETE /api/projects/{name}/{provider}/{model}` - Requires write access. - Rejects deletion with `409` if generation is active for that variant. - Deletes DB record and variant directory. ### `DELETE /api/projects/{name}` - Requires write access. - Rejects with `409` if any variant with that project name is still generating. - Admin deletes **all** variants for that name (across owners); non-admin deletes only own variants. ```1034:1071:src/docsfy/main.py @app.delete("/api/projects/{name}/{provider}/{model}") async def delete_variant( request: Request, name: str, provider: str, model: str, ) -> dict[str, str]: _require_write_access(request) name = _validate_project_name(name) # Check for active generation (scan all keys) for key in _generating: ... raise HTTPException( status_code=409, detail=f"Cannot delete '{name}/{provider}/{model}' while generation is in progress. Abort first.", ) ... return {"deleted": f"{name}/{provider}/{model}"} ``` ```1127:1155:src/docsfy/main.py @app.delete("/api/projects/{name}") async def delete_project_endpoint(request: Request, name: str) -> dict[str, str]: _require_write_access(request) name = _validate_project_name(name) ... if request.state.is_admin: variants = await list_variants(name) else: variants = await list_variants(name, owner=request.state.username) ... return {"deleted": name} ``` ## Abort Endpoints ### `POST /api/projects/{name}/abort` (legacy) Backwards-compatible endpoint that aborts the first active generation matching project name. ### `POST /api/projects/{name}/{provider}/{model}/abort` Variant-specific abort endpoint. Both endpoints: - Require write access. - Return `404` if no active generation. - Can return `409` if cancellation is still in progress. - Update status to `aborted` with `error_message="Generation aborted by user"`. ```569:639:src/docsfy/main.py @app.post("/api/projects/{name}/abort") async def abort_generation(request: Request, name: str) -> dict[str, str]: """Abort generation for any variant of the given project name. ... _require_write_access(request) ... if not task or not matching_key: raise HTTPException( status_code=404, detail=f"No active generation for '{name}'" ) ... await update_project_status( name, ai_provider, ai_model, status="aborted", owner=key_owner, error_message="Generation aborted by user", current_stage=None, ) ... return {"aborted": name} ``` ```642:717:src/docsfy/main.py @app.post("/api/projects/{name}/{provider}/{model}/abort") async def abort_variant( request: Request, name: str, provider: str, model: str ) -> dict[str, str]: _require_write_access(request) ... if not task: ... if not task: raise HTTPException( status_code=404, detail="No active generation for this variant", ) ... return {"aborted": f"{name}/{provider}/{model}"} ``` UI integration example (URL-encoding each path segment): ```2162:2176:src/docsfy/templates/dashboard.html document.addEventListener('click', async function(e) { var abortBtn = e.target.closest('[data-abort-variant]'); if (!abortBtn) return; var composite = abortBtn.getAttribute('data-abort-variant'); // composite is "name/provider/model" var parts = composite.split('/'); var name = parts[0]; var provider = parts[1]; var model = parts.slice(2).join('/'); ... fetch('/api/projects/' + encodeURIComponent(name) + '/' + encodeURIComponent(provider) + '/' + encodeURIComponent(model) + '/abort', { method: 'POST', credentials: 'same-origin', redirect: 'manual' }) ``` ## Download Endpoints ### `GET /api/projects/{name}/{provider}/{model}/download` - Requires variant to be `ready`, else `400 "Variant not ready"`. - Streams `application/gzip`. - Filename: `{name}-{provider}-{model}-docs.tar.gz`. ### `GET /api/projects/{name}/download` - Selects latest ready variant (`last_generated DESC`). - Streams `application/gzip`. - Filename: `{name}-docs.tar.gz`. ```1074:1112:src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}/download") async def download_variant( request: Request, name: str, provider: str, model: str, ) -> StreamingResponse: ... if project["status"] != "ready": raise HTTPException(status_code=400, detail="Variant not ready") ... return StreamingResponse( _stream_and_cleanup(), media_type="application/gzip", headers={ "Content-Disposition": f'attachment; filename="{name}-{provider}-{model}-docs.tar.gz"' }, ) ``` ```1158:1194:src/docsfy/main.py @app.get("/api/projects/{name}/download") async def download_project(request: Request, name: str) -> StreamingResponse: ... if request.state.is_admin: latest = await get_latest_variant(name) else: latest = await get_latest_variant(name, owner=request.state.username) if not latest: raise HTTPException(status_code=404, detail=f"No ready variant for '{name}'") ... return StreamingResponse( _stream_and_cleanup(), media_type="application/gzip", headers={"Content-Disposition": f'attachment; filename="{name}-docs.tar.gz"'}, ) ``` Integration test coverage confirms both download routes return gzip content: ```138:146:tests/test_integration.py # Download via variant-specific route response = await client.get("/api/projects/test-repo/claude/opus/download") assert response.status_code == 200 assert response.headers["content-type"] == "application/gzip" # Download via latest-variant route response = await client.get("/api/projects/test-repo/download") assert response.status_code == 200 assert response.headers["content-type"] == "application/gzip" ``` ## Validation and Common Error Cases Project name is validated before project/variant operations: ```73:77:src/docsfy/main.py def _validate_project_name(name: str) -> str: """Validate project name to prevent path traversal.""" if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name): raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'") return name ``` Common errors: - `400`: invalid project name; variant download attempted before ready. - `401`: missing/invalid API auth for `/api/*`. - `403`: write action by `viewer`. - `404`: not found/not accessible/no active generation. - `409`: delete while generating; admin owner ambiguity; abort still cancelling. ## Relevant Configuration Snippets Auth/runtime settings affecting these endpoints: ```1:8:.env.example # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # AI Configuration AI_PROVIDER=claude # [1m] = 1 million token context window, this is a valid model identifier AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 ``` ```27:28:.env.example # Set to false for local HTTP development # SECURE_COOKIES=false ``` Operational health check (separate from project status API): ```9:13:docker-compose.yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` Test runner config used for endpoint coverage: ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** `/api/projects/{name}/abort` is intentionally retained for backward compatibility; new clients should prefer `/api/projects/{name}/{provider}/{model}/abort`. --- Source: api-admin.md # Admin Endpoints `docsfy` provides admin-only APIs for: - user lifecycle management - project access grants/revocations - API key rotation (user keys via API, `ADMIN_KEY` via config) Core route implementations live in `src/docsfy/main.py`, with persistence and validation in `src/docsfy/storage.py`. ## Authentication and Required Configuration Admin routes require `request.state.is_admin`. Middleware sets this when auth is one of: - `Authorization: Bearer ` - `Authorization: Bearer ` where the DB user role is `admin` - a valid admin `docsfy_session` cookie > **Note:** Unauthenticated `/api/*` calls return `401` with `{"detail":"Unauthorized"}`; authenticated non-admin calls to admin routes return `403` with `{"detail":"Admin access required"}`. Environment configuration from `.env.example`: ```bash # REQUIRED - Admin key for user management (minimum 16 characters) ADMIN_KEY=your-secure-admin-key-here-min-16-chars # Set to false for local HTTP development # SECURE_COOKIES=false ``` Container runtime wiring from `docker-compose.yaml`: ```yaml services: docsfy: env_file: .env volumes: - ./data:/data ``` ## Endpoint Index | Method | Path | Purpose | |---|---|---| | `GET` | `/admin` | Admin UI page (HTML) | | `POST` | `/api/admin/users` | Create user (returns generated API key once) | | `GET` | `/api/admin/users` | List users | | `DELETE` | `/api/admin/users/{username}` | Delete user | | `POST` | `/api/admin/projects/{name}/access` | Grant project access | | `GET` | `/api/admin/projects/{name}/access` | List project access | | `DELETE` | `/api/admin/projects/{name}/access/{username}` | Revoke project access | | `POST` | `/api/admin/users/{username}/rotate-key` | Admin rotates a user key | | `POST` | `/api/me/rotate-key` | Logged-in user rotates own key | ## User CRUD ### Create User: `POST /api/admin/users` Request JSON: - `username` (required) - `role` (optional, defaults to `user`; allowed: `admin`, `user`, `viewer`) Actual request code from `src/docsfy/templates/admin.html`: ```javascript const resp = await fetch("/api/admin/users", { method: "POST", headers: {"Content-Type": "application/json"}, credentials: "same-origin", redirect: "error", body: JSON.stringify({username: username, role: role}) }); ``` Actual success response from `src/docsfy/main.py`: ```python return JSONResponse( content={"username": username, "api_key": raw_key, "role": role}, headers={"Cache-Control": "no-store"}, ) ``` Validation behavior: - username `admin` is reserved (case-insensitive) - username regex: `^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$` - invalid role -> `400` - missing username -> `400` - DB insert failures (for example duplicate username) -> `400` ### List Users: `GET /api/admin/users` Returns: - `{"users": [...]}` Each row is selected as: - `id`, `username`, `role`, `created_at` `api_key_hash` is not returned. ### Delete User: `DELETE /api/admin/users/{username}` Actual request code from `src/docsfy/templates/admin.html`: ```javascript const resp = await fetch("/api/admin/users/" + encodeURIComponent(username), { method: "DELETE", credentials: "same-origin", redirect: "error", }); ``` Success response: - `{"deleted":""}` Guardrails and side effects: - admin cannot delete their own account (`400`) - storage cleanup deletes that user’s sessions, owned projects (DB rows), and ACL entries where they are owner or grantee > **Note:** User management supports create/list/delete. There is no dedicated endpoint for username rename or role update in place. ## Access Grant/Revoke/List Access is owner-scoped: grants are keyed by `project_name + project_owner + username`, so grants apply to all variants for that project name under that owner. ### Grant Access: `POST /api/admin/projects/{name}/access` Request JSON: - `username` (required) - `owner` (required) Route behavior: - verifies user exists - verifies project exists for that owner (`list_variants(name, owner=owner)`) - inserts grant via `grant_project_access(...)` Example from `test-plans/e2e-ui-test-plan.md`: ```javascript fetch('/api/admin/projects/for-testing-only/access', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify({username: 'testviewer-e2e', owner: 'testuser-e2e'}) }).then(r => r.json()).then(d => JSON.stringify(d)) ``` Success response shape: - `{"granted":"","username":"","owner":""}` ### List Access: `GET /api/admin/projects/{name}/access?owner=` Example from `test-plans/e2e-ui-test-plan.md`: ```javascript fetch('/api/admin/projects/for-testing-only/access?owner=testuser-e2e', {credentials:'same-origin'}).then(r => r.json()) ``` Success response shape: - `{"project":"","owner":"","users":[...]}` ### Revoke Access: `DELETE /api/admin/projects/{name}/access/{username}?owner=` Example from `test-plans/e2e-ui-test-plan.md`: ```javascript fetch('/api/admin/projects/for-testing-only/access/testviewer-e2e?owner=testuser-e2e', {method:'DELETE', credentials:'same-origin'}).then(r => r.status) ``` Success response shape: - `{"revoked":"","username":""}` > **Tip:** Always pass `owner` on revoke/list requests. The route reads owner from query params and applies owner-scoped ACL operations. ## Key Rotation Operations ### Rotate Own Key: `POST /api/me/rotate-key` Available to authenticated DB users (`admin`, `user`, `viewer`). Request JSON: - optional `new_key` - if omitted, server generates a new key - if provided, minimum length is 16 Actual dashboard request from `src/docsfy/templates/dashboard.html`: ```javascript var resp = await fetch('/api/me/rotate-key', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', body: JSON.stringify(body), }); ``` Behavior: - returns `{"username":"","new_api_key":""}` with `Cache-Control: no-store` - invalidates that user’s sessions - deletes current `docsfy_session` cookie (forces re-login) `ADMIN_KEY` super-admin sessions are explicitly rejected: ```python if request.state.is_admin and not request.state.user: raise HTTPException( status_code=400, detail="ADMIN_KEY users cannot rotate keys. Change the ADMIN_KEY env var instead.", ) ``` ### Admin Rotate User Key: `POST /api/admin/users/{username}/rotate-key` Admin-only endpoint to rotate another user’s key. Request JSON: - optional `new_key` (same min length rule) Actual admin panel request from `src/docsfy/templates/admin.html`: ```javascript fetch('/api/admin/users/' + encodeURIComponent(username) + '/rotate-key', { method: 'POST', headers: {'Content-Type': 'application/json'}, credentials: 'same-origin', redirect: 'error', body: JSON.stringify(body), }) ``` Behavior: - success: `{"username":"","new_api_key":""}` plus `Cache-Control: no-store` - unknown user: `404` - invalid custom key: `400` - all sessions for the target user are invalidated by storage logic ### Rotating `ADMIN_KEY` Itself (Config Operation) There is no API endpoint for rotating `ADMIN_KEY`; this is done in environment config and service restart. Startup guard from `src/docsfy/main.py`: ```python if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` HMAC linkage in `src/docsfy/storage.py`: ```python # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will # invalidate all existing api_key_hash values, requiring all users to # regenerate their API keys. secret = hmac_secret or os.getenv("ADMIN_KEY", "") ``` > **Warning:** Rotating `ADMIN_KEY` invalidates all existing DB user API keys. After restart, log in as `admin` with the new key and re-issue user keys (for example via `POST /api/admin/users/{username}/rotate-key`). ## Verification Notes This repository currently has no `.github/workflows` directory. Test automation entry point is `tox.toml`: ```toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` Relevant endpoint coverage is present in: - `tests/test_auth.py` (reserved username, self-delete guard, key rotation behavior) - `tests/test_storage.py` (ACL grant/revoke/list and cleanup behavior) - `test-plans/e2e-ui-test-plan.md` (end-to-end admin/access API usage examples) --- Source: api-doc-serving.md # Documentation Serving Routes `docsfy` serves generated documentation files through two authenticated `/docs` route patterns: | Route pattern | Purpose | Variant selection | |---|---|---| | `/docs/{project}/{provider}/{model}/{path}` | Serve a specific provider/model variant | Explicit (`provider` + `model`) | | `/docs/{project}/{path}` | Serve the most recently generated **ready** variant | Automatic (`last_generated DESC`, ready-only) | > **Warning:** Route declaration order matters. The variant-specific route must be registered before the generic `/docs/{project}/{path}` route, or variant URLs can be matched by the generic handler. ```1377:1435:src/docsfy/main.py # IMPORTANT: variant-specific route MUST be defined BEFORE the generic route # so FastAPI matches it first. @app.get("/docs/{project}/{provider}/{model}/{path:path}") async def serve_variant_docs( request: Request, project: str, provider: str, model: str, path: str = "index.html", ) -> FileResponse: if not path or path == "/": path = "index.html" project = _validate_project_name(project) proj = await _resolve_project( request, project, ai_provider=provider, ai_model=model ) proj_owner = str(proj.get("owner", "")) site_dir = get_project_site_dir(project, provider, model, proj_owner) file_path = site_dir / path try: file_path.resolve().relative_to(site_dir.resolve()) except ValueError as exc: raise HTTPException(status_code=403, detail="Access denied") from exc if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") return FileResponse(file_path) @app.get("/docs/{project}/{path:path}") async def serve_docs( request: Request, project: str, path: str = "index.html" ) -> FileResponse: """Serve the most recently generated variant.""" if not path or path == "/": path = "index.html" project = _validate_project_name(project) if request.state.is_admin: latest = await get_latest_variant(project) else: latest = await get_latest_variant(project, owner=request.state.username) if not latest: raise HTTPException(status_code=404, detail="No docs available") await _check_ownership(request, project, latest) latest_owner = str(latest.get("owner", "")) site_dir = get_project_site_dir( project, str(latest["ai_provider"]), str(latest["ai_model"]), latest_owner, ) file_path = site_dir / path try: file_path.resolve().relative_to(site_dir.resolve()) except ValueError as exc: raise HTTPException(status_code=403, detail="Access denied") from exc if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") return FileResponse(file_path) ``` ## Variant-Specific Serving (`/docs/{project}/{provider}/{model}/{path}`) This route serves files from an explicit variant directory. - Normalizes empty path or `/` to `index.html`. - Resolves variant with `_resolve_project(...)`. - Builds site directory with owner scoping (`get_project_site_dir(...)`). - Blocks path traversal with `resolve().relative_to(...)`. - Returns `404 File not found` if the file is missing. Variant resolution behavior: ```210:261:src/docsfy/main.py async def _resolve_project( request: Request, name: str, ai_provider: str, ai_model: str, ) -> dict[str, Any]: """Find a project variant, preferring the requesting user's owned copy. Raises 404 if not found or not accessible. """ # 1. Try owned by requesting user if not request.state.is_admin: proj = await get_project( name, ai_provider=ai_provider, ai_model=ai_model, owner=request.state.username, ) if proj: return proj # 2. For admin, disambiguate by owner if request.state.is_admin: all_variants = await list_variants(name) matching = [ v for v in all_variants if v.get("ai_provider") == ai_provider and v.get("ai_model") == ai_model ] if not matching: raise HTTPException(status_code=404, detail="Not found") distinct_owners = {str(v.get("owner", "")) for v in matching} if len(distinct_owners) > 1: raise HTTPException( status_code=409, detail="Multiple owners found for this variant, please specify owner", ) return matching[0] # 3. For non-admin, check granted access — find which owner granted access accessible = await get_user_accessible_projects(request.state.username) for proj_name, proj_owner in accessible: if proj_name == name and proj_owner: # Found a grant — look up this specific owner's variant proj = await get_project( name, ai_provider=ai_provider, ai_model=ai_model, owner=proj_owner ) if proj: return proj # 4. Not found raise HTTPException(status_code=404, detail="Not found") ``` > **Note:** Variant-specific serving can resolve owned variants and access-granted variants for non-admin users. ## Latest-Ready Serving (`/docs/{project}/{path}`) This route automatically picks the latest **ready** variant. - Uses `get_latest_variant(...)`. - Only considers `status='ready'`. - Orders by `last_generated DESC`. - Sets `last_generated` only when status becomes `ready`. ```295:330:src/docsfy/storage.py async def update_project_status( name: str, ai_provider: str, ai_model: str, status: str, owner: str | None = None, last_commit_sha: str | None = None, page_count: int | None = None, error_message: str | None = None, plan_json: str | None = None, current_stage: str | None | object = _UNSET, ) -> None: ... if status == "ready": fields.append("last_generated = CURRENT_TIMESTAMP") ... ``` ```552:569:src/docsfy/storage.py async def get_latest_variant( name: str, owner: str | None = None ) -> dict[str, str | int | None] | None: """Get the most recently generated ready variant for a repo.""" async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row if owner is not None: cursor = await db.execute( "SELECT * FROM projects WHERE name = ? AND owner = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1", (name, owner), ) else: cursor = await db.execute( "SELECT * FROM projects WHERE name = ? AND status = 'ready' ORDER BY last_generated DESC LIMIT 1", (name,), ) row = await cursor.fetchone() return dict(row) if row else None ``` Ordering is explicitly tested: ```378:392:tests/test_storage.py # Manually set last_generated to ensure deterministic ordering # (CURRENT_TIMESTAMP may resolve to the same second for both rows) async with aiosqlite.connect(DB_PATH) as db: await db.execute( "UPDATE projects SET last_generated = '2025-01-01 00:00:00' WHERE ai_provider = 'claude'" ) await db.execute( "UPDATE projects SET last_generated = '2025-01-02 00:00:00' WHERE ai_provider = 'gemini'" ) await db.commit() latest = await get_latest_variant("repo") assert latest is not None # gemini has a later last_generated timestamp assert latest["ai_provider"] == "gemini" ``` > **Warning:** For non-admin users, latest-route selection is owner-scoped (`owner=request.state.username`). If you rely on shared/access-granted projects, use the variant-specific `/docs/{project}/{provider}/{model}/...` route. ## What Files Are Served Under `/docs` The serving routes can return any generated file in the variant site directory, including HTML pages, markdown sources, search index JSON, LLM text files, and static assets. ```215:233:src/docsfy/renderer.py def render_site(plan: dict[str, Any], pages: dict[str, str], output_dir: Path) -> None: if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) assets_dir = output_dir / "assets" assets_dir.mkdir(exist_ok=True) # Prevent GitHub Pages from running Jekyll (output_dir / ".nojekyll").touch() ... if STATIC_DIR.exists(): for static_file in STATIC_DIR.iterdir(): if static_file.is_file(): shutil.copy2(static_file, assets_dir / static_file.name) ``` ```243:290:src/docsfy/renderer.py index_html = render_index(project_name, tagline, navigation, repo_url=repo_url) (output_dir / "index.html").write_text(index_html, encoding="utf-8") ... (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") search_index = _build_search_index(valid_pages, plan) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) # Generate llms.txt files llms_txt = _build_llms_txt(plan) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") llms_full_txt = _build_llms_full_txt(plan, valid_pages) (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` Templates link these files with relative paths, which are compatible with `/docs/...` static serving: ```8:10:src/docsfy/templates/page.html ``` ## Authentication and Safety Docs routes are protected by the same auth middleware as the rest of the app. - `/login` and `/health` are public. - Non-authenticated browser requests are redirected to `/login`. - Non-authenticated API requests return `401`. ```108:115:src/docsfy/main.py # Paths that do not require authentication _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) async def dispatch( self, request: Request, call_next: RequestResponseEndpoint ) -> Response: if request.url.path in self._PUBLIC_PATHS: return await call_next(request) ``` ```151:155:src/docsfy/main.py if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) return RedirectResponse(url="/login", status_code=302) ``` Project and filesystem path safety checks: ```73:77:src/docsfy/main.py def _validate_project_name(name: str) -> str: """Validate project name to prevent path traversal.""" if not _re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", name): raise HTTPException(status_code=400, detail=f"Invalid project name: '{name}'") return name ``` ```1396:1402:src/docsfy/main.py file_path = site_dir / path try: file_path.resolve().relative_to(site_dir.resolve()) except ValueError as exc: raise HTTPException(status_code=403, detail="Access denied") from exc if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail="File not found") ``` ## URL Construction for Provider/Model Provider/model values should be URL-encoded in links. The UI templates already do this. ```1483:1484:src/docsfy/templates/dashboard.html View Docs Download ``` ```1188:1190:src/docsfy/templates/status.html var viewBtn = document.createElement('a'); viewBtn.href = '/docs/' + encodeURIComponent(PROJECT_NAME) + '/' + encodeURIComponent(PROJECT_PROVIDER) + '/' + encodeURIComponent(PROJECT_MODEL) + '/'; viewBtn.target = '_blank'; ``` ```16:22:src/docsfy/config.py admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` > **Tip:** Keep route construction encoded, especially for model names containing characters like `[` and `]` (for example `claude-opus-4-6[1m]`). ## Verified Behavior and Ops Configuration Integration tests cover both serving paths: ```124:136:tests/test_integration.py # Check docs are served via variant-specific route response = await client.get("/docs/test-repo/claude/opus/index.html") assert response.status_code == 200 assert "test-repo" in response.text response = await client.get("/docs/test-repo/claude/opus/introduction.html") assert response.status_code == 200 assert "Welcome!" in response.text # Check docs are served via latest-variant route response = await client.get("/docs/test-repo/index.html") assert response.status_code == 200 assert "test-repo" in response.text ``` Deployment and test pipeline snippets relevant to `/docs` serving: ```1:13:docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` ```1:7:tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** No repository-level GitHub/GitLab/Jenkins workflow files are present; automated validation in this repo is defined via local/CI-friendly tooling (`tox`, `pytest`, `pre-commit`) plus container health checks. --- Source: api-health-status.md # Health and Status Endpoints Use these two endpoints for runtime health checks and UI state refresh: - `GET /health`: service liveness check - `GET /api/status`: authenticated project status feed for dashboard polling ## `GET /health` `/health` is a public endpoint and returns a minimal JSON payload. From `src/docsfy/main.py`: ```python @app.get("/health") async def health() -> dict[str, str]: return {"status": "ok"} ``` From `tests/test_auth.py`: ```python async def test_health_is_public(unauthed_client: AsyncClient) -> None: """The /health endpoint should be accessible without authentication.""" response = await unauthed_client.get("/health") assert response.status_code == 200 assert response.json()["status"] == "ok" ``` From `src/docsfy/main.py` (auth middleware public paths): ```python _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) ``` > **Note:** `/health` is intentionally lightweight and does not require login, Bearer token, or session cookie. ### Service-check configuration in this repository From `Dockerfile`: ```dockerfile HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ``` From `docker-compose.yaml`: ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` From `src/docsfy/main.py` (startup requirement): ```python if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) ``` > **Warning:** `/health` only confirms the app process/router is responding. It does not validate DB contents, generation state, or AI CLI availability. --- ## `GET /api/status` `/api/status` powers dashboard updates. It is authenticated and returns both project rows and model metadata. From `src/docsfy/main.py`: ```python @app.get("/api/status") async def status(request: Request) -> dict[str, Any]: if request.state.is_admin: projects = await list_projects() else: accessible = await get_user_accessible_projects(request.state.username) projects = await list_projects( owner=request.state.username, accessible=accessible ) known_models = await get_known_models() return {"projects": projects, "known_models": known_models} ``` ### Authentication and access behavior From `src/docsfy/main.py` (API auth failure path): ```python if not user and not is_admin: # Not authenticated if request.url.path.startswith("/api/"): return JSONResponse(status_code=401, content={"detail": "Unauthorized"}) ``` From `tests/test_auth.py`: ```python response = await unauthed_client.get("/api/status") assert response.status_code == 401 assert response.json()["detail"] == "Unauthorized" ``` From `src/docsfy/storage.py` (non-admin filtering logic): ```python if owner is not None and accessible and len(accessible) > 0: # Build OR conditions for each (name, owner) pair conditions = ["(owner = ?)"] params: list[str] = [owner] for proj_name, proj_owner in accessible: conditions.append("(name = ? AND owner = ?)") params.extend([proj_name, proj_owner]) query = f"SELECT * FROM projects WHERE {' OR '.join(conditions)} ORDER BY updated_at DESC" ``` From `tests/test_auth.py` (owner filtering is enforced): ```python response = await ac.get("/api/status") assert response.status_code == 200 projects = response.json()["projects"] assert len(projects) == 1 assert projects[0]["name"] == "alice-proj" ``` From `tests/test_auth.py` (granted viewer access is included): ```python response = await ac.get("/api/status") assert response.status_code == 200 projects = response.json()["projects"] project_names = [p["name"] for p in projects] assert "assigned-proj" in project_names ``` > **Warning:** `/api/status` is not a public health endpoint; unauthenticated calls return `401 {"detail":"Unauthorized"}`. ### Response structure `/api/status` returns: - `projects`: list of project variant rows (`SELECT * FROM projects`, ordered by `updated_at DESC`) - `known_models`: provider->models map derived from completed variants From `src/docsfy/storage.py` (project schema): ```sql CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` From `src/docsfy/storage.py` (valid `status` values): ```python VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` From `src/docsfy/storage.py` (`known_models` population): ```python cursor = await db.execute( "SELECT DISTINCT ai_provider, ai_model FROM projects WHERE ai_provider != '' AND ai_model != '' AND status = 'ready' ORDER BY ai_provider, ai_model" ) ``` From `tests/test_main.py` (empty state behavior): ```python response = await client.get("/api/status") assert response.status_code == 200 assert response.json()["projects"] == [] ``` --- ## Dashboard Polling Contract (`/api/status`) The dashboard uses `/api/status` as a polling source for both coarse status refresh and fast progress updates. From `src/docsfy/templates/dashboard.html` (poll intervals): ```javascript statusPollInterval = setInterval(pollStatusChanges, 10000); progressPollInterval = setInterval(pollProgressUpdates, 5000); ``` From `src/docsfy/templates/dashboard.html` (status poll request + payload handling): ```javascript fetch('/api/status', { credentials: 'same-origin', redirect: 'manual' }) .then(function(res) { if (checkAuthRedirect(res)) return null; if (res.type === 'opaqueredirect') { checkAuthRedirect({ redirected: true, status: 302 }); return null; } return res.json(); }) .then(function(data) { if (!data) return; var projectsList = data.projects || data; if (!Array.isArray(projectsList)) return; // Update known models from the API so new models // appear in dropdowns without a full page reload. if (data.known_models) { knownModels = data.known_models; rebuildModelDropdownOptions(); } ``` From `src/docsfy/templates/dashboard.html` (progress calculations use `page_count` + `plan_json`): ```javascript var pageCount = proj.page_count || 0; var totalPages = 0; var parsedPlan = null; if (proj.plan_json) { if (typeof proj.plan_json === 'string') { try { parsedPlan = JSON.parse(proj.plan_json); } catch(e) { parsedPlan = null; } } else { parsedPlan = proj.plan_json; } } if (parsedPlan && parsedPlan.navigation) { parsedPlan.navigation.forEach(function(group) { totalPages += (group.pages || []).length; }); } ``` > **Tip:** For local HTTP development, disable secure cookies so browser polling can send the session cookie. From `.env.example`: ```env # Set to false for local HTTP development # SECURE_COOKIES=false ``` From `src/docsfy/config.py`: ```python secure_cookies: bool = True # Set to False for local HTTP dev ``` --- Source: deployment-topologies.md # Deployment Topologies `docsfy` supports three practical deployment modes with the same core runtime behavior: - **Local process** (single host, direct Python runtime) - **Containerized** (Docker image + Compose orchestration) - **OpenShift-style non-root runtime** (arbitrary UID, root-group writable paths) ## Shared Runtime Contract Regardless of topology, startup and storage behavior are consistent. ```python # src/docsfy/config.py class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```python # src/docsfy/main.py @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) _generating.clear() await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` ```python # src/docsfy/storage.py DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" # ... DB_PATH.parent.mkdir(parents=True, exist_ok=True) PROJECTS_DIR.mkdir(parents=True, exist_ok=True) ``` > **Warning:** `ADMIN_KEY` is mandatory and must be at least 16 characters. The app exits at startup if it is missing or too short. > **Warning:** The process must be able to write to `DATA_DIR` (default `/data`) to create `docsfy.db` and project artifacts. --- ## Topology 1: Local Process Deployment Use this mode for development, single-user setups, or tightly controlled internal hosts. ### Runtime entry point ```toml # pyproject.toml [project.scripts] docsfy = "docsfy.main:run" ``` ```python # src/docsfy/main.py def run() -> None: import uvicorn reload = os.getenv("DEBUG", "").lower() == "true" host = os.getenv("HOST", "127.0.0.1") port = int(os.getenv("PORT", "8000")) uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload) ``` ### Configuration pattern ```bash # .env.example ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 LOG_LEVEL=INFO # Set to false for local HTTP development # SECURE_COOKIES=false ``` ### Local deployment notes - Default bind is `127.0.0.1:8000`; set `HOST=0.0.0.0` only when you intentionally expose it. - `SECURE_COOKIES` defaults to `true`; for plain HTTP local testing, set `SECURE_COOKIES=false`. - Persistent state is filesystem-based (`DATA_DIR`, SQLite file, generated project/site outputs). - Generation checks AI CLI availability before work starts (`check_ai_cli_available` in `src/docsfy/main.py`), so provider CLIs must be installed and on `PATH` in local installs. > **Tip:** Keep local data isolated by setting `DATA_DIR` to a project-local folder during development. --- ## Topology 2: Containerized Deployment (Docker / Compose) Use this mode for reproducible packaging and host portability. ### Image characteristics ```dockerfile # Dockerfile FROM python:3.12-slim AS builder # ... RUN uv sync --frozen --no-dev FROM python:3.12-slim # ... RUN apt-get update && apt-get install -y --no-install-recommends \ bash \ git \ curl \ nodejs \ npm \ && rm -rf /var/lib/apt/lists/* ``` ```dockerfile # Dockerfile # Install Claude Code CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://claude.ai/install.sh | bash" # Install Cursor Agent CLI (installs to ~/.local/bin) RUN /bin/bash -o pipefail -c "curl -fsSL https://cursor.com/install | bash" # Configure npm for non-root global installs and install Gemini CLI RUN mkdir -p /home/appuser/.npm-global \ && npm config set prefix '/home/appuser/.npm-global' \ && npm install -g @google/gemini-cli ``` ```dockerfile # Dockerfile EXPOSE 8000 HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ### Compose topology in repo ```yaml # docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` ### Container deployment notes - The container always serves on `0.0.0.0:8000` via `ENTRYPOINT`. - `/data` is the persistence boundary and should be mounted to durable storage. - Health is probe-ready through `/health` and both image-level and Compose-level health checks are defined. - Runtime includes AI CLIs plus `git`, matching generation dependencies. > **Note:** `docker-compose.yaml` already maps `./data` to `/data`, which aligns with the default `data_dir` in app settings. --- ## Topology 3: OpenShift-Style Non-Root Runtime This image explicitly encodes compatibility with restricted/container-security platforms that run with an arbitrary non-root UID. ```dockerfile # Dockerfile # OpenShift runs containers as a random UID in the root group (GID 0) RUN useradd --create-home --shell /bin/bash -g 0 appuser \ && mkdir -p /data \ && chown appuser:0 /data \ && chmod -R g+w /data ``` ```dockerfile # Dockerfile # Make /app group-writable for OpenShift compatibility RUN chmod -R g+w /app # Directories need group write+execute for OpenShift's arbitrary UID (in GID 0) RUN find /home/appuser -type d -exec chmod g=u {} + \ && npm cache clean --force 2>/dev/null; \ rm -rf /home/appuser/.npm/_cacache USER appuser ENV PATH="/home/appuser/.local/bin:/home/appuser/.npm-global/bin:${PATH}" ENV HOME="/home/appuser" ``` ```dockerfile # Dockerfile # --no-sync prevents uv from attempting to modify the venv at runtime. # This is required for OpenShift where containers run as an arbitrary UID ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ### Why these settings matter - **Arbitrary UID support:** group-writable paths (`/app`, `/data`, home dirs) allow runtime writes without root. - **No passwd-entry dependency:** `HOME=/home/appuser` ensures tools can resolve user-home paths even with random UID. - **Read-only venv safety:** `uv run --no-sync` prevents runtime attempts to mutate `.venv`, which can fail under restricted permissions. - **Non-root execution:** final runtime user is `appuser`, not root. > **Warning:** Do not remove `--no-sync` from the container startup command in restricted non-root environments; runtime package sync/write attempts can fail. > **Warning:** Any mounted volume used for `/data` must permit group write compatible with GID `0` behavior. > **Note:** This repository does not include Kubernetes/OpenShift manifest files. Platform manifests should preserve the image contract above (non-root, writable `/data`, unchanged startup semantics). --- ## Health, Auth, and Probe Behavior ```python # src/docsfy/main.py class AuthMiddleware(BaseHTTPMiddleware): _PUBLIC_PATHS = frozenset({"/login", "/login/", "/health"}) ``` ```python # src/docsfy/main.py @app.get("/health") async def health() -> dict[str, str]: return {"status": "ok"} ``` - `/health` is intentionally unauthenticated and suitable for liveness/readiness checks. - Most other routes are auth-protected by middleware. - The code explicitly expects edge-level protections (for example, login rate limiting) to be handled by reverse proxy/ingress when needed. --- ## CI/CD and Verification Inputs in This Repo No GitHub Actions or GitLab CI pipeline files are present in the repository root structure. Validation is defined through local/pipeline-friendly config: ```toml # tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ```yaml # .pre-commit-config.yaml ci: autofix_prs: false autoupdate_commit_msg: "ci: [pre-commit.ci] pre-commit autoupdate" repos: - repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/PyCQA/flake8 - repo: https://github.com/Yelp/detect-secrets - repo: https://github.com/astral-sh/ruff-pre-commit - repo: https://github.com/gitleaks/gitleaks - repo: https://github.com/pre-commit/mirrors-mypy ``` > **Tip:** In external CI/CD systems, treat `tox` + pre-commit hooks as the minimum gate before publishing container images. --- Source: database-schema-and-migrations.md # Database Schema and Migrations Docsfy uses SQLite for all metadata, auth, ACL, and session state. The schema is managed in application code (not external migration files), and migrations run automatically at startup. ## Database location and startup flow ```python # src/docsfy/storage.py DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" ``` ```python # src/docsfy/config.py class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```python # src/docsfy/main.py @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() if not settings.admin_key: logger.error("ADMIN_KEY environment variable is required") raise SystemExit(1) if len(settings.admin_key) < 16: logger.error("ADMIN_KEY must be at least 16 characters long") raise SystemExit(1) _generating.clear() await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` ```yaml # docker-compose.yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` > **Tip:** Persist `/data` (or your custom `DATA_DIR`) across restarts. If storage is ephemeral, your DB and generated project metadata are lost. --- ## Schema overview Docsfy creates and maintains four tables in `init_db()`: - `projects` - `users` - `project_access` - `sessions` ### `projects` Stores generated documentation variants per project/owner/provider/model combination. ```python # src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) ) """) ``` `status` is constrained in code (not DB enum): ```python # src/docsfy/storage.py VALID_STATUSES = frozenset({"generating", "ready", "error", "aborted"}) ``` Writes are idempotent via upsert on the composite PK: ```python # src/docsfy/storage.py await db.execute( """INSERT INTO projects (name, ai_provider, ai_model, owner, repo_url, status, updated_at) VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP) ON CONFLICT(name, ai_provider, ai_model, owner) DO UPDATE SET repo_url = excluded.repo_url, status = excluded.status, error_message = NULL, current_stage = NULL, updated_at = CURRENT_TIMESTAMP""", (name, ai_provider, ai_model, owner, repo_url, status), ) ``` ### `users` Stores API-key-authenticated users and role-based access. ```python # src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS users ( id INTEGER PRIMARY KEY AUTOINCREMENT, username TEXT UNIQUE NOT NULL, api_key_hash TEXT NOT NULL UNIQUE, role TEXT NOT NULL DEFAULT 'user', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) """) ``` API keys are not stored in plaintext; they are HMAC-hashed with `ADMIN_KEY`: ```python # src/docsfy/storage.py def hash_api_key(key: str, hmac_secret: str = "") -> str: """Hash an API key with HMAC-SHA256 for storage. Uses ADMIN_KEY as the HMAC secret so that even if the source is read, keys cannot be cracked without the environment secret. """ # NOTE: ADMIN_KEY is used as the HMAC secret. Rotating ADMIN_KEY will # invalidate all existing api_key_hash values, requiring all users to # regenerate their API keys. secret = hmac_secret or os.getenv("ADMIN_KEY", "") if not secret: msg = "ADMIN_KEY environment variable is required for key hashing" raise RuntimeError(msg) return hmac.new(secret.encode(), key.encode(), hashlib.sha256).hexdigest() ``` > **Note:** Username `admin` is reserved in the DB user model; environment-admin auth is handled separately via `ADMIN_KEY`. ### `project_access` Stores explicit ACL grants for sharing projects between users. ```python # src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS project_access ( project_name TEXT NOT NULL, project_owner TEXT NOT NULL DEFAULT '', username TEXT NOT NULL, PRIMARY KEY (project_name, project_owner, username) ) """) ``` Grant semantics are project-level for a given owner (all variants under that name/owner): ```python # src/docsfy/storage.py async def grant_project_access( project_name: str, username: str, project_owner: str = "" ) -> None: """Grant a user access to all variants of a project.""" if not project_owner: msg = "project_owner is required for access grants" raise ValueError(msg) async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT OR IGNORE INTO project_access (project_name, project_owner, username) VALUES (?, ?, ?)", (project_name, project_owner, username), ) await db.commit() ``` ### `sessions` Stores browser session state (token, user, role flag, expiration). ```python # src/docsfy/storage.py await db.execute(""" CREATE TABLE IF NOT EXISTS sessions ( token TEXT PRIMARY KEY, username TEXT NOT NULL, is_admin INTEGER NOT NULL DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP NOT NULL ) """) ``` Session tokens are opaque to clients and stored hashed in DB: ```python # src/docsfy/storage.py SESSION_TTL_SECONDS = 28800 # 8 hours SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600 def _hash_session_token(token: str) -> str: """Hash a session token for storage.""" return hashlib.sha256(token.encode()).hexdigest() async def create_session( username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS ) -> str: """Create an opaque session token.""" token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours) expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S") async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", (token_hash, username, 1 if is_admin else 0, expires_str), ) await db.commit() return token async def get_session(token: str) -> dict[str, str | int | None] | None: """Look up a session. Returns None if expired or not found.""" token_hash = _hash_session_token(token) async with aiosqlite.connect(DB_PATH) as db: db.row_factory = aiosqlite.Row cursor = await db.execute( "SELECT * FROM sessions WHERE token = ? AND expires_at > datetime('now')", (token_hash,), ) row = await cursor.fetchone() return dict(row) if row else None ``` And cookie max-age is aligned with session TTL: ```python # src/docsfy/main.py response.set_cookie( "docsfy_session", session_token, httponly=True, samesite="strict", secure=settings.secure_cookies, max_age=SESSION_TTL_SECONDS, ) ``` --- ## Built-in migration behavior Docsfy uses in-code, idempotent migrations inside `init_db()`. > **Note:** There is no migration version table and no separate migration runner. Startup is the migration trigger. ### 1) `projects` PK migration (legacy 3-column to 4-column owner-aware PK) Detection and migration are automatic: ```python # src/docsfy/storage.py # Migration: convert old 3-column PK table to 4-column PK (with owner) cursor = await db.execute("PRAGMA table_info(projects)") columns = await cursor.fetchall() col_names = [c[1] for c in columns] needs_pk_migration = False # Detect old schema: owner not in columns, or owner is nullable if "owner" not in col_names: needs_pk_migration = True elif "ai_provider" not in col_names: needs_pk_migration = True else: # Check if ai_provider is nullable (old schema) for col in columns: if col[1] == "ai_provider" and col[3] == 0: # notnull=0 means nullable needs_pk_migration = True break ``` ```python # src/docsfy/storage.py await db.execute(f""" INSERT OR IGNORE INTO projects_new (name, ai_provider, ai_model, owner, repo_url, status, current_stage, last_commit_sha, last_generated, page_count, error_message, plan_json, created_at, updated_at) SELECT {", ".join(select_cols)} FROM projects """) await db.execute("DROP TABLE projects") await db.execute("ALTER TABLE projects_new RENAME TO projects") ``` > **Warning:** This migration rewrites the table and drops the original after copy. Back up `docsfy.db` before major upgrades. > **Warning:** Data copy uses `INSERT OR IGNORE`; if legacy rows collide under the new composite key, ignored rows will not be migrated. ### 2) `users.role` backfill migration ```python # src/docsfy/storage.py # Migration: add role column for existing DBs try: await db.execute( "ALTER TABLE users ADD COLUMN role TEXT NOT NULL DEFAULT 'user'" ) except sqlite3.OperationalError as exc: if "duplicate column name" not in str(exc).lower(): logger.exception("Migration failed while adding column") raise ``` ### 3) `users.api_key_hash` uniqueness migration ```python # src/docsfy/storage.py cursor = await db.execute("PRAGMA index_list(users)") indexes = await cursor.fetchall() has_unique_key_index = False for idx in indexes: if idx[2]: # unique=1 cursor2 = await db.execute(f"PRAGMA index_info({idx[1]})") idx_cols = await cursor2.fetchall() for ic in idx_cols: if ic[2] == "api_key_hash": has_unique_key_index = True break if has_unique_key_index: break if not has_unique_key_index: try: await db.execute( "CREATE UNIQUE INDEX IF NOT EXISTS idx_users_api_key_hash ON users (api_key_hash)" ) except sqlite3.OperationalError as exc: if "unique" not in str(exc).lower(): logger.exception("Migration failed while adding unique index") raise ``` ### 4) `project_access.project_owner` backfill migration ```python # src/docsfy/storage.py # Migration: add project_owner column to project_access try: await db.execute( "ALTER TABLE project_access ADD COLUMN project_owner TEXT NOT NULL DEFAULT ''" ) except sqlite3.OperationalError as exc: if "duplicate column name" not in str(exc).lower(): logger.exception("Migration failed while adding column") raise ``` ### 5) Startup recovery behavior (non-schema but migration-adjacent) On restart, in-progress generations are marked failed: ```python # src/docsfy/storage.py cursor = await db.execute( "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'" ) ``` Expired sessions are pruned during app startup: ```python # src/docsfy/storage.py async def cleanup_expired_sessions() -> None: """Remove expired sessions. NOTE: This is called during application startup (lifespan) only. Expired sessions accumulate between restarts but are harmless since get_session() filters by expires_at. For long-running deployments, consider calling this periodically (e.g., via a background task). TODO: Add periodic cleanup for long-running instances. """ async with aiosqlite.connect(DB_PATH) as db: await db.execute("DELETE FROM sessions WHERE expires_at <= datetime('now')") await db.commit() ``` --- ## Integrity model and relationships Docsfy intentionally enforces most relationships at application level (not SQLite foreign keys). - `projects.owner` logically maps to `users.username` - `project_access.username` maps to `users.username` - `project_access.(project_name, project_owner)` maps to project identity (across variants) - `sessions.username` maps to user identity (including env-admin login path) Cleanup logic is explicit in code: ```python # src/docsfy/storage.py async def delete_user(username: str) -> bool: """Delete a user by username, invalidating all their sessions and cleaning up ACLs.""" async with aiosqlite.connect(DB_PATH) as db: await db.execute("DELETE FROM sessions WHERE username = ?", (username,)) # Clean up owned projects and their access entries await db.execute("DELETE FROM projects WHERE owner = ?", (username,)) await db.execute( "DELETE FROM project_access WHERE project_owner = ?", (username,) ) # Clean up ACL entries where user was granted access await db.execute("DELETE FROM project_access WHERE username = ?", (username,)) cursor = await db.execute("DELETE FROM users WHERE username = ?", (username,)) await db.commit() return cursor.rowcount > 0 ``` ```python # src/docsfy/storage.py # Clean up project_access if no more variants remain for this name+owner if cursor.rowcount > 0 and owner is not None: remaining = await db.execute( "SELECT COUNT(*) FROM projects WHERE name = ? AND owner = ?", (name, owner), ) row = await remaining.fetchone() if row and row[0] == 0: await db.execute( "DELETE FROM project_access WHERE project_name = ? AND project_owner = ?", (name, owner), ) ``` > **Warning:** Because there are no DB-level foreign keys, direct/manual SQL writes can create orphaned rows that the app does not automatically reconcile unless specific cleanup paths are triggered. --- ## Test and CI coverage for schema behavior Key migration-adjacent behaviors are covered by tests: ```python # tests/test_storage.py async def test_init_db_resets_orphaned_generating(db_path: Path) -> None: from docsfy.storage import get_project, init_db, save_project await save_project( name="stuck-repo", repo_url="https://github.com/org/stuck.git", status="generating", ai_provider="claude", ai_model="opus", owner="testuser", ) # Simulate server restart by re-running init_db await init_db() project = await get_project( "stuck-repo", ai_provider="claude", ai_model="opus", owner="testuser" ) assert project is not None assert project["status"] == "error" assert "Server restarted" in project["error_message"] ``` ```python # tests/test_storage.py async def test_cleanup_expired_sessions(db_path: Path) -> None: import aiosqlite from docsfy.storage import ( DB_PATH, _hash_session_token, cleanup_expired_sessions, create_session, ) # Directly insert a session with a past expiration async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", ("expired-token", "expired-user", 0, "2020-01-01T00:00:00"), ) await db.commit() # Create a valid session valid_token = await create_session("valid-user", ttl_hours=8) await cleanup_expired_sessions() # Check that only the valid session remains async with aiosqlite.connect(DB_PATH) as db: cursor = await db.execute("SELECT COUNT(*) FROM sessions") row = await cursor.fetchone() assert row is not None assert row[0] == 1 # Session tokens are stored as hashes token_hash = _hash_session_token(valid_token) cursor = await db.execute( "SELECT username FROM sessions WHERE token = ?", (token_hash,) ) row = await cursor.fetchone() assert row is not None assert row[0] == "valid-user" ``` CI test entry point: ```toml # tox.toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` > **Note:** Current tests validate startup recovery, session cleanup, owner scoping, and `data_dir` initialization, but they do not include explicit fixtures for every legacy schema branch in `init_db()` (for example, a seeded pre-owner `projects` table). --- Source: backup-and-recovery.md # Backup and Recovery `docsfy` persists operational state in `DATA_DIR` and expects both SQLite metadata and generated artifacts to remain available together. ## Where data is stored `DATA_DIR` is configured via settings and passed into DB initialization at startup: ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" # Required — validated at startup ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True # Set to False for local HTTP dev ``` ```python @asynccontextmanager async def lifespan(app: FastAPI) -> AsyncIterator[None]: settings = get_settings() ... await init_db(data_dir=settings.data_dir) await cleanup_expired_sessions() yield ``` `storage.py` resolves concrete paths from `DATA_DIR`: ```python DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" ``` Project artifacts are namespaced by owner/project/provider/model: ```python def _validate_owner(owner: str) -> str: """Validate owner segment to prevent path traversal.""" if not owner: return "_default" if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."): msg = f"Invalid owner: '{owner}'" raise ValueError(msg) return owner def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: ... safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model def get_project_site_dir(...): return get_project_dir(name, ai_provider, ai_model, owner) / "site" def get_project_cache_dir(...): return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages" ``` Expected layout: ```text DATA_DIR/ docsfy.db projects/ / / / / plan.json cache/ pages/ *.md site/ index.html *.html *.md search-index.json llms.txt llms-full.txt .nojekyll assets/* ``` ## What to back up Back up **both**: 1. `DATA_DIR/docsfy.db` 2. `DATA_DIR/projects/` (all owners/projects/variants) SQLite holds project state plus auth/session/access data: ```python await db.execute(""" CREATE TABLE IF NOT EXISTS users ( id INTEGER PRIMARY KEY AUTOINCREMENT, username TEXT UNIQUE NOT NULL, api_key_hash TEXT NOT NULL UNIQUE, role TEXT NOT NULL DEFAULT 'user', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) """) await db.execute(""" CREATE TABLE IF NOT EXISTS project_access ( project_name TEXT NOT NULL, project_owner TEXT NOT NULL DEFAULT '', username TEXT NOT NULL, PRIMARY KEY (project_name, project_owner, username) ) """) await db.execute(""" CREATE TABLE IF NOT EXISTS sessions ( token TEXT PRIMARY KEY, username TEXT NOT NULL, is_admin INTEGER NOT NULL DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP NOT NULL ) """) ``` Generated docs and indexes are written into each variant’s `site/` directory: ```python if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) assets_dir = output_dir / "assets" assets_dir.mkdir(exist_ok=True) (output_dir / ".nojekyll").touch() (output_dir / "index.html").write_text(index_html, encoding="utf-8") (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") (output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8") (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` > **Warning:** Backing up only `docsfy.db` or only `projects/` can produce mismatches (metadata points to missing files, or files exist without matching DB rows). ## Deployment persistence configuration Containerized deployments should persist `/data` externally. Example from `docker-compose.yaml`: ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` Local repo config also avoids committing runtime data: ```gitignore # Data data/ .dev/data/ ``` ## Recommended backup procedure 1. Quiesce writes (stop `docsfy`, or ensure no generation is in progress). 2. Snapshot/copy the **entire** `DATA_DIR` atomically if possible. 3. Store versioned backups (daily full + retention policy). 4. Test restore periodically in a non-production environment. > **Tip:** In Docker Compose setups, backing up host `./data` captures both `docsfy.db` and all generated variant artifacts because it maps directly to `/data`. ## Recovery procedure 1. Stop `docsfy`. 2. Restore `DATA_DIR` from the same backup set (`docsfy.db` + `projects/`). 3. Start `docsfy` and let startup run DB initialization/migrations. 4. Validate project status and docs serving. Startup recovery behavior includes schema migration and handling interrupted generations: ```python # Migration: convert old 3-column PK table to 4-column PK (with owner) ... logger.info( "Migrating database to 4-column PK schema (name, ai_provider, ai_model, owner)" ) ... await db.execute("ALTER TABLE projects_new RENAME TO projects") # Reset orphaned "generating" projects from previous server run cursor = await db.execute( "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'" ) ``` > **Note:** After restore/restart, variants that were `generating` when the backup was taken are intentionally transitioned to `error` with `Server restarted during generation`. ## Variant/site export (supplemental backup) `docsfy` can export rendered docs as `.tar.gz` through API endpoints: ```python @app.get("/api/projects/{name}/{provider}/{model}/download") ... with tarfile.open(tar_path, mode="w:gz") as tar: tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}") ``` ```python @app.get("/api/projects/{name}/download") ... with tarfile.open(tar_path, mode="w:gz") as tar: tar.add(str(site_dir), arcname=name) ``` Use these as supplemental exports, not as full disaster-recovery backups. > **Note:** Download endpoints package the `site/` output only; they do **not** include SQLite metadata (`docsfy.db`), `cache/pages`, or `plan.json`. ## Destructive operations to account for Generation and delete operations remove data on disk: ```python if force: cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner) if cache_dir.exists(): shutil.rmtree(cache_dir) logger.info(f"[{project_name}] Cleared cache (force=True)") ... project_dir = get_project_dir(name, provider, model, project_owner) if project_dir.exists(): shutil.rmtree(project_dir) ``` And each render replaces the full site directory: ```python if output_dir.exists(): shutil.rmtree(output_dir) ``` > **Warning:** `DELETE` endpoints and re-render operations are destructive on disk; recovery requires restoring from backup or regenerating from source repositories. --- Source: testing-and-quality-gates.md # Testing and Quality Gates This project uses a layered quality stack: `pytest` for behavior coverage, `tox` as the test runner wrapper, and `pre-commit` for linting, type checking, and secret scanning. > **Warning:** No repository CI pipeline files are currently present (`.github/workflows`, `.gitlab-ci.yml`, `.circleci`, `Jenkinsfile`, etc.). Quality gates are defined in local tooling (`tox.toml`, `.pre-commit-config.yaml`) and any external pre-commit service integration. ## Pytest Coverage Areas `pytest` is configured in `pyproject.toml`: ```toml [project.optional-dependencies] dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"] [tool.pytest.ini_options] asyncio_mode = "auto" testpaths = ["tests"] pythonpath = ["src"] ``` Current suite structure covers **149 tests across 13 test modules**: - **Auth, RBAC, session and access control:** `tests/test_auth.py` (36), `tests/test_main.py` (15), `tests/test_dashboard.py` (4) - **Storage and persistence behavior:** `tests/test_storage.py` (33) - **Generation/planning/parser/repository logic:** `tests/test_generator.py` (8), `tests/test_json_parser.py` (15), `tests/test_prompts.py` (3), `tests/test_repository.py` (9) - **Rendering and content safety:** `tests/test_renderer.py` (11) - **Contracts and configuration models:** `tests/test_config.py` (3), `tests/test_models.py` (9), `tests/test_ai_client.py` (2) - **End-to-end mocked flow:** `tests/test_integration.py` (1) ### Coverage Examples from Tests **SSRF hardening (private DNS/IP rejection)** from `tests/test_main.py`: ```python async def test_reject_private_url_dns(monkeypatch: pytest.MonkeyPatch) -> None: """Test that SSRF protection rejects DNS names resolving to private IPs.""" import socket from docsfy.main import _reject_private_url def mock_getaddrinfo( host: str, port: object, *args: object, **kwargs: object ) -> list[ tuple[socket.AddressFamily, socket.SocketKind, int, str, tuple[str, int]] ]: return [(socket.AF_INET, socket.SOCK_STREAM, 0, "", ("192.168.1.1", 0))] monkeypatch.setattr(socket, "getaddrinfo", mock_getaddrinfo) with pytest.raises(HTTPException) as exc_info: await _reject_private_url("https://evil.com/org/repo") assert exc_info.value.status_code == 400 ``` **Role enforcement (viewer cannot generate docs)** from `tests/test_auth.py`: ```python async def test_viewer_cannot_generate(_init_db: None) -> None: """A viewer should get 403 when trying to generate docs.""" from docsfy.main import _generating, app from docsfy.storage import create_user _generating.clear() _, viewer_key = await create_user("viewer-gen", role="viewer") transport = ASGITransport(app=app) async with AsyncClient( transport=transport, base_url="http://test", headers={"Authorization": f"Bearer {viewer_key}"}, ) as ac: response = await ac.post( "/api/generate", json={ "repo_url": "https://github.com/org/repo", "project_name": "test-proj", }, ) assert response.status_code == 403 assert "Write access required" in response.json()["detail"] _generating.clear() ``` **Output sanitization (XSS vectors blocked)** from `tests/test_renderer.py`: ```python def test_sanitize_html_unquoted_javascript() -> None: from docsfy.renderer import _sanitize_html result = _sanitize_html("x") assert "javascript:" not in result result = _sanitize_html("") assert "javascript:" not in result result = _sanitize_html("alert(1)>x") assert "data:" not in result result = _sanitize_html("") assert "data:" not in result ``` **End-to-end API/docs artifact flow** from `tests/test_integration.py`: ```python response = await client.get("/docs/test-repo/claude/opus/index.html") assert response.status_code == 200 assert "test-repo" in response.text response = await client.get("/api/projects/test-repo/claude/opus/download") assert response.status_code == 200 assert response.headers["content-type"] == "application/gzip" ``` > **Note:** There is no coverage threshold gate configured (no `pytest-cov` settings in `pyproject.toml` or `tox.toml`). ## tox Usage `tox` is configured in `tox.toml` with a single environment: ```toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` What this means: - `tox` runs the `unittests` env by default. - Tests run through `uv` with dev extras. - `pytest-xdist` is used with `-n auto` for parallel execution. - Packaging/build is skipped for test runs (`skipsdist = true`). > **Tip:** For parity with tox while debugging a single step, use the exact command from `tox.toml`: `uv run --extra dev pytest -n auto tests`. ## Pre-commit Hooks Hook orchestration is defined in `.pre-commit-config.yaml`: ```yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v6.0.0 hooks: - id: check-added-large-files - id: check-docstring-first - id: check-executables-have-shebangs - id: check-merge-conflict - id: check-symlinks - id: detect-private-key - id: mixed-line-ending - id: debug-statements - id: trailing-whitespace args: [--markdown-linebreak-ext=md] - id: end-of-file-fixer - id: check-ast - id: check-builtin-literals - id: check-toml ``` It also wires lint/type/security hooks: ```yaml # flake8 retained for RedHatQE M511 plugin; ruff handles standard linting - repo: https://github.com/PyCQA/flake8 rev: 7.3.0 hooks: - id: flake8 args: [--config=.flake8] additional_dependencies: [git+https://github.com/RedHatQE/flake8-plugins.git, flake8-mutable] - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.2 hooks: - id: ruff - id: ruff-format - repo: https://github.com/pre-commit/mirrors-mypy rev: v1.19.1 hooks: - id: mypy exclude: (tests/) ``` ## mypy Gate Type checking is strict at project level in `pyproject.toml`: ```toml [tool.mypy] check_untyped_defs = true disallow_any_generics = true disallow_incomplete_defs = true disallow_untyped_defs = true no_implicit_optional = true show_error_codes = true warn_unused_ignores = true strict_equality = true extra_checks = true warn_unused_configs = true warn_redundant_casts = true ``` In pre-commit, mypy also installs extra stubs/deps: ```yaml - repo: https://github.com/pre-commit/mirrors-mypy rev: v1.19.1 hooks: - id: mypy exclude: (tests/) additional_dependencies: [types-requests, types-PyYAML, types-colorama, types-aiofiles, pydantic, types-Markdown] ``` ## Ruff Gate Ruff is enforced via pre-commit: ```yaml - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.2 hooks: - id: ruff - id: ruff-format ``` There is no dedicated `[tool.ruff]` section in `pyproject.toml`, so ruff runs with defaults unless overridden by hook-level args (none currently set). ## Flake8 Compatibility Gate (M511) `flake8` is retained specifically for RedHatQE plugin checks: ```ini [flake8] select=M511 ``` This keeps `M511` enforcement while ruff handles general linting. ## Secrets Scanning Gates Secret scanning is layered in pre-commit: ```yaml - repo: https://github.com/pre-commit/pre-commit-hooks rev: v6.0.0 hooks: - id: detect-private-key - repo: https://github.com/Yelp/detect-secrets rev: v1.5.0 hooks: - id: detect-secrets - repo: https://github.com/gitleaks/gitleaks rev: v8.30.0 hooks: - id: gitleaks ``` `gitleaks` has a repo-specific allowlist in `.gitleaks.toml`: ```toml [extend] useDefault = true [allowlist] paths = [ '''tests/test_repository\.py''', ] ``` Test fixtures also use inline allowlist annotations where needed, for example in `tests/test_repository.py`: ```python assert sha == "abc123def" # pragma: allowlist secret ``` > **Warning:** Allowlisting should stay narrowly scoped to test fixtures only; broad allowlists can hide real leaks in production code. --- Source: ci-cd-integration.md # CI/CD Integration Docsfy already has strong **automation building blocks** for CI, but they are not yet wired into a repository-managed CI pipeline. The project currently relies on `tox` for tests and `pre-commit` for linting, typing, and secret scanning. > **Warning:** No CI workflow definitions are currently checked in (for example, no `.github/workflows`, `.gitlab-ci.yml`, or `Jenkinsfile`). Until a pipeline is added, enforcement depends on developers running checks locally. ## Current Automation Posture ### Test execution is defined in `tox` ```toml skipsdist = true envlist = ["unittests"] [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` - One tox environment exists: `unittests` - Tests run with `pytest-xdist` (`-n auto`) for parallel execution - `skipsdist = true` means tox does not build/install the package before testing > **Note:** With `skipsdist = true`, CI validates source-tree behavior but not wheel/sdist installability. ### Python, pytest, and mypy defaults are centralized in `pyproject.toml` ```toml [project] requires-python = ">=3.12" [project.optional-dependencies] dev = ["pytest", "pytest-asyncio", "pytest-xdist", "httpx"] [tool.pytest.ini_options] asyncio_mode = "auto" testpaths = ["tests"] pythonpath = ["src"] [tool.mypy] check_untyped_defs = true disallow_any_generics = true disallow_incomplete_defs = true disallow_untyped_defs = true no_implicit_optional = true show_error_codes = true warn_unused_ignores = true strict_equality = true extra_checks = true warn_unused_configs = true warn_redundant_casts = true ``` - CI runners should use Python 3.12+ - Async testing is first-class (`pytest-asyncio`) - Mypy is configured in strict mode ### Lint, formatting, typing, and security checks are encoded in `.pre-commit-config.yaml` ```yaml ci: autofix_prs: false autoupdate_commit_msg: "ci: [pre-commit.ci] pre-commit autoupdate" ``` ```yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v6.0.0 hooks: - id: check-added-large-files - id: check-docstring-first - id: check-executables-have-shebangs - id: check-merge-conflict - id: check-symlinks - id: detect-private-key - id: mixed-line-ending - id: debug-statements - id: trailing-whitespace args: [--markdown-linebreak-ext=md] - id: end-of-file-fixer - id: check-ast - id: check-builtin-literals - id: check-toml ``` ```yaml # flake8 retained for RedHatQE M511 plugin; ruff handles standard linting - repo: https://github.com/PyCQA/flake8 rev: 7.3.0 hooks: - id: flake8 args: [--config=.flake8] additional_dependencies: [git+https://github.com/RedHatQE/flake8-plugins.git, flake8-mutable] - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.2 hooks: - id: ruff - id: ruff-format - repo: https://github.com/Yelp/detect-secrets rev: v1.5.0 hooks: - id: detect-secrets - repo: https://github.com/gitleaks/gitleaks rev: v8.30.0 hooks: - id: gitleaks - repo: https://github.com/pre-commit/mirrors-mypy rev: v1.19.1 hooks: - id: mypy exclude: (tests/) ``` - `ruff` + `ruff-format` handle general lint/format checks - `flake8` is retained for rule `M511` via plugin - `detect-secrets` and `gitleaks` provide layered secret scanning - `mypy` runs as a hook and excludes `tests/` ```ini [flake8] select=M511 ``` ```toml [extend] useDefault = true [allowlist] paths = [ '''tests/test_repository\.py''', ] ``` > **Warning:** The flake8 hook intentionally pulls `RedHatQE/flake8-plugins` from Git, so CI reproducibility depends on that upstream repository state unless you pin a commit. ## Deployment Readiness Signals Already in Code The repo already contains deploy-friendly health checks in both app and container config: ```python @app.get("/health") async def health() -> dict[str, str]: return {"status": "ok"} ``` ```dockerfile HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 ``` ## Recommended Pipeline Stages Use the existing repository configuration as the source of truth: 1. **Setup** - Use Python 3.12 runner - Install `pre-commit`, `tox`, and `uv` 2. **Quality & Security Gate** - Run all hooks from `.pre-commit-config.yaml` - Enforces linting, formatting, type checks, and secret scanning 3. **Test Gate** - Run tox `unittests` env from `tox.toml` - Executes `pytest -n auto tests` through `uv` 4. **Build Gate (main/release branches)** - Build container from `Dockerfile` - Preserves runtime assumptions already encoded in the image 5. **Smoke Gate** - Start the built image and check `/health` - Fail fast before deployment if health probe fails 6. **Deploy** - Deploy only after all prior gates succeed > **Tip:** Keep CI logic thin by reusing `tox.toml` and `.pre-commit-config.yaml` directly, instead of duplicating check logic in pipeline YAML. ## Why This Works Well for Docsfy Tests are already written to run without external AI services by mocking expensive/external operations: ```python with patch.dict(os.environ, {"ADMIN_KEY": TEST_ADMIN_KEY}): get_settings.cache_clear() await storage.init_db() ... with ( patch("docsfy.main.check_ai_cli_available", return_value=(True, "")), patch("docsfy.main.clone_repo", return_value=(tmp_path / "repo", "abc123")), patch("docsfy.main.run_planner", return_value=sample_plan), patch( "docsfy.main.generate_all_pages", return_value={"introduction": "# Intro\n\nWelcome!"}, ), ): ... ``` This makes CI runs deterministic and suitable for pull-request validation without requiring real provider credentials. --- Source: repository-structure.md # Repository Structure `docsfy` is organized as a `src`-layout Python service with server-rendered UI, static site rendering utilities, and a focused async test suite. ## Top-Level Layout ```text docsfy/ ├── src/docsfy/ # Application package │ ├── __init__.py │ ├── main.py # FastAPI app, auth middleware, API routes │ ├── config.py # Environment-backed settings │ ├── models.py # Pydantic request/plan models │ ├── storage.py # SQLite + filesystem storage + user/session auth │ ├── repository.py # git clone/diff helpers │ ├── ai_client.py # AI CLI wrapper re-exports │ ├── prompts.py # Planner/page/incremental prompt builders │ ├── json_parser.py # Robust JSON extraction from AI output │ ├── generator.py # Planner + page generation orchestration │ ├── renderer.py # Markdown-to-HTML rendering + asset/site output │ ├── templates/ # Jinja templates (app UI + generated docs pages) │ └── static/ # Frontend assets copied into generated docs ├── tests/ # Unit + integration tests ├── docs/plans/ # Design/implementation planning docs ├── test-plans/ # End-to-end/manual UI test plan ├── pyproject.toml # Packaging, deps, pytest config, script entrypoint ├── uv.lock # Locked dependency graph ├── tox.toml # Local test task runner ├── Dockerfile # Multi-stage runtime image ├── docker-compose.yaml # Local container orchestration ├── .env.example # Environment variable template ├── .pre-commit-config.yaml # Lint/type/security hooks ├── .flake8 # Flake8 plugin settings ├── .gitleaks.toml # Secret scanning config ├── .gitignore └── OWNERS ``` ## Source Modules (`src/docsfy`) ### API entrypoint and route wiring `main.py` defines app startup, authentication middleware, API endpoints, and the end-to-end generation lifecycle. ```python app = FastAPI( title="docsfy", description="AI-powered documentation generator", version="0.1.0", lifespan=lifespan, ) app.add_middleware(AuthMiddleware) @app.get("/health") async def health() -> dict[str, str]: return {"status": "ok"} @app.post("/api/generate", status_code=202) async def generate(request: Request, gen_request: GenerateRequest) -> dict[str, str]: ``` Key responsibilities: - request auth (`Bearer` token or `docsfy_session` cookie) - project/variant ownership checks - generation task scheduling and abort logic - docs serving (`/docs/...`) and archive download endpoints ### Settings and request models - `config.py` centralizes runtime settings (`ADMIN_KEY`, `AI_PROVIDER`, `AI_MODEL`, `DATA_DIR`, cookie security, timeout). - `models.py` validates generation input (`repo_url` vs `repo_path`) and doc-plan schemas (`DocPlan`, `NavGroup`, `DocPage`). ```python class Settings(BaseSettings): model_config = SettingsConfigDict( env_file=".env", env_file_encoding="utf-8", extra="ignore", ) admin_key: str = "" ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True ``` ### Generation pipeline modules - `prompts.py`: prompt construction for planner, page generation, and incremental page selection. - `ai_client.py`: re-exports provider/runtime helpers from `ai-cli-runner`. - `json_parser.py`: resilient parsing from noisy AI output. - `generator.py`: planning + page generation, cache support, bounded concurrency. - `repository.py`: git clone and changed-file detection for incremental behavior. ```python success, output = await call_ai_cli( prompt=prompt, cwd=repo_path, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, cli_flags=cli_flags, ) results = await run_parallel_with_limit( coroutines, max_concurrency=MAX_CONCURRENT_PAGES ) ``` ### Persistence and runtime pathing `storage.py` owns both database schema/migrations and output path conventions. ```python DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db" DATA_DIR = Path(os.getenv("DATA_DIR", "/data")) PROJECTS_DIR = DATA_DIR / "projects" ``` ```python CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', ... PRIMARY KEY (name, ai_provider, ai_model, owner) ) ``` ```python def get_project_dir( name: str, ai_provider: str = "", ai_model: str = "", owner: str = "" ) -> Path: ... safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model ``` ## Templates and Static Assets ### Jinja templates (`src/docsfy/templates`) - App UI pages: `dashboard.html`, `status.html`, `login.html`, `admin.html` - Generated docs pages: `index.html`, `page.html` - Shared partials: `_theme.html`, `_sidebar.html`, `_modal.html` Generated docs templates explicitly load the packaged static assets: ```html ``` ### Static frontend assets (`src/docsfy/static`) - `style.css`: full docs theme (layout, typography, callouts, TOC, search modal) - `theme.js`: dark/light theme toggle + persistence - `search.js`: `Cmd/Ctrl+K` modal search using `search-index.json` - `copy.js`: code block copy buttons - `callouts.js`: transforms blockquotes (`Note`, `Warning`, `Tip`, etc.) into callouts - `scrollspy.js`: active heading sync in TOC - `codelabels.js`: inferred language badges on code blocks - `github.js`: optional GitHub stars badge hydration `renderer.py` copies these files to the generated site output and emits search/LLM artifacts: ```python if STATIC_DIR.exists(): for static_file in STATIC_DIR.iterdir(): if static_file.is_file(): shutil.copy2(static_file, assets_dir / static_file.name) (output_dir / "search-index.json").write_text( json.dumps(search_index), encoding="utf-8" ) (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` ## Tests (`tests/`) The suite is split by module/feature area: - `test_main.py`: API route behavior and generation endpoint lifecycle - `test_auth.py`: login/session flows, role permissions (`admin`, `user`, `viewer`) - `test_storage.py`: DB CRUD, migrations, key/session management, ACL behavior - `test_repository.py`: clone/local SHA/diff helpers - `test_generator.py`: planner/page generation and incremental planner handling - `test_renderer.py`: markdown rendering and HTML sanitization behavior - `test_config.py`, `test_models.py`, `test_json_parser.py`, `test_prompts.py`, `test_ai_client.py`: focused unit tests - `test_dashboard.py`: dashboard page rendering behavior - `test_integration.py`: mocked full flow (`generate -> serve -> download -> delete`) Example integration assertion flow: ```python response = await client.get("/api/status") assert response.status_code == 200 projects = response.json()["projects"] assert len(projects) == 1 assert projects[0]["status"] == "ready" response = await client.get("/docs/test-repo/claude/opus/index.html") assert response.status_code == 200 ``` ## Runtime and Configuration Files ### Python packaging and app entrypoint ```toml [project] name = "docsfy" requires-python = ">=3.12" dependencies = [ "ai-cli-runner", "fastapi", "uvicorn", "pydantic-settings", "python-simple-logger", "aiosqlite", "jinja2", "markdown", "pygments", "python-multipart>=0.0.22", ] [project.scripts] docsfy = "docsfy.main:run" ``` ### Container/runtime config ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` ```dockerfile HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 ENTRYPOINT ["uv", "run", "--no-sync", "uvicorn", "docsfy.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` ### Environment template ```bash ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 LOG_LEVEL=INFO # SECURE_COOKIES=false ``` ### Local quality/security tooling ```toml # tox.toml [env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]] ``` ```yaml # .pre-commit-config.yaml (excerpt) - repo: https://github.com/astral-sh/ruff-pre-commit hooks: - id: ruff - id: ruff-format - repo: https://github.com/pre-commit/mirrors-mypy hooks: - id: mypy ``` ```gitignore # Data data/ .dev/data/ ``` > **Warning:** Runtime state (`/data`, SQLite DB, generated sites/cache) is intentionally untracked; do not commit generated project output. ## CI/CD and Contributor Workflow > **Note:** No hosted pipeline definitions (for example `.github/workflows/`) are currently checked into this repository. Quality gates are still defined and reproducible locally via `tox`, `pytest`, `pre-commit`, and secret scanning configs (`.gitleaks.toml`, `detect-secrets` hook). > **Tip:** Before opening a PR, run `uv run --extra dev pytest -n auto tests` and `pre-commit run --all-files`. ## Runtime Output Layout (Generated, Not Source-Controlled) Based on `storage.py` + `renderer.py`, generation outputs are stored under owner/provider/model-specific paths: ```text /data/ ├── docsfy.db └── projects/ └── {owner}/ └── {project}/ └── {ai_provider}/{ai_model}/ ├── plan.json ├── cache/pages/*.md └── site/ ├── .nojekyll ├── index.html ├── *.html ├── *.md ├── search-index.json ├── llms.txt ├── llms-full.txt └── assets/* ``` This separation is important for multi-user and multi-variant isolation (`name + provider + model + owner`). --- Source: extending-docsfy.md # Extending docsfy docsfy has four main extension surfaces: 1. Prompt construction (`src/docsfy/prompts.py`) 2. HTML rendering and template selection (`src/docsfy/renderer.py`, `src/docsfy/templates/`) 3. Frontend behavior and styling (`src/docsfy/static/` plus shared template partials) 4. Generation orchestration and caching (`src/docsfy/main.py`, `src/docsfy/generator.py`, `src/docsfy/repository.py`, `src/docsfy/storage.py`) > **Note:** docsfy uses two template contexts: > - **Generated docs pages**: `index.html`, `page.html`, and static assets copied to `assets/` > - **Web app UI** (dashboard/admin/status/login): Jinja templates rendered by FastAPI, many with inline CSS/JS --- ## 1) Customizing prompts All planner/page prompts are built in `src/docsfy/prompts.py`. ```python PLAN_SCHEMA = """{ "project_name": "string - project name", "tagline": "string - one-line project description", "navigation": [ { "group": "string - section group name", "pages": [ { "slug": "string - URL-friendly page identifier", "title": "string - human-readable page title", "description": "string - brief description of what this page covers" } ] } ] }""" ``` ```python def build_planner_prompt(project_name: str) -> str: return f"""You are a technical documentation planner. Explore this repository thoroughly. Explore the source code, configuration files, tests, CI/CD pipelines, and project structure. Do NOT rely on the README — understand the project from its code and configuration. ... Output format: {PLAN_SCHEMA}""" ``` ```python def build_page_prompt(project_name: str, page_title: str, page_description: str) -> str: return f"""You are a technical documentation writer. Explore this repository to write the "{page_title}" page for the {project_name} documentation. ... Use these callout formats for special content: - Notes: > **Note:** text - Warnings: > **Warning:** text - Tips: > **Tip:** text ... Output ONLY the markdown content for this page. No wrapping, no explanation.""" ``` ### Prompt contract you must preserve The generation pipeline expects plan JSON with `navigation -> pages -> slug/title/description`: ```python for group in plan.get("navigation", []): for page in group.get("pages", []): slug = page.get("slug", "") title = page.get("title", slug) ``` > **Warning:** If you change prompt output shape, update all plan consumers (`generator.py`, `renderer.py`, and any tests expecting `navigation/pages`). --- ## 2) Customizing renderer templates Renderer wiring lives in `src/docsfy/renderer.py`: ```python TEMPLATES_DIR = Path(__file__).parent / "templates" STATIC_DIR = Path(__file__).parent / "static" _jinja_env = Environment( loader=FileSystemLoader(str(TEMPLATES_DIR)), autoescape=select_autoescape(["html"]), ) ``` Generated docs pages use `index.html` and `page.html`: ```python def render_page(...): env = _get_jinja_env() template = env.get_template("page.html") content_html, toc_html = _md_to_html(markdown_content) return template.render(...) def render_index(...): env = _get_jinja_env() template = env.get_template("index.html") return template.render(...) ``` Site output assembly (`render_site`) includes static copy, HTML, markdown, search index, and LLM files: ```python if output_dir.exists(): shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) assets_dir = output_dir / "assets" assets_dir.mkdir(exist_ok=True) if STATIC_DIR.exists(): for static_file in STATIC_DIR.iterdir(): if static_file.is_file(): shutil.copy2(static_file, assets_dir / static_file.name) (output_dir / "index.html").write_text(index_html, encoding="utf-8") (output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8") (output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8") (output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8") (output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8") (output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8") ``` > **Warning:** `render_site()` deletes `output_dir` before rendering. Do not place manual files there unless your extension re-creates them every run. ### Markdown-to-HTML behavior you can extend ```python md = markdown.Markdown( extensions=["fenced_code", "codehilite", "tables", "toc"], extension_configs={ "codehilite": {"css_class": "highlight", "guess_lang": False}, "toc": {"toc_depth": "2-3"}, }, ) content_html = _sanitize_html(md.convert(md_text)) ``` The sanitizer strips dangerous tags/attributes and allowlists URL schemes (`http://`, `https://`, `#`, `/`, `mailto:`). If you loosen this, update `tests/test_renderer.py`. --- ## 3) Customizing frontend assets Generated docs pages load assets from `assets/` (copied from `src/docsfy/static/`): ```html {% include '_sidebar.html' %} ``` ### Callout behavior `src/docsfy/static/callouts.js` turns blockquotes into styled callouts based on first bold token: ```javascript if (text === 'note' || text === 'info') { type = 'note'; } else if (text === 'warning' || text === 'caution') { type = 'warning'; } else if (text === 'tip' || text === 'hint') { type = 'tip'; } else if (text === 'danger' || text === 'error') { type = 'danger'; } else if (text === 'important') { type = 'important'; } ``` This matches the prompt’s preferred syntax (`> **Note:**`, `> **Warning:**`, `> **Tip:**`) and additional aliases. ### Theme, search, and code-label hooks - `theme.js`: toggles `data-theme` and persists `localStorage["theme"]` - `search.js`: loads `search-index.json` and provides Cmd/Ctrl+K modal search - `codelabels.js`: maps `language-*` classes to human labels - `style.css`: centralized design tokens (`:root` and `[data-theme="dark"]`) > **Tip:** To add a new docs-page behavior, add a file under `src/docsfy/static/`, then include it in `src/docsfy/templates/page.html` and `src/docsfy/templates/index.html`. --- ## 4) Customizing generation logic High-level flow starts in `POST /api/generate` (`src/docsfy/main.py`) and runs `_run_generation()` / `_generate_from_path()`. Core orchestration: ```python plan = await run_planner( repo_path=repo_dir, project_name=project_name, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, ) plan["repo_url"] = source_url ... pages = await generate_all_pages( repo_path=repo_dir, plan=plan, cache_dir=cache_dir, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, use_cache=use_cache if use_cache else not force, project_name=project_name, owner=owner, ) site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir) ``` ### Page generation parallelism `src/docsfy/generator.py` limits concurrent page jobs: ```python MAX_CONCURRENT_PAGES = 5 ... results = await run_parallel_with_limit( coroutines, max_concurrency=MAX_CONCURRENT_PAGES ) ``` ### Incremental regeneration path When commits differ, docsfy diffs changed files and asks the incremental planner which page slugs to invalidate: ```python changed_files = get_changed_files(repo_dir, old_sha, commit_sha) ... pages_to_regen = await run_incremental_planner( repo_dir, project_name, ai_provider, ai_model, changed_files, existing_plan, ai_cli_timeout, ) if pages_to_regen != ["all"]: for slug in pages_to_regen: cache_file = cache_dir / f"{slug}.md" if cache_file.exists(): cache_file.unlink() ``` ### Cache/output location model `src/docsfy/storage.py` defines project storage layout: ```python def get_project_dir(name: str, ai_provider: str = "", ai_model: str = "", owner: str = "") -> Path: ... safe_owner = _validate_owner(owner) return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model def get_project_site_dir(...): return get_project_dir(...) / "site" def get_project_cache_dir(...): return get_project_dir(...) / "cache" / "pages" ``` > **Warning:** Slug/path safety checks are enforced in both generation and rendering. If you change slug rules, update all validations (`main.py`, `generator.py`, `renderer.py`) consistently. ### Adding a new AI provider (beyond `claude/gemini/cursor`) Provider support is explicitly constrained in request validation and API checks: ```python ai_provider: Literal["claude", "gemini", "cursor"] | None = None ``` ```python if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) ``` Also update provider dropdowns in templates (`dashboard.html`, `status.html`) and relevant tests. --- ## 5) Configuration knobs for extension work Runtime settings come from `.env` (see `.env.example`) via `src/docsfy/config.py`. ```env ADMIN_KEY=your-secure-admin-key-here-min-16-chars AI_PROVIDER=claude AI_MODEL=claude-opus-4-6[1m] AI_CLI_TIMEOUT=60 LOG_LEVEL=INFO # SECURE_COOKIES=false ``` `config.py` defaults: ```python admin_key: str = "" ai_provider: str = "claude" ai_model: str = "claude-opus-4-6[1m]" ai_cli_timeout: int = Field(default=60, gt=0) log_level: str = "INFO" data_dir: str = "/data" secure_cookies: bool = True ``` App run-time host/port/debug toggles: ```python reload = os.getenv("DEBUG", "").lower() == "true" host = os.getenv("HOST", "127.0.0.1") port = int(os.getenv("PORT", "8000")) uvicorn.run("docsfy.main:app", host=host, port=port, reload=reload) ``` Container/dev config (`docker-compose.yaml`): ```yaml services: docsfy: build: . ports: - "8000:8000" env_file: .env volumes: - ./data:/data ``` > **Tip:** For local HTTP-only development, set `SECURE_COOKIES=false` so session cookies are accepted without TLS. --- ## 6) Tests and CI/CD status when extending The repo has strong unit/integration coverage for prompt building, generation, rendering, auth, and storage. Pytest is configured in `pyproject.toml`: ```toml [tool.pytest.ini_options] asyncio_mode = "auto" testpaths = ["tests"] pythonpath = ["src"] ``` > **Warning:** There are currently no CI/CD workflow files in this repository (`.github/workflows` and `.gitlab-ci*` are absent). Run tests locally after extension changes: - `pytest` - Focused suites like `pytest tests/test_generator.py tests/test_renderer.py tests/test_main.py` for generation/rendering changes ---