Core Concepts
docsfy organizes generated documentation around six core entities:
- Project: a repository identity (derived name + metadata).
- Variant: one generated output for a specific AI provider/model.
- Owner: the authenticated user who owns that project/variant namespace.
- Role: authorization level (
admin,user,viewer). - Session: login state via secure cookie and DB-backed expiry.
- Generated artifacts: cached markdown and rendered static site files.
Note: In
docsfy, project names are repository-centric, but storage and access are owner-scoped to avoid cross-user collisions.
1) Projects
A generation request must include exactly one source (repo_url or repo_path), and project_name is derived from that source.
```10:30:src/docsfy/models.py class GenerateRequest(BaseModel): repo_url: str | None = Field( default=None, description="Git repository URL (HTTPS or SSH)" ) repo_path: str | None = Field(default=None, description="Local git repository path") ai_provider: Literal["claude", "gemini", "cursor"] | None = None ai_model: str | None = None ai_cli_timeout: int | None = Field(default=None, gt=0) force: bool = Field( default=False, description="Force full regeneration, ignoring cache" )
@model_validator(mode="after")
def validate_source(self) -> GenerateRequest:
if not self.repo_url and not self.repo_path:
msg = "Either 'repo_url' or 'repo_path' must be provided"
raise ValueError(msg)
if self.repo_url and self.repo_path:
msg = "Provide either 'repo_url' or 'repo_path', not both"
raise ValueError(msg)
return self
```55:64:src/docsfy/models.py
@property
def project_name(self) -> str:
if self.repo_url:
name = self.repo_url.rstrip("/").split("/")[-1]
if name.endswith(".git"):
name = name[:-4]
return name
if self.repo_path:
return Path(self.repo_path).resolve().name
return "unknown"
Projects are tracked in SQLite with generation metadata (status, commit SHA, page count, plan JSON, timestamps).
```56:73:src/docsfy/storage.py CREATE TABLE IF NOT EXISTS projects ( name TEXT NOT NULL, ai_provider TEXT NOT NULL DEFAULT '', ai_model TEXT NOT NULL DEFAULT '', owner TEXT NOT NULL DEFAULT '', repo_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'generating', current_stage TEXT, last_commit_sha TEXT, last_generated TEXT, page_count INTEGER DEFAULT 0, error_message TEXT, plan_json TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (name, ai_provider, ai_model, owner) )
## 2) Variants
A **variant** is one `(project, provider, model, owner)` tuple.
This is the real unit of generation, status, deletion, serving, and download.
```282:290:src/docsfy/storage.py
"""INSERT INTO projects (name, ai_provider, ai_model, owner, repo_url, status, updated_at)
VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
ON CONFLICT(name, ai_provider, ai_model, owner) DO UPDATE SET
repo_url = excluded.repo_url,
status = excluded.status,
error_message = NULL,
current_stage = NULL,
updated_at = CURRENT_TIMESTAMP""",
(name, ai_provider, ai_model, owner, repo_url, status),
Variant-specific API/docs routes are explicit:
```1019:1041:src/docsfy/main.py @app.get("/api/projects/{name}/{provider}/{model}") async def get_variant_details( request: Request, name: str, provider: str, model: str, ) -> dict[str, str | int | None]: name = _validate_project_name(name) project = await _resolve_project( request, name, ai_provider=provider, ai_model=model )
return project
@app.delete("/api/projects/{name}/{provider}/{model}") async def delete_variant( request: Request, name: str, provider: str, model: str, ) -> dict[str, str]:
```1379:1386:src/docsfy/main.py
@app.get("/docs/{project}/{provider}/{model}/{path:path}")
async def serve_variant_docs(
request: Request,
project: str,
provider: str,
model: str,
path: str = "index.html",
) -> FileResponse:
3) Owners
Owner is set from the authenticated username at generation time:
```457:484:src/docsfy/main.py project_name = gen_request.project_name owner = request.state.username
if ai_provider not in ("claude", "gemini", "cursor"): raise HTTPException( status_code=400, detail=f"Invalid AI provider: '{ai_provider}'. Must be claude, gemini, or cursor.", ) if not ai_model: raise HTTPException(status_code=400, detail="AI model must be specified.")
Fix 6: Use lock to prevent race condition between check and add
gen_key = f"{owner}/{project_name}/{ai_provider}/{ai_model}" async with _gen_lock: if gen_key in _generating: raise HTTPException( status_code=409, detail=f"Variant '{project_name}/{ai_provider}/{ai_model}' is already being generated", )
await save_project(
name=project_name,
repo_url=gen_request.repo_url or gen_request.repo_path or "",
status="generating",
ai_provider=ai_provider,
ai_model=ai_model,
owner=owner,
)
Owner is also part of filesystem layout:
```501:519:src/docsfy/storage.py
def get_project_dir(
name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
if not ai_provider or not ai_model:
msg = "ai_provider and ai_model are required for project directory paths"
raise ValueError(msg)
# Sanitize path segments to prevent traversal
for segment_name, segment in [("ai_provider", ai_provider), ("ai_model", ai_model)]:
if (
"/" in segment
or "\\" in segment
or ".." in segment
or segment.startswith(".")
):
msg = f"Invalid {segment_name}: '{segment}'"
raise ValueError(msg)
safe_owner = _validate_owner(owner)
return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
Cross-owner sharing is controlled through project_access and scoped by (project_name, project_owner, username).
```237:243:src/docsfy/storage.py CREATE TABLE IF NOT EXISTS project_access ( project_name TEXT NOT NULL, project_owner TEXT NOT NULL DEFAULT '', username TEXT NOT NULL, PRIMARY KEY (project_name, project_owner, username) )
> **Warning:** For admin users, if multiple owners have the same variant `(name/provider/model)`, owner is ambiguous and some variant routes return `409` until disambiguated.
```241:246:src/docsfy/main.py
if len(distinct_owners) > 1:
raise HTTPException(
status_code=409,
detail="Multiple owners found for this variant, please specify owner",
)
4) Roles
docsfy defines three roles:
- admin: full access, including user and access management endpoints.
- user: read/write project operations (generate, abort, delete) within accessible scope.
- viewer: read-only access (dashboard/docs/download/status), no write operations.
```609:623:src/docsfy/storage.py VALID_ROLES = frozenset({"admin", "user", "viewer"})
async def create_user(username: str, role: str = "user") -> tuple[str, str]: """Create a user and return (username, raw_api_key).""" if username.lower() == "admin": msg = "Username 'admin' is reserved" raise ValueError(msg) if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]{1,49}$", username): msg = f"Invalid username: '{username}'. Must be 2-50 alphanumeric characters, dots, hyphens, underscores." raise ValueError(msg) if role not in VALID_ROLES: msg = f"Invalid role: '{role}'. Must be admin, user, or viewer." raise ValueError(msg)
```185:191:src/docsfy/main.py
def _require_write_access(request: Request) -> None:
"""Raise 403 if user is a viewer (read-only)."""
if request.state.role not in ("admin", "user"):
raise HTTPException(
status_code=403,
detail="Write access required.",
)
5) Sessions
Authentication supports both:
Authorization: Bearer ...(admin key or user API key)docsfy_sessioncookie (browser login flow)
```122:137:src/docsfy/main.py
1. Check Authorization header (API clients)
auth_header = request.headers.get("authorization", "") if auth_header.startswith("Bearer "): token = auth_header[7:] if token == settings.admin_key: is_admin = True username = "admin" else: user = await get_user_by_key(token)
2. Check session cookie (browser) -- opaque session token
if not user and not is_admin: session_token = request.cookies.get("docsfy_session") if session_token: session = await get_session(session_token)
Sessions are opaque tokens, hashed at rest, and expire after 8 hours.
```21:23:src/docsfy/storage.py
SESSION_TTL_SECONDS = 28800 # 8 hours
SESSION_TTL_HOURS = SESSION_TTL_SECONDS // 3600
```686:713:src/docsfy/storage.py async def create_session( username: str, is_admin: bool = False, ttl_hours: int = SESSION_TTL_HOURS ) -> str: """Create an opaque session token.""" token = secrets.token_urlsafe(32) token_hash = _hash_session_token(token) expires_at = datetime.now(timezone.utc) + timedelta(hours=ttl_hours) expires_str = expires_at.strftime("%Y-%m-%d %H:%M:%S") async with aiosqlite.connect(DB_PATH) as db: await db.execute( "INSERT INTO sessions (token, username, is_admin, expires_at) VALUES (?, ?, ?, ?)", (token_hash, username, 1 if is_admin else 0, expires_str), ) await db.commit() return token
```297:304:src/docsfy/main.py
response.set_cookie(
"docsfy_session",
session_token,
httponly=True,
samesite="strict",
secure=settings.secure_cookies,
max_age=SESSION_TTL_SECONDS,
)
Tip: Keep
SECURE_COOKIESenabled in production. Only set it tofalsefor local HTTP development.
```27:28:.env.example
Set to false for local HTTP development
SECURE_COOKIES=false
## 6) Generated Artifacts
Each completed variant writes structured outputs under owner/project/provider/model:
- `plan.json` (navigation plan used for rendering and status UI)
- `cache/pages/*.md` (cached AI markdown for incremental regeneration)
- `site/` (served static docs)
Site generation includes HTML, markdown copies, search index, and LLM-friendly files:
```223:290:src/docsfy/renderer.py
# Prevent GitHub Pages from running Jekyll
(output_dir / ".nojekyll").touch()
project_name: str = plan.get("project_name", "Documentation")
tagline: str = plan.get("tagline", "")
navigation: list[dict[str, Any]] = plan.get("navigation", [])
repo_url: str = plan.get("repo_url", "")
# ...
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
# ...
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
search_index = _build_search_index(valid_pages, plan)
(output_dir / "search-index.json").write_text(
json.dumps(search_index), encoding="utf-8"
)
# Generate llms.txt files
llms_txt = _build_llms_txt(plan)
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
llms_full_txt = _build_llms_full_txt(plan, valid_pages)
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
The orchestration layer persists the plan and final status:
```998:1015:src/docsfy/main.py site_dir = get_project_site_dir(project_name, ai_provider, ai_model, owner) render_site(plan=plan, pages=pages, output_dir=site_dir)
project_dir = get_project_dir(project_name, ai_provider, ai_model, owner) (project_dir / "plan.json").write_text(json.dumps(plan, indent=2), encoding="utf-8")
page_count = len(pages) await update_project_status( project_name, ai_provider, ai_model, status="ready", owner=owner, current_stage=None, last_commit_sha=commit_sha, page_count=page_count, plan_json=json.dumps(plan), )
Persistent storage is typically mounted to `/data`:
```1:10:docker-compose.yaml
services:
docsfy:
build: .
ports:
- "8000:8000"
env_file: .env
volumes:
- ./data:/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
7) CI/CD and Quality Gate Context
This repository currently has no checked-in .github workflow directory, but quality checks are still codified via local/CI-capable tooling:
```1:7:tox.toml skipsdist = true
envlist = ["unittests"]
[env.unittests] deps = ["uv"] commands = [["uv", "run", "--extra", "dev", "pytest", "-n", "auto", "tests"]]
```43:61:.pre-commit-config.yaml
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.2
hooks:
- id: ruff
- id: ruff-format
- repo: https://github.com/gitleaks/gitleaks
rev: v8.30.0
hooks:
- id: gitleaks
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.19.1
hooks:
- id: mypy
In practice, these concepts fit together as:
- Authenticated user (owner + role) submits generation request.
- Request creates/updates a project variant.
- Background pipeline plans, generates, renders artifacts.
- Session-scoped or bearer-scoped access controls who can view/manage each variant.
- Static artifacts are served directly or downloaded as
.tar.gz.