Backup and Recovery

docsfy persists operational state in DATA_DIR and expects both SQLite metadata and generated artifacts to remain available together.

Where data is stored

DATA_DIR is configured via settings and passed into DB initialization at startup:

class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )

    admin_key: str = ""  # Required — validated at startup
    ai_provider: str = "claude"
    ai_model: str = "claude-opus-4-6[1m]"  # [1m] = 1 million token context window
    ai_cli_timeout: int = Field(default=60, gt=0)
    log_level: str = "INFO"
    data_dir: str = "/data"
    secure_cookies: bool = True  # Set to False for local HTTP dev
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    settings = get_settings()
    ...
    await init_db(data_dir=settings.data_dir)
    await cleanup_expired_sessions()
    yield

storage.py resolves concrete paths from DATA_DIR:

DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"

Project artifacts are namespaced by owner/project/provider/model:

def _validate_owner(owner: str) -> str:
    """Validate owner segment to prevent path traversal."""
    if not owner:
        return "_default"
    if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."):
        msg = f"Invalid owner: '{owner}'"
        raise ValueError(msg)
    return owner

def get_project_dir(
    name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
    ...
    safe_owner = _validate_owner(owner)
    return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model

def get_project_site_dir(...):
    return get_project_dir(name, ai_provider, ai_model, owner) / "site"

def get_project_cache_dir(...):
    return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"

Expected layout:

DATA_DIR/
  docsfy.db
  projects/
    <owner-or-_default>/
      <project-name>/
        <ai-provider>/
          <ai-model>/
            plan.json
            cache/
              pages/
                *.md
            site/
              index.html
              *.html
              *.md
              search-index.json
              llms.txt
              llms-full.txt
              .nojekyll
              assets/*

What to back up

Back up both: 1. DATA_DIR/docsfy.db 2. DATA_DIR/projects/ (all owners/projects/variants)

SQLite holds project state plus auth/session/access data:

await db.execute("""
    CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        username TEXT UNIQUE NOT NULL,
        api_key_hash TEXT NOT NULL UNIQUE,
        role TEXT NOT NULL DEFAULT 'user',
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")

await db.execute("""
    CREATE TABLE IF NOT EXISTS project_access (
        project_name TEXT NOT NULL,
        project_owner TEXT NOT NULL DEFAULT '',
        username TEXT NOT NULL,
        PRIMARY KEY (project_name, project_owner, username)
    )
""")

await db.execute("""
    CREATE TABLE IF NOT EXISTS sessions (
        token TEXT PRIMARY KEY,
        username TEXT NOT NULL,
        is_admin INTEGER NOT NULL DEFAULT 0,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        expires_at TIMESTAMP NOT NULL
    )
""")

Generated docs and indexes are written into each variant’s site/ directory:

if output_dir.exists():
    shutil.rmtree(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
assets_dir = output_dir / "assets"
assets_dir.mkdir(exist_ok=True)

(output_dir / ".nojekyll").touch()
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
(output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8")
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")

Warning: Backing up only docsfy.db or only projects/ can produce mismatches (metadata points to missing files, or files exist without matching DB rows).

Deployment persistence configuration

Containerized deployments should persist /data externally. Example from docker-compose.yaml:

services:
  docsfy:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    volumes:
      - ./data:/data

Local repo config also avoids committing runtime data:

# Data
data/
.dev/data/
  1. Quiesce writes (stop docsfy, or ensure no generation is in progress).
  2. Snapshot/copy the entire DATA_DIR atomically if possible.
  3. Store versioned backups (daily full + retention policy).
  4. Test restore periodically in a non-production environment.

Tip: In Docker Compose setups, backing up host ./data captures both docsfy.db and all generated variant artifacts because it maps directly to /data.

Recovery procedure

  1. Stop docsfy.
  2. Restore DATA_DIR from the same backup set (docsfy.db + projects/).
  3. Start docsfy and let startup run DB initialization/migrations.
  4. Validate project status and docs serving.

Startup recovery behavior includes schema migration and handling interrupted generations:

# Migration: convert old 3-column PK table to 4-column PK (with owner)
...
logger.info(
    "Migrating database to 4-column PK schema (name, ai_provider, ai_model, owner)"
)
...
await db.execute("ALTER TABLE projects_new RENAME TO projects")

# Reset orphaned "generating" projects from previous server run
cursor = await db.execute(
    "UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)

Note: After restore/restart, variants that were generating when the backup was taken are intentionally transitioned to error with Server restarted during generation.

Variant/site export (supplemental backup)

docsfy can export rendered docs as .tar.gz through API endpoints:

@app.get("/api/projects/{name}/{provider}/{model}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
    tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}")
@app.get("/api/projects/{name}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
    tar.add(str(site_dir), arcname=name)

Use these as supplemental exports, not as full disaster-recovery backups.

Note: Download endpoints package the site/ output only; they do not include SQLite metadata (docsfy.db), cache/pages, or plan.json.

Destructive operations to account for

Generation and delete operations remove data on disk:

if force:
    cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
    if cache_dir.exists():
        shutil.rmtree(cache_dir)
        logger.info(f"[{project_name}] Cleared cache (force=True)")
...
project_dir = get_project_dir(name, provider, model, project_owner)
if project_dir.exists():
    shutil.rmtree(project_dir)

And each render replaces the full site directory:

if output_dir.exists():
    shutil.rmtree(output_dir)

Warning: DELETE endpoints and re-render operations are destructive on disk; recovery requires restoring from backup or regenerating from source repositories.