Backup and Recovery
docsfy persists operational state in DATA_DIR and expects both SQLite metadata and generated artifacts to remain available together.
Where data is stored
DATA_DIR is configured via settings and passed into DB initialization at startup:
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore",
)
admin_key: str = "" # Required — validated at startup
ai_provider: str = "claude"
ai_model: str = "claude-opus-4-6[1m]" # [1m] = 1 million token context window
ai_cli_timeout: int = Field(default=60, gt=0)
log_level: str = "INFO"
data_dir: str = "/data"
secure_cookies: bool = True # Set to False for local HTTP dev
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
settings = get_settings()
...
await init_db(data_dir=settings.data_dir)
await cleanup_expired_sessions()
yield
storage.py resolves concrete paths from DATA_DIR:
DB_PATH = Path(os.getenv("DATA_DIR", "/data")) / "docsfy.db"
DATA_DIR = Path(os.getenv("DATA_DIR", "/data"))
PROJECTS_DIR = DATA_DIR / "projects"
Project artifacts are namespaced by owner/project/provider/model:
def _validate_owner(owner: str) -> str:
"""Validate owner segment to prevent path traversal."""
if not owner:
return "_default"
if "/" in owner or "\\" in owner or ".." in owner or owner.startswith("."):
msg = f"Invalid owner: '{owner}'"
raise ValueError(msg)
return owner
def get_project_dir(
name: str, ai_provider: str = "", ai_model: str = "", owner: str = ""
) -> Path:
...
safe_owner = _validate_owner(owner)
return PROJECTS_DIR / safe_owner / _validate_name(name) / ai_provider / ai_model
def get_project_site_dir(...):
return get_project_dir(name, ai_provider, ai_model, owner) / "site"
def get_project_cache_dir(...):
return get_project_dir(name, ai_provider, ai_model, owner) / "cache" / "pages"
Expected layout:
DATA_DIR/
docsfy.db
projects/
<owner-or-_default>/
<project-name>/
<ai-provider>/
<ai-model>/
plan.json
cache/
pages/
*.md
site/
index.html
*.html
*.md
search-index.json
llms.txt
llms-full.txt
.nojekyll
assets/*
What to back up
Back up both:
1. DATA_DIR/docsfy.db
2. DATA_DIR/projects/ (all owners/projects/variants)
SQLite holds project state plus auth/session/access data:
await db.execute("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT UNIQUE NOT NULL,
api_key_hash TEXT NOT NULL UNIQUE,
role TEXT NOT NULL DEFAULT 'user',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
await db.execute("""
CREATE TABLE IF NOT EXISTS project_access (
project_name TEXT NOT NULL,
project_owner TEXT NOT NULL DEFAULT '',
username TEXT NOT NULL,
PRIMARY KEY (project_name, project_owner, username)
)
""")
await db.execute("""
CREATE TABLE IF NOT EXISTS sessions (
token TEXT PRIMARY KEY,
username TEXT NOT NULL,
is_admin INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NOT NULL
)
""")
Generated docs and indexes are written into each variant’s site/ directory:
if output_dir.exists():
shutil.rmtree(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
assets_dir = output_dir / "assets"
assets_dir.mkdir(exist_ok=True)
(output_dir / ".nojekyll").touch()
(output_dir / "index.html").write_text(index_html, encoding="utf-8")
(output_dir / f"{slug}.html").write_text(page_html, encoding="utf-8")
(output_dir / f"{slug}.md").write_text(md_content, encoding="utf-8")
(output_dir / "search-index.json").write_text(json.dumps(search_index), encoding="utf-8")
(output_dir / "llms.txt").write_text(llms_txt, encoding="utf-8")
(output_dir / "llms-full.txt").write_text(llms_full_txt, encoding="utf-8")
Warning: Backing up only
docsfy.dbor onlyprojects/can produce mismatches (metadata points to missing files, or files exist without matching DB rows).
Deployment persistence configuration
Containerized deployments should persist /data externally. Example from docker-compose.yaml:
services:
docsfy:
build: .
ports:
- "8000:8000"
env_file: .env
volumes:
- ./data:/data
Local repo config also avoids committing runtime data:
# Data
data/
.dev/data/
Recommended backup procedure
- Quiesce writes (stop
docsfy, or ensure no generation is in progress). - Snapshot/copy the entire
DATA_DIRatomically if possible. - Store versioned backups (daily full + retention policy).
- Test restore periodically in a non-production environment.
Tip: In Docker Compose setups, backing up host
./datacaptures bothdocsfy.dband all generated variant artifacts because it maps directly to/data.
Recovery procedure
- Stop
docsfy. - Restore
DATA_DIRfrom the same backup set (docsfy.db+projects/). - Start
docsfyand let startup run DB initialization/migrations. - Validate project status and docs serving.
Startup recovery behavior includes schema migration and handling interrupted generations:
# Migration: convert old 3-column PK table to 4-column PK (with owner)
...
logger.info(
"Migrating database to 4-column PK schema (name, ai_provider, ai_model, owner)"
)
...
await db.execute("ALTER TABLE projects_new RENAME TO projects")
# Reset orphaned "generating" projects from previous server run
cursor = await db.execute(
"UPDATE projects SET status = 'error', error_message = 'Server restarted during generation', current_stage = NULL WHERE status = 'generating'"
)
Note: After restore/restart, variants that were
generatingwhen the backup was taken are intentionally transitioned toerrorwithServer restarted during generation.
Variant/site export (supplemental backup)
docsfy can export rendered docs as .tar.gz through API endpoints:
@app.get("/api/projects/{name}/{provider}/{model}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
tar.add(str(site_dir), arcname=f"{name}-{provider}-{model}")
@app.get("/api/projects/{name}/download")
...
with tarfile.open(tar_path, mode="w:gz") as tar:
tar.add(str(site_dir), arcname=name)
Use these as supplemental exports, not as full disaster-recovery backups.
Note: Download endpoints package the
site/output only; they do not include SQLite metadata (docsfy.db),cache/pages, orplan.json.
Destructive operations to account for
Generation and delete operations remove data on disk:
if force:
cache_dir = get_project_cache_dir(project_name, ai_provider, ai_model, owner)
if cache_dir.exists():
shutil.rmtree(cache_dir)
logger.info(f"[{project_name}] Cleared cache (force=True)")
...
project_dir = get_project_dir(name, provider, model, project_owner)
if project_dir.exists():
shutil.rmtree(project_dir)
And each render replaces the full site directory:
if output_dir.exists():
shutil.rmtree(output_dir)
Warning:
DELETEendpoints and re-render operations are destructive on disk; recovery requires restoring from backup or regenerating from source repositories.