Skip to content

Future Plans

Planned work across aaiclick, ordered by priority.


Medium Priority

ClickHouse Migration Framework

aaiclick has no migration system for the ClickHouse side. Alembic manages the SQL schema (jobs, tasks, dependencies, registered_jobs, table_registry, …), but ClickHouse tables created via the ChClientoperation_log, all p_* / t_* / j_* data tables produced at runtime — are created with CREATE TABLE IF NOT EXISTS in aaiclick/oplog/models.py plus a column-existence validator. No versions, no history, no upgrade path.

The consequence: any DDL change in the Python source that would need to alter an existing table is silently a no-op on installs that already have it. Today this has bitten the operation_log ORDER BY change; it will keep biting every time anything structural changes on the CH side. Column types, new required columns, MergeTree key changes, TTL clauses, materialized projections, etc. all need a coordinated server-side update that the current setup cannot perform.

Also relevant: ClickHouse's own ALTER TABLE is limited — MODIFY ORDER BY can only append freshly added columns to the sort key, you can't reshape existing ones without rebuilding the table. So even a "real" migration framework has to handle per-change execution strategies (pure ALTER, shadow-table-rebuild, or drop-and-recreate with manual data move), not just a linear script runner.

What a minimal framework would look like:

  • A schema_version table in ClickHouse tracked per-database.
  • Versioned DDL scripts under aaiclick/oplog/migrations/ (or a broader aaiclick/ch_migrations/) applied in order by init_oplog_tables() on startup.
  • Each script declares its own execution strategy — inline ALTER, shadow-table rewrite, or a Python callable for data-move logic.
  • A --dry-run mode for operators.
  • Column validator (_validate_schema) grows a version check and surfaces a clear error ("your table is at v3, code expects v5, run aaiclick migrate").

Alternatives to building a framework:

  • Release-notes recipe — document a maintenance step per release. Zero code, high operator burden, easy to miss.
  • Per-change maintenance CLIsaaiclick maintenance rebuild-oplog, etc. Works but doesn't scale past a handful of changes.

No action today — fresh installs keep working, existing installs degrade gracefully at worst. Revisit once there is a third structural CH-side change (which makes the per-change CLI approach untenable) or once a change actually breaks (not just slows down) an existing install.

API Server Authentication + start_worker

The shared I/O layer (docs/api_server.md) shipped Phases 1–4 — view models, internal_api, REST routers, and the MCP surface. Two independent tracks remain; the full contract lives in docs/api_server.md under Authentication and Spawning workers — POST /api/v0/workers.

Track A — bearer-token auth. Gate /api/v0/* and /mcp/* behind a shared AAICLICK_API_TOKEN:

  • Unauthorized / Forbidden in aaiclick/internal_api/errors.py; add UNAUTHORIZED, FORBIDDEN, WORKER_SPAWN_FAILED to ProblemCode.
  • aaiclick/server/auth.py — a require_bearer dependency using hmac.compare_digest, setting WWW-Authenticate: Bearer on 401.
  • Wire Depends(require_bearer) per include_router call, plus an ASGI middleware on the /mcp mount (Depends does not cross mount boundaries). Token unset → open-server mode with a single startup WARNING. /health and the OpenAPI/docs routes stay open.

Track B — POST /api/v0/workers. The one CLI verb that did not graduate to HTTP in Phase 3:

  • StartWorkerRequest(max_tasks: int | None = None) in aaiclick/view_models.py.
  • internal_api.workers.start_worker(request) — raise Invalid if is_local(); spawn python -m aaiclick worker start [--max-tasks N] via create_subprocess_exec(start_new_session=True); map exec failure to Conflict(WORKER_SPAWN_FAILED).
  • Router returns 202 with a relative Location: /api/v0/workers header; matching start_worker MCP tool.

Out of scope (separate future work): DB-backed tokens with scopes, process supervision for HTTP-spawned workers, OAuth / OIDC.


Deferred

Items deferred until preconditions are met.

Object.export() HTML Format

.html extension → ClickHouse HTML output format. The format is supported by upstream ClickHouse but the chdb build that aaiclick ships against rejects it with UNKNOWN_FORMAT (chdb appears to omit the HTML output handler). Add an .html / HTML entry to FORMATS in aaiclick/data/formats.py and the corresponding test once chdb's build includes it, or once aaiclick gains a way to fall back to clickhouse-connect for formats chdb doesn't ship.

SSE /events Endpoint + LISTEN/NOTIFY Fanout

v0 uses 2 s refetchInterval polling. The designed real-time path is:

  1. GET /api/v0/eventstext/event-stream (one connection per UI session).
  2. Workers emit NOTIFY job_events in the same commit as every status write.
  3. FastAPI holds one LISTEN connection per backend and forwards notifications onto an in-process pub/sub bus.
  4. The SSE endpoint subscribes and streams typed events (job.updated, task.updated, task.log) to the browser.
  5. The browser calls queryClient.invalidateQueries(...) and lets REST fetch authoritative state — events are signals, not payloads.

SQLite local mode: poll + snapshot diff every 2 s (same latency as current polling, but avoids N×M HTTP requests from N browser tabs).

When to revisit: when polling overhead is measurable (many tabs or many concurrent jobs), or when sub-2 s latency matters for operators.

Cross-Host Log Access

task.log_path stores the filesystem path written by the worker process. In local mode (single process) aaiclick/internal_api/tasks.pyget_task_logs reads the file directly. In distributed / Docker mode the log file lives on the worker host's filesystem and is not accessible to the API server.

Solution when it lands: either (a) workers stream log lines into a DB column or a dedicated log table as they write, or (b) a sidecar log-shipping agent uploads completed log files to object storage (S3 / GCS) and get_task_logs redirects to a presigned URL.

When to revisit: when Docker or multi-host distributed runs become the primary deployment mode and operators need task logs in the UI.

SSE Cross-Host Fanout (Redis)

The v0 SSE pipeline (docs/frontend.md) feeds deltas onto a single in-process bus inside one FastAPI process — Postgres LISTEN/NOTIFY for distributed mode, polling for SQLite local mode. That works for any deployment where there is exactly one API process per host that clients can connect to.

Once we run multiple FastAPI workers across machines (e.g. behind a load balancer for horizontal scale), a notification arriving on host A's LISTEN connection won't reach an SSE client connected to host B. LISTEN/NOTIFY can't cheaply solve cross-host fanout — every host would need its own LISTEN, which doesn't scale and amplifies DB load.

Solution when it lands: Redis Pub/Sub. Workers (or the LISTEN adapter) publish to a Redis channel; every FastAPI host subscribes and forwards onto its in-process bus. The in-process bus and SSE delivery layer don't change — only the feeder gets a third option.

When to revisit: when we horizontally scale the API server beyond a single host, or when the single-process bus becomes a measurable bottleneck for connection count or fan-out throughput.

Frontend Unit Tests

The SPA (docs/frontend.md) ships with no unit-test layer in v0 — only TypeScript's static type check (tsc --noEmit) and Playwright e2e coverage in test_e2e/web/. Add Vitest + React Testing Library when component logic grows enough that e2e feedback is too coarse to localize regressions: typically when a single component owns enough branching behavior (form validation, derived state, conditional rendering paths) that an e2e failure can't tell you which branch broke.

Work when revisited:

  • Add vitest, @testing-library/react, jsdom to package.json dev deps.
  • npm test script + vitest.config.ts reusing the Vite config.
  • Co-locate tests next to the component (Foo.tsxFoo.test.tsx), matching the Python convention of test files alongside the modules they test.
  • Add an npm test step to the CI workflow that runs the SPA gates.

OpenAPI Codegen

src/api/types.ts is hand-written to mirror the pydantic view models. When the API surface grows, generate it from GET /api/v0/openapi.json using openapi-typescript or similar — run as a pre-build step so the TypeScript types always match the server schema.

Work when revisited: add openapi-typescript dev dep, npm run gen-types script, CI check that the generated file is up to date (commit the output; fail if dirty after re-gen).

Operator UI Auth

The v0 server is unauthenticated (localhost-only intent). When the UI is exposed beyond localhost, add an auth layer:

  • Simple option: HTTP Basic via a reverse proxy (nginx / Caddy).
  • Integrated option: cookie session with a configurable password via a FastAPI middleware; the SPA sends the cookie on every request.
  • Enterprise option: OAuth2 / OIDC via an identity provider.

When to revisit: when the server is intentionally exposed on a network interface accessible to untrusted clients.

Comparison Page

docs/comparison.md — feature matrix comparing aaiclick vs Pandas, Spark, and Dask. Defer until the project has enough real-world usage to make meaningful claims.

Changelog

docs/changelog.md — version history in Keep a Changelog format. Introduce with v1.0.0 release.