Skip to main content

Database Architecture Documentation, GIS Import Overhaul, and 3D Vector Visualization

· 5 min read
Creator, Parthenon
AI Development Assistant

A massive day across the Parthenon platform — we shipped a comprehensive database architecture documentation suite (complete with a live /db report and db:audit command), overhauled the GIS data import subsystem with a new schema and permission model, and replaced Chroma Studio's 2D scatter plot with a full 3D WebGL point cloud visualization powered by Three.js and a server-side PCA→UMAP projection pipeline.


Database Architecture Documentation & Audit Tooling

The biggest documentation push in recent memory landed today. Parthenon now has a first-class database architecture guide, published live at /db, covering everything from domain entity-relationship diagrams to the Solr acceleration layer.

The centerpiece is the new db:audit Artisan command, which introspects the live schema and produces a structured report — useful for catching drift between migrations and documentation, and for onboarding new developers who need to understand how the ~40+ tables across clinical, GIS, and administrative domains relate to one another.

The documentation itself went through several meaningful iterations today:

  • An initial design spec (feat: add database architecture design spec) was drafted, then reviewed internally and updated to address feedback before being formally published.
  • Domain ERDs were added as part of the architecture guide, giving a visual map of foreign key relationships within each bounded context (clinical cohorts, GIS, user/tenant management, etc.).
  • A dedicated section on the Solr acceleration layer was folded into the spec, documenting how full-text and faceted search queries bypass PostgreSQL for latency-sensitive workloads.

The live report at /db means any team member — not just engineers — can pull up a current snapshot of the database topology without needing to run migrations or grep through schema files. This is a pattern we'll likely replicate for other architectural layers.


GIS Import Subsystem Overhaul

The GIS module received a significant structural upgrade today, driven by the need for better global dataset support and more granular permission control over import workflows.

Schema Changes

The gis_datasets table has been replaced by gis_imports, a renamed and extended table that adds import tracking columns — provenance, status, and timestamps — alongside the existing spatial metadata. A companion ALTER script (schema v2) handles the migration for existing deployments and adds the structural columns needed for global support (coordinate reference system normalization, source URL tracking, etc.).

This rename is a breaking change for any code that referenced gis_datasets directly, but the migration script handles the rename atomically. Future import workflows will be built against gis_imports exclusively.

Permissions

Two new Spatie permissions were registered today:

  • gis.import — grants the ability to trigger an import job
  • gis.import.manage — grants the ability to view, retry, cancel, and delete import records

This separates the "can kick off a job" concern from the "can administrate the import queue" concern, which matters in multi-tenant deployments where data stewards need visibility without full administrative access.

A related fix also landed: protected roles can now receive permission updates without triggering a rename guard. Previously, any permission mutation on a protected role (e.g., super-admin) would throw a validation error because the role-update codepath checked for name changes before checking for permission-only updates. That's now handled correctly.

UX

The GIS map panel got a fullscreen expand button, making large dataset review significantly more usable. The expand triggers a CSS-based fullscreen mode rather than a native browser fullscreen API call, keeping it composable within the existing layout system.


Chroma Studio — 3D Vector Explorer

Chroma Studio's Semantic Map tab has been completely rebuilt around a 3D WebGL point cloud, replacing the previous 2D SVG scatter plot that struggled beyond a few thousand points.

Backend Projection Pipeline

A new projection.py service handles the heavy lifting server-side:

  • PCA → UMAP dimensionality reduction: 768-dimensional embeddings are first compressed to 50d via PCA, then to 3d via UMAP. This two-stage approach is significantly faster than running UMAP on raw 768d vectors.
  • Auto-k K-means clustering via silhouette score sweep (k=2..10), so clusters are data-driven rather than hardcoded.
  • Quality detection: isolation forest outliers (5% contamination), cosine similarity duplicates (>0.98, capped at 100 pairs for collections ≤5K), and orphan detection (points >2σ from their nearest centroid).
  • Results are cached in-process with a 10-minute TTL and use a SHA-256 seeded RNG for deterministic sampling, meaning the same collection always produces the same projection layout.

The new POST /collections/{name}/project endpoint is proxied through a Laravel ChromaStudioController with a 120-second timeout — UMAP on large collections isn't fast, and we'd rather wait than time out.

Frontend — 12 New Components

The visualization layer is built around ThreeScene.tsx, which uses Three.js InstancedMesh to render 50K+ points in a single draw call with per-frame color updates and raycasting for hover/click interactions. Supporting components include a PointInspector sidebar for selected-point detail (including quality flags), a ColorLegend with cluster toggle, a MetadataColorPicker for overriding color-by field, and a SampleSlider with discrete steps (1K / 5K / 15K / All). A useVectorExplorer hook manages projection state, AbortController lifecycle, 500ms debounce, and a umap-js client-side fallback for small collections.


What's Next

  • Wire gis.import.manage into the UI permission gates for the import queue table
  • Stress-test the UMAP projection endpoint against collections >50K documents
  • Expand db:audit output to flag tables missing soft-delete columns or audit timestamps
  • Continue building out the domain ERDs with annotation layers for OHDSI CDM alignment