Layout Engine

Zindex ships with a built-in layout engine. Agents describe what the diagram contains - nodes, edges, the relationships between them - and the engine figures out where each element should sit, how each edge should route around obstacles, and where each label should land.

The engine is implemented from scratch with zero external layout dependencies. It uses a Sugiyama-style layered layout pipeline (the same family of algorithms behind tools like Graphviz and Mermaid) and runs deterministically: identical input always produces identical output.

Building an agent integration? Fetch the canonical agent front door with Accept: text/markdown for the ready-to-paste system prompt that covers the recommended workflow, including when to use auto-layout vs explicit positioning. The same content is bundled into the @zindex-ai/mcp server’s instructions field.

Why this exists

Most diagram tools assume the human knows where every box and line goes. That assumption breaks down for agents. An LLM generating an architecture diagram knows the services and the connections, but asking it to also produce pixel-perfect coordinates is asking for a worse diagram and a worse use of tokens.

The layout engine inverts the contract: agents describe the graph, the engine handles the geometry. Agents that want pixel control can still provide explicit layout on individual elements - the engine respects user-supplied positions and only fills in the gaps. This mixed mode is the most common pattern in practice: pin the critical elements, let the engine arrange the rest.

What you get

No coordinates required. Set layoutStrategy: { algorithm: "hierarchical", direction: "TB" } and the engine positions every node automatically.
Mixed mode. Some nodes have explicit layout, others don’t - the engine fills in the gaps and respects the fixed nodes as anchors.
Smart edge routing. Edges enter target nodes through centered faces wherever possible, route around obstacles via step paths when an L would collide, and never cut through unrelated nodes.
Non-overlapping labels. Each placed label reserves space so subsequent labels keep visible breathing room.
Clean output. Sub-pixel artifacts are stripped from edge paths so arrow markers orient correctly.
Deterministic. The same scene always produces the same layout - layouts are diffable and versionable.

What auto-layout handles and what it doesn’t

Auto-layout is powerful but has a specific scope. Understanding the boundary prevents frustration:

Auto-layout handles (no coordinates needed):

Node positioning - the planner assigns x/y/width/height based on graph topology
Edge routing - orthogonal paths with obstacle avoidance, centered entry faces, step-path fallbacks
Edge label placement - non-overlapping, longest-segment-biased positioning
Crossing reduction - minimises edge crossings via median heuristic
Aesthetic scoring - evaluates 5 candidates and picks the best

Auto-layout also handles frames and containers:

Frames auto-size from children. Omit layout from a frame and the engine computes its bounds from the union of its children’s positions, plus padding and title reserve.
Pools with lanes auto-split. A pool with laneDirection: "vertical" and lane children automatically divides its content area into horizontal bands.
Activation bars auto-position. In sequence diagrams, sequence.activation nodes without explicit positions are auto-placed on the most active lifeline, spanning the y-range of messages on that lifeline.
Combined fragments auto-size. Fragment frames in sequence diagrams auto-size from their child messages.
Nested frames. Frames inside frames are resolved leaf-to-root via topological sort.

What still needs explicit coordinates:

Multi-region spatial layouts - diagrams where nodes must be grouped into specific visual regions by semantic meaning (e.g. “frontend on the left, backend on the right”). The planner positions by graph topology, not by semantic grouping.
Custom spatial relationships beyond what the constraint solver handles. Note: the engine supports hard ordering constraints via type: "order" (with relations above, below, leftOf, rightOf, sameRank) that agents can use to control rank and position ordering during auto-layout. For relationships not expressible as pairwise ordering, explicit coordinates are still needed.

The rule of thumb: omit coordinates from everything. The engine auto-positions nodes, edges, frames, activation bars, and fragments. Provide explicit layout only when you need pixel-precise control or a specific spatial arrangement the planner can’t infer from the graph structure.

The pipeline

Every render runs the scene through five phases. Each phase has well-defined inputs and outputs and can be inspected in isolation.

1. Measurement

Every node and every edge label is measured before any positioning happens. Node sizes come from intrinsic measurements (text width, padding, fixed shape constraints). Edge labels are measured to the pixel so the planner can leave room for them between ranks.

2. Layer assignment

Nodes are assigned to ranks (rows for TB, columns for LR) based on the directional flow of the graph. Cycles are detected and broken using a feedback arc set heuristic, so the algorithm always produces a finite layering even on cyclic input.

3. Crossing reduction

Within each layer, nodes are reordered to minimise the number of edges that cross. The algorithm uses the median heuristic: each node’s preferred position is the median index of its neighbours in the adjacent layer. The down sweep aligns each layer with its predecessors, the up sweep aligns each layer with its successors, and 12 iterations are enough to converge for any normal graph.

When two nodes have similar median positions (within 0.5), the algorithm tiebreaks by input order. This preserves natural reading order: if you declare api1 before api2, they’ll appear in that order even when the heuristic could swap them to save a single crossing. Substantial median differences still trust the heuristic, so this never produces a worse layout - it just prevents counter-intuitive reorderings.

4. Coordinate assignment

Once layer order is fixed, each node gets an exact position along its layer using iterative barycentric refinement with PAVA projection:

Each node’s “target” position is the mean of its neighbours’ positions in the adjacent layers.
Targets are then projected onto a feasible non-overlapping arrangement using Pool Adjacent Violators (PAVA) - the L2-optimal monotonic sequence under linear spacing constraints, computed in linear time.
The down + up sweep is repeated 24 times. After each full pass the global centroid is normalised to zero, preventing drift.

The result: chains of single-parent / single-child nodes end up dead straight (a worker connected only to a job queue sits exactly underneath it), branching points are centered above or below their children, and node spacing always satisfies the configured nodeSpacing.

5. Edge routing

After every node has a position, edges are routed. The router walks a ranked list of anchor candidates for each edge:

The visually preferred centered entry: leave the source perpendicular to the line connecting the centers, enter the target through a centered N/S or E/W face. The L’s terminal leg lands on the target’s center column / row.
If the centered L collides with an obstacle, try a 5-point step path that routes through the gap between source and target while still entering the target perpendicular to a centered face.
If both fail, fall back to the same-axis pair (E↔W or N↔S) so the bend hugs the target’s edge - never ideal, but always valid.

Frames (pools, lanes, fragments) and groups are containers, not obstacles - an edge between two children of a swimlane will never be reported as colliding with the lane itself. Frames and groups are also valid edge endpoints: when an edge’s from or to references a frame ID, the router computes anchors against the frame’s bounds and the arrow terminates at the frame’s border, exactly as it would for a node target. This is the platform mechanism behind the edges-to-a-group-as-a-whole authoring convention.

After routing, every path is run through a simplification pass that snaps near-equal coordinates, drops consecutive duplicates, and removes colinear midpoints. This eliminates the sub-pixel wiggles that come from planner-driven width mismatches and keeps arrow markers oriented cleanly.

A channel routing refinement then detects interior edge segments that overlap or run in parallel within the same corridor (from different edges) and spreads them into distinct tracks. This prevents multiple edges from visually overlapping when they share the same horizontal or vertical segment. Track spacing is constrained by the available gap between nearby nodes.

Canvas-aware spread

When scene.canvas is set and the laid-out content uses substantially less space than the canvas allows, the planner scales node positions and rank centers around their centroid so the layout fills the available area. Minimum node and rank spacing remain hard floors - auto-spread only increases gaps, never compresses them. This means agents who set a larger canvas get a layout that uses the space, instead of content clustered into one corner. Spread is capped (3x maximum scale factor) to prevent absurd spacing on tiny graphs that happen to live in a generous canvas.

Canvas size is a minimum, with per-axis tightening on over-declared canvases

Declared scene.canvas dimensions act as a lower bound: when content exceeds them the engine auto-extends the canvas to fit (and emits a CANVAS_AUTO_EXTENDED info diagnostic). When the declared canvas is larger than what node spacing requires, the engine auto-spreads (above) to fill the available area.

When the declared canvas on either axis exceeds the comfortable centred-content size by more than 30% AND every node was auto-laid-out, the engine tightens that axis to content x 1.4 to avoid shipping with excessive slack. Width and height are tightened independently and per-axis. The engine emits one CANVAS_AUTO_TIGHTENED info diagnostic per tightened axis (the data.axis field carries "width" or "height"). This is the only path where the engine reduces a declared canvas dimension. Scenes with any user-positioned content are left untouched - the agent’s declared canvas size is treated as part of that positioning intent.

Summary: bigger declared canvas = more breathing room (engine spreads to fill); smaller declared canvas = tighter layout up to the auto-extension floor; auto-laid-out scenes with a much-too-large declared canvas on either axis = tightened to a comfortable margin per axis. nodeSpacing / rankSpacing remain hard floors regardless.

For per-diagnostic recovery prose (when each fires, what data is in the payload, what to do about it), see Rendering reference: Render diagnostics.

Bonus phase: label placement

For each labeled edge, the placer samples positions along every segment of the routed path. Each candidate’s bounding box is tested against a spatial index of nodes and previously-placed labels - if anything intersects, the candidate is rejected. The remaining candidates are scored by combining segment-centeredness (prefer mid-segment) with a longest-segment preference (prefer the most prominent visual run of the edge). The chosen position is then reserved as an obstacle for subsequent placements, so neighbouring labels keep visible breathing room.

Aesthetic scoring

The engine doesn’t just run the pipeline once and hope for the best. It evaluates 5 layout candidates with different planner parameters and picks the one with the lowest aesthetic penalty score. This happens automatically on every render.

The 5 candidates vary two parameters that have the biggest impact on layout quality: node/rank spacing (how dense or spacious the layout is) and crossing-reduction thoroughness (how many optimisation passes the planner runs). One candidate always uses the scene’s own defaults, so the worst case is identical to a single-pass layout.

Each candidate is scored on:

Edge crossings - the single most impactful quality signal. Fewer crossings = more readable.
Edge bends - fewer 90-degree bends = cleaner orthogonal routes.
Total edge length - prefer compact layouts over sprawling ones (weak signal).
Label overlaps - should be zero. Heavily penalised when not.
Area utilisation - prefer layouts that fill roughly 40% of the canvas, not too sparse and not too packed.
Canvas overflow - layouts whose content exceeds the canvas in either axis are heavily penalised, so the best-of-N selector rejects them. Combined with the canvas-aware spread step above, this means agents reliably get a layout that fits the canvas they specified.

The winner is cached, so subsequent renders of the same scene return it instantly without re-running the candidates. The entire process is deterministic: same scene always produces the same layout.

For agents, this is invisible - you describe the graph, the engine does the rest. The scoring is the mechanism behind the “always produces a clean diagram” guarantee.

Configuration

The layout engine reads scene.layoutStrategy:

{
  "layoutStrategy": {
    "algorithm": "hierarchical",
    "direction": "TB",
    "nodeSpacing": 60,
    "rankSpacing": 100
  }
}

Field	Type	Default	Description
`algorithm`	string	`"hierarchical"`	Layout algorithm. Currently `"hierarchical"` is the production-ready choice.
`direction`	`"TB"` \| `"BT"` \| `"LR"` \| `"RL"`	depends on family	Primary flow direction.
`nodeSpacing`	number	`30`	Minimum pixels between nodes in the same rank.
`rankSpacing`	number	`80`	Minimum pixels between adjacent ranks.

When omitted, sensible defaults are used per diagram family. The hierarchical pipeline is used by architecture, workflow, entityRelationship, uiflow, and the default fallback strategy. Org charts use a tidy-tree planner instead, and network topologies use a force-directed layout (since they have no inherent direction). Sequence diagrams use a dedicated time-ordered resolver.

Mixed mode in practice

The most common production pattern is to fix some nodes and let the engine position the rest:

{
  "schemaVersion": "0.1",
  "scene": { "id": "mixed-arch", "canvas": { "width": 1000, "height": 600 } },
  "layoutStrategy": { "algorithm": "hierarchical", "direction": "LR" },
  "elements": [
    {
      "id": "gateway",
      "kind": "node",
      "nodeType": "service",
      "label": "API Gateway",
      "layout": { "mode": "absolute", "x": 50, "y": 250, "width": 160, "height": 80 }
    },
    { "id": "auth", "kind": "node", "nodeType": "service", "label": "Auth" },
    { "id": "users", "kind": "node", "nodeType": "service", "label": "Users" },
    { "id": "billing", "kind": "node", "nodeType": "service", "label": "Billing" },
    { "id": "db", "kind": "node", "nodeType": "database", "label": "Postgres" },
    { "id": "e1", "kind": "edge", "from": { "elementId": "gateway" }, "to": { "elementId": "auth" } },
    { "id": "e2", "kind": "edge", "from": { "elementId": "gateway" }, "to": { "elementId": "users" } },
    { "id": "e3", "kind": "edge", "from": { "elementId": "gateway" }, "to": { "elementId": "billing" } },
    { "id": "e4", "kind": "edge", "from": { "elementId": "users" }, "to": { "elementId": "db" } },
    { "id": "e5", "kind": "edge", "from": { "elementId": "billing" }, "to": { "elementId": "db" } }
  ]
}

The gateway is pinned where the diagram’s user wants it. Auth, Users, Billing, and Postgres have no layout - the engine positions them automatically and routes the edges around the gateway.

Determinism guarantees

The engine is fully deterministic. Given the same scene document, the same layoutStrategy, and the same set of diagram-family rules, the output is bit-identical across runs. This means:

Diagrams can be checked into git and diffed meaningfully.
Visual regression tests don’t need to tolerate pixel jitter.
Two agents inspecting the same scene see the same picture.

There is no random initialization, no time-based heuristic, and no external service call. Every algorithm in the pipeline is purely a function of the input.