Skip to content

LinxISA v0.4 Rendering Kernel Authoring Guide

This document defines the live authoring guidance for rendering-related kernels and PTO-style building blocks in canonical v0.4.

It is written to stay consistent with the current workload split of the ISA:

  • general-purpose computing remains a first-class target across the broader block-structured ISA,
  • AI-oriented acceleration primarily composes CUBE + VEC + TMA,
  • rendering-oriented acceleration primarily composes VEC + TMA + TAU.

This page covers the rendering subset only. It does not redefine the canonical TEPL selector set or the rendering command contract.

Scope

  • rendering-related kernel authoring under workloads/pto_kernels/,
  • tile-shape and data-layout conventions for rendering-oriented work,
  • how rendering kernels should compose VEC, TMA, and TAU under canonical v0.4,
  • what to avoid until a later catalog revision freezes additional rendering operations.

Authoring Location

Rendering-related PTO kernels live under:

  • workloads/pto_kernels/

This submodule is the canonical workspace for PTO kernel sources, public PTO headers, and host-side validation flows. Rendering examples and experiments should follow the same source and test conventions already used by the existing PTO kernels there.

Workload Alignment

Rendering kernels are one workload profile within a broader ISA that targets multiple workload classes.

  • General-purpose computing is not a side case. The ISA remains a block-structured general-purpose architecture and may use scalar, vector, and tile features as appropriate to the workload.
  • AI-oriented acceleration primarily composes CUBE + VEC + TMA.
  • CUBE handles the current matrix and accumulator engine path.
  • VEC handles programmable compute and fallback execution, including loop-based stages such as softmax.
  • TMA handles explicit tile-memory movement and layout plumbing.
  • Rendering-oriented acceleration primarily composes VEC + TMA + TAU.
  • VEC handles programmable shader-style computation, general SIMT loop execution, and required fallback execution.
  • TMA handles explicit tile transport and global-memory staging.
  • TAU hosts tile-to-tile hardened rendering-oriented work under the current rendering profile.

This split is architectural intent. It does not authorize new op assignments beyond the already-frozen canonical catalogs.

Pipeline-Mode Neutrality

Rendering authoring under canonical v0.4 must stay compatible with both of the supported pipeline strategies:

  • immediate-style rendering flows,
  • tile-based rendering flows.

This requirement affects authoring in a specific way:

  • treat tiles as general intermediate-state carriers, not as proof that one particular screen-space pipeline structure has been chosen,
  • keep programmable stages expressible as standalone VEC kernels even if a later pipeline chooses to batch or bin work differently,
  • keep stage boundaries explicit enough that BCC-led software scaffolding can own a stage early and later hand it off to VEC or TAU-backed execution without changing the public contract,
  • avoid source assumptions that only make sense for one pipeline style unless that assumption is already frozen in a canonical profile.

Tile And Data Conventions

Rendering kernels should follow the first frozen rendering profile:

  • use 4KB tile carriers,
  • use the canonical 1024x1 row-major conceptual profile for rendering-state transport,
  • prefer SoA layouts for fragment and state payloads,
  • keep pack and unpack behavior explicit instead of hiding it behind undocumented conventions.

In PTO source terms, that usually means:

  • continue using the existing PTO tile abstractions that result in 4096 bytes per tile,
  • prefer one tile per logical channel or state plane where practical,
  • use explicit format-conversion and layout-conversion steps instead of overloading one tile type with multiple implicit meanings.

For tile-based rendering flows specifically:

  • treat tiles as the working-set carrier and inter-stage exchange medium, not as proof that one exact screen-space tile size is frozen,
  • keep binning, tile-list construction, and resolve boundaries explicit in the surrounding block stream and host/runtime logic,
  • keep tile-local shading and related parallel-loop work expressible as VEC kernels,
  • keep final writeback explicit through architected TMA or .brg movement rather than an undocumented rendering-private path.

Basic Source-Level Style

Follow the same source-level style already used by the existing PTO kernels:

  • use Tile<Location::Vec, T, 32, 32, ...> or equivalent PTO tile forms that preserve the 4096-byte tile contract,
  • use global_tensor<T, RowMajor<H,W>> and global_iterator<...> for linear global buffers where appropriate,
  • structure kernels as explicit tile load, compute, and store phases,
  • keep dataflow obvious in source form so host and QEMU parity tools can reason about it.

Typical pattern:

  • build iterators,
  • load one or more tiles,
  • perform tile-local or VEC computation,
  • store or forward the resulting tile explicitly.

Kernel Signatures

Rendering stage kernels should take explicit pointers and state references for the resources they operate on.

Typical inputs:

  • pointers to global buffers such as vertex buffers, index buffers, and attachments,
  • pointers to descriptor or state tables,
  • optional scratch pointers,
  • explicit dimensions or batch-shape parameters where the source type system does not already carry them.

When lowered to Linx execution:

  • global or descriptor-table pointers map through explicit scalar imports such as ri* via B.IOR,
  • tile movement remains explicit through TLOAD, TSTORE, or other TMA-backed plumbing,
  • state that is hot or repeatedly reused may be staged into StateTiles rather than re-fetched implicitly.

VEC, TMA, TAU, And CUBE Usage Rules

Rendering authoring under current v0.4 should follow these rules:

  • Use VEC for programmable shading and for any rendering step that does not already have a canonical hardened tile op.
  • Use TMA for explicit tile and global-memory movement.
  • Use TAU only through carriers and selectors that are already canonical under the rendering PTO contract.
  • Do not treat CUBE as the primary rendering carrier. CUBE remains the canonical AI-oriented matrix and accumulator path, even though some existing tile and conversion primitives may still be useful as generic building blocks.

For rendering specifically:

  • BSTART.TEPL is the only canonical generic rendering PTO carrier today.
  • BSTART.FIXP is not a canonical rendering PTO carrier today and must not be used as if it were.
  • If a desired rendering operation does not already have an assigned canonical TEPL selector, implement it through composition of existing canonical primitives or through an MPAR fallback kernel.

Fallback Expectations

Rendering kernels must preserve the current fallback architecture:

  • MPAR VEC kernels remain the required functional fallback for rendering PTO work,
  • software-backed rendering paths remain valid references during bring-up,
  • any hardened or TEPL-backed rendering path should be comparable against a VEC or software reference path.

This is required for AVS, emulator closure, and staged hardening.

What To Avoid

Until later canonical revisions freeze more rendering-specific ops, avoid the following:

  • do not assign new rendering semantics to unassigned BSTART.TEPL selectors,
  • do not route rendering PTOs through BSTART.FIXP,
  • do not introduce hidden global-memory effects inside TAU-facing work,
  • do not bind tiles to one rendering pipeline style only; the same tile model must remain compatible with both immediate-style and tile-based strategies,
  • do not overfit to one pixel format when explicit pack or unpack steps would keep the contract clearer,
  • do not rely on undocumented implicit state outside descriptors, referenced records, and StateTiles.

Relationship To Other Canonical Pages

  • workload-wide architectural scope is defined in docs/architecture/v0.4-architecture-contract.md,
  • workload-to-engine mapping is defined in docs/architecture/v0.4-workload-engine-model.md,
  • rendering PTO carrier legality is defined in docs/architecture/v0.4-rendering-pto-contract.md,
  • command lowering and submission ownership are defined in docs/architecture/v0.4-rendering-command-contract.md,
  • rendering hardening and fallback policy are defined in docs/architecture/v0.4-hardening-policy.md.