LinxISA v0.4 Rendering Kernel Authoring Guide¶
This document defines the live authoring guidance for rendering-related kernels
and PTO-style building blocks in canonical v0.4.
It is written to stay consistent with the current workload split of the ISA:
- general-purpose computing remains a first-class target across the broader block-structured ISA,
- AI-oriented acceleration primarily composes
CUBE + VEC + TMA, - rendering-oriented acceleration primarily composes
VEC + TMA + TAU.
This page covers the rendering subset only. It does not redefine the canonical TEPL selector set or the rendering command contract.
Scope¶
- rendering-related kernel authoring under
workloads/pto_kernels/, - tile-shape and data-layout conventions for rendering-oriented work,
- how rendering kernels should compose VEC, TMA, and TAU under canonical
v0.4, - what to avoid until a later catalog revision freezes additional rendering operations.
Authoring Location¶
Rendering-related PTO kernels live under:
workloads/pto_kernels/
This submodule is the canonical workspace for PTO kernel sources, public PTO headers, and host-side validation flows. Rendering examples and experiments should follow the same source and test conventions already used by the existing PTO kernels there.
Workload Alignment¶
Rendering kernels are one workload profile within a broader ISA that targets multiple workload classes.
- General-purpose computing is not a side case. The ISA remains a block-structured general-purpose architecture and may use scalar, vector, and tile features as appropriate to the workload.
- AI-oriented acceleration primarily composes
CUBE + VEC + TMA. CUBEhandles the current matrix and accumulator engine path.VEChandles programmable compute and fallback execution, including loop-based stages such as softmax.TMAhandles explicit tile-memory movement and layout plumbing.- Rendering-oriented acceleration primarily composes
VEC + TMA + TAU. VEChandles programmable shader-style computation, general SIMT loop execution, and required fallback execution.TMAhandles explicit tile transport and global-memory staging.TAUhosts tile-to-tile hardened rendering-oriented work under the current rendering profile.
This split is architectural intent. It does not authorize new op assignments beyond the already-frozen canonical catalogs.
Pipeline-Mode Neutrality¶
Rendering authoring under canonical v0.4 must stay compatible with both of
the supported pipeline strategies:
- immediate-style rendering flows,
- tile-based rendering flows.
This requirement affects authoring in a specific way:
- treat tiles as general intermediate-state carriers, not as proof that one particular screen-space pipeline structure has been chosen,
- keep programmable stages expressible as standalone
VECkernels even if a later pipeline chooses to batch or bin work differently, - keep stage boundaries explicit enough that BCC-led software scaffolding can
own a stage early and later hand it off to
VECor TAU-backed execution without changing the public contract, - avoid source assumptions that only make sense for one pipeline style unless that assumption is already frozen in a canonical profile.
Tile And Data Conventions¶
Rendering kernels should follow the first frozen rendering profile:
- use
4KBtile carriers, - use the canonical
1024x1row-major conceptual profile for rendering-state transport, - prefer SoA layouts for fragment and state payloads,
- keep pack and unpack behavior explicit instead of hiding it behind undocumented conventions.
In PTO source terms, that usually means:
- continue using the existing PTO tile abstractions that result in
4096bytes per tile, - prefer one tile per logical channel or state plane where practical,
- use explicit format-conversion and layout-conversion steps instead of overloading one tile type with multiple implicit meanings.
For tile-based rendering flows specifically:
- treat tiles as the working-set carrier and inter-stage exchange medium, not as proof that one exact screen-space tile size is frozen,
- keep binning, tile-list construction, and resolve boundaries explicit in the surrounding block stream and host/runtime logic,
- keep tile-local shading and related parallel-loop work expressible as
VECkernels, - keep final writeback explicit through architected TMA or
.brgmovement rather than an undocumented rendering-private path.
Basic Source-Level Style¶
Follow the same source-level style already used by the existing PTO kernels:
- use
Tile<Location::Vec, T, 32, 32, ...>or equivalent PTO tile forms that preserve the4096-byte tile contract, - use
global_tensor<T, RowMajor<H,W>>andglobal_iterator<...>for linear global buffers where appropriate, - structure kernels as explicit tile load, compute, and store phases,
- keep dataflow obvious in source form so host and QEMU parity tools can reason about it.
Typical pattern:
- build iterators,
- load one or more tiles,
- perform tile-local or VEC computation,
- store or forward the resulting tile explicitly.
Kernel Signatures¶
Rendering stage kernels should take explicit pointers and state references for the resources they operate on.
Typical inputs:
- pointers to global buffers such as vertex buffers, index buffers, and attachments,
- pointers to descriptor or state tables,
- optional scratch pointers,
- explicit dimensions or batch-shape parameters where the source type system does not already carry them.
When lowered to Linx execution:
- global or descriptor-table pointers map through explicit scalar imports such
as
ri*viaB.IOR, - tile movement remains explicit through
TLOAD,TSTORE, or other TMA-backed plumbing, - state that is hot or repeatedly reused may be staged into StateTiles rather than re-fetched implicitly.
VEC, TMA, TAU, And CUBE Usage Rules¶
Rendering authoring under current v0.4 should follow these rules:
- Use
VECfor programmable shading and for any rendering step that does not already have a canonical hardened tile op. - Use
TMAfor explicit tile and global-memory movement. - Use
TAUonly through carriers and selectors that are already canonical under the rendering PTO contract. - Do not treat
CUBEas the primary rendering carrier.CUBEremains the canonical AI-oriented matrix and accumulator path, even though some existing tile and conversion primitives may still be useful as generic building blocks.
For rendering specifically:
BSTART.TEPLis the only canonical generic rendering PTO carrier today.BSTART.FIXPis not a canonical rendering PTO carrier today and must not be used as if it were.- If a desired rendering operation does not already have an assigned canonical
TEPL selector, implement it through composition of existing canonical
primitives or through an
MPARfallback kernel.
Fallback Expectations¶
Rendering kernels must preserve the current fallback architecture:
MPARVEC kernels remain the required functional fallback for rendering PTO work,- software-backed rendering paths remain valid references during bring-up,
- any hardened or TEPL-backed rendering path should be comparable against a VEC or software reference path.
This is required for AVS, emulator closure, and staged hardening.
What To Avoid¶
Until later canonical revisions freeze more rendering-specific ops, avoid the following:
- do not assign new rendering semantics to unassigned
BSTART.TEPLselectors, - do not route rendering PTOs through
BSTART.FIXP, - do not introduce hidden global-memory effects inside TAU-facing work,
- do not bind tiles to one rendering pipeline style only; the same tile model must remain compatible with both immediate-style and tile-based strategies,
- do not overfit to one pixel format when explicit pack or unpack steps would keep the contract clearer,
- do not rely on undocumented implicit state outside descriptors, referenced records, and StateTiles.
Relationship To Other Canonical Pages¶
- workload-wide architectural scope is defined in
docs/architecture/v0.4-architecture-contract.md, - workload-to-engine mapping is defined in
docs/architecture/v0.4-workload-engine-model.md, - rendering PTO carrier legality is defined in
docs/architecture/v0.4-rendering-pto-contract.md, - command lowering and submission ownership are defined in
docs/architecture/v0.4-rendering-command-contract.md, - rendering hardening and fallback policy are defined in
docs/architecture/v0.4-hardening-policy.md.