SoTA Harvester & Synthesis
Pattern G.2 · Stable · Architectural (A) · Normative (unless explicitly marked informative) Part G - Discipline SoTA Patterns Kit
Type: Architectural (A) Status: Stable Normativity: Normative (unless explicitly marked informative)
Purpose. Provide a repeatable, auditable way to discover, triage, and synthesize state‑of‑the‑art (SoTA) across competing
Traditionlineages before minting CHR/CAL/LOG assets for aCG‑Frame. The primary output is aSoTA Synthesis Pack@CG‑Framethat feeds:
- naming/publication (UTS),
- CHR authoring (G.3),
- CAL authoring (G.4),
- method/generator registries and dispatch (G.5).
Scope note. This pattern owns the harvesting + synthesis generator in Part G. Shipping ownership is in G.10, refresh orchestration ownership is in G.11.
Terminology note (normative). In normative clauses below,
Traditionrefers to the Tech tokenTradition(a plural lineage with internally coherent commitments). Plain “tradition” is allowed only as a 1:1 synonym.
A team extends FPF into a new CG‑Frame. The relevant literature is typically:
Keywords
- SoTA harvest
- synthesis
- SoTA Synthesis Pack@CG-Frame
- SoTA_Set@CG-Frame
- SoTAPaletteDescription
- Tradition
- ClaimSheets
- CorpusLedger
- PRISMA Flow Record
- BridgeMatrix
- describedEntity
- micro-examples
- hand-off manifests
- RSCRTriggerKindId.
Relations
Content
Problem frame
A team extends FPF into a new CG‑Frame. The relevant literature is typically:
- plural (multiple
Traditionlineages with incompatible commitments), - context‑sensitive (results depend on
U.BoundedContextand declareddescribedEntity), - method‑heterogeneous (different evidence styles, operator sets, and validity regions),
- time‑sensitive (rapid drift post‑2015; frequent benchmark/protocol shifts).
Downstream Part‑G work (CHR/CAL/selection/shipping/refresh) depends on the team producing consumable, citation‑ready artefacts without collapsing semantic boundaries across contexts or planes.
Problem
How can we systematically assemble a SoTA view that is:
- pluralist but comparable (plurality preserved; comparability is achieved only via explicit crossings),
- evidence‑addressable (claims cite auditable evidence surfaces and anchors),
- actionable (produces inventories and cards that G.3/G.4/G.5 can consume),
- refreshable (editions/policies/windows are pinned so RSCR/refresh can re‑audit and re‑run without semantic drift)?
Forces
- Pluralism vs. consolidation. Consolidation is valuable, but unqualified fusion destroys meaning.
- Breadth vs. load‑bearing depth. Too broad becomes shallow; too deep misses rival lineages.
- Recency vs. stability. Freshness matters, yet durable “backbone” claims must be identified and kept visible.
- Pedagogy vs. rigour. Outputs must be teachable enough to support review, while remaining audit‑ready.
- Authoring vs. operations. This pattern lives in the authoring plane; operational runs and decisions belong to Work planes and to owner patterns.
Solution
G.Core linkage (normative)
Builds on: G.Core (Part‑G core invariants; routing hub)
GCoreLinkageManifest (normative).
(Canonical form, Nil‑elision, and Expansion rule are defined in G.Core.)
`GCoreLinkageManifest := ⟨ CoreConformanceProfileIds := { GCoreConformanceProfileId.PartG.AuthoringBase, GCoreConformanceProfileId.PartG.UTSWhenPublicIdsMinted }, RSCRTriggerSetIds := {GCoreTriggerSetId.SoTAHarvestSynthesis}, CorePinSetIds := {GCorePinSetId.PartG.CrossingVisibilityPins},
CorePinsRequired := { // Scope pins (G.2‑specific) CG-FrameContext, Tradition[], describedEntity := ⟨GroundingHolon, ReferencePlane⟩, SoTA_SetId, SoTAPaletteDescriptionId,
},
DefaultsConsumed := ∅, TriggerAliasMapRef := ∅ ⟩`
(RSCR payload pins: ClaimSheetId[], SoTA_SetId, SoTAPaletteDescriptionId, BridgeMatrixId?, GammaEpistSynthId[]?, UTSRowId[]?, DistanceDefRef.edition?, HarvestPolicyRef?, InclusionCriteriaId?, ScreeningRubricId?, PathId/PathSliceId? when path‑citable evidence or a stable freshness window is pinned.)
Pattern‑local default rules (owned by this pattern; not a Part‑G‑wide DefaultId).
FamilyCoverageFloorK := 3 (unless explicitly overridden by HarvestPolicyRef and recorded in FlowRecord)
Kit: SoTA Synthesis Pack@CG‑Frame (pattern‑owned surface)
A conforming G.2 publication produces a notation‑independent pack whose internal organisation is free, but whose exported named components / views are stable and citable:
Each named component is addressable via a stable pack‑local identifier (e.g., CorpusLedgerId, ClaimSheetId, FlowRecordId) for citation and RSCR scoping. If any component is minted/evolved as a public id, it is published and cited via UTSRowId[] per CC‑GCORE‑UTS‑1 (delegation).
-
SoTA_Set@CG‑Frame(export view; “M2 output” consumed downstream)
A read‑optimised view over the harvested candidate set that downstream generator/selector work treats as the “harvester output set”.
Constraint (normative):SoTA_Set@CG‑FrameMUST be reconstructible from pack components by id (no “hidden extra set”). -
G.2a CorpusLedgerLedger of candidate sources with Context and triage status (e.g., include / park / retire) and explicit rationale hooks. -
G.2b ClaimSheets[Tradition]Typed Claim Sheets perTradition, each with:
- explicit home context and
describedEntity, - explicit evidence anchors/citations (A.10 and/or EvidenceGraph refs when available),
- explicit freshness window notes and risk/trust cues (cite
B.3owners when using trust/decay language).
-
G.2c OperatorAndObjectInventoryInventory of candidate CHR terms (characteristics/scales/coordinates) and candidate CAL operators/flows as stubs for downstream authoring. -
G.2d BridgeMatrixA citable alignment/divergence surface acrossTradition×Tradition, with explicit losses and row scopes. If any row asserts cross‑source / cross‑Traditionsubstitution or fusion, the pack MUST attach aGammaEpistSynthIdrecord (alias:G.2‑F) perG.2:Ext.GammaEpistSynthesis(no silent fusion). -
G.2e MicroExamplesWorked micro‑examples for load‑bearing claims, each citing A.10 carriers, declaring context +describedEntity, and annotating assurance type(s) (TA/VA/LA, where applicable). -
G.2f UTSProposalsDraft Name Cards + Minimal Definitional Sheets (MDS) + alias proposals (incl. concept‑set linkage where applicable), with the required publication pins. -
G.2g describedEntity MapMap from key terms/claims/public ids toGroundingHolon,ReferencePlane, and minimal reference cues for later CHR/CAL authoring. -
G.2h PRISMA Flow RecordA screening/eligibility trail for how sources entered the pack (method‑profile is allowed; see Extensions).
(Name is historical; the artefact remains notation‑independent.) -
G.2i SoSIndicatorFamiliesIndicator families as variants (windows/constraints/assumptions) with explicit Acceptance branches per variant (branch ids/labels only; threshold semantics belong to CAL owners). -
G.2j MethodFamilyCardsCandidate method families with a shared signature and a plurality of implementations, each with validity regions, cost/complexity notes, and known failure modes. When the pack targets downstream registry/dispatch, MethodFamily cards SHOULD include wiring stubs needed byG.5(eligibility predicate refs, assurance profile cues, and the pack ids that justify the family). -
G.2k GeneratorFamilyCards(if applicable) Candidate generator families for environment/task generation with declared validity regions and transfer hooks. -
G.2l Annexes(optional; owner‑routed; see Extensions) For example: QD/NQD annexes, discipline‑specific indicator annexes, interop forms.
SoTAPaletteDescription (export view; required downstream)
A view‑friendly description object (pack‑local SoTAPaletteDescriptionId) that binds together:
- the
SoTA_Set@CG‑Frameview, ClaimSheetId[],OperatorAndObjectInventory,BridgeMatrixId?,SoSIndicatorFamilies(with variant/branch structure),MethodFamilyCards/GeneratorFamilyCards?,MicroExamples,UTSProposals,- and the
describedEntity Mapfor citation and later CHR/CAL authoring.
Note (normative intent): this is the primary “consumable surface” forG.3/G.4/G.5; it prevents downstream patterns from scraping free prose.
Editorial template: 1‑page “SoTA Sheet” per Tradition (informative).
When authoring ClaimSheets[Tradition], teams often benefit from a single‑page template: scope + claims + evidence anchors + validity region + failure modes + freshness window + cross‑Tradition reuse notes + pointers to micro‑examples.
Harvester loop (conceptual choreography; pattern‑owned)
A conforming G.2 work product is built by iterating the following conceptual loop until the declared gates are satisfied:
-
Declare scope and plurality. Declare
CG-FrameContext, the initialTraditionset, and thedescribedEntitysurface for each intended claim region. Record these declarations in the pack pins (not as implicit assumptions). -
Discover and triage sources (ledger‑first). Populate
CorpusLedgervia:
- seed sources,
- expansion via citation chaining and keyword family exploration,
- pruning using load‑bearing relevance tests tied to the declared CG‑Frame scope.
-
Distill claims per
Tradition. For eachTradition, author a Claim Sheet that preserves internal commitments and cites evidence anchors. Do not fuse cross‑Traditionclaims at this stage. -
Inventory operators/objects for downstream authoring. Extract candidate measurement terms and operator stubs for later CHR/CAL authoring (without asserting legality or thresholds locally).
-
Build alignment/divergence surfaces. Where reuse across
Traditionis desired, author Bridge‑backed alignment records and explicit loss notes inBridgeMatrix. Any consolidation is explicitly marked as requiring alignment proof. -
(Alias: G.2‑F) Produce Γ_epist synthesis records when fusion/substitution is asserted. If a work product asserts cross‑source / cross‑
Traditionfusion or substitution (beyond mere “parallel divergent claims”), it MUST emitGammaEpistSynthIdrecords perG.2:Ext.GammaEpistSynthesis(provenance union + explicit object alignment refs + assurance tuple refs), and it MUST keep penalties routed toR_effonly by delegation (CC‑GCORE‑PEN‑1). -
Publish teachable micro‑groundings. Attach worked micro‑examples to load‑bearing claims, each tied to A.10 carriers and declaring context +
describedEntity. -
Apply gates and record repairs. Enforce
FamilyCoverageFloorK(and any optional diversity‑by‑distance gate). If a gate fails, the pack MUST:- record the failure and the repair iteration in
FlowRecordandCorpusLedger, - pin the updated
HarvestPolicyRef/ criteria ids (if changed), - iterate the loop rather than silently weakening the gate.
- record the failure and the repair iteration in
-
Emit hand‑off manifests and export views. Produce explicit manifests to:
G.3(CHR authoring),G.4(CAL authoring),G.5(registry/dispatch), so that downstream work can cite pack components by id rather than re‑authoring them. The pack MUST also exportSoTA_Set@CG‑FrameandSoTAPaletteDescriptionas the default downstream consumption surfaces (ids pinned).
Interfaces (minimal I/O Standard)
Note: Orchestration of re‑runs is owned by G.11; this pattern only defines what a conforming (re)harvest produces and what pins it must expose.
Extensions (pattern‑scoped; non‑core)
Extensions are pattern‑scoped modules. They do not introduce Part‑G‑wide norms; they provide wiring/pins and cite semantic owners.
GPatternExtension: GammaEpistSynthesis
PatternScopeId: G.2:Ext.GammaEpistSynthesis
GPatternExtensionId: GammaEpistSynthesis
GPatternExtensionKind: GeneratorSpecific
SemanticOwnerPatternId: G.2 (this pattern owns synthesis semantics; module exists for modularity + later extraction)
Uses: {G.Core, B.3, F.9, G.6} (penalty routing + trust/decay cues + bridges/CL + evidence path citation when used)
⊑/⊑⁺: ∅
RequiredPins/EditionPins/PolicyPins (minimum):
GammaEpistSynthId[](pack‑local ids of synthesis records; emitted iff fusion/substitution is asserted)EvidenceAnchorRef[](provenance union; A.10 carriers)BridgeMatrixIdandBridgeCardId[](explicit object alignment references when crossing is involved)CL/CL^plane+Φ/Ψ/Φ_plane policy-ids(ids only; semantics routed via owners; penalties →R_effonly by delegation)PathId/PathSliceId?(only when citing viaG.6)
RSCRTriggerKindIds: {RSCRTriggerKindId.EvidenceSurfaceEdit, RSCRTriggerKindId.CrossingBundleEdit, RSCRTriggerKindId.ReferencePlaneEdit, RSCRTriggerKindId.PenaltyPolicyEdit, RSCRTriggerKindId.PolicyPinChange, RSCRTriggerKindId.EditionPinChange}
Notes (normative intent; duplication‑avoidant):
Γ_epist^synthis an auditable record that binds: (i) provenance union, (ii) explicit object alignment refs, (iii) assurance tuple refs (via existing owners) for each asserted fusion/substitution.- This module does not redefine
Γ‑fold,Φ, or penalty semantics; it only requires the pins/refs needed for replayability and auditability (seeG.Coredelegations).
GPatternExtension: HarvestProtocols
PatternScopeId: G.2:Ext.HarvestProtocols
GPatternExtensionId: HarvestProtocols
GPatternExtensionKind: Phase3Seed
SemanticOwnerPatternId: owner TBD (Phase‑3 seed: harvesting protocol taxonomy not yet extracted into a dedicated owner)
Uses: {B.3, A.10} (for freshness/decay and provenance anchors, when protocol requires them explicitly)
⊑/⊑⁺: ∅
RequiredPins/EditionPins/PolicyPins (minimum):
HarvestPolicyRef(declares the chosen protocol family and its parameters)FlowRecordId(protocol‑specific profile id or rubric id may be attached here)InclusionCriteriaId/ScreeningRubricId(ids only; semantics remain local to the protocol family)
RSCRTriggerKindIds: {RSCRTriggerKindId.PolicyPinChange, RSCRTriggerKindId.EditionPinChange, RSCRTriggerKindId.FreshnessOrDecayEvent}
Notes (wiring‑only):
- This module binds a declared protocol profile to the pack’s
FlowRecordwithout redefining evidence semantics.
GPatternExtension: DHCAlignmentHooks
PatternScopeId: G.2:Ext.DHCAlignmentHooks
GPatternExtensionId: DHCAlignmentHooks
GPatternExtensionKind: DisciplineSpecific
SemanticOwnerPatternId: C.21 (DHC semantics are owned by C.21)
Uses: {C.21, G.6, G.7} (DHC series + evidence path citations + bridge/CL regimes when alignment density is claimed)
⊑/⊑⁺: ∅
RequiredPins/EditionPins/PolicyPins (minimum):
DHCMethodRef.editionWindowRef?(if the DHC series is windowed)DHCSenseCellId[](pack‑local ids for emitted DHC SenseCells; if any are public, cite viaUTSRowId[])UTSRowId[]?(only if any DHC SenseCells / series ids are minted/evolved as public ids)PathId[]/PathSliceId[](when alignment summaries cite evidence paths via G.6)
RSCRTriggerKindIds: {RSCRTriggerKindId.EditionPinChange, RSCRTriggerKindId.EvidenceSurfaceEdit, RSCRTriggerKindId.TelemetryDelta}
Notes (wiring‑only):
- If DHC alignment summaries are emitted, this module ensures the DHC method edition and the cited evidence paths are visible.
- Units/constraints (semantic owner:
C.21) must be pinned, not redefined here (e.g.,bridges_per_100_DHC_SenseCells,CL_min = 2for cross‑Context counting, and the “CL=3 implies free substitution” interpretation when used).
GPatternExtension: NQDAnnex
PatternScopeId: G.2:Ext.NQDAnnex
GPatternExtensionId: NQDAnnex
GPatternExtensionKind: MethodSpecific
SemanticOwnerPatternId: C.18 (NQD‑CAL semantics owned by C.18; explore/exploit logging by C.19 when used)
Uses: {C.18, C.19}
⊑/⊑⁺: ∅
RequiredPins/EditionPins/PolicyPins (minimum):
DescriptorMapRef.editionDistanceDefRef.editionInsertionPolicyRef(policy‑id/ref)EmitterPolicyRef(policy‑id/ref)TaskSignatureRef?(when QD mode is trait‑gated)
RSCRTriggerKindIds: {RSCRTriggerKindId.EditionPinChange, RSCRTriggerKindId.PolicyPinChange, RSCRTriggerKindId.TelemetryDelta, RSCRTriggerKindId.FreshnessOrDecayEvent}
Notes (wiring‑only):
- This module only pins the required references for replayability; it does not redefine QD semantics, dominance, or acceptance rules.
GPatternExtension: InteropForms
PatternScopeId: G.2:Ext.InteropForms
GPatternExtensionId: InteropForms
GPatternExtensionKind: InteropSpecific
SemanticOwnerPatternId: G.13
Uses: {G.13}
⊑/⊑⁺: ∅
RequiredPins/EditionPins/PolicyPins (minimum):
ExternalIndexRef.editionClaimMapperRef.editionMappingPolicyRef(policy‑id/ref)UTSRowId[](for published external ids/aliases where relevant)
RSCRTriggerKindIds: {RSCRTriggerKindId.EditionPinChange, RSCRTriggerKindId.PolicyPinChange, RSCRTriggerKindId.TokenizationOrNameChange, RSCRTriggerKindId.EvidenceSurfaceEdit}
Notes (wiring‑only):
- Interop affects only representation and citation routes; it must not introduce alternate legality gates or acceptance semantics.
Archetypal Grounding (System / Episteme)
Bias-Annotation (informative)
Lenses tested: Gov, Arch, Onto/Epist, Prag, Did.
-
Selection bias (Gov/Onto). Any harvesting protocol can over‑represent certain venues, languages, or evidence styles. Mitigation: pluralism floor + explicit
CorpusLedger+ explicit protocol pins. -
Consolidation bias (Onto/Epist). Pressure to “merge” lineages can erase incompatible commitments. Mitigation: keep Claim Sheets disjoint by default; require explicit alignment proof for fusion; preserve loss notes.
-
Recency bias (Prag). Overweighting newest papers can hide durable backbone results; underweighting them misses SoTA drift. Mitigation: publish freshness windows and make them RSCR‑relevant.
-
Didactic bias (Did). Micro‑examples can steer interpretation toward familiar domains. Mitigation: require heterogeneous substrates and explicit A.10 anchors.
Conformance Checklist (normative) — CC‑G2
Common Anti‑Patterns and How to Avoid Them
-
AP‑G2‑1: “One true SoTA score.” Avoid: selecting a single unqualified scalar metric as “the” SoTA. Do instead: represent evaluation constructs as families/variants; keep partial orders set‑returning (delegated).
-
AP‑G2‑2: Fusion without explicit alignment proof. Avoid: merging rival
Traditionclaims into one statement “by common sense.” Do instead: preserve parallel Claim Sheets; if consolidation is required, publish explicit alignment proof or keep a divergence record. -
AP‑G2‑3: Hidden protocol drift. Avoid: changing the harvesting protocol (inclusion criteria, windowing, screening rubric) without pins. Do instead: pin harvesting policy/profile ids and treat changes as RSCR‑relevant.
-
AP‑G2‑4: Unanchored pedagogy. Avoid: micro‑examples without carriers (they become folklore). Do instead: bind micro‑examples to A.10 anchors and declare
describedEntity.
Consequences
- Positive: Downstream CHR/CAL/dispatch work becomes faster and less ambiguous because the pack is citable and structured.
- Positive: Plurality is preserved while still enabling disciplined comparability through explicit crossings.
- Positive: Refresh becomes tractable because pins and typed causes exist.
- Negative: Adds authoring overhead (ledger, flow record, micro‑examples, explicit pins).
- Negative: Requires governance discipline to prevent the pack from becoming an uncontrolled “everything bucket”.
Rationale
SoTA synthesis is a bottleneck for new CG‑Frame work: without a disciplined harvest, downstream formalization (CHR/CAL) and operational selection (G.5) either (i) inherit hidden semantic collisions, or (ii) re‑invent incompatible “mini‑standards.”
G.2 resolves this by treating SoTA work as a publishable kit: explicit plurality, explicit crossings, explicit evidence anchors, and explicit hand‑offs.
SoTA-Echoing (informative)
This pattern aligns its method options (via Extensions and authoring practice) with widely used post‑2015 SoTA practices, while keeping FPF’s semantics stable and id‑based:
-
PRISMA 2020 reporting discipline (Page et al., 2021) Status: Adopt (adapted) — we adopt the idea of a transparent screening trail as
FlowRecord, but keep it notation‑independent and concept‑level. -
Living systematic reviews (Elliott et al., 2017 and subsequent living‑review practice) Status: Adopt (as optional protocol family) — the “living” stance is expressed as a harvesting protocol profile (Extension), with explicit freshness windows and RSCR‑relevant change causes.
-
AMSTAR 2 critical appraisal (Shea et al., 2017) Status: Adapt — we adapt the idea of structured quality appraisal into Claim Sheet evidence cues, without turning it into a single scalar rating.
-
Science of Science synthesis (Fortunato et al., 2018) Status: Adopt (as content discipline) — SoS indicators are treated as families/variants and wired as citable artefacts, not as a single “score”.
-
Disruption / team‑structure indicators (Wu, Wang & Evans, 2019 and follow‑on work) Status: Adopt (as exemplar family) — useful as an example of a SoS‑indicator family with strong dependence on windowing and corpus definition.
-
Quality‑Diversity and open‑ended generation (e.g., Fontaine et al., 2020 for CMA‑ME; Wang et al., 2019 for POET) Status: Adopt (as optional annex wiring) — when QD/OEE is relevant for the
CG‑Frame, we include generator/method family cards and pin the required edition/policy surfaces viaG.2:Ext.NQDAnnex, without embedding those semantics into the core pack.
Relations
-
Builds on:
G.Core(core invariants, typed RSCR causes, default ownership routing)E.8(pattern template discipline)E.10(lexical/ontological rules; strict distinction; kind‑suffix discipline)E.19(conformance discipline)A.10(provenance anchors / carriers)B.3(trust, freshness/decay as cited owners)F.9(bridges and CL as cited owners)F.17(UTS publication discipline; via delegation)G.0(CG‑Spec legality gate; cited when legality surfaces are referenced)G.6(EvidenceGraph / path citation surfaces when used)
-
Used by:
G.1(generator chassis consumes harvested SoTA sets)G.3(CHR authoring consumes operator/object inventory and claim sheets)G.4(CAL authoring consumes operator stubs, acceptance branch scaffolding)G.5(registry/dispatch consumes MethodFamily/GeneratorFamily cards)G.10(shipping cites the pack as payload)G.11(refresh orchestration can re‑invoke harvest via typed causes)
-
Relates to:
G.13(interop surfaces when external indices are used)