Methodology — PartSentinel

THE CANONICAL FRAMEWORK

Three pillars. Six metrics. Six deliverables.

Everything PartSentinel produces maps to this framework. Pillars set the score weighting. Metrics are the units of measurement. Deliverables are what the client receives.

PILLAR 1 · ACCURACY

Does AI describe your SKU correctly?

Reality Gap — divergence between AI output and catalogue truth.

PILLAR 2 · INFERENCE

What can AI infer that you never disclosed?

Inference Risk — sensitivity of what AI can reconstruct. Key differentiator.

PILLAR 3 · VISIBILITY

Who owns the AI answer in your category?

Competitive Visibility — Share of Answer + Category Authority.

SIX METRICS

AI Visibility Score — SKU mention frequency in AI answers
Reality Gap Score — divergence between AI output and catalogue truth
Inference Risk Score — sensitivity of what AI can reconstruct
Competitive Share of Answer — your presence vs competitors
Substitution Risk Index — competitor-substitution frequency
Category Authority Score — AI-perceived category authority

SIX DELIVERABLES

AI SKU Exposure Report — structured view of AI outputs
SKU Reality Gap Matrix — AI vs catalogue truth comparison
Inference Risk Register — sensitive inferences classification
AI Competitive Visibility Map — competitors by category
Substitution Risk Matrix — AI demand redirection
Control & Remediation Roadmap — action plan without publishing the catalogue

REVEAL · v0.x

Diagnostic on 50–100 SKUs. Mode 1 (Blind) or Mode 2 (Full).

MAP · v1.x

Extension across families, markets, languages, competitors.

CONTROL · v2.x

Continuous monitoring + remediation + Negative Knowledge Layer.

PRINCIPLES

Five operating principles

Reference-level, not brand-level

Brand monitoring tells you what people say about your company. PartSentinel measures what AI says about each individual SKU, OE number, or technical reference in your catalogue.

Multi-model, deterministic

We query 8–12 large language models in parallel with deterministic seeds, fixed temperatures, and rate-limited concurrency. The same audit run twice produces the same result within 1 score point.

Calibrated per vertical

Aftermarket, electrotechnical, aerospace, chemicals — each vertical has its own prompt template, scoring weights, and ground-truth schema. No one-size-fits-all.

Ground-truth comes from you

We never invent the right answer. The reference truth is your BMEcat / ETIM / PIM / product reference data. We measure deviation, not opinion.

Auditable, not magic

Every Sentinel Score is reconstructible from the raw model responses, the calibration set, and the scoring formula — all of which we publish.

PIPELINE

The audit pipeline

From your catalogue to a regulator-ready dossier.

01 / INGEST

Catalogue ingestion

BMEcat, ETIM, CSV, JSON, or PIM API (SAP, Inriver, Akeneo, Pimcore). Native parsing — no manual mapping for standard formats.

02 / SAMPLE

Stratified sampling

We stratify references by vertical, age, OE coverage, and revenue contribution. Default audit: 50–500 references; full-catalogue: every reference.

03 / PROBE

Multi-model probing

Each reference is queried against 8–12 LLMs with calibrated, vertical-specific prompts. Per-reference: ~32 prompts × N models.

04 / EXTRACT

Structured extraction

Free-form responses are parsed into a structured schema (identifier, application, cross-references, specs, procedural depth) using a deterministic extractor.

05 / COMPARE

Ground-truth alignment

Extracted facts are aligned to your authoritative data. Each fact is labeled accurate / partial / hallucinated / leaked / obsolete.

06 / SCORE

Per-reference and aggregate scoring

We compute the Sentinel Score (0–100) for each reference and roll it up by vertical, brand, model, and time.

07 / REPORT

Multi-format delivery

Excel raw export, executive PDF, drilldown dashboard, and AI Act dossier (Article 53(1)(d) compliant) — all under signed checksums.

MODELS

The model panel

Refreshed quarterly. Last refresh: 2026-04-22.

ANTHROPIC

Claude Opus 4.7 (1M)

Reasoning

ANTHROPIC

Claude Sonnet 4.6

Recall

OPENAI

GPT-5

Reasoning

OPENAI

GPT-5 mini

Recall

GOOGLE

Gemini 2.5 Pro

Multimodal

MISTRAL

Mistral Large 3

EU sovereign

Prompt design

Each prompt is calibrated to elicit a single, schema-conformant fact. Free-form prose is rejected. Examples are versioned and published.

# Vertical: automotive_aftermarket
# Schema: identifier_v3
You are answering a single question about an automotive aftermarket
reference. Reply ONLY with the JSON schema below — no prose.

REFERENCE: "{{ref_code}}"

{
  "identifier_confidence": 0.0–1.0,
  "applications": ["{vehicle make/model/year}"],
  "cross_references": ["{competing OE/IAM codes}"],
  "specs": { "{spec_name}": "{value}" },
  "source_hints": ["{public_url_or_null}"]
}

SCORING

Scoring

The Sentinel Score is a calibrated, weighted aggregate. Each dimension is scored on 0–100. Per-vertical weights and leak-penalty constants are disclosed inside the signed audit dossier under NDA.

WEIGHT · core

Identification

Does the model know the reference exists and what it is?

WEIGHT · core

Cross-references

Does it correctly map to OE / IAM / competing codes?

WEIGHT · core

Application

Does it correctly state where the part fits (vehicle, machine, system)?

WEIGHT · supporting

Spec depth

Does it know the technical specifications, not just the marketing copy?

WEIGHT · supporting

Freshness

Is the information current — or stuck on a 2019 catalogue?

score = Σ ( w_i × dim_i ) where i ∈ { id, xref, app, spec, fresh } − leakPenalty(leak_count) w_i and leakPenalty calibrated per vertical · disclosed under NDA

GROUND TRUTH

Ground-truth alignment

We never declare a model wrong without your authoritative data. Three sources are accepted:

●Your PIM / MDM authoritative data (preferred).
●Industry standards (BMEcat, ETIM, ATA) when you certify them.
●Public regulator-validated data (EUR-Lex, ECHA, EASA) for safety/regulatory facts.

GOVERNANCE

Governance & reproducibility

Every audit run produces an immutable manifest: prompt versions, model versions, calibration constants, and seed values. Reproducibility is the contract.

→All prompt templates are versioned (semver) and signed.
→Each delivery includes a SHA-256 manifest of all inputs and outputs.
→Recalibration events are publicly logged in the changelog.
→Customers can request the raw model responses for any audit.

FAQ

Can I use PartSentinel without sharing my catalogue?

Yes. We support an on-premise mode where the audit pipeline runs inside your VPC and only the aggregated scorecard leaves the perimeter.

How often should I re-audit?

Quarterly is the default. Verticals with rapid catalogue rotation (electrotechnical, automotive aftermarket) benefit from monthly delta audits.

What about confidentiality of cross-references?

We never publish or train on customer data. Leaks are only ever flagged to the customer, never disclosed externally.

Why these specific models?

Coverage of the LLMs your customers actually use. Panel is updated quarterly to follow market share, with prior-quarter overlap for trend continuity.

How PartSentinel audits AI knowledge of your catalogue.

Three pillars. Six metrics. Six deliverables.

Does AI describe your SKU correctly?

What can AI infer that you never disclosed?

Who owns the AI answer in your category?

Five operating principles

Reference-level, not brand-level

Multi-model, deterministic

Calibrated per vertical

Ground-truth comes from you

Auditable, not magic

The audit pipeline

Catalogue ingestion

Stratified sampling

Multi-model probing

Structured extraction

Ground-truth alignment

Per-reference and aggregate scoring

Multi-format delivery

The model panel

Prompt design

Scoring

Identification

Cross-references

Application

Spec depth

Freshness

Ground-truth alignment

Governance & reproducibility

FAQ

Audit your catalogue with this exact protocol.