Methodology

We're showing you our work.

AWI measures the gap between how web systems present themselves and how they actually behave. Full transparency on how intelligence is produced.

What AWI measures

Every digital system has a declared version (APIs, documentation, metadata, capability manifests) and an observed version (runtime behavior, real pricing, hidden constraints, failure modes).

AWI quantifies this delta with verifiable evidence. When a source declares open access but blocks agent traffic, when an MCP server claims limited permissions but requests filesystem access, when a skill package's behavior diverges from its documentation — these gaps are measured, scored, and reported.

Analysis pipeline

Four analysis layers, from content acquisition to behavioral testing. Each layer adds depth and confidence.

L0Content Acquisition

Fetches content using agent-realistic HTTP configurations. Captures response headers, redirects, status codes, and raw content for downstream analysis.

L1Pattern Matching

Rule-based detection of structural indicators: prompt injection patterns, hidden text directives, metadata manipulation, and data exfiltration vectors. Each rule includes falsification criteria for transparency.

L2Structural Analysis

Eight analysis modules evaluate the web environment: block page detection, content quality assessment, embedded resource analysis, format compliance, redirect chain analysis, robots.txt evaluation, readiness probes, and agent interoperability signals.

L3Canary Testing

Behavioral probe that compares agent responses with and without source content to detect manipulation attempts. Measures behavioral delta to identify adversarial influence on agent decision-making.

Entity-type evaluation

Each interaction pattern requires different evaluation strategies. The analysis pipeline adapts based on entity type.

Fetch & Read

Content sources evaluated for injection risks, content quality, staleness, and metadata integrity. Focus: is this content what it claims to be for an agent to consume?

Web pages, documentation, APIs, data feeds

Call & Execute

Tool servers evaluated for declared-vs-observed behavior, permission scope, response integrity, and capability transparency. Focus: does this tool do what it says and only what it says?

MCP servers, REST APIs, code execution services

Load & Run

Installable packages evaluated for supply chain integrity, declaration file conformance, and runtime behavior. Focus: is this package what it declares in the agent runtime?

Agent skills, plugins, extensions

Intelligence summary production

Every entity receives a structured intelligence response with five components:

Observations

Factual statements derived from assessment evidence. What AWI has actually observed.

Risk Signals

Impact-framed interpretations of observations, classified by severity and impact domain (security, reliability, compliance, quality).

Coverage

Four dimensions of assessment completeness — content, behavior, network, and history. Tells agents how much AWI actually knows.

Unknowns

Explicit gaps in AWI's knowledge. What hasn't been observed. Equal prominence to observations — honest boundaries.

Recommendation

Soft guidance based on all signals. Never prescriptive — agents and operators apply their own policies.

Assessment score

A universal composite score from 0.00 to 1.00, computed from multiple assessment dimensions. Coverage discount applies: low coverage reduces confidence in the score. Semantic anchoring maps numeric scores to meaningful interpretations.

Score	Confidence	Posture	Meaning
0.85 – 1.00	High confidence	safe	Well-understood entity. Consistent behavior across evaluations. Low unknowns.
0.65 – 0.84	Moderate confidence	proceed_with_caution	Meaningful observations exist, but gaps remain. Some inconsistencies noted.
0.40 – 0.64	Low confidence	restrict	Limited coverage. Significant unknowns. Observations may indicate concern.
0.00 – 0.39	Minimal confidence	avoid	Insufficient data or concerning signals detected. Agent interaction not recommended.

What each tier provides

Free

Full intelligence response: assessment score, posture, lifecycle state, observations, risk signals, coverage, unknowns, recommendation. 25 lookups/day, 5 evaluations/day.

Pro ($5/mo)

Everything in Free with unlimited lookups and evaluations. Plus: batch lookup (up to 20/req), watchlist alerts, usage analytics, assessment score history.

Framework references

OWASP Top 10 for LLM Applications (2025)

LLM01 (Prompt Injection), LLM05 (Supply Chain), LLM07 (Insecure Plugins)

OWASP AI Agent Security Guidelines

Agent control risk, tool compromise, instruction injection

NIST AI 600-1 — Generative AI Profile

Content integrity, provenance, measurement methodology

NIST Cybersecurity Framework 2.0

Identify, Protect, Detect functions applied to agent interactions

MITRE ATLAS

Adversarial ML techniques mapped to agent interaction patterns

Continuous evaluation

AWI doesn't blanket-crawl the web. Evaluation is demand-driven: when agents query for a domain, that query creates demand signal. Hot sources (frequently queried) are re-evaluated weekly. Warm sources monthly. Cold sources quarterly.

When an agent queries a domain AWI hasn't seen, an on-demand analysis is triggered automatically. The agent gets intelligence on the first request.

The landscape →Explore the index