SearchrAInk · Technical Paper · IEEE Format

A Multi-Model Algorithm for AI Visibility Measurement

Design, Analysis, and Convergence Properties of the SearchrAInk Framework

Hasse Muller

Independent Researcher · The Netherlands

Abstract

The emergence of large language models (LLMs) has transformed information retrieval from ranked search results to synthesised, conversational outputs. This shift introduces challenges in measuring brand visibility, as exposure is implicit and context-dependent. This paper presents a formalised algorithmic framework for SearchrAInk, a system that quantifies brand visibility across AI-generated responses. The approach integrates prompt orchestration, multi-model querying, semantic feature extraction, and weighted aggregation into a unified scoring pipeline. We introduce a five-dimensional evaluation model and provide formal proofs of boundedness, unbiasedness, consistency, and convergence. The framework enables reproducible benchmarking in AI-mediated discovery environments.

Index Terms

Generative AI, LLM Evaluation, Information Retrieval, Generative Engine Optimisation (GEO), Multi-Model Systems, Statistical Consistency, Hoeffding Bounds.

I. Introduction

Large language models (LLMs) have shifted information retrieval from explicit ranking to implicit synthesis. Unlike traditional search engines, which return lists of ranked documents, LLMs generate narrative outputs in which brand exposure is mediated by context, prompt framing, and latent model preferences. As a consequence, visibility becomes difficult to quantify with legacy rank-based metrics.

This paper formalises the SearchrAInk framework, which evaluates brand visibility across multiple LLMs and aggregates results into a unified, bounded metric with provable statistical guarantees.

II. System Model

Let:

\(Q = \{q_1, \dots, q_n\}\) denote prompts drawn from a distribution \(\mathcal{D}_Q\);
\(M = \{m_1, \dots, m_K\}\) denote the set of LLMs queried;
\(R_{i,j} = m_j(q_i)\) denote the response of model \(m_j\) to prompt \(q_i\).

For each response we extract a feature vector

\[ F(R_{i,j}) = \{f_1, f_2, \dots, f_d\}, \]

(1)

and define dimension scores

\[ X_{i,j}^{(k)} \in [0,1]. \]

(2)

The empirical per-dimension score aggregates over prompts and models:

\[ \hat{S}_k = \frac{1}{nK} \sum_{i=1}^{n} \sum_{j=1}^{K} X_{i,j}^{(k)}, \]

(3)

and the overall visibility score is

\[ \hat{V} = \sum_{k=1}^{5} w_k \, \hat{S}_k, \qquad \sum_k w_k = 1. \]

(4)

III. Algorithm

Algorithm 1SearchrAInk Visibility Evaluation

Initialise \(R \leftarrow \emptyset\)
for each \(q_i \in Q\) do
for each \(m_j \in M\) do
\(r \leftarrow m_j(q_i)\)
\(f \leftarrow \mathrm{ExtractFeatures}(r)\)
append \((i,j,f)\) to \(R\)
end for
end for
for each dimension \(k\) do
\(\hat{S}_k \leftarrow \tfrac{1}{|R|} \sum g_k(f)\)
end for
\(\hat{V} \leftarrow \sum_k w_k \hat{S}_k\)
return \(\hat{V}\)

IV. Mathematical Analysis

A. Assumptions

Scores are bounded: \(X_{i,j}^{(k)} \in [0,1]\).
Samples across prompts and models are independent.
All dimension scores have finite expectation and variance.

Theorem 1 — Boundedness.

The final score satisfies \(0 \le \hat{V} \le 1\).

Proof. Since \(\hat{S}_k \in [0,1]\) and \(\sum_k w_k = 1\) with \(w_k \ge 0\), the weighted sum remains in \([0,1]\).◻

Theorem 2 — Unbiasedness.

\(\mathbb{E}[\hat{S}_k] = S_k^{\star}\).

Proof. Follows from linearity of expectation.◻

Theorem 3 — Strong Consistency.

\(\hat{S}_k \xrightarrow{\text{a.s.}} S_k^{\star}\).

Proof. A direct application of the Strong Law of Large Numbers to the i.i.d. samples \(X_{i,j}^{(k)}\).◻

Theorem 4 — Convergence of the Final Score.

The aggregate score converges almost surely to its population counterpart:

\[ \hat{V} \xrightarrow{\text{a.s.}} V^{\star}. \]

(5)

Theorem 5 — Hoeffding Concentration.

For every \(\epsilon > 0\),

\[ \Pr\!\left(|\hat{S}_k - S_k^{\star}| \ge \epsilon\right) \le 2\exp(-2nK\epsilon^{2}). \]

(6)

Theorem 6 — Aggregate Error Bound.

If \(|\hat{S}_k - S_k^{\star}| \le \epsilon_k\) for each \(k\), then

\[ |\hat{V} - V^{\star}| \le \sum_k w_k \, \epsilon_k. \]

(7)

Theorem 7 — Asymptotic Normality.

As \(nK \to \infty\),

\[ \sqrt{nK}\,(\hat{V} - V^{\star}) \;\xrightarrow{d}\; \mathcal{N}(0, \sigma_V^{2}). \]

(8)

V. Discussion

The framework exhibits three properties:

Robustness. Multi-model aggregation smooths per-model idiosyncrasies.
Semantic awareness. Feature extraction captures nuance beyond token matching.
Statistical consistency. The estimator converges and concentrates exponentially in \(nK\).

Limitations include:

Model drift. Underlying LLMs evolve; benchmarks must be re-run periodically.
Prompt bias. The prompt distribution \(\mathcal{D}_Q\) shapes outcomes and must be curated carefully.
Non-stationarity. LLM responses are not strictly i.i.d. over time, which relaxes the guarantees of Theorems 3–7 to piecewise-stationary regimes.

VI. Conclusion

We presented a formal algorithm for measuring AI visibility across multiple LLMs. The resulting estimator is unbiased and strongly consistent, admits sub-Gaussian concentration, and satisfies a weighted-error decomposition that is convenient for interpretability. Together, these properties make SearchrAInk suitable for reproducible benchmarking of brand presence in generative AI systems.

Acknowledgment

The author thanks the broader AI research community for ongoing work on LLM evaluation and statistical NLP.

References

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS, 2020.
Databricks, “Instructed Retriever: Aligning Retrieval with Instructions,” Technical Report, 2024.
OpenAI, “GPT System Cards,” Technical Reports, 2023–2024.
Anthropic, “Claude 3 Model Family,” Technical Report, 2024.
Google DeepMind, “Gemini,” Technical Report, 2024.

hello@searchraink.com