Technical

Methodology

How InstantFocus generates, calibrates, and evaluates synthetic consumer panels. Every design decision is grounded in published research.

Overview

Three-stage pipeline

Three stages, executed per-request. No pre-generated data. Every panel is fresh.

Panel generation

Synthetic personas are generated with census-aligned demographics and IPIP-NEO Big Five personality traits. Each persona is a unique, internally consistent profile.

Per-persona evaluation

Each persona independently evaluates your stimulus via an LLM call, responding as their calibrated profile. Batched 10/batch with 5 concurrent batches.

Statistical aggregation

Individual responses are aggregated into distributions, scores, themes, and representative verbatims. Study-type-specific post-processing produces the final structured output.

Personality Model

IPIP-NEO Big Five

Each synthetic persona is assigned a Big Five personality profile — Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism — sampled from population-level norms.

Data source

IPIP-NEO normative dataset (Srivastava, John, Gosling & Potter, 2003). N=132,515 across ages 10–80+. Provides age-stratified means and standard deviations for each Big Five trait and its six sub-facets.

Calibration process

Look up the age-stratified mean and standard deviation for each Big Five trait from the IPIP-NEO normative table
Apply gender adjustments for Agreeableness (+0.5 SD for women) and Neuroticism (+0.4 SD for women) based on published sex differences
Sample each trait from a truncated normal distribution (clamped to 1–100 percentile range)
Enforce internal consistency: e.g., high Openness + Innovator tech adoption are positively correlated
Store the full personality vector as part of the persona profile for use in evaluation prompting

Big Five traits

Trait	High end	Low end	Relevance
Openness	Creative, curious, novelty-seeking	Conventional, practical, routine-oriented	Drives receptivity to new products and concepts
Conscientiousness	Organized, detail-oriented, disciplined	Flexible, spontaneous, less structured	Affects price sensitivity and quality expectations
Extraversion	Outgoing, enthusiastic, assertive	Reserved, independent, introspective	Influences social proof sensitivity and sharing intent
Agreeableness	Cooperative, trusting, empathetic	Competitive, skeptical, direct	Drives response positivity bias — important to account for
Neuroticism	Anxious, risk-averse, emotionally reactive	Calm, resilient, emotionally stable	Affects risk perception and concern identification

Demographics

Census-aligned distributions

Age

Distribution from U.S. Census American Community Survey (2023). Age-adjusted personality norms applied after demographic assignment.

Gender

Male/Female/Nonbinary distribution with gender-adjusted personality trait offsets for Agreeableness and Neuroticism.

Income

Five brackets mapped to Census income percentiles: low (<$30K), middle ($30–60K), upper-middle ($60–120K), high ($120K+).

Education

Three levels from Current Population Survey: high school, bachelor's degree, graduate/professional degree.

Region

US regional distributions (Northeast, Midwest, South, West) with planned expansion to UK, EU, and global panels.

Tech adoption

Rogers' diffusion curve: 2.5% innovators, 13.5% early adopters, 34% early/late majority, 16% laggards. Correlated with age and Openness.

Evaluation Engine

LLM-powered response generation

Model

OpenAI gpt-4o-mini with structured JSON output mode. Selected for speed/cost balance at scale — full panel of 100 completes in ~3–5 seconds.

Temperature

0.7 — produces meaningful response diversity within persona constraints while maintaining coherence. Lower temperatures tested but produced too-uniform panels.

Prompt architecture

Study-type-specific system prompts inject the full persona profile (demographics + Big Five vector) as character context. The stimulus is presented as user content.

Batching

10 personas per batch, 5 concurrent batches. Balances throughput against rate limits. Full 100-persona panel completes in ~10 parallel rounds.

Limitations

What this is not

Synthetic panels are not a replacement for real consumer research. They provide directional signal, not statistical proof.
LLM-generated responses may reflect training data biases that do not match real population sentiment on specific topics.
Published research shows 75–85% directional accuracy against real survey data (NORC, NNGroup). This means 15–25% of the time, the synthetic panel will disagree with reality.
Cultural nuance, regional dialect, and highly specialized domain knowledge are areas where LLM simulation is weakest.
Results should be treated as hypothesis generators, not hypothesis validators. Always follow up with real research for high-stakes decisions.