SQL Testing And Selection

ThothAI does not stop at generating candidate SQL. It precomputes tests, validates candidate output, and then selects the best query.

For the detailed developer view of main_test_generation.py, main_evaluation.py, sql_selection.py, and the escalation logic in main_generation_phases.py, see Evaluation And Selection.

Precomputed Tests

The current pipeline runs _precompute_tests_phase() before candidate generation.

That phase:

generates tests from schema and evidence
deduplicates them
stores them on the request state
marks evidence-critical tests when special grounding is required

This is important because the selection stage is not purely stylistic. It is evidence-aware.

How Tests Are Actually Generated

The implementation in main_test_generation.py uses test_gen_agent_1 as the fixed generator and runs multiple generations in parallel with a temperature range between 0.5 and 1.0.

For each batch it:

regenerates a dynamic mschema
injects directives, evidence, and the SQL candidate list into the test template
gathers structured outputs from the test generation agent

The outputs are then deduplicated and stored so later phases evaluate against a consolidated set of tests rather than isolated one-off answers.

Candidate Generation

_generate_sql_candidates_phase() runs SQL generation agents in parallel and records timing and status metrics in the execution state.

The agent manager typically builds:

one Basic SQL generator
one Advanced SQL generator
one Expert SQL generator
multiple test generators
one evaluator

How Candidate Diversity Is Produced

The runtime does not depend on one static prompt.

generate_sql_units(...) produces diversity through:

three different generation methods:
query_plan
step_by_step
divide_and_conquer
a spread of temperature values
repeated parallel generations with the agent selected by functionality_level

That means quality comes from controlled prompt and temperature variation within the current BASIC / ADVANCED / EXPERT tiers.

Validators

The validation layer includes:

SQL validators in agents/validators/sql_validators.py
test validators in agents/validators/test_validators.py
explanation validators for SQL explanation outputs

The SQL validators are responsible for safe execution checks and compatibility cleanup before final selection.

Evaluation And Final Selection

_evaluate_and_select_phase() scores the candidates and updates:

final SQL status
evaluation case
selected SQL complexity
failure metadata when no acceptable query survives

How Evaluation Works In Code

evaluate_sql_candidates(...):

collects all generated test answers
deduplicates them
optionally runs semantic filtering through TestReducer
evaluates each SQL candidate against the resulting test basis

The evaluation stage therefore operates on a merged and cleaned set of requirements, not on whichever test batch happened to be generated first.

How Final Selection Works

select_best_sql(...) parses evaluator output per SQL candidate, computes pass rates, records failure reasons, and compares the outcome to the workspace threshold.

The selector does not rely on vague model confidence. It uses structured pass/fail evidence from the evaluation output.

The execution state uses status values such as:

GOLD
SILVER
FAILED

Escalation Rules

If no candidate satisfies the current threshold, the runtime can escalate:

BASIC -> ADVANCED
ADVANCED -> EXPERT

The code mutates request.functionality_level, clears previous generation artefacts, and re-runs generation plus evaluation.

Once EXPERT is exhausted, the request stops with failure.

Feedback Loop

The SQL Generator also exposes save_sql_feedback, which stores successful SQL back into the vector database as reusable retrieval material.

That closes the loop between:

user requests
generated SQL
validated outcomes
future retrieval quality