Pipeline

The current SQL generation pipeline is implemented in frontend/sql_generator/main.py and the helpers/main_helpers/ modules.

For a code-level walkthrough of the orchestration layer, state mutations, and helper boundaries, continue with the developer pages under Technical Architecture > Developer Pipeline.

Request Contract

The main request model includes:

question
workspace_id
functionality_level
flags

Phase Order

The runtime orchestrates the request in this sequence:

request initialization
question validation and translation
keyword extraction
context retrieval
test precomputation
SQL candidate generation
evaluation and selection
final response preparation

Request Initialization In Detail

The pipeline begins in _initialize_request_state().

This step is more than shape validation:

it normalizes functionality_level to uppercase
it builds RequestContext
it calls _setup_dbmanager_and_agents()
it resolves workspace scope and target language
it verifies that both dbmanager and vdbmanager are ready
it fetches full_schema from the target SQL database

The request cannot proceed if the schema fetch fails, because every later phase depends on full_schema.

Validation And Translation In Detail

Validation is controlled by SystemState.run_question_validation_with_translation().

The implemented methodology is:

use the validator as the control agent
register the translator as a callable tool on that validator
let the validator decide whether translation is required
return either:
a validation failure
or a valid question, possibly translated into the configured workspace language

If the specialized agents are unavailable, the code falls back to _run_question_validation().

Keyword Extraction In Detail

Keyword extraction is mandatory in the current runtime.

If keyword_extraction_agent is missing, _extract_keywords_phase() emits a critical error and the request stops.

That strictness exists because the next retrieval steps depend on state.keywords for:

evidence search
similar SQL retrieval
LSH schema matching
semantic schema enrichment

Sequence Diagram

sequenceDiagram
    participant FE as Frontend
    participant API as SQL Generator
    participant PRE as Preprocessing
    participant GEN as SQL Generation
    participant EVAL as Evaluation

    FE->>API: POST /generate-sql
    API->>PRE: validate, translate, extract keywords
    PRE->>PRE: retrieve evidence, SQL examples, schema context
    API->>GEN: precompute tests and generate SQL candidates
    API->>EVAL: score, validate, and select SQL
    API-->>FE: stream progress and final result

Failure Model

The pipeline can stop early when:

the workspace cannot be initialized
the validator rejects the question
the keyword extraction agent is missing
the vector database is unavailable
SQL candidate generation fails critically

Warnings are streamed for partial retrieval failures, but the request can continue when the runtime still has enough context to proceed.

Developer Reading Order

If you want to trace the code path end to end, start here:

main.py
main_request_initialization.py
main_preprocessing_phases.py
main_schema_link_strategy.py
main_sql_generation.py
main_generation_phases.py
main_evaluation.py
sql_selection.py