Pipeline
The current SQL generation pipeline is implemented in frontend/sql_generator/main.py and the helpers/main_helpers/ modules.
For a code-level walkthrough of the orchestration layer, state mutations, and helper boundaries, continue with the developer pages under Technical Architecture > Developer Pipeline.
Request Contract
The main request model includes:
questionworkspace_idfunctionality_levelflags
Phase Order
The runtime orchestrates the request in this sequence:
- request initialization
- question validation and translation
- keyword extraction
- context retrieval
- test precomputation
- SQL candidate generation
- evaluation and selection
- final response preparation
Request Initialization In Detail
The pipeline begins in _initialize_request_state().
This step is more than shape validation:
- it normalizes
functionality_levelto uppercase - it builds
RequestContext - it calls
_setup_dbmanager_and_agents() - it resolves workspace scope and target language
- it verifies that both
dbmanagerandvdbmanagerare ready - it fetches
full_schemafrom the target SQL database
The request cannot proceed if the schema fetch fails, because every later phase depends on full_schema.
Validation And Translation In Detail
Validation is controlled by SystemState.run_question_validation_with_translation().
The implemented methodology is:
- use the validator as the control agent
- register the translator as a callable tool on that validator
- let the validator decide whether translation is required
- return either:
- a validation failure
- or a valid question, possibly translated into the configured workspace language
If the specialized agents are unavailable, the code falls back to _run_question_validation().
Keyword Extraction In Detail
Keyword extraction is mandatory in the current runtime.
If keyword_extraction_agent is missing, _extract_keywords_phase() emits a critical error and the request stops.
That strictness exists because the next retrieval steps depend on state.keywords for:
- evidence search
- similar SQL retrieval
- LSH schema matching
- semantic schema enrichment
Sequence Diagram
sequenceDiagram
participant FE as Frontend
participant API as SQL Generator
participant PRE as Preprocessing
participant GEN as SQL Generation
participant EVAL as Evaluation
FE->>API: POST /generate-sql
API->>PRE: validate, translate, extract keywords
PRE->>PRE: retrieve evidence, SQL examples, schema context
API->>GEN: precompute tests and generate SQL candidates
API->>EVAL: score, validate, and select SQL
API-->>FE: stream progress and final result
Failure Model
The pipeline can stop early when:
- the workspace cannot be initialized
- the validator rejects the question
- the keyword extraction agent is missing
- the vector database is unavailable
- SQL candidate generation fails critically
Warnings are streamed for partial retrieval failures, but the request can continue when the runtime still has enough context to proceed.
Developer Reading Order
If you want to trace the code path end to end, start here:
main.pymain_request_initialization.pymain_preprocessing_phases.pymain_schema_link_strategy.pymain_sql_generation.pymain_generation_phases.pymain_evaluation.pysql_selection.py