Skip to content

System State And Contexts

SystemState is the center of the SQL Generator runtime. The important design choice is that it is not a flat bag of fields anymore: it is a coordinator over smaller context objects, exposed through bridge properties for backward compatibility.

The main implementation is in frontend/sql_generator/model/system_state.py.

Context Decomposition

flowchart LR
    RC["RequestContext"] --> SS["SystemState"]
    DC["DatabaseContext"] --> SS
    SC["SemanticContext"] --> SS
    SD["SchemaDerivations"] --> SS
    GR["GenerationResults"] --> SS
    EX["ExecutionState"] --> SS
    ES["ExternalServices"] --> SS

Why The Decomposition Matters

The runtime still reads like a classic mutable state machine, but the data is grouped by concern:

  • request identity and user intent
  • database configuration and authoritative schema
  • retrieval artefacts such as keywords, evidence, and SQL examples
  • derived schemas and mschema variants
  • generated tests and SQL candidates
  • execution telemetry, timings, statuses, and escalation metadata
  • external service handles such as dbmanager, vdbmanager, and the agent manager

That decomposition lets the orchestration helpers mutate state without having to know where every field is physically stored.

What Each Context Owns

RequestContext

Owns the data that identifies the request:

  • question
  • original question
  • username
  • workspace id and name
  • functionality level
  • target language
  • scope
  • original language when translation happened

It validates that functionality_level is one of BASIC, ADVANCED, or EXPERT.

DatabaseContext

Owns authoritative database information and database behavior flags:

  • full_schema
  • dbmanager
  • directives
  • treat_empty_result_as_error

SemanticContext

Owns retrieval-time semantic artefacts:

  • keywords
  • evidence
  • formatted evidence for prompt templates
  • SQL shots
  • SQL documents

SchemaDerivations

Owns the schema variants generated during preprocessing:

  • similar_columns
  • schema_with_examples
  • schema_from_vector_db
  • enriched_schema
  • filtered_schema
  • full_mschema
  • reduced_mschema
  • used_mschema

GenerationResults

Owns intermediate and final generation artefacts:

  • generated SQL candidates
  • generated tests
  • serialized JSON forms of both
  • evaluation results

ExecutionState

Owns runtime metadata:

  • timings for every major phase
  • schema_link_strategy
  • available_context_tokens
  • full_schema_tokens_count
  • final SQL status
  • evaluation case
  • escalation attempts and flags
  • failure messages
  • telemetry for retries and relevance-guard events

ExternalServices

Owns integration handles and workspace-level settings:

  • vdbmanager
  • agents_and_tools
  • SQL database config
  • workspace config
  • number_of_tests_to_generate
  • number_of_sql_to_generate
  • filtered request flags

Property Bridges

The most important implementation detail is that SystemState exposes bridge properties.

Examples:

  • state.question maps to submitted_question
  • state.full_schema maps to database.full_schema
  • state.keywords maps to semantic.keywords
  • state.filtered_schema maps to schemas.filtered_schema
  • state.generated_sqls maps to generation.generated_sqls
  • state.schema_link_strategy maps to execution.schema_link_strategy
  • state.vdbmanager maps to services.vdbmanager

This keeps older helper code readable while still using a structured internal model.

Bridge Model

flowchart TD
    A["orchestration helper"] --> B["state.generated_sqls"]
    B --> C["GenerationResults.generated_sqls"]

    D["orchestration helper"] --> E["state.full_schema"]
    E --> F["DatabaseContext.full_schema"]

    G["orchestration helper"] --> H["state.schema_link_strategy"]
    H --> I["ExecutionState.schema_link_strategy"]

Mutable Vs Immutable Data

The runtime treats different parts of the state differently.

Mostly stable after initialization

  • workspace_id
  • workspace_name
  • scope
  • language
  • dbmanager
  • vdbmanager
  • full_schema

Mutated repeatedly during the pipeline

  • submitted_question
  • translated_question
  • keywords
  • schema_with_examples
  • schema_from_vector_db
  • enriched_schema
  • filtered_schema
  • used_mschema
  • generated_tests
  • generated_sqls
  • evaluation_results
  • execution timing fields
  • escalation fields

State Mutation Timeline

flowchart LR
    INIT["initialization"] --> Q["question or translated_question"]
    Q --> K["keywords"]
    K --> RET["evidence, sql_documents, schema fragments"]
    RET --> SCH["enriched_schema or filtered_schema"]
    SCH --> MS["used_mschema"]
    MS --> SQL["generated_sqls"]
    SQL --> TEST["generated_tests"]
    TEST --> EVAL["evaluation_results and selection_metrics"]
    EVAL --> FINAL["execution.sql_status and last_SQL"]

submitted_question Is The Working Question

This is a subtle but important behavior.

  • RequestContext.question captures the request payload as received.
  • submitted_question is the working text the pipeline should use.
  • state.question is just a bridge to submitted_question.

When translation happens, the runtime does not rewrite the original request object blindly. It updates state.translated_question and then sets state.submitted_question to the translated value.

That is why most later phases read state.question rather than state.request.question.

Schema Operations Live On SystemState

The orchestration layer calls methods such as:

  • create_enriched_schema()
  • create_filtered_schema()
  • extract_schema_via_lsh()
  • extract_schema_from_vectordb()
  • run_question_validation_with_translation()

That means SystemState is not only storage. It is also the facade through which orchestration helpers trigger domain logic.

Important Developer Caveat

There are two different implementations of filtered-schema logic in the codebase:

  • SystemState.create_filtered_schema()
  • helpers/main_helpers/main_generate_mschema.py::create_filtered_schema(...)

The runtime path in _retrieve_context_phase() calls state.create_filtered_schema(). If you are debugging actual behavior, use the SystemState method as the source of truth for the live orchestration path.

ExecutionState Is Not Just Logging

It is easy to treat ExecutionState as passive telemetry, but the code uses it actively for control and observability:

  • timing metrics are written during each phase
  • schema-link strategy is persisted there
  • escalation attempts are counted there
  • SQL status is finalized there
  • model retry and relevance-guard events are accumulated there

This means ExecutionState is part of the behavior contract, not just reporting.

Practical Debugging Heuristic

When the runtime behaves unexpectedly, first ask which context should have been mutated by the previous phase.

For example:

  • missing retrieval quality usually means SemanticContext or SchemaDerivations did not get populated
  • missing SQL candidates usually means GenerationResults never received valid generated_sqls
  • wrong final status usually means ExecutionState was finalized with the wrong selection or escalation data

That approach is usually faster than reading the pipeline linearly from top to bottom.