SQL Server

VAI Flux supports extracting tables from Microsoft SQL Server into the bronze layer (Parquet in S3) using dlt_extract.

SQL Server connections currently support SQL Authentication (username/password) only. Windows Authentication and Azure AD are not supported yet.

Prerequisites

ODBC Driver: Install Microsoft ODBC Driver 17 or 18 for SQL Server on every machine/container that runs extraction workers.
Network access: Ensure the SQL Server host/port is reachable from the VAI Flux worker environment.

Connection Fields

Field	Required	Default	Notes
`host`	Yes	—	SQL Server hostname or IP
`port`	No	`1433`	TCP port
`database`	Yes	—	Database name
`username`	Yes	—	SQL login username
`password`	Yes	—	SQL login password
`encrypt`	No	`false`	Enable TLS encryption
`trust_server_certificate`	No	`false`	Skip cert validation (use only if needed)
`connection_timeout`	No	`30`	Connection timeout (seconds)

Table Selection

Tables are specified manually using schema.table identifiers.

Examples:

dbo.orders
sales.customers

If you omit the schema (e.g. orders), it defaults to dbo.

Bracketed identifiers are supported for special characters:

[sales].[order items]

UI: Selecting Tables

When configuring a dlt_extract action in the web UI:

Select your SQL Server connection
Use the Tables picker to choose which tables to extract (searchable)
If discovery fails, you can still type schema.table and press Enter

Incremental Extraction

Incremental extraction is configured globally via:

cursor_column (recommended: datetime or int columns)
start_date (used as the initial cursor value on first run)

On subsequent runs, dlt resumes from saved state automatically.

Example: dlt_extract Action

{
  "kind": "dlt_extract",
  "connection": { "kind": "by_slug", "slug": "my-sqlserver" },
  "source": "sqlserver",
  "source_config": {
    "tables": ["dbo.orders", "sales.customers"],
    "cursor_column": "updated_at",
    "start_date": "2026-01-01",
    "full_resync": false
  }
}

Bronze Layer Data Model

Data extracted from SQL Server is organized by connection, not by action:

s3://bronze/tenant_id={id}/source=sqlserver/{schema.table}/year=YYYY/month=MM/day=DD/

Multiple actions extracting the same table from the same connection write to the same S3 path. This is by design: bronze represents the canonical raw data from each source.

Key points:

Avoid creating conflicting actions (e.g., one with cursor_column for incremental, another with full_resync=true on the same tables)
If you need different extraction strategies for the same tables, create separate connections

Best Practice: One action per (connection, table-set) combination.

Stored Procedures

You can extract result sets from read-only stored procedures alongside tables.

Only use this for stored procedures that return data (like parameterized views). SPs that perform INSERT, UPDATE, or other side effects should not be used here.

Configuration

{
  "kind": "dlt_extract",
  "source": "sqlserver",
  "source_config": {
    "tables": ["dbo.orders"],
    "stored_procedures": [
      {
        "name": "dbo.GetOrderHistory",
        "params": { "start_date": "2026-01-01", "customer_id": 123 },
        "resource_name": "order_history"
      }
    ]
  }
}

Fields

Field	Required	Description
`name`	Yes	Full SP name with schema (e.g., `dbo.GetOrderHistory`)
`params`	No	Parameters to pass to the SP
`resource_name`	Yes	Name for the resource in bronze (e.g., `order_history`)

Limitations

Single result set only: If the SP returns multiple result sets, only the first is captured
Read-only only: No side-effect validation; use responsibly
No incremental support: SP results always use write_disposition='replace'
Parameters must be JSON-serializable: Complex types like binary are not supported

Example: SP-Only Extraction

You can extract from stored procedures without any tables:

{
  "source_config": {
    "stored_procedures": [
      { "name": "dbo.GetDailyStats", "resource_name": "daily_stats" },
      { "name": "reports.CustomerSummary", "params": { "year": 2026 }, "resource_name": "customer_summary" }
    ]
  }
}

Progress Estimation

Before extracting each table, VAI Flux runs SELECT COUNT(*) to estimate row counts. This provides:

Estimated rows: Approximate rows to be extracted (filtered by watermark for incremental)
Estimated chunks: Based on chunk_size (default 100,000 rows per chunk)

This appears in the action run progress for visibility into long-running extractions.

Progress estimation is not available for stored procedures (row count cannot be determined before execution).

Troubleshooting

Authentication failed: Verify username/password and that SQL Authentication is enabled on the server.
Database not found / access denied: Confirm database name and grant the user permissions.
Permission denied: The user must be allowed to connect and run SELECT queries.
TLS handshake errors: Toggle encrypt and/or trust_server_certificate, and verify the server certificate chain.
Driver errors: Confirm the ODBC driver is installed and available to the worker runtime.

SQL Server

On this page