VirtuousAI

SQL Server

Connect SQL Server for dlt extraction

SQL Server

VAI Flux supports extracting tables from Microsoft SQL Server into the bronze layer (Parquet in S3) using dlt_extract.

SQL Server connections currently support SQL Authentication (username/password) only. Windows Authentication and Azure AD are not supported yet.

Prerequisites

  • ODBC Driver: Install Microsoft ODBC Driver 17 or 18 for SQL Server on every machine/container that runs extraction workers.
  • Network access: Ensure the SQL Server host/port is reachable from the VAI Flux worker environment.

Connection Fields

FieldRequiredDefaultNotes
hostYesSQL Server hostname or IP
portNo1433TCP port
databaseYesDatabase name
usernameYesSQL login username
passwordYesSQL login password
encryptNofalseEnable TLS encryption
trust_server_certificateNofalseSkip cert validation (use only if needed)
connection_timeoutNo30Connection timeout (seconds)

Table Selection

Tables are specified manually using schema.table identifiers.

Examples:

  • dbo.orders
  • sales.customers

If you omit the schema (e.g. orders), it defaults to dbo.

Bracketed identifiers are supported for special characters:

  • [sales].[order items]

UI: Selecting Tables

When configuring a dlt_extract action in the web UI:

  1. Select your SQL Server connection
  2. Use the Tables picker to choose which tables to extract (searchable)
  3. If discovery fails, you can still type schema.table and press Enter

Incremental Extraction

Incremental extraction is configured globally via:

  • cursor_column (recommended: datetime or int columns)
  • start_date (used as the initial cursor value on first run)

On subsequent runs, dlt resumes from saved state automatically.

Example: dlt_extract Action

{
  "kind": "dlt_extract",
  "connection": { "kind": "by_slug", "slug": "my-sqlserver" },
  "source": "sqlserver",
  "source_config": {
    "tables": ["dbo.orders", "sales.customers"],
    "cursor_column": "updated_at",
    "start_date": "2026-01-01",
    "full_resync": false
  }
}

Bronze Layer Data Model

Data extracted from SQL Server is organized by connection, not by action:

s3://bronze/tenant_id={id}/source=sqlserver/{schema.table}/year=YYYY/month=MM/day=DD/

Multiple actions extracting the same table from the same connection write to the same S3 path. This is by design: bronze represents the canonical raw data from each source.

Key points:

  • Avoid creating conflicting actions (e.g., one with cursor_column for incremental, another with full_resync=true on the same tables)
  • If you need different extraction strategies for the same tables, create separate connections

Best Practice: One action per (connection, table-set) combination.

Stored Procedures

You can extract result sets from read-only stored procedures alongside tables.

Only use this for stored procedures that return data (like parameterized views). SPs that perform INSERT, UPDATE, or other side effects should not be used here.

Configuration

{
  "kind": "dlt_extract",
  "source": "sqlserver",
  "source_config": {
    "tables": ["dbo.orders"],
    "stored_procedures": [
      {
        "name": "dbo.GetOrderHistory",
        "params": { "start_date": "2026-01-01", "customer_id": 123 },
        "resource_name": "order_history"
      }
    ]
  }
}

Fields

FieldRequiredDescription
nameYesFull SP name with schema (e.g., dbo.GetOrderHistory)
paramsNoParameters to pass to the SP
resource_nameYesName for the resource in bronze (e.g., order_history)

Limitations

  • Single result set only: If the SP returns multiple result sets, only the first is captured
  • Read-only only: No side-effect validation; use responsibly
  • No incremental support: SP results always use write_disposition='replace'
  • Parameters must be JSON-serializable: Complex types like binary are not supported

Example: SP-Only Extraction

You can extract from stored procedures without any tables:

{
  "source_config": {
    "stored_procedures": [
      { "name": "dbo.GetDailyStats", "resource_name": "daily_stats" },
      { "name": "reports.CustomerSummary", "params": { "year": 2026 }, "resource_name": "customer_summary" }
    ]
  }
}

Progress Estimation

Before extracting each table, VAI Flux runs SELECT COUNT(*) to estimate row counts. This provides:

  • Estimated rows: Approximate rows to be extracted (filtered by watermark for incremental)
  • Estimated chunks: Based on chunk_size (default 100,000 rows per chunk)

This appears in the action run progress for visibility into long-running extractions.

Progress estimation is not available for stored procedures (row count cannot be determined before execution).

Troubleshooting

  • Authentication failed: Verify username/password and that SQL Authentication is enabled on the server.
  • Database not found / access denied: Confirm database name and grant the user permissions.
  • Permission denied: The user must be allowed to connect and run SELECT queries.
  • TLS handshake errors: Toggle encrypt and/or trust_server_certificate, and verify the server certificate chain.
  • Driver errors: Confirm the ODBC driver is installed and available to the worker runtime.

On this page