VirtuousAI
Connections

Google Drive

Connect Google Drive folders using a GCP service account

Google Drive

Google Drive connections use GCP service account credentials (no OAuth) to read files from a shared Drive folder.

There are two main ways to use this connection:

  • Tabular ingest: dlt_extract + source: "google_drive" for csv / parquet / jsonl → bronze layer
  • File sync: file_sync + source: "google_drive" for documents/files (PDF/DOCX/PPTX/XLSX/MD/etc) → artifacts bucket (raw + optional extracted text)

Prerequisites

  1. Create a service account in Google Cloud.
  2. Enable the Google Drive API for the project.
  3. Create a service account key (JSON).
  4. Share the target Google Drive folder with the service account client email (Viewer is enough).

This connector is service-account only. There is no OAuth flow.

Creating a Connection

  1. Go to ConnectionsAdd Connection
  2. Select Google Drive
  3. Enter the credential fields from your service account key JSON
  4. Click Create, then Verify

Credential Fields

FieldDescription
client_emailService account email (from the key JSON)
private_keyService account private key (from the key JSON)
project_idGCP project id (from the key JSON)

If your private key is pasted with escaped newlines (\\n), VAI normalizes it automatically.

The Key ID shown in the GCP console is not the private key. You must download a JSON key and copy the private_key value from that file.

Verification

Verification checks that the credentials can access the Drive API and can list a folder.

Common failures:

  • 401 Unauthorized: credentials are invalid (wrong project, email, or key)
  • 403 Forbidden: folder is not shared with the service account email
  • 429 Rate limited: retry after a short delay

Usage with Actions

Tabular ingest (CSV/Parquet/JSONL)

Use dlt_extract with source: "google_drive" when you want to ingest tabular files from a Drive folder:

{
  "kind": "dlt_extract",
  "connection": { "kind": "by_slug", "slug": "my-google-drive" },
  "source": "google_drive",
  "source_config": {
    "folder_id": "1AbCDefGhIJkLmNoPqRsTuVwXyZ",
    "file_glob": "**/*.csv",
    "file_type": "csv"
  }
}

Notes

  • folder_id is required and must be a folder that is shared with the service account.
  • The only available resource is files.

Multi-select (folders/files) with per-item cursors

Use multi-select when you want to explicitly sync a set of folders/files by Drive item ID (IDs survive renames/moves).

{
  "kind": "dlt_extract",
  "connection": { "kind": "by_slug", "slug": "my-google-drive" },
  "source": "google_drive",
  "source_config": {
    "folder_id": "root",
    "selected_items": ["0Bxx123FolderId", "0Bxx456FileId"],
    "item_cursors": {
      "0Bxx123FolderId": "modification_date",
      "0Bxx456FileId": null
    },
    "native_export_formats": {
      "docs": "application/pdf",
      "sheets": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
      "slides": "application/pdf"
    }
  }
}

Selection semantics

  • selected_items can include both folder IDs and file IDs.
  • Selecting a folder includes all descendant files (recursively).
  • item_cursors controls incremental behavior per selected item:
    • "modification_date" → incremental
    • null → full resync for that selected item
  • Google-native file types (Docs/Sheets/Slides) are downloaded via the Drive export API using native_export_formats.

Hard limits

  • 1000 expanded files max
  • 3 folder levels max (from folder_id)
  • 50GB total max

File Sync (Docs + Files)

Use file_sync when you want to sync common Drive documents as raw bytes plus extracted text (when supported).

{
  "kind": "file_sync",
  "connection": { "kind": "by_slug", "slug": "my-google-drive" },
  "source": "google_drive",
  "roots": [{ "kind": "folder", "id": "1AbCDefGhIJkLmNoPqRsTuVwXyZ" }],
  "include_globs": ["**/*"],
  "exclude_globs": ["**/.trash/**"],
  "include_extensions": ["pdf", "md", "docx", "pptx", "xlsx", "csv", "txt"],
  "max_file_size_mb": 50,
  "extract_text": true,
  "dry_run": false,
  "export_google_native": {
    "docs": "text/plain",
    "sheets": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "slides": "application/pdf"
  }
}

Selection semantics

  • roots: multiple folder IDs and/or file IDs
  • include_globs / exclude_globs: matched against a computed path like Root/Subfolder/File.ext

On this page