Data Connectors
Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.
Supported Data Connectors include:
| Name | Description | Status | Protocol/Format | 
|---|---|---|---|
| postgres | PostgreSQL, Amazon Redshift | Stable | PostgreSQL-line | 
| mysql | MySQL | Stable | |
| s3 | S3 | Stable | Parquet, CSV, JSON | 
| file | File | Stable | Parquet, CSV, JSON | 
| duckdb | DuckDB | Stable | Embedded | 
| dremio | Dremio | Stable | Arrow Flight | 
| spice.ai | Spice.ai OSS & Cloud | Stable | Arrow Flight | 
| databricks (mode: delta_lake) | Databricks | Stable | S3/Delta Lake | 
| delta_lake | Delta Lake | Stable | Delta Lake | 
| github | GitHub | Stable | GitHub API | 
| graphql | GraphQL | Release Candidate | JSON | 
| databricks (mode: spark_connect) | Databricks | Beta | Spark Connect | 
| flightsql | FlightSQL | Beta | Arrow Flight SQL | 
| mssql | Microsoft SQL Server | Beta | Tabular Data Stream (TDS) | 
| odbc | ODBC | Beta | ODBC | 
| snowflake | Snowflake | Beta | Arrow | 
| spark | Spark | Beta | Spark Connect | 
| iceberg | Apache Iceberg | Beta | Parquet | 
| abfs | Azure BlobFS | Alpha | Parquet, CSV, JSON | 
| ftp,sftp | FTP/SFTP | Alpha | Parquet, CSV, JSON | 
| glue | Glue | Alpha | Iceberg, Parquet, CSV | 
| http,https | HTTP(s) | Alpha | Parquet, CSV, JSON | 
| imap | IMAP | Alpha | IMAP Emails | 
| localpod | Local dataset replication | Alpha | |
| oracle | Oracle | Alpha | Oracle ODPI-C | 
| sharepoint | Microsoft SharePoint | Alpha | Unstructured UTF-8 documents | 
| clickhouse | Clickhouse | Alpha | |
| debezium | Debezium CDC | Alpha | Kafka + JSON | 
| kafka | Kafka | Alpha | Kafka + JSON | 
| dynamodb | DynamoDB | Alpha | |
| mongodb | MongoDB | Alpha | |
| elasticsearch | ElasticSearch | Roadmap | 
Object Store File Formats
For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.
If a file is provided, the file format will be inferred, and params.file_format is unnecessary.
File formats currently supported are:
| Name | Parameter | Supported | Is Document Format | 
|---|---|---|---|
| Apache Parquet | file_format: parquet | ✅ | ❌ | 
| CSV | file_format: csv | ✅ | ❌ | 
| Apache Iceberg | file_format: iceberg | Roadmap | ❌ | 
| JSON | file_format: json | Roadmap | ❌ | 
| Microsoft Excel | file_format: xlsx | Roadmap | ❌ | 
| Markdown | file_format: md | ✅ | ✅ | 
| Text | file_format: txt | ✅ | ✅ | 
| file_format: pdf | Alpha | ✅ | |
| Microsoft Word | file_format: docx | Alpha | ✅ | 
File formats support additional parameters in the params (like csv_has_header) described in File Formats
If a format is a document format, each file will be treated as a document, as per document support below.
Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly.
Document Support
If a Data Connector supports documents, when the appropriate file format is specified (see above), each file will be treated as a row in the table, with the contents of the file within the content column. Additional columns will exist, dependent on the data connector.
Example
Consider a local filesystem
>>> ls -la
total 232
drwxr-sr-x@ 22 jeadie  staff    704 30 Jul 13:12 .
drwxr-sr-x@ 18 jeadie  staff    576 30 Jul 13:12 ..
-rw-r--r--@  1 jeadie  staff   1329 15 Jan  2024 DR-000-Template.md
-rw-r--r--@  1 jeadie  staff   4966 11 Aug  2023 DR-001-Dremio-Architecture.md
-rw-r--r--@  1 jeadie  staff   2307 28 Jul  2023 DR-002-Data-Completeness.md
And the spicepod
datasets:
  - name: my_documents
    from: file:docs/decisions/
    params:
      file_format: md
A Document table will be created.
>>> SELECT * FROM my_documents LIMIT 3
+----------------------------------------------------+--------------------------------------------------+
| location                                           | content                                          |
+----------------------------------------------------+--------------------------------------------------+
| Users/docs/decisions/DR-000-Template.md            | # DR-000: DR Template                            |
|                                                    | **Date:** <>                                     |
|                                                    | **Decision Makers:**                             |
|                                                    | - @<>                                            |
|                                                    | - @<>                                            |
|                                                    | ...                                              |
| Users/docs/decisions/DR-001-Dremio-Architecture.md | # DR-001: Add "Cached" Dremio Dataset            |
|                                                    |                                                  |
|                                                    | ## Context                                       |
|                                                    |                                                  |
|                                                    | We use [Dremio](https://www.dremio.com/) to p... |
| Users/docs/decisions/DR-002-Data-Completeness.md   | # DR-002: Append-Only Data Completeness          |
|                                                    |                                                  |
|                                                    | ## Context                                       |
|                                                    |                                                  |
|                                                    | Our Ethereum append-only dataset is incomple...  |
+----------------------------------------------------+--------------------------------------------------+
Data Connector Docs
📄️ Redshift Data Connector
Connect to Amazon Redshift using the PostgreSQL connector in Spice.
📄️ Azure BlobFS Data Connector
Azure BlobFS Data Connector Documentation
📄️ ClickHouse Data Connector
ClickHouse Data Connector Documentation
📄️ Databricks Data Connector
Databricks Data Connector Documentation
📄️ Debezium Data Connector
Debezium Data Connector Documentation
📄️ Delta Lake Data Connector
Delta Lake Data Connector Documentation
📄️ Dremio Data Connector
Dremio Data Connector Documentation
📄️ DuckDB Data Connector
DuckDB Data Connector Documentation
📄️ DynamoDB Data Connector
DynamoDB Data Connector Documentation
📄️ File Data Connector
File Data Connector Documentation
📄️ Flight SQL Data Connector
Flight SQL Data Connector Documentation
📄️ FTP/SFTP Data Connector
FTP/SFTP Data Connector Documentation
📄️ GitHub Data Connector
GitHub Data Connector Documentation
📄️ Glue Data Connector
Glue Data Connector Documentation
📄️ GraphQL Data Connector
GraphQL Data Connector Documentation
📄️ HTTP(s) Data Connector
HTTP(s) Data Connector Documentation
📄️ Iceberg Data Connector
Connect to and query Apache Iceberg tables
📄️ IMAP Data Connector
IMAP Data Connector Documentation
📄️ Kafka Data Connector
Kafka Data Connector Documentation
📄️ Localpod Data Connector
Localpod Data Connector Documentation
📄️ Memory Data Connector
Memory Data Connector Documentation
📄️ MongoDB Data Connector
MongoDB Data Connector Documentation
📄️ Microsoft SQL Server
Microsoft SQL Server Data Connector
📄️ MySQL Data Connector
MySQL Data Connector Documentation
📄️ ODBC Data Connector
ODBC Data Connector Documentation
📄️ Oracle Data Connector
Oracle Data Connector Documentation
📄️ PostgreSQL Data Connector
PostgreSQL Data Connector Documentation
📄️ S3 Data Connector
S3 Data Connector Documentation
📄️ SharePoint Data Connector
SharePoint Data Connector Documentation
📄️ Snowflake Data Connector
Snowflake Data Connector Documentation
📄️ Apache Spark Connector
Apache Spark Connector Documentation
📄️ Spice.ai Data Connector
Spice.ai Data Connector Documentation
