Creating a New Source Integration in Go

This guide walks through building a source integration in Go using the CloudQuery SDK. As a running example, we reference the xkcd integration which fetches comic data from the xkcd API.

Before starting, make sure you’re familiar with CloudQuery core concepts and have completed the Getting Started guide.

Prerequisites:

Go installed (Go Tutorial, A Tour of Go)
CloudQuery CLI installed

Scaffold a New Integration

The cq-scaffold tool generates a new Go source integration with all the boilerplate. Download it from the releases page, or install via Homebrew on macOS:


brew install cloudquery/tap/scaffold

Create a new integration (replace <org> and <name> with your GitHub org and integration name):


cq-scaffold source <org> <name>
cd cq-source-<name>
go mod tidy

The scaffold tool only generates Go source integrations. For other languages, see the Python, JavaScript, or Java guides.

Project Structure

Here’s the structure of the xkcd integration, which is representative of a typical Go source integration:


plugins/source/xkcd/
├── main.go                  # Entry point
├── go.mod                   # SDK dependency (plugin-sdk/v4)
├── plugin/
│   └── plugin.go            # Name, version, kind, team constants
├── client/
│   ├── client.go            # Client struct (implements schema.ClientMeta)
│   ├── spec.go              # Configuration spec
│   └── testing.go           # Mock client constructor used in table unit tests
├── internal/xkcd/
│   └── xkcd.go              # HTTP client + Comic struct
└── resources/
    ├── plugin/
    │   ├── plugin.go         # Creates plugin via plugin.NewPlugin()
    │   └── client.go         # Configure function, Sync, Tables
    └── services/
        ├── comic.go          # Table definition + resolver
        └── comic_test.go     # Tests

A CloudQuery integration has several distinct components. Here’s what each part does and how they fit together:

main.go: the entry point. It creates the integration and starts serving it over gRPC. You rarely need to modify this file.
plugin/: defines constants like the integration name, version, team, and kind (source). These identify your integration on the CloudQuery Hub.
client/: the Client is a struct that stores everything your resolvers need: an authenticated API client, configuration values, a logger, etc. Every resolver receives the Client so it can make API calls. The Spec is a struct matching the user’s YAML configuration. It defines what settings your integration accepts (API keys, endpoints, concurrency, etc.).
internal/xkcd/ (or internal/<api_name>/): your raw API client code. This is where you make HTTP calls to the third-party API, handle authentication headers, parse responses, and define the response structs. Keeping this separate from the CloudQuery-specific code means you can test and reuse it independently.
resources/plugin/: the Configure function lives here. It’s called once when a sync starts: it parses the user’s spec, creates the API client, sets up the scheduler, and returns a plugin.Client that the SDK uses to run the sync.
resources/services/: one file per table. Each file defines a table (the name, columns, and how they map to your API response struct) and a resolver (the function that actually calls the API and sends results back to CloudQuery). The resolver is the heart of each table: it’s where you make API calls, handle pagination, and stream results to the destination.

How It All Connects

Before reading the code, it helps to understand the flow of what happens when a user runs cloudquery sync:

The CLI starts your integration as a separate process (or connects to it over gRPC if you’re running it locally)
Your main.go creates the integration and starts the gRPC server
The CLI sends the user’s spec configuration to your Configure function
Configure parses the spec, validates it, creates an authenticated API client, and returns a plugin.Client
The CLI asks your integration for its list of tables, then for each table, calls the table’s resolver
Each resolver fetches data from the API and sends results over a channel. The SDK handles writing them to the destination.

This flow means your main implementation work is in two places: the Configure function (parsing configuration and creating the API client) and the resolvers (fetching data from the API).

Entry Point

The main.go creates and serves the integration. This is boilerplate that you rarely need to modify. It wires together the serve package and your integration:


package main
 
import (
	"context"
	"log"
 
	"github.com/cloudquery/plugin-sdk/v4/serve"
	plugin "github.com/cloudquery/<org>/cq-source-<name>/resources/plugin"
)
 
func main() {
	p := serve.Plugin(plugin.Plugin())
	if err := p.Serve(context.Background()); err != nil {
		log.Fatalf("failed to serve plugin: %v", err)
	}
}

Note that main.go imports from resources/plugin — the package that creates the full integration with Sync, Tables, and Close methods. The top-level plugin/ directory only holds name, version, and kind constants. This is a common point of confusion in the project layout.

Integration Setup

Both resources/plugin/plugin.go and resources/plugin/client.go live in the same Go package (package plugin). The plugin.go file creates the integration; client.go holds the SDK-facing client struct and all the methods the SDK calls at sync time.

resources/plugin/plugin.go wires the constants from plugin/ to the SDK:


// resources/plugin/plugin.go
package plugin
 
import (
	internalPlugin "github.com/cloudquery/<org>/cq-source-<name>/plugin"
	"github.com/cloudquery/plugin-sdk/v4/plugin"
)
 
func Plugin() *plugin.Plugin {
	return plugin.NewPlugin(
		internalPlugin.Name,
		internalPlugin.Version,
		Configure,
		plugin.WithKind(internalPlugin.Kind),
		plugin.WithTeam(internalPlugin.Team),
	)
}

resources/plugin/client.go defines the SDK-facing client struct and the three methods the SDK calls during a sync. This is distinct from client/client.go (which implements schema.ClientMeta and is used inside resolvers):


// resources/plugin/client.go
package plugin
 
import (
	"context"
	"encoding/json"
	"fmt"
 
	"github.com/cloudquery/plugin-sdk/v4/message"
	"github.com/cloudquery/plugin-sdk/v4/plugin"
	"github.com/cloudquery/plugin-sdk/v4/scheduler"
	"github.com/cloudquery/plugin-sdk/v4/schema"
	"github.com/cloudquery/plugin-sdk/v4/state"
	"github.com/cloudquery/plugin-sdk/v4/transformers"
	"github.com/rs/zerolog"
	"github.com/cloudquery/<org>/cq-source-<name>/client"
	"github.com/cloudquery/<org>/cq-source-<name>/resources/services"
)
 
// Client implements plugin.Client for a source integration.
// It embeds plugin.UnimplementedDestination to satisfy the full
// plugin.Client interface without providing write methods.
type Client struct {
	logger    zerolog.Logger
	config    client.Spec
	tables    schema.Tables
	scheduler *scheduler.Scheduler
	services  *yourapi.Client
 
	plugin.UnimplementedDestination
}
 
func (c *Client) Sync(ctx context.Context, options plugin.SyncOptions, res chan<- message.SyncMessage) error {
	tt, err := c.tables.FilterDfs(options.Tables, options.SkipTables, options.SkipDependentTables)
	if err != nil {
		return err
	}
 
	stateClient, err := state.NewConnectedClient(ctx, options.BackendOptions)
	if err != nil {
		return err
	}
	defer stateClient.Close()
 
	schedulerClient := client.New(c.logger, c.config, c.services, stateClient)
	if err := c.scheduler.Sync(ctx, schedulerClient, tt, res,
		scheduler.WithSyncDeterministicCQID(options.DeterministicCQID)); err != nil {
		return fmt.Errorf("failed to sync: %w", err)
	}
	return stateClient.Flush(ctx)
}
 
func (c *Client) Tables(_ context.Context, options plugin.TableOptions) (schema.Tables, error) {
	return c.tables.FilterDfs(options.Tables, options.SkipTables, options.SkipDependentTables)
}
 
func (*Client) Close(_ context.Context) error { return nil }

getTables() builds the table list once at startup, applies transformers, and injects the standard CloudQuery columns (_cq_id, _cq_source_name, _cq_sync_time):


func getTables() schema.Tables {
	tables := []*schema.Table{
		services.ComicsTable(),
	}
	if err := transformers.TransformTables(tables); err != nil {
		panic(err)
	}
	for _, t := range tables {
		schema.AddCqIDs(t)
	}
	return tables
}

Configuration & Authentication

The SDK passes the user’s spec block from their YAML configuration as raw JSON bytes to your Configure function. Define a Spec struct and unmarshal it:


// client/spec.go
package client
 
import "fmt"
 
type Spec struct {
	AccessToken string `json:"access_token"`
	Concurrency int    `json:"concurrency"`
}
 
func (s *Spec) SetDefaults() {
	if s.Concurrency == 0 {
		s.Concurrency = 100
	}
}
 
func (s *Spec) Validate() error {
	if s.AccessToken == "" {
		return fmt.Errorf("access_token is required")
	}
	return nil
}

Configure is the constructor the SDK calls once per sync. It has two distinct code paths: a fast-path for when the CLI only needs metadata (no live connection), and the normal path that creates a real API client and scheduler:


// resources/plugin/client.go (continued)
func Configure(_ context.Context, logger zerolog.Logger, specBytes []byte, opts plugin.NewClientOptions) (plugin.Client, error) {
	if opts.NoConnection {
		// Called when the CLI needs table schema without connecting,
		// e.g. for documentation generation or --no-migrate.
		return &Client{
			logger: logger.With().Str("module", "<name>").Logger(),
			tables: getTables(),
		}, nil
	}
 
	var spec client.Spec
	if err := json.Unmarshal(specBytes, &spec); err != nil {
		return nil, fmt.Errorf("failed to unmarshal spec: %w", err)
	}
	spec.SetDefaults()
	if err := spec.Validate(); err != nil {
		return nil, err
	}
 
	apiClient, err := yourapi.NewClient(spec.AccessToken)
	if err != nil {
		return nil, fmt.Errorf("failed to create API client: %w", err)
	}
 
	return &Client{
		logger: logger.With().Str("module", "<name>").Logger(),
		config: spec,
		scheduler: scheduler.NewScheduler(
			scheduler.WithLogger(logger),
			scheduler.WithConcurrency(spec.Concurrency),
		),
		services: apiClient,
		tables:   getTables(),
	}, nil
}

Users configure authentication in their YAML file. The CLI automatically resolves environment variable references:


spec:
  access_token: "${YOUR_API_TOKEN}"
  concurrency: 50

For public APIs that don’t require authentication (like xkcd), omit access_token from the Spec struct and remove its validation from Validate().

Define a Table

A table in CloudQuery represents a collection of related data, typically one API resource type. In Go, you define a table as a function returning a *schema.Table. Each table needs three things: a name (which becomes the database table name), a transformer (which maps your Go struct fields to columns), and a resolver (the function that fetches data from the API).

Rather than listing columns manually, the SDK can auto-map fields from a Go struct using transformers.TransformWithStruct. If an existing Go SDK already provides a struct for the API response, you can use it directly. Otherwise, define your own struct matching the API’s JSON response. Here’s the actual xkcd comics table:


package services
 
import (
	"github.com/cloudquery/plugin-sdk/v4/schema"
	"github.com/cloudquery/plugin-sdk/v4/transformers"
)
 
func ComicsTable() *schema.Table {
	return &schema.Table{
		Name:      "xkcd_comics",
		Resolver:  fetchComics,
		Transform: transformers.TransformWithStruct(
			&xkcd.Comic{},
			transformers.WithPrimaryKeys("Num"),
		),
	}
}

Notice that we don’t list individual columns. TransformWithStruct inspects the Comic struct and creates a column for each exported field. The WithPrimaryKeys("Num") option marks the Num field as the primary key. The final table name xkcd_comics will appear directly as a database table when synced.

The Comic struct defines the columns (from internal/xkcd/xkcd.go):


type Comic struct {
	Month      string `json:"month"`
	Num        int    `json:"num"`
	Link       string `json:"link"`
	Year       string `json:"year"`
	News       string `json:"news"`
	SafeTitle  string `json:"safe_title"`
	Transcript string `json:"transcript"`
	Alt        string `json:"alt"`
	Img        string `json:"img"`
	Title      string `json:"title"`
	Day        string `json:"day"`
}

Each struct field becomes a column in the destination table. The SDK maps Go types to appropriate database types (e.g. string → text, int → integer). The json tags determine how the struct is serialized but don’t affect column names. Column names are derived from the Go field names, converted to snake_case.

Write a Table Resolver

The resolver is the heart of your integration: it’s the function that actually calls the third-party API and sends results back to CloudQuery. The resolver signature has four arguments, each serving a specific purpose:


func fetchComics(ctx context.Context, meta schema.ClientMeta, parent *schema.Resource, res chan<- any) error

ctx: a standard Go context, used for cancellation. If a user stops a sync, this context is cancelled, so your resolver should respect it in long-running loops.
meta: your Client struct (cast it with meta.(*client.Client)). This gives you access to the API client, credentials, and any shared state.
parent: for top-level tables, this is nil. For child tables (e.g. fetching commits for a specific repository), this contains the parent row so you can extract the parent’s ID.
res: a channel where you send your results. Each item you send becomes a row in the destination table.

Here’s the xkcd resolver. It fetches the latest comic to determine the total count, then iterates through all comics by ID:


func fetchComics(ctx context.Context, meta schema.ClientMeta, parent *schema.Resource, res chan<- any) error {
	c := meta.(*client.Client)
	latest, err := c.XKCD.GetLatestComic(ctx)
	if err != nil {
		return err
	}
	res <- latest
 
	for i := 1; i < latest.Num; i++ {
		comic, err := c.XKCD.GetComic(ctx, i)
		if err != nil {
			return err
		}
		res <- comic
	}
	return nil
}

A few important things to note about this code:

We send each comic to the res channel as soon as we get it. This is important. The SDK streams items to the destination immediately, so don’t collect everything into a slice first. Streaming keeps memory usage low and gets data to the user’s database faster.
You can send items one at a time or as a slice. The SDK handles both.
If an API call fails, return the error. The SDK will log it and report it to the user. Any items you’ve already sent to res before the error are still written to the destination, so partial results are preserved.
The struct you send (Comic) must match the struct used in TransformWithStruct. That’s how the SDK knows which fields map to which columns.

The Client

The Client struct is where you store everything that resolvers need to access: API clients, credentials, configuration, and any other shared state. Every resolver receives it via the meta argument. The Client lives in client/client.go and must implement the schema.ClientMeta interface, which requires an ID() method:


type Client struct {
	Logger  zerolog.Logger
	XKCD    *xkcd.Client
	Backend state.Client
}
 
func (c *Client) ID() string {
	return "xkcd"
}

The ID() method serves two purposes: it identifies the client in log messages, and the SDK uses it internally to track which multiplexed client is running. For a small integration like xkcd, a static string is fine. For multiplexed integrations (e.g. one that syncs multiple AWS accounts), you’d include the account name so each client has a unique ID.

The Client is created inside the Configure function and passed to the SDK, which then provides it to every resolver. This is the main way your integration’s initialization code communicates with its resolvers.

Test Locally

Start the integration as a gRPC server for debugging:


go run main.go serve

Or build and run as a local binary:


go build
./cq-source-<name> serve

Then sync using the appropriate registry. See Testing Locally for configuration examples and Running Locally for full details.

Advanced: Column Resolvers

Most of the time, TransformWithStruct handles column mapping automatically. But sometimes you need a column that doesn’t come directly from the API response, maybe it’s derived from other fields, or requires an additional API call. In these cases, you can add extra columns with their own resolver functions.

For example, imagine we want to add an is_good boolean column to the xkcd comics table that doesn’t exist in the API response. We add it to the Columns field alongside the auto-generated columns from Transform:


func ComicsTable() *schema.Table {
	return &schema.Table{
		Name:      "xkcd_comics",
		Resolver:  fetchComics,
		Transform: transformers.TransformWithStruct(&xkcd.Comic{}),
		Columns: []schema.Column{
			{
				Name:     "is_good",
				Type:     arrow.FixedWidthTypes.Boolean,
				Resolver: resolveComicIsGood,
			},
		},
	}
}
 
func resolveComicIsGood(ctx context.Context, meta schema.ClientMeta, resource *schema.Resource, c schema.Column) error {
	comic := resource.Item.(xkcd.Comic)
	return resource.Set(c.Name, strings.Contains(comic.Title, "xkcd"))
}

The column resolver receives the current row via resource.Item. You cast it to your struct type, compute the value, and set it with resource.Set(). As big fans of meta-jokes, we define only comics with "XKCD" in the title to be good. These custom columns appear alongside the auto-generated columns from TransformWithStruct.

Advanced: Multiplexing

For our xkcd integration, multiplexing isn’t necessary. There’s only one xkcd API with no accounts or organizations. But many real-world integrations need to fetch data for multiple entities. For example, a GitHub integration that syncs repositories for multiple organizations needs to make separate API calls per org. Without multiplexing, these would run sequentially. With multiplexing, they run in parallel.

A multiplexer is a function that takes the base client and returns a slice of clients, one per entity. The SDK calls your table resolver once for each client in the slice:


func AccountMultiplex(meta schema.ClientMeta) []schema.ClientMeta {
	client := meta.(*Client)
	l := make([]schema.ClientMeta, 0, len(client.accounts))
	for _, acc := range client.accounts {
		l = append(l, client.WithAccount(acc))
	}
	return l
}

Then set Multiplex: client.AccountMultiplex on tables that need it. Make sure the client’s ID() method returns a unique value per multiplexed entity:


func (c *Client) ID() string {
	return fmt.Sprintf("myplugin:%s", c.Account)
}

Inside the resolver, you can then access the current account via client.Account to make the right API calls.

Advanced: Incremental Tables

By default, every sync fetches all data from scratch. For small APIs this is fine, but for APIs with millions of records that rarely change, re-fetching everything is wasteful. Incremental tables solve this by remembering where the last sync left off (using a cursor) and only fetching new data on subsequent syncs.

To make a table incremental, you need to mark it as such and designate one column as the incremental key (the cursor):


func Items() *schema.Table {
	return &schema.Table{
		Name:          "hackernews_items",
		Resolver:      fetchItems,
		IsIncremental: true,
		Transform:     transformers.TransformWithStruct(&hackernews.Item{}),
		Columns: []schema.Column{
			{
				Name:           "id",
				Type:           arrow.PrimitiveTypes.Int64,
				PrimaryKey:     true,
				IncrementalKey: true,
			},
		},
	}
}

In the resolver, use the state backend to persist the cursor:


func fetchItems(ctx context.Context, meta schema.ClientMeta, _ *schema.Resource, res chan<- any) error {
	c := meta.(*client.Client)
	tableName := Items().Name
 
	// Load cursor from last sync
	value, err := c.Backend.GetKey(ctx, tableName)
	// ... fetch data starting from cursor ...
 
	// Save cursor after processing
	err = c.Backend.SetKey(ctx, tableName, strconv.Itoa(newCursor))
	err = c.Backend.Flush(ctx) // Must flush to persist
	return nil
}

See Managing Incremental Tables for the full guide, and the Hacker News integration for the complete working example.

Troubleshooting

go run main.go serve fails with a compile error

The most common cause is an import path mismatch. Make sure main.go imports from resources/plugin (the package with the Plugin() function), not the top-level plugin/ directory (which only holds constants). See the Entry Point section for the correct import path.

cloudquery sync fails with connection refused

Your integration isn’t running, or the port in your config doesn’t match. Make sure go run main.go serve is running in a separate terminal, and that the path in your YAML (localhost:7777) matches the address shown in the server output.

failed to validate spec error on sync

Your Spec’s Validate() method returned an error. Check that all required fields are present in your YAML config and that environment variable references like ${MY_API_TOKEN} are set in the shell where you’re running cloudquery sync.

Resolver is never called / zero rows synced

The table is probably filtered out. Check that the table name listed in your YAML tables: field matches the name returned by your ComicsTable() function exactly. If you’re using tables: ["*"], make sure the table is included in the list returned by getTables() in resources/plugin/client.go.

Resolver runs but sends no rows to the res channel

Add a log statement inside the resolver to verify it’s executing and that your API call is returning data. Check that you’re actually sending to res — a common mistake is building a slice and forgetting to send its elements.

Sync works locally but re-fetches everything on the second run

You have an incremental table but forgot to call c.Backend.Flush(ctx) after saving the cursor. Without Flush, the cursor is never persisted and each sync starts from scratch. See Advanced: Incremental Tables.

Common Pitfalls

Avoid these common mistakes when building Go integrations:

Don’t batch results in memory. Send items to the res channel as soon as they’re available. Don’t collect all pages into a slice and send them at the end. This wastes memory and delays writes to the destination.
Fetch concurrently when the API allows it. A sequential loop is the simplest starting point, but for large datasets use golang.org/x/sync/errgroup with a concurrency limit so you don’t overwhelm the API. See the xkcd integration for a working example.
Always call Backend.Flush(ctx) for incremental tables. If you skip this, your cursor won’t persist and the next sync will re-fetch everything.
Make ID() unique per multiplexed client. If two multiplexed clients return the same ID(), the SDK won’t parallelize them correctly.
Return errors from resolvers. Don’t silently swallow API errors. Return them so the SDK can log them and surface them to the user.
Respect context cancellation. Check ctx.Done() in long-running loops so the user can cancel a sync cleanly.

Publishing

Visit Publishing an Integration to the Hub for release instructions.

Real-World Examples

xkcd: starter integration referenced in this tutorial
Hacker News: incremental tables with state backend
Kubernetes: large-scale integration with many tables and mock tests
PostgreSQL Destination: “unmanaged” destination that handles batching itself
BigQuery Destination: “managed” destination with per-table batching
All integrations

Next Steps

Once your integration is working locally:

Publish to the Hub: make your integration available to others
Add tests: see comic_test.go in the xkcd integration for a testing pattern
Add incremental tables: use the state backend for large datasets that don’t change much between syncs
Add multiplexing: parallelize fetching if your integration supports multiple accounts or regions
Build a destination: see the Go Destination guide to write an integration that receives and stores data

Resources

CloudQuery Community
Go SDK Source Code (plugin-sdk/v4)
How to Write a CloudQuery Source Integration (Video. May reference older SDK patterns; use this guide for current code.)