Documentation
¶
Overview ¶
Package avroio contains transforms for reading and writing avro files.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Read ¶
Read reads a set of files and returns lines as a PCollection<elem> based on the internal avro schema of the file. A type - reflect.TypeOf( YourType{} ) - with JSON tags can be defined or if you wish to return the raw JSON string, use - reflect.TypeOf("") -
func Write ¶
func Write(s beam.Scope, prefix, schema string, col beam.PCollection, opts ...WriteOption)
Write writes a PCollection<string> to an AVRO file. Write expects a JSON string with a matching AVRO schema. the process will fail if the schema does not match the JSON provided
Parameters:
prefix: File path prefix (e.g., "gs://bucket/output") suffix: File extension (e.g., ".avro") numShards: Number of output files (0 or 1 for single file) schema: AVRO schema as JSON string
Files are named as: <prefix>-<shard>-of-<numShards><suffix> Example: output-00000-of-00010.avro
Examples:
Write(s, "gs://bucket/output", schema, col) // output-00000-of-00001.avro (defaults)
Write(s, "gs://bucket/output", schema, col, WithSuffix(".avro")) // output-00000-of-00001.avro (explicit)
Write(s, "gs://bucket/output", schema, col, WithNumShards(10)) // output-00000-of-00010.avro (10 shards)
Write(s, "gs://bucket/output", schema, col, WithSuffix(".avro"), WithNumShards(10)) // full control
Types ¶
type WriteOption ¶ added in v2.71.0
type WriteOption func(*writeConfig)
func WithNumShards ¶ added in v2.71.0
func WithNumShards(numShards int) WriteOption
WithNumShards sets the number of output shards (default: 1)
func WithSuffix ¶ added in v2.71.0
func WithSuffix(suffix string) WriteOption
WithSuffix sets the file suffix (default: ".avro")