When you use min_ram or cpu_count resource hints for pipeline steps that don't require accelerators, Auto VM Selection (Instance Flexibility) is enabled automatically. With Auto VM Selection, workers are provisioned from a curated list of machine types that meet your RAM and CPU requirements. For more information, see Auto VM Selection for worker machine types.
Dataflow support for the C4A machine series of Arm processors is now generally available. Arm-based VMs are optimized for power efficiency and can provide improved price-performance for many workloads. For more information, see Use Arm VMs on Dataflow.
]]>Dataflow Managed I/O now supports rolling upgrades for streaming jobs. With this feature, Dataflow upgrades your Managed I/O connectors in running pipelines as new connector versions become available. For more information, see Automatic upgrades.
]]>Dataflow is available in the Bangkok (asia-southeast3) region. Learn more
about Google Cloud locations.
Dataflow now serves a notice for when the Dataflow Runner v2 container image of a streaming pipeline will be upgraded. To use a new image and avoid the scheduled maintenance, launch a replacement job before the upgrade. For more information, see Runner v2 harness update.
]]>Dataflow now supports speculative execution for batch pipelines. This feature mitigates the impact of slow-running tasks (stragglers) by launching a redundant execution of these tasks. The first task to finish is used, and the other is canceled, which can improve the overall completion time of your pipeline. This feature is generally available. For more information, see Use speculative execution to avoid stragglers.
]]>For jobs that use GPUs, Dataflow now supports the flex-start provisioning model. This flex-start provisioning model can improve your ability to get access to constrained GPU resources for short-duration workloads. This feature is available in Preview and is for batch pipelines only. For more information, see Configure a provisioning model.
]]>Dataflow now supports using secure tags to set firewall rules on worker VMs. For more information, see Use secure tags with Dataflow.
]]>Dataflow supports TPUs, Google's custom-designed AI accelerators that are optimized for large-scale AI/ML workloads. This feature lets you accelerate inference workloads on frameworks like PyTorch, JAX, and TensorFlow. This feature is generally available with an allowlist. For more information, see Dataflow support for TPUs.
Dataflow supports specifically targeted reservations for pipelines using accelerators (GPUs or TPUs). This functionality is generally available with an allowlist. For more information, see Use Compute Engine reservations with Dataflow.
Dataflow supports NVIDIA® H100 and NVIDIA® H100 Mega GPU types. For more information, see Dataflow support for GPUs.
]]>Dataflow Runner v2 fixes an issue that could cause data discrepancies when using splittable DoFns, particularly when processing large datasets as side inputs. This fix ensures that all data is accurately processed and transmitted within the pipeline. This improvement is available in recent Dataflow service releases, and is automatically enabled when using Dataflow Runner v2.
Note: After this fix, pipelines that previously experienced data loss due to this issue might consume more resources (such as CPU, memory, and processing time) because more data is being processed. This increase in resource usage is expected and reflects the correct behavior of the pipeline.
]]>Dataflow now automatically detects performance bottlenecks in streaming jobs. You can see the cause of the bottleneck in the Step Info panel to help with troubleshooting.
For more information, see Troubleshoot bottlenecks.
]]>Dataflow now supports an automated parallel update workflow for streaming jobs. This feature helps minimize disruption by launching a new replacement job that runs in parallel with the existing job. After a duration of time you specify, the old job is automatically drained.
For more information, see Run parallel pipelines.
]]>Dataflow now supports right fitting for streaming jobs. Right fitting lets you specify resource requirements for an entire pipeline or for specific pipeline steps. Previously, right fitting was only supported for batch pipelines. For more information, see Streaming right fitting.
]]>StreamingMode is added (2f22244)bugs is added to message .google.dataflow.v1beta3.SdkVersion (2f22244)data_sampling is added to message .google.dataflow.v1beta3.DebugOptions (2f22244)default_streaming_mode is added to message .google.dataflow.v1beta3.TemplateMetadata (2f22244)default_value is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)disk_size_gb is added to message .google.dataflow.v1beta3.RuntimeEnvironment (2f22244)dynamic_destinations is added to message .google.dataflow.v1beta3.PubsubLocation (2f22244)enable_launcher_vm_serial_port_logging is added to message .google.dataflow.v1beta3.FlexTemplateRuntimeEnvironment (2f22244)enum_options is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)group_name is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)hidden_ui is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)image_repository_cert_path is added to message .google.dataflow.v1beta3.ContainerSpec (2f22244)image_repository_password_secret_id is added to message .google.dataflow.v1beta3.ContainerSpec (2f22244)image_repository_username_secret_id is added to message .google.dataflow.v1beta3.ContainerSpec (2f22244)name is added to message .google.dataflow.v1beta3.ListJobsRequest (2f22244)parent_name is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)parent_trigger_values is added to message .google.dataflow.v1beta3.ParameterMetadata (2f22244)runtime_updatable_params is added to message .google.dataflow.v1beta3.Job (2f22244)satisfies_pzi is added to message .google.dataflow.v1beta3.Job (2f22244)service_resources is added to message .google.dataflow.v1beta3.Job (2f22244)step_names_hash is added to message .google.dataflow.v1beta3.PipelineDescription (2f22244)straggler_info is added to message .google.dataflow.v1beta3.WorkItemDetails (2f22244)straggler_summary is added to message .google.dataflow.v1beta3.StageSummary (2f22244)streaming_mode is added to message .google.dataflow.v1beta3.Environment (2f22244)streaming_mode is added to message .google.dataflow.v1beta3.FlexTemplateRuntimeEnvironment (2f22244)streaming_mode is added to message .google.dataflow.v1beta3.RuntimeEnvironment (2f22244)streaming is added to message .google.dataflow.v1beta3.TemplateMetadata (2f22244)supports_at_least_once is added to message .google.dataflow.v1beta3.TemplateMetadata (2f22244)supports_exactly_once is added to message .google.dataflow.v1beta3.TemplateMetadata (2f22244)trie is added to message .google.dataflow.v1beta3.MetricUpdate (2f22244)update_mask is added to message .google.dataflow.v1beta3.UpdateJobRequest (2f22244)use_streaming_engine_resource_based_billing is added to message .google.dataflow.v1beta3.Environment (2f22244)user_display_properties is added to message .google.dataflow.v1beta3.JobMetadata (2f22244)DataSamplingConfig is added (2f22244)HotKeyDebuggingInfo is added (2f22244)ParameterMetadataEnumOption is added (2f22244)RuntimeUpdatableParams is added (2f22244)SdkBug is added (2f22244)ServiceResources is added (2f22244)Straggler is added (2f22244)StragglerInfo is added (2f22244)StragglerSummary is added (2f22244)StreamingStragglerInfo is added (2f22244)job,update_mask is added to method UpdateJob in service JobsV1Beta3 (2f22244)BIGQUERY_TABLE is added to enum ParameterType (2f22244)BOOLEAN is added to enum ParameterType (2f22244)ENUM is added to enum ParameterType (2f22244)GO is added to enum Language (2f22244)JAVASCRIPT_UDF_FILE is added to enum ParameterType (2f22244)KAFKA_READ_TOPIC is added to enum ParameterType (2f22244)KAFKA_TOPIC is added to enum ParameterType (2f22244)KAFKA_WRITE_TOPIC is added to enum ParameterType (2f22244)KMS_KEY_NAME is added to enum ParameterType (2f22244)MACHINE_TYPE is added to enum ParameterType (2f22244)NUMBER is added to enum ParameterType (2f22244)SERVICE_ACCOUNT is added to enum ParameterType (2f22244)WORKER_REGION is added to enum ParameterType (2f22244)WORKER_ZONE is added to enum ParameterType (2f22244)JobState is changed (2f22244)WorkerIPAddressConfiguration is changed (2f22244)JOB_VIEW_ALL in enum JobView is changed (2f22244)additional_experiments in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)additional_user_labels in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)bypass_temp_dir_validation in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)capabilities in message .google.dataflow.v1beta3.SdkHarnessContainerImage is changed (2f22244)current_state in message .google.dataflow.v1beta3.Job is changed (2f22244)dataset in message .google.dataflow.v1beta3.Environment is changed (2f22244)debug_options in message .google.dataflow.v1beta3.Environment is changed (2f22244)dump_heap_on_oom in message .google.dataflow.v1beta3.FlexTemplateRuntimeEnvironment is changed (2f22244)dynamic_template in message .google.dataflow.v1beta3.LaunchTemplateRequest is changed (2f22244)enable_hot_key_logging in message .google.dataflow.v1beta3.DebugOptions is changed (2f22244)enable_streaming_engine in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)environment in message .google.dataflow.v1beta3.Job is changed (2f22244)flex_resource_scheduling_goal in message .google.dataflow.v1beta3.Environment is changed (2f22244)gcs_path in message .google.dataflow.v1beta3.DynamicTemplateLaunchParams is changed (2f22244)gcs_path in message .google.dataflow.v1beta3.LaunchTemplateRequest is changed (2f22244)id in message .google.dataflow.v1beta3.Job is changed (2f22244)ip_configuration in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)job_name in message .google.dataflow.v1beta3.LaunchTemplateParameters is changed (2f22244)kms_key_name in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)launch_parameters in message .google.dataflow.v1beta3.LaunchTemplateRequest is changed (2f22244)location in message .google.dataflow.v1beta3.Job is changed (2f22244)machine_type in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)max_workers in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)name in message .google.dataflow.v1beta3.Job is changed (2f22244)network in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)num_workers in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)project_id in message .google.dataflow.v1beta3.Job is changed (2f22244)requested_state in message .google.dataflow.v1beta3.Job is changed (2f22244)save_heap_dumps_to_gcs_path in message .google.dataflow.v1beta3.FlexTemplateRuntimeEnvironment is changed (2f22244)service_account_email in message .google.dataflow.v1beta3.Environment is changed (2f22244)service_account_email in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)service_kms_key_name in message .google.dataflow.v1beta3.Environment is changed (2f22244)service_options in message .google.dataflow.v1beta3.Environment is changed (2f22244)set in message .google.dataflow.v1beta3.MetricUpdate is changed (2f22244)subnetwork in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)temp_location in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)transform_name_mapping in message .google.dataflow.v1beta3.Job is changed (2f22244)type in message .google.dataflow.v1beta3.Job is changed (2f22244)worker_region in message .google.dataflow.v1beta3.Environment is changed (2f22244)worker_region in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)worker_zone in message .google.dataflow.v1beta3.Environment is changed (2f22244)worker_zone in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)zone in message .google.dataflow.v1beta3.RuntimeEnvironment is changed (2f22244)DynamicTemplateLaunchParams is changed (2f22244)Job is changed (2f22244)JobExecutionStageInfo is changed (2f22244)JobMetrics is changed (2f22244)LaunchTemplateParameters is changed (2f22244)MetricUpdate is changed (2f22244)SdkHarnessContainerImage is changed (2f22244)Step is changed (2f22244)AggregatedListJobs in service JobsV1Beta3 is changed (2f22244)CreateJob in service JobsV1Beta3 is changed (2f22244)CreateJobFromTemplate in service TemplatesService is changed (2f22244)GetTemplate in service TemplatesService is changed (2f22244)LaunchTemplate in service TemplatesService is changed (2f22244)ListJobs in service JobsV1Beta3 is changed (2f22244)FlexTemplatesService is changed (2f22244)Dataflow now supports data lineage. Data lineage lets you track how data moves through your systems. This feature is generally available (GA). For more information, see Use data lineage in Dataflow.
]]>Dataflow is now available in Stockholm (europe-north2).
Managed I/O now supports automatic upgrades for supported I/O connectors. Using this feature, Dataflow pipelines automatically use the latest reliable version of the connector. This feature is generally available (GA). For more information, see Dataflow managed I/O.
]]>Dataflow is available in Queretaro, Mexico (northamerica-south1). Learn more about Google Cloud locations.
]]>You can now use the Dataflow job builder UI to create and run Dataflow pipelines in the Google Cloud console, without writing any code. This feature is generally available (GA).
]]>The remote code execution vulnerability, CVE-2024-6387, in OpenSSH has been mitigated. A patched Dataflow VM image that includes an updated OpenSSH is available. For more information about how to apply mitigations, see the GCP-2024-040 security bulletin.
]]>A remote code execution vulnerability, CVE-2024-6387, was recently discovered in OpenSSH. Dataflow jobs might create VMs that use an OS image with versions of OpenSSH that are vulnerable to CVE-2024-6387. For more information, see the GCP-2024-040 security bulletin.
]]>Dataflow batch jobs are now cancelled after ten days. Previously, they were cancelled after 30 days. See Quotas and limits.
]]>Dataflow SQL is deprecated. As of July 31, 2024, you can't access Dataflow SQL in the Google Cloud console. As of January 31, 2025, you can't use Dataflow SQL in the Google Cloud CLI. As a replacement, use Beam SQL.
]]>Iceberg read/write support is available through the new Managed I/O Java API. For more information, see Dataflow managed I/O.
]]>You can now use Metrics Explorer to find individual DoFns that cause latencies in streaming jobs. These metrics are available in streaming pipelines that use Apache Beam 2.53.0 and later versions. The following new metrics are available:
job/dofn_latency_average)job/dofn_latency_max)job/dofn_latency_min)job/dofn_latency_num_messages)job/oldest_active_message_age)job/dofn_latency_total)For more information about Dataflow metrics, see Google Cloud metrics.
]]>Dataflow no longer supports the NVIDIA Tesla K80 GPU type. For a list of supported GPU types, see Dataflow support for GPUs.
]]>