Skip to content

SetNoDataValue performance regression #568

@pomadchin

Description

@pomadchin

In terms of prepearing #567 (notebook) I notcied that:

# any dataframe with rasters
rs = spark.read.raster(assets.limit(1), tile_dimensions=(512, 512), buffer_size=2, catalog_col_names=["band"])

# Set LC 8 NoData to zero
rsnd = rs.select(rf_with_no_data(rs.band, 0).alias("band"))

# save a hillshade raster to the disk as a tiff (nodata is set via rf_with_no_data)
rsnd \
  .limit(1) \
  .select(rf_hillshade(rsnd.band, azimuth=315, altitude=45, z_factor=1, target="data")) \
  .write.geotiff("lc8-hillshade.tiff", "EPSG:32718")

# save a hillshade raster to the disk as a tiff (without nodata set)
rs \
  .limit(1) \
  .select(rf_hillshade(rs.band, azimuth=315, altitude=45, z_factor=1)) \
  .write.geotiff("lc8-hillshade-all.tiff", "EPSG:32718")```

The usage of rf_with_no_data makes computation ~x600 slower than without it.

rf_with_no_data

image

vanilla

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions