Skip to content

Check that an upgrade can be performed on an existing cluster without data loss (cycling demo) #752

@razvan

Description

@razvan

Description

Test the upgrade of the cycling demo from 25.3 to 25.7 works and the data is preserved.

Summary

  • The demo needs to be patched to give the HBase master more memory.
  • Upgrading the SDP works.
  • HBase fails to start because the 25.3 image and the 25.7 configurations are incompatible.
  • Deleting and recreating the HBase cluster fixes it.
  • The cycling-triodata table contains the same data.

Protocol

Test SDP release upgrade with the cycling demo

install the 25.3 demo version

❯ stackablectl demo install --release 25.3 hbase-hdfs-load-cycling-data

HBase shell errors out

❯ kubectl exec -it hbase-master-default-0 -- bin/hbase shell
command terminated with exit code 137

Fixed by increasing the Hbase memory limit from 1Gi to 2Gi after which it worked

hbase:001:0> describe 'cycling-tripdata'
Table cycling-tripdata is ENABLED
cycling-tripdata, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
COLUMN FAMILIES DESCRIPTION
...
{NAME => 'started_at', INDEX_BLOCK_ENCODING => 'NONE', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BL
OOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 B (64KB)'}

12 row(s)
Quota is disabled
Took 0.7271 seconds
hbase:002:0>

Uninstall 25.3 ops

demos on  main [$] took 1m42s
❯ stackablectl release uninstall 25.3

Uninstalled release "25.3"

Use "stackablectl release list" to list available releases.

Patch crds (copy&paste from release notes)

Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/airflow-operator/25.7.0/deploy/helm/airflow-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "airflowclusters.airflow.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/authenticationclasses.authentication.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/s3connections.s3.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/s3buckets.s3.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/druid-operator/25.7.0/deploy/helm/druid-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "druidclusters.druid.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/hbaseclusters.hbase.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/hdfsclusters.hdfs.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/hive-operator/25.7.0/deploy/helm/hive-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "hiveclusters.hive.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/kafka-operator/25.7.0/deploy/helm/kafka-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "kafkaclusters.kafka.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/listenerclasses.listeners.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/listeners.listeners.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/podlisteners.listeners.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/nifi-operator/25.7.0/deploy/helm/nifi-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "nificlusters.nifi.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/opa-operator/25.7.0/deploy/helm/opa-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "opaclusters.opa.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/secretclasses.secrets.stackable.tech replaced
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/secret-operator/25.7.0/deploy/helm/secret-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "truststores.secrets.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/truststores.secrets.stackable.tech created
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/stackabletech/secret-operator/25.7.0/deploy/helm/secret-operator/crds/crds.yaml": customresourcedefinitions.api
extensions.k8s.io "secretclasses.secrets.stackable.tech" already exists
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkapplications.spark.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkhistoryservers.spark.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/25.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "sparkconnectservers.spark.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/sparkapplications.spark.stackable.tech created
customresourcedefinition.apiextensions.k8s.io/sparkhistoryservers.spark.stackable.tech created
customresourcedefinition.apiextensions.k8s.io/sparkconnectservers.spark.stackable.tech created
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/superset-operator/25.7.0/deploy/helm/superset-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "supersetclusters.superset.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/superset-operator/25.7.0/deploy/helm/superset-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "druidconnections.superset.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/trino-operator/25.7.0/deploy/helm/trino-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "trinoclusters.trino.stackable.tech" not found
Error from server (NotFound): error when replacing "https://raw.githubusercontent.com/stackabletech/trino-operator/25.7.0/deploy/helm/trino-operator/crds/crds.yaml": customresourcedefinitions.apiextensions.k8s.io "trinocatalogs.trino.stackable.tech" not found
customresourcedefinition.apiextensions.k8s.io/zookeeperclusters.zookeeper.stackable.tech replaced
customresourcedefinition.apiextensions.k8s.io/zookeeperznodes.zookeeper.stackable.tech replaced

Install release SDP release 25.7

stackablectl release install 25.7

Hbase pods crash loop.

Master logs

2025-07-24T09:43:36,934 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.NumberFormatException: For input string: "${HBASE_SERVICE_PORT}"
    at java.lang.NumberFormatException.forInputString(Unknown Source) ~[?:?]
    at java.lang.Integer.parseInt(Unknown Source) ~[?:?]
    at java.lang.Integer.parseInt(Unknown Source) ~[?:?]
    at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1534) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:1270) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.MasterRpcServices.<init>(MasterRpcServices.java:424) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:737) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:670) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:474) ~[hbase-server-2.6.1.jar:2.6.1]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:?]
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3403) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:248) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3423) ~[hbase-server-2.6.1.jar:2.6.1]
2025-07-24T09:43:37,018 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3412) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:248) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.6.1.jar:2.6.1]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3423) ~[hbase-server-2.6.1.jar:2.6.1]
Caused by: java.lang.NumberFormatException: For input string: "${HBASE_SERVICE_PORT}"

Patched the hbase sts to update the image, but got the error

+ HBASE_ROLE_NAME=master
+ HBASE_ROLE_SERVICE_PORT=hbase-master-default.default.svc.cluster.local
+ HBASE_PORT_NAME=16000
/stackable/hbase/bin/hbase-entrypoint.sh: line 19: $4: unbound variable
stream closed EOF for default/hbase-master-default-0 (hbase) 

Deleted the stacklet and recereated it with the same Hbase version worked.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions