Skip to content

Conversation

@GutoVeronezi
Copy link
Contributor

Description

On StatsCollector, it limits secondary storages capacity to 90%; therefore, operators do not have option to choose which threshold they want/need.

This PR intends to externalize the secondary storage capacity threshold configuration on StatsCollector (secondary.storage.capacity.threshold), validating if it is between 0 and 100% (0.00 and 1.00).

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functioanlity)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

It has been tested locally on a test lab.

  1. I enabled a zone;
  2. I created a template and observed the log. It will log messages according to the configuration set.

Comment on lines 1634 to 1639
s_logger.warn(String.format("Invalid [%s] configuration: value set [%s] is [%s]. Assuming %s as secondary storage capacity threshold.",
secondaryStorageCapacityThreshold.key(),
thresholdConfig,
thresholdConfig < MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD ? String.format("lower than '%s'", MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD) : String.format("bigger than '%s'", MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD),
MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD));
return MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes you wish for a bounds checker on entry, 👍

thresholdConfig,
thresholdConfig < MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD ? String.format("lower than '%s'", MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD) : String.format("bigger than '%s'", MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD),
MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD));
return MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should revert to default, i.e. 0.9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it too and came to a possible resolution:

  • When the config is bigger than max, I think we should return max (as it is top limit);
  • When the config is lower than min, then we use default;

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sound like a plan

Copy link
Contributor

@sureshanaparti sureshanaparti May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GutoVeronezi these value limits are controlled/validated in the configuration manager. you can add this new config to 'weightBasedParametersForValidation' in configuration manager. check here:

private void weightBasedParametersForValidation() {

you can simply return threshold config value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GutoVeronezi GutoVeronezi force-pushed the externalize-secondary-storage-capacity-threshold branch from 3fc2702 to 3ec2ed6 Compare March 11, 2021 18:36
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cltgm

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I did not test it, though.

@GabrielBrascher
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 440

@GutoVeronezi
Copy link
Contributor Author

@DaanHoogland @GabrielBrascher is there anything else to do?

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 88

return true;
}

s_logger.warn(String.format("Image storage [%s] has not enough capacity. Capacity: total=[%s], used=[%s], threshold=[%s%%].", imageStoreId,readableTotalCapacity, readableUsedCapacity, threshold * 100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s_logger.warn(String.format("Image storage [%s] has not enough capacity. Capacity: total=[%s], used=[%s], threshold=[%s%%].", imageStoreId,readableTotalCapacity, readableUsedCapacity, threshold * 100));
s_logger.warn(String.format("Image storage [%s] has not enough capacity.", imageStoreId));

capacity/used bytes and threshold already logged above, may not be required here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be interesting to log here too to not depend on another log line; Also we could lose the information in some situations.

"URI to send StatsCollector statistics to. The collector is defined on the URI scheme. Example: graphite://graphite-hostaddress:port or influxdb://influxdb-hostaddress/dbname. Note that the port is optional, if not added the default port for the respective collector (graphite or influxdb) will be used. Additionally, the database name '/dbname' is also optional; default db name is 'cloudstack'. You must create and configure the database if using influxdb.",
true);
private static final ConfigKey<Double> secondaryStorageCapacityThreshold = new ConfigKey<Double>("Advanced", Double.class, "secondary.storage.capacity.threshold", "0.90",
"Secondary storage capacity threshold (1 = 100%).", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GutoVeronezi can you move this config to 'CapacityManager' ? and use it wherever applicable.

also, update StorageOrchestrator class with this config, it is hardcoded here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private static final String INFLUXDB_HOST_MEASUREMENT = "host_stats";
private static final String INFLUXDB_VM_MEASUREMENT = "vm_stats";

private static final double MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD = 0.00;
Copy link
Contributor

@sureshanaparti sureshanaparti May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final double MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD = 0.00;
private static final double MIN_SECONDARY_STORAGE_CAPACITY_THRESHOLD = 0.00;

Note: this is not required, If you use configuration framework.

private static final String INFLUXDB_VM_MEASUREMENT = "vm_stats";

private static final double MIN_STORAGE_SECONDARY_CAPACITY_THRESHOLD = 0.00;
private static final double MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD = 1.00;
Copy link
Contributor

@sureshanaparti sureshanaparti May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final double MAX_STORAGE_SECONDARY_CAPACITY_THRESHOLD = 1.00;
private static final double MAX_SECONDARY_STORAGE_CAPACITY_THRESHOLD = 1.00;

Note: this is not required, If you use configuration framework.

private static final ConfigKey<String> statsOutputUri = new ConfigKey<String>("Advanced", String.class, "stats.output.uri", "",
"URI to send StatsCollector statistics to. The collector is defined on the URI scheme. Example: graphite://graphite-hostaddress:port or influxdb://influxdb-hostaddress/dbname. Note that the port is optional, if not added the default port for the respective collector (graphite or influxdb) will be used. Additionally, the database name '/dbname' is also optional; default db name is 'cloudstack'. You must create and configure the database if using influxdb.",
true);
private static final ConfigKey<Double> secondaryStorageCapacityThreshold = new ConfigKey<Double>("Advanced", Double.class, "secondary.storage.capacity.threshold", "0.90",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for most of the percentage based configs clear indicate that values should be between 0 and 1, and same is being validated in the configuration manager. can you search code with string 'Percentage (as a value between 0 and 1) ' and update the config description as appropriate. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GutoVeronezi
Copy link
Contributor Author

GutoVeronezi commented May 31, 2021

@sureshanaparti Thank you for the review.

As you suggested, I added the config to the validation in ConfigurationManagerImpl. However, I had to change the config type from Double to Float (not that it is a problem) because the validation only works on Float configs.
I tested some of they and realize that those with Double type are not validated. I think we need some refactors in this validation, in another PR. What do you think?

@DaanHoogland
Copy link
Contributor

@sureshanaparti are you satisfied?
do we need more testing? cc @borisstoyanov @vladimirpetrov

@GutoVeronezi
Copy link
Contributor Author

@sureshanaparti @DaanHoogland @GabrielBrascher is there anything else to do?

1 similar comment
@GutoVeronezi
Copy link
Contributor Author

@sureshanaparti @DaanHoogland @GabrielBrascher is there anything else to do?

@sureshanaparti
Copy link
Contributor

@sureshanaparti Thank you for the review.

As you suggested, I added the config to the validation in ConfigurationManagerImpl. However, I had to change the config type from Double to Float (not that it is a problem) because the validation only works on Float configs.
I tested some of they and realize that those with Double type are not validated. I think we need some refactors in this validation, in another PR. What do you think?

@GutoVeronezi agree, these validations can be refactored in another PR.

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 533

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1250)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47948 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4790-t1250-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 87 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_invalid_upgrade_kubernetes_cluster Failure 3615.86 test_kubernetes_clusters.py
test_02_deploy_and_upgrade_kubernetes_cluster Failure 3616.17 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 0.06 test_kubernetes_clusters.py
test_04_basic_lifecycle_kubernetes_cluster Failure 0.04 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 0.04 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 94.79 test_kubernetes_clusters.py

@DaanHoogland DaanHoogland added this to the 4.16.0.0 milestone Jul 16, 2021
@DaanHoogland DaanHoogland merged commit cbe380a into apache:main Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants