Skip to content

Conversation

@GabrielBrascher
Copy link
Member

Description

Numbers are great for computers, however, we (humans) deal better with names. Logging host's ID is good in terms of short messages; however, admins must know all hosts by their ID or look via DB/API queries to figure out which host the log message is pointing to.

This PR proposes an admin friendly approach for log messages by logging the host name instead of its ID.

An example of how it works currently log messages can be painful in large environments:

Operation Failed vm's original host id: 334 new host id: 300 host id before state transition: 127
Vs.
Operation Failed vm's original host: node23.clusterA new host: node09.clusterA host before state transition: node01.clusterA

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

@GabrielBrascher GabrielBrascher added this to the 4.16.0.0 milestone Jan 8, 2021
@GabrielBrascher GabrielBrascher self-assigned this Jan 8, 2021
@GabrielBrascher GabrielBrascher changed the title Enhance log messages with hostName Enhance log messages with host name Jan 8, 2021
@GabrielBrascher GabrielBrascher marked this pull request as draft January 12, 2021 00:48
Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GabrielBrascher can we use something like host-name(ID: host_uuid)? Personally I found it easier at times to debug issues with uuid in logs

@blueorangutan package

@GabrielBrascher
Copy link
Member Author

GabrielBrascher commented Jan 14, 2021

@shwstppr That is something that I can add. Maybe this is something that we can discuss (on the mailing list or a Github issue) in order to find conventions that best fit CloudStack log messages.

Nowadays we do not have any convention.

@RodrigoDLopez
Copy link
Contributor

Hey @shwstppr what do you suggest?

I totally agree with @GabrielBrascher, normally when we are talking about a host we use their names to refer to it, I never met someone who uses UUID for this purpose.
Maybe, in some cases, the UUID turn easy to find these hosts on the database, but with a simple change on the query, we have the same result.
And for me, something like

Operation Failed vm's original host: node23.clusterA (ID: 98c37930-599f-11eb-ae93-0242ac130002) new host: node09.clusterA (ID: bdb22ea8-599f-11eb-ae93-0242ac130002) host before state transition: node01.clusterA (ID: bdb22598-599f-11eb-ae93-0242ac130002)

it is too verbose and does not add anything to the debug factor.
Presenting only the hostname in the log makes it easy to identify, as it is the way we use it to reference it daily. And with the name, we can find the UUID if necessary.
Don't you agree?

@shwstppr
Copy link
Contributor

@RodrigoDLopez Hi, I don't have any issue in using just the names of host if we have consensus on that.
I liked the idea of having a discussion on ML as that will allow forming a convention project wide

@ravening
Copy link
Member

@GabrielBrascher can we use something like host-name(ID: host_uuid)? Personally I found it easier at times to debug issues with uuid in logs

@blueorangutan package

But this will display uuid to users also right which doesnt make sense to them

@blueorangutan
Copy link

@ravening a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2553

@sureshanaparti
Copy link
Contributor

@GabrielBrascher can we use something like host-name(ID: host_uuid)? Personally I found it easier at times to debug issues with uuid in logs
@blueorangutan package

But this will display uuid to users also right which doesnt make sense to them

For logging (and debugging), I think it is better to use either ID / UUID (with/without name) as the name is not unique (multiple hosts can have the same name, across the cloudstack env).

@rafaelweingartner
Copy link
Member

What about adding all of them? I mean, ID, Name, and UUID. This would for sure help debugging. It would have the ID of the DB already there, and the UUID of the element that is used in the API. Moreover, the name to make the log entry more friendly.

Something such as:

s_logger.debug(String.format("Create ClusteredDirectAgentAttache for host [id=%s, uuid=%s, name=%s] ",  host.getId(), host.getUuid(), host.getName()));

@GabrielBrascher
Copy link
Member Author

@rafaelweingartner @sureshanaparti having ID, UUID, and the name sounds like a good way. Might be a bit verbose, but it serves well all the preferences.

@GabrielBrascher GabrielBrascher marked this pull request as ready for review February 19, 2021 13:31
@GabrielBrascher
Copy link
Member Author

Applied changes in order to use the toString() method from HostVO. These changes consider the proposal from PR #4706 where toString() would present a "JSON like" String.

  • How it is today: Host[-123-Routing] (yep ... nowadays logs that use the toString have - before the ID)
  • Proposal from Improve logs on KVMHostActivityChecker #4706: Host [{id: "123", name: "hostName", uuid: "1e374e50-g231-4345-d6h3-123abcdrfas143", type="Routing"}]

The idea is that it would be good for:

  1. Debug via DB filtering with the host ID
  2. Run commands on API/cloudmonkey by copying the uuid presented at the log
  3. Identify the host quickly as name makes logs human-friendly
  4. Name also tends to be resolved to the host address; therefore, admins can jump into the host (SSH) by copying the name from the log.

Ping for discussion/review: @rafaelweingartner @sureshanaparti @shwstppr @RodrigoDLopez @DaanHoogland @rhtyd @wido @ustcweizhou @svenvogel @andrijapanicsb and others ...

@weizhouapache
Copy link
Member

@GabrielBrascher it would be good to add host name in the output.

ip is not very important as we normally ssh into server using hostname, not ip.

diff --git a/engine/schema/src/main/java/com/cloud/host/HostVO.java b/engine/schema/src/main/java/com/cloud/host/HostVO.java
index 1c3d7c4267..a78b940142 100644
--- a/engine/schema/src/main/java/com/cloud/host/HostVO.java
+++ b/engine/schema/src/main/java/com/cloud/host/HostVO.java
@@ -677,7 +677,7 @@ public class HostVO implements Host {

     @Override
     public String toString() {
-        return new StringBuilder("Host[").append("-").append(id).append("-").append(type).append("]").toString();
+        return new StringBuilder("Host[").append(name).append("-").append(id).append("-").append(type).append("]").toString();
     }

     public void setHypervisorType(HypervisorType hypervisorType) {

@GabrielBrascher
Copy link
Member Author

@weizhouapache Sorry if I was not clear, but PR #4706 is adding the host's name indeed, IP address would not be interesting indeed.

The idea is exactly what you mentioned. We will then log host ID, UUID, and Name (Type will also be logged the host.toString is used for non-routing hosts as well).

This PR also uses the toString in order to have a standard Host log. As PR #4706 already proposes an interesting approach, I added it on this proposal as well.

@GabrielBrascher
Copy link
Member Author

@blueorangutan package

@GabrielBrascher
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@nvazquez
Copy link
Contributor

nvazquez commented Jul 1, 2021

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 442

@nvazquez
Copy link
Contributor

nvazquez commented Jul 1, 2021

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Contributor

@GutoVeronezi GutoVeronezi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @GabrielBrascher.

CLGTM, I raised one last point about toString.

@blueorangutan
Copy link

Trillian test result (tid-1188)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 58897 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4575-t1188-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_storage_policy.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_usage.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_router_nics.py
Smoke tests completed. 81 look OK, 7 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestVMWareStoragePolicies>:setup Error 0.00 test_storage_policy.py
test_02_create_template_with_checksum_sha1 Error 65.36 test_templates.py
test_03_create_template_with_checksum_sha256 Error 65.36 test_templates.py
test_04_create_template_with_checksum_md5 Error 65.38 test_templates.py
test_05_create_template_with_no_checksum Error 65.38 test_templates.py
test_04_extract_template Failure 128.34 test_templates.py
ContextSuite context=TestISOUsage>:setup Error 0.00 test_usage.py
test_01_volume_usage Failure 786.45 test_usage.py
test_10_attachAndDetach_iso Failure 1510.13 test_vm_life_cycle.py
test_06_download_detached_volume Failure 426.80 test_volumes.py
ContextSuite context=TestVPCRedundancy>:setup Error 0.00 test_vpc_redundant.py
ContextSuite context=TestVPCNics>:setup Error 0.00 test_vpc_router_nics.py

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 457

@blueorangutan
Copy link

Trillian test result (tid-1194)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33154 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4575-t1194-kvm-centos7.zip
Smoke tests completed. 88 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✖️ debian. SL-JID 522

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 523

@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-1237)

@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1238)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42963 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4575-t1238-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_diagnostics.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 87 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 536.06 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 464.38 test_vpc_redundant.py

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvazquez
Copy link
Contributor

Merging based on approvals and test results

@nvazquez nvazquez merged commit ca78f5b into apache:main Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.