Page MenuHomePhabricator

Update eventlogging image for MWCLI
Closed, ResolvedPublic

Description

The eventlogging image for MWCLI seems to be out of date? When visiting pages that have to send events to eventlogging, I get an error like:

{"name":"eventgate-wikimedia","hostname":"364e002c55ca","pid":1,"level":50,"err":{"message":"Failed loading schema at /analytics/mediawiki/accountcreation/account_conversion/1.1.0","name":"EventSchemaLoadError","stack":"EventSchemaLoadError: Failed loading schema at /analytics/mediawiki/accountcreation/account_conversion/1.1.0\n    at /srv/service/node_modules/eventgate/lib/EventValidator.js:232:23\n    at tryCatcher (/srv/service/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/srv/service/node_modules/bluebird/js/release/promise.js:547:31)\n    at Promise._settlePromise (/srv/service/node_modules/bluebird/js/release/promise.js:604:18)\n    at Promise._settlePromise0 (/srv/service/node_modules/bluebird/js/release/promise.js:649:10)\n    at Promise._settlePromises (/srv/service/node_modules/bluebird/js/release/promise.js:725:18)\n    at _drainQueueStep (/srv/service/node_modules/bluebird/js/release/async.js:93:12)\n    at _drainQueue (/srv/service/node_modules/bluebird/js/release/async.js:86:9)\n    at Async._drainQueues (/srv/service/node_modules/bluebird/js/release/async.js:102:5)\n    at Async.drainQueues [as _onImmediate] (/srv/service/node_modules/bluebird/js/release/async.js:15:14)\n    at process.processImmediate (node:internal/timers:476:21)","originalError":{"name":"HTTPError","message":"connect ENETUNREACH 2a02:ec80:600:ed1a::1:443 - Local (:::0)","status":504,"headers":{"content-type":"application/problem+json"},"body":{"type":"internal_http_error","detail":"connect ENETUNREACH 2a02:ec80:600:ed1a::1:443 - Local (:::0)","internalStack":"Error: connect ENETUNREACH 2a02:ec80:600:ed1a::1:443 - Local (:::0)\n    at internalConnect (node:net:1101:16)\n    at defaultTriggerAsyncIdScope (node:internal/async_hooks:462:18)\n    at emitLookup (node:net:1375:9)\n    at /srv/service/node_modules/dnscache/lib/index.js:125:28\n    at /srv/service/node_modules/dnscache/lib/cache.js:116:13\n    at RawTask.call (/srv/service/node_modules/asap/asap.js:40:19)\n    at flush (/srv/service/node_modules/asap/raw.js:50:29)\n    at processTicksAndRejections (node:internal/process/task_queues:77:11)\n    at runNextTicks (node:internal/process/task_queues:64:3)\n    at process.processImmediate (node:internal/timers:447:9)","internalURI":"https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/mediawiki/accountcreation/account_conversion/1.1.0","internalErr":"connect ENETUNREACH 2a02:ec80:600:ed1a::1:443 - Local (:::0)","internalMethod":"get"}},"uri":"/analytics/mediawiki/accountcreation/account_conversion/1.1.0"},"msg":"event encountered an error: Failed loading schema at /analytics/mediawiki/accountcreation/account_conversion/1.1.0","time":"2025-10-03T10:58:04.090Z","v":0}

{"name":"eventgate-wikimedia-dev","hostname":"364e002c55ca","pid":1,"level":50,"levelPath":"error/events","request_id":"3b2f89a48474092ff7942658","request":{"url":"/v1/events","headers":{"x-request-id":"3b2f89a48474092ff7942658","content-type":"application/json","user-agent":"MediaWiki/1.45.0-alpha","content-length":"541"},"method":"POST","params":{"0":"/v1/events"},"query":{},"remoteAddress":"10.0.0.6","remotePort":46748},"msg":"1 out of 1 events had failures and were not accepted. (0 invalid and 1 errored).","time":"2025-10-03T10:58:04.092Z","v":0}

Not sure why it can’t load the schema. Does the failure have anything to do with: T400119? I remember some weeks ago, I ran into this issue, but the error was about the UA.

I ran mw docker update to make sure I’m using the latest docker images:

INFO Updating 9 services
[+] Pulling 9/9
 ✔ dps Pulled                                                                                                                                                                                                    2.5s
 ✔ phpmyadmin Pulled                                                                                                                                                                                             2.5s
 ✔ mysql Pulled                                                                                                                                                                                                  2.5s
 ✔ memcached Pulled                                                                                                                                                                                              2.5s
 ✔ nginx-proxy Pulled                                                                                                                                                                                            2.5s
 ✔ mediawiki Pulled                                                                                                                                                                                              1.7s
 ✔ mediawiki-web Pulled                                                                                                                                                                                          1.7s
 ✔ eventlogging Pulled                                                                                                                                                                                           1.7s
 ✔ mailhog Pulled                                                                                                                                                                                                2.5s
[+] Running 10/10
 ✔ Container mwcli-mwdd-default-memcached-1                    Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-dps-1                          Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-eventlogging-1                 Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-nginx-proxy-1                  Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-mailhog-1                      Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-phpmyadmin-1                   Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-mediawiki-web-1                Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-mediawiki-1                    Running                                                                                                                                           0.0s
 ✔ Container mwcli-mwdd-default-mysql-1                        Running                                                                                                                                           0.0s

And my mw version:

You are already on the latest version: 0.28.0

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Bump and change eventgate imagerepos/releng/cli!634addshoreeventgate-image-bumpmain
Customize query in GitLab

Event Timeline

Hi,
I'm having a similar error when trying to send an event locally.
I tried adding the setting of EVENTLOGGING_IMAGE=docker-registry.wikimedia.org/wikimedia/eventgate-wikimedia:ffd68c0de41e3395e2f8ba9422fbe8824c2a49ff (the latest from here) to .config/mwcli/mwdd/default as it is otherwise overwritten as an old image name in eventlogging.yml (it'd be good to update it in eventlogging.yml)

Still, the schema cannot be loaded (even though it does exist at https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/product_metrics/web/base/1.5.0 ). This time the error is "unable to get local issuer certificate"

eventlogging-1  | {"name":"eventgate-wikimedia","hostname":"XXX","pid":1,"level":50,"err":{"message":"Failed loading schema at /analytics/product_metrics/web/base/1.5.0","name":"EventSchemaLoadError","stack":"EventSchemaLoadError: Failed loading schema at /analytics/product_metrics/web/base/1.5.0\n    at loadSchema.catch (/srv/service/node_modules/eventgate/lib/EventValidator.js:229:23)\n    at tryCatcher (/srv/service/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/srv/service/node_modules/bluebird/js/release/promise.js:547:31)\n    at Promise._settlePromise (/srv/service/node_modules/bluebird/js/release/promise.js:604:18)\n    at Promise._settlePromise0 (/srv/service/node_modules/bluebird/js/release/promise.js:649:10)\n    at Promise._settlePromises (/srv/service/node_modules/bluebird/js/release/promise.js:725:18)\n    at _drainQueueStep (/srv/service/node_modules/bluebird/js/release/async.js:93:12)\n    at _drainQueue (/srv/service/node_modules/bluebird/js/release/async.js:86:9)\n    at Async._drainQueues (/srv/service/node_modules/bluebird/js/release/async.js:102:5)\n    at Immediate.Async.drainQueues [as _onImmediate] (/srv/service/node_modules/bluebird/js/release/async.js:15:14)\n    at runCallback (timers.js:705:18)\n    at tryOnImmediate (timers.js:676:5)\n    at processImmediate (timers.js:658:5)","originalError":{"name":"HTTPError","message":"unable to get local issuer certificate","status":504,"headers":{"content-type":"application/problem+json"},"body":{"type":"internal_http_error","detail":"unable to get local issuer certificate","internalStack":"Error: unable to get local issuer certificate\n    at TLSSocket.onConnectSecure (_tls_wrap.js:1055:34)\n    at TLSSocket.emit (events.js:189:13)\n    at TLSSocket._finishInit (_tls_wrap.js:633:8)","internalURI":"https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/product_metrics/web/base/1.5.0","internalErr":"unable to get local issuer certificate","internalMethod":"get"}},"uri":"/analytics/product_metrics/web/base/1.5.0"},"msg":"event encountered an error: Failed loading schema at /analytics/product_metrics/web/base/1.5.0","time":"2025-11-21T14:59:48.168Z","v":0}

I guess it leads here for the error https://gitlab.wikimedia.org/repos/data-engineering/eventgate/-/blob/master/lib/EventValidator.js#L229

@Addshore do you know of something else that needs to be updated here? Thanks!

I tried adding the setting of EVENTLOGGING_IMAGE=docker-registry.wikimedia.org/wikimedia/eventgate-wikimedia:ffd68c0de41e3395e2f8ba9422fbe8824c2a49ff (the latest from here) to .config/mwcli/mwdd/default as it is otherwise overwritten as an old image name in eventlogging.yml (it'd be good to update it in eventlogging.yml)

Still, the schema cannot be loaded (even though it does exist at https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/product_metrics/web/base/1.5.0 ). This time the error is "unable to get local issuer certificate"

eventlogging-1  | {"name":"eventgate-wikimedia","hostname":"XXX","pid":1,"level":50,"err":{"message":"Failed loading schema at /analytics/product_metrics/web/base/1.5.0","name":"EventSchemaLoadError","stack":"EventSchemaLoadError: Failed loading schema at /analytics/product_metrics/web/base/1.5.0\n    at loadSchema.catch (/srv/service/node_modules/eventgate/lib/EventValidator.js:229:23)\n    at tryCatcher (/srv/service/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/srv/service/node_modules/bluebird/js/release/promise.js:547:31)\n    at Promise._settlePromise (/srv/service/node_modules/bluebird/js/release/promise.js:604:18)\n    at Promise._settlePromise0 (/srv/service/node_modules/bluebird/js/release/promise.js:649:10)\n    at Promise._settlePromises (/srv/service/node_modules/bluebird/js/release/promise.js:725:18)\n    at _drainQueueStep (/srv/service/node_modules/bluebird/js/release/async.js:93:12)\n    at _drainQueue (/srv/service/node_modules/bluebird/js/release/async.js:86:9)\n    at Async._drainQueues (/srv/service/node_modules/bluebird/js/release/async.js:102:5)\n    at Immediate.Async.drainQueues [as _onImmediate] (/srv/service/node_modules/bluebird/js/release/async.js:15:14)\n    at runCallback (timers.js:705:18)\n    at tryOnImmediate (timers.js:676:5)\n    at processImmediate (timers.js:658:5)","originalError":{"name":"HTTPError","message":"unable to get local issuer certificate","status":504,"headers":{"content-type":"application/problem+json"},"body":{"type":"internal_http_error","detail":"unable to get local issuer certificate","internalStack":"Error: unable to get local issuer certificate\n    at TLSSocket.onConnectSecure (_tls_wrap.js:1055:34)\n    at TLSSocket.emit (events.js:189:13)\n    at TLSSocket._finishInit (_tls_wrap.js:633:8)","internalURI":"https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/product_metrics/web/base/1.5.0","internalErr":"unable to get local issuer certificate","internalMethod":"get"}},"uri":"/analytics/product_metrics/web/base/1.5.0"},"msg":"event encountered an error: Failed loading schema at /analytics/product_metrics/web/base/1.5.0","time":"2025-11-21T14:59:48.168Z","v":0}

I guess it leads here for the error https://gitlab.wikimedia.org/repos/data-engineering/eventgate/-/blob/master/lib/EventValidator.js#L229

If the new images doesn't work out of the box that is unfortunate.

It looks like the image is missing a cert that is needed for the http call though?

Perhaps this image when using in WMF production gets additional certs from somewhere?

n one find the latest deployed container images for various services these days? TLDR is I'm trying to find out what version of https://docker-registry.wikimedia.org//repos/data-engineering/eventgate-wikimedia/tags/ is actively used right now / if this image is actually used or if there is a different one for eventgate
10:54 AM <Lucas_WMDE> Lucas Werkmeister
addshore: my guess would’ve been deployment-charts but it looks like https://gerrit.wikimedia.org/g/operations/deployment-charts/+/5f57cf991e/charts/eventgate/values.yaml#92 isn’t pinned to a specific version
10:54 AM
— Lucas_WMDE might be misunderstanding a lot of things
10:56 AM <addshore>
i was confused when it wasnt just showing up in codesearch :D but yes, this already seems like the pointer I need, which si Im looking at and or usoing the wrong image :D
10:57 AM
it changed here it seems https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/6834d5331aaebccbb069f10d1e2250c2281c1be4%5E%21/#F1
10:57 AM
Lucas_WMDE: tyvm

So yes, it would appear that mwcli is set to use the wrong image now
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/6834d5331aaebccbb069f10d1e2250c2281c1be4%5E%21/#F1

https://docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia/tags/ looks to be correct, and thus likely docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26.0

Looking encouraging

wdev dev dc logs eventlogging
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.336Z","ecs.version":"8.10.0","log.level":"info","message":"responseTimeMetric","service":{"name":"eventgate-wikimedia-dev"}}
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.346Z","ecs.version":"8.10.0","labels":{"route":"events"},"log.level":"info","message":"Instantiating EventGate from eventgate-wikimedia-dev.js","service":{"name":"eventgate-wikimedia-dev"}}
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.421Z","ecs.version":"8.10.0","log.level":"info","message":"Will look for relative schema_uris in https://schema.wikimedia.org/repositories/primary/jsonschema,https://schema.wikimedia.org/repositories/secondary/jsonschema","service":{"name":"eventgate-wikimedia-dev"}}
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.454Z","ecs.version":"8.10.0","log.level":"info","message":"No stream_config_uri was set; events of any $schema will be allowed in any stream.","service":{"name":"eventgate-wikimedia-dev"}}
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.455Z","ecs.version":"8.10.0","log.level":"info","message":"Writing valid events to stdout","service":{"name":"eventgate-wikimedia-dev"}}
eventlogging-1  | {"@timestamp":"2025-12-01T10:02:21.603Z","ecs.version":"8.10.0","log.level":"info","message":"Worker 1 listening on 0.0.0.0:8192","service":{"name":"eventgate-wikimedia-dev"}}

So https://gitlab.wikimedia.org/repos/releng/cli/-/merge_requests/634 should fix this, but it would be great if @SuzanneWood-WMDE or @xSavitar might be able to validate it too first with the env var override!

EVENTLOGGING_IMAGE=docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26

After some local testing, I noticed the following:

  • Setting EVENTLOGGING_IMAGE=docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26 (as recommended in your comment above) gives an error when I try to recreate the event logging container:
mwcli [master] mw docker eventlogging create
[+] Running 1/1
 ✘ eventlogging Error manifest for docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26 not found: manifest unknown: manifest unknown                                                  4.1s
Error response from daemon: manifest for docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26 not found: manifest unknown: manifest unknown
ERROR exit status 1

That was because the image version is v1.26.0, not v1.26. Verified here: https://docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia/tags/

  • But after changing to v1.26.0, creating the container worked
mwcli [master] mw docker env set EVENTLOGGING_IMAGE docker-registry.wikimedia.org/repos/data-engineering/eventgate-wikimedia:v1.26.0
mwcli [master] mw docker eventlogging create
[+] Running 8/8
 ✔ eventlogging Pulled                                                                                                                                                                                          96.6s
   ✔ 233900dcc2a1 Pull complete                                                                                                                                                                                 18.2s
   ✔ faa35bfe3cc3 Pull complete                                                                                                                                                                                 31.4s
   ✔ 1c5a7497c1b0 Pull complete                                                                                                                                                                                 31.4s
   ✔ 63590cccf304 Pull complete                                                                                                                                                                                 31.4s
   ✔ a09be86c9167 Pull complete                                                                                                                                                                                 31.4s
   ✔ 21eacdff7bd1 Pull complete                                                                                                                                                                                 91.4s
   ✔ 4f4fb700ef54 Pull complete                                                                                                                                                                                 91.4s
[+] Running 2/2
 ✔ Container mwcli-mwdd-default-eventlogging-1                                                                                                                 Started                                           1.8s
  • Attempting to log in to my local wiki - Special:Userlogin emits events in the event logging container.
{"@timestamp":"2025-12-03T08:41:37.652Z","ecs.version":"8.10.0","log.level":"info","message":"Loaded schema at /analytics/mediawiki/accountcreation/account_conversion/1.1.0","name":"EventValidator","service":{"name":"eventgate-wikimedia-dev"}}
{
  "http": {
    "request_headers": {
      "user-agent": “(redacted)"
    }
  },
  "meta": {
    "domain": "auth.mediawiki.mwdd.localhost",
    "stream": "mediawiki.accountcreation.login",
    "id": "a32e72cd-2b3e-4572-842b-81f6e14f2777",
    "dt": "2025-12-03T08:41:37.667Z"
  },
  "dt": "2025-12-03T08:41:35Z",
  "$schema": "/analytics/mediawiki/accountcreation/account_conversion/1.1.0",
  "event_type": "impression",
  "performer": {
    "user_id": 0,
    "user_text": “(redacted)",
    "is_temp": false
  },
  "source_wiki": "metawiki",
  "sul3_enabled": true,
  "page_title": "UserLogin",
  "page_namespace": -1
}
{
  "http": {
    "request_headers": {
      "user-agent": “(redacted)"
    }
  },
  "meta": {
    "domain": "auth.mediawiki.mwdd.localhost",
    "stream": "mediawiki.accountcreation.login",
    "id": "41c8b4c9-1987-4ff6-9235-1ee4a08a6f74",
    "dt": "2025-12-03T08:41:56.344Z"
  },
  "dt": "2025-12-03T08:41:56Z",
  "$schema": "/analytics/mediawiki/accountcreation/account_conversion/1.1.0",
  "event_type": "success",
  "performer": {
    "user_id": 0,
    "user_text": "10.0.0.7",
    "is_temp": false
  },
  "source_wiki": "metawiki",
  "sul3_enabled": true,
  "page_title": "UserLogin",
  "page_namespace": -1
}

So I think this works now. Thank you, @Addshore, for fixing. 🙏🏽 I’ll let @SuzanneWood-WMDE confirm, and then we can resolve this ticket.

Great, I'll look at merging this and including it in the next release too :)

Great, I'll look at merging this and including it in the next release too :)

Thank you for helping us fix this @Addshore. Looking forward to the next release.

Do you want to resolve this?