Page MenuHomePhabricator

Add a Link: Link Suggestions Code Review and rollout planning
Closed, ResolvedPublic5 Estimated Story Points

Description

User story & summary:

As a new editor, I want access to easy edit suggestions, so that I can successfully start editing.
As a mobile editor, I want access to suggestions that are easy to edit from a mobile device, so that I can successfully edit.

Related:

Background & research:

This task is important because Add a Link helps new account holders get started:

Documentation:

https://wikitech.wikimedia.org/wiki/Add_Link#Enabling_on_a_new_wiki

Acceptance Criteria:

Details

Other Assignee
Sgs
Related Changes in Gerrit:

Related Objects

Event Timeline

Hello @KStoller-WMF ,

  • I know the inference api currently supports the wikis here.
  • Also I have the list of wikis that are below/above the release threshold
  • However, I'm missing the information about which wikis are enabled in tasks currently. Can you share this information? Have we already enabled tasks for all the wikis here. I can look into usage if this is not easy to find.
  • As we want to enable tasks for wikis, I think we should depend on the list of wikis currently enabled in tasks, rather than the list of wikis that are currently being served. They might be the same though. I just want to make sure.

@OKarakaya-WMF If I understand the config correctly the enabled ones would be in mediawiki-config repo under ext-GrowthExperiments.php config.
I think it is these two variables wgGENewcomerTasksLinkRecommendationsEnabled & wgGELinkRecommendationsFrontendEnabled , so we can extract the list of wikis from there. Please someone from the Growth team correct me if I'm wrong.

KStoller-WMF updated the task description. (Show Details)
KStoller-WMF moved this task from Inbox to Up Next (estimated tasks) on the Growth-Team board.
KStoller-WMF added a subscriber: Trizek-WMF.

Hello good morning,

Thank you both creating this task.

Related to the recent updates in the deployment plan section:
In scope of this task, do we want to enable add-a-link onboarding tasks to the wikis that currently do not have add-a-link tasks? (e.g enwiki not in the scope)
or is this task for releasing all v2 models above the release threshold? (e.g. enwiki is in the scope)

Thank you!

About the release of new wikis that are above the release threshold in v2 and do not have add-a-link onboarding tasks;
I share the below the list of wikis filtered by the criteria above (47 in total);
The wikis are sorted by ~their size.

Group1:
klwiki
tiwiki
pwnwiki

Group 2:
chwiki
ikwiki
altwiki
gurwiki
gucwiki
mnwwiki
jbowiki
kcgwiki
dvwiki
madwiki
pcmwiki
awawiki
guwwiki
kgwiki
niawiki
novwiki
krcwiki
blkwiki
szywiki
taywiki

Group3:
dtywiki
ganwiki
nrmwiki
bowiki
smnwiki
shnwiki
dagwiki
mniwiki
hywwiki
avkwiki
shiwiki
bugwiki
skrwiki

Group4:
urwiki
ltwiki
mywiki
bpywiki
lldwiki
wuuwiki
jawiki
fywiki
diqwiki
nvwiki
zhwiki

I've split the wikis into 4 groups. We can release each group separately e.g in different days.
The advantage of releasing small wikis first is that it will have less impact if something goes wrong.
The disadvantage is that it's easier to figure out problems on large wikis as they are used more.

Please feel free to suggest another plan or let me know your comments.

Option1:
Group1 -> Group 4 -> Group 3 -> Group 2

Option2:
Group1 -> Group 2 -> Group 3 -> Group 4

So far, we are in favor of Option1 to see the impact as soon as possible after verifying deployment with Group1

About the release of new wikis that are above the release threshold in v2 and do not have add-a-link onboarding tasks;
I share the below the list of wikis filtered by the criteria above (47 in total);
The wikis are sorted by ~their size.

Group1:
klwiki
tiwiki
pwnwiki

Group 2:
[...]

I've split the wikis into 4 groups. We can release each group separately e.g in different days.
The advantage of releasing small wikis first is that it will have less impact if something goes wrong.
The disadvantage is that it's easier to figure out problems on large wikis as they are used more.

Please feel free to suggest another plan or let me know your comments.

Mh, maybe that first group will be useful for trying the new process out in production (i.e., making sure that there are no fatal errors and such). But otherwise, those wikis seem to be pretty close to dead:

  • klwiki had 5 main namespace edits in the last 30 days
  • tiwiki had 6 main namespace edits in the last 30 days
  • pwnwiki had 3 main namespace edits in the last 30 days

So, I would expect 0 genuine newcomer interaction at all from these wikis. OTOH, this might be another example for why it might be useful to enable the homepage by default for auto-created accounts.

KStoller-WMF updated the task description. (Show Details)
KStoller-WMF set the point value for this task to 5.

@OKarakaya-WMF Sorry, I had some conflicting details in the task description.
This task is meant to just be about coordinating with you, completing the necessary code review, and creating subtasks for each rollout group (once there is consensus).

My assumption is that we first want to focus on wikis that don't yet have access to Growth's "Add a Link" task, and after that work is done we can then focus on wikis that already have the task (moving from V1 to V2 of the model).

I think that can all be covered in this parent task, but I imagine we should create subtasks for each rollout group.

I'm open to other ways to approach this or organize the work though!

hello @KStoller-WMF ,
I totally agree 💯 . All clear, thank you!

Regarding community engagement, group 4 will need a specific effort as it gathers the biggest wikis of the list.

I successfully built a local version of the image (I had some issues with Docker running out of memory for some reason). For simplicity, I pushed the image I created to gitlab.fit.cvut.cz:5050/urbanm42/mwaddlink-rebuild:0c448e78b so that it is easier to return to it later. This image is based on https://gerrit.wikimedia.org/r/c/research/mwaddlink/+/1183685.

From there, I attempted to load the new cswiki datasets into an empty database (from https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/):

somebody@fd0fcfbfb587:/srv/app$ export ANALYTICS_BASE_URL=https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/
somebody@fd0fcfbfb587:/srv/app$ python3 load-datasets.py --wiki-id cswiki --path /srv/app/data --download
== Initializing ==
   [general] Ensuring checksum table exists...[OK]
   [general] Ensuring model table exists...[OK]
   [cswiki] Ensuring anchors table exists...[OK]
   [cswiki] Ensuring redirects table exists...[OK]
   [cswiki] Ensuring pageids table exists...[OK]
   [cswiki] Ensuring w2vfiltered table exists...[OK]
   [cswiki] Ensuring model table exists...[OK]
   Beginning process to load datasets for cswiki
== Attempting to download datasets (anchors, redirects, pageids, w2vfiltered, model) for cswiki ==
   No checksum found for anchors in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/cswiki/lr_cswiki_anchors.sql.gz...[OK]
   No checksum found for redirects in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/cswiki/lr_cswiki_redirects.sql.gz...[OK]
   No checksum found for pageids in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/cswiki/lr_cswiki_pageids.sql.gz...[OK]
   No checksum found for w2vfiltered in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/cswiki/lr_cswiki_w2vfiltered.sql.gz...[OK]
   No checksum found for model in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/cswiki/cswiki.linkmodel.json...[OK]
== Importing datasets (anchors, redirects, pageids, w2vfiltered, model) for cswiki ==
   Verifying file and checksum exists for anchors...[OK]
   Verifying checksum for anchors...   Verifying file and checksum exists for redirects...[OK]
   Verifying checksum for redirects...   Verifying file and checksum exists for pageids...[OK]
   Verifying checksum for pageids...   Verifying file and checksum exists for w2vfiltered...[OK]
   Verifying checksum for w2vfiltered...   Verifying file and checksum exists for model...[OK]
   Verifying checksum for model...   Processing dataset: anchors
     Deleting all values from lr_cswiki_anchors...[OK]
     Inserting content into lr_cswiki_anchors...[OK]
       846101 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: redirects
     Deleting all values from lr_cswiki_redirects...[OK]
     Inserting content into lr_cswiki_redirects...[OK]
       344361 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: pageids
     Deleting all values from lr_cswiki_pageids...[OK]
     Inserting content into lr_cswiki_pageids...[OK]
       595079 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: w2vfiltered
     Deleting all values from lr_cswiki_w2vfiltered...[OK]
     Inserting content into lr_cswiki_w2vfiltered...[OK]
       571767 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: model
     Inserting link model...[OK]
     Updating stored checksum...[OK]
   Committing...[OK]
   Finished importing for cswiki!
Finished importing datasets for cswiki
somebody@fd0fcfbfb587:/srv/app$

Querying the service via the API works, and I do get results that appear to be believable.

Then, I tried loading old frwiki dataset:

somebody@fd0fcfbfb587:/srv/app$ unset ANALYTICS_BASE_URL
somebody@fd0fcfbfb587:/srv/app$ python3 load-datasets.py --wiki-id frwiki --download --path /srv/app/data
== Initializing ==
   [general] Ensuring checksum table exists...[OK]
   [general] Ensuring model table exists...[OK]
   [frwiki] Ensuring anchors table exists...[OK]
   [frwiki] Ensuring redirects table exists...[OK]
   [frwiki] Ensuring pageids table exists...[OK]
   [frwiki] Ensuring w2vfiltered table exists...[OK]
   [frwiki] Ensuring model table exists...[OK]
   Beginning process to load datasets for frwiki
== Attempting to download datasets (anchors, redirects, pageids, w2vfiltered, model) for frwiki ==
== Initializing ==
   [general] Ensuring checksum table exists...[OK]
   [general] Ensuring model table exists...[OK]
   [frwiki] Ensuring anchors table exists...[OK]
   [frwiki] Ensuring redirects table exists...[OK]
   [frwiki] Ensuring pageids table exists...[OK]
   [frwiki] Ensuring w2vfiltered table exists...[OK]
   [frwiki] Ensuring model table exists...[OK]
   Beginning process to load datasets for frwiki
== Attempting to download datasets (anchors, redirects, pageids, w2vfiltered, model) for frwiki ==
   No checksum found for anchors in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/frwiki/lr_frwiki_anchors.sql.gz...[OK]
   No checksum found for redirects in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/frwiki/lr_frwiki_redirects.sql.gz...[OK]
   No checksum found for pageids in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/frwiki/lr_frwiki_pageids.sql.gz...[OK]
   No checksum found for w2vfiltered in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/frwiki/lr_frwiki_w2vfiltered.sql.gz...[OK]
   No checksum found for model in local database, will attempt to download
   Downloading dataset https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/frwiki/frwiki.linkmodel.json...[OK]
== Importing datasets (anchors, redirects, pageids, w2vfiltered, model) for frwiki ==
   Verifying file and checksum exists for anchors...[OK]
   Verifying checksum for anchors...   Verifying file and checksum exists for redirects...[OK]
   Verifying checksum for redirects...   Verifying file and checksum exists for pageids...[OK]
   Verifying checksum for pageids...   Verifying file and checksum exists for w2vfiltered...[OK]
   Verifying checksum for w2vfiltered...   Verifying file and checksum exists for model...[OK]
   Verifying checksum for model...   Processing dataset: anchors
     Deleting all values from lr_frwiki_anchors...[OK]
     Inserting content into lr_frwiki_anchors...[OK]
       2992265 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: redirects
     Deleting all values from lr_frwiki_redirects...[OK]
     Inserting content into lr_frwiki_redirects...[OK]
       1770633 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: pageids
     Deleting all values from lr_frwiki_pageids...[OK]
     Inserting content into lr_frwiki_pageids...[OK]
       2650853 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: w2vfiltered
     Deleting all values from lr_frwiki_w2vfiltered...[OK]
     Inserting content into lr_frwiki_w2vfiltered...[OK]
       2269099 rows inserted
     Updating stored checksum...[OK]
   Processing dataset: model
     Inserting link model...[OK]
     Updating stored checksum...[OK]
   Committing...[OK]
   Finished importing for frwiki!
Finished importing datasets for frwiki
somebody@fd0fcfbfb587:/srv/app$

Querying recommendations worked reliably all the way. Switching between older and newer models worked as I'd expect it to.

I noticed in docker image ls that the image seems to be larger (almost two times). I'm not sure if I'm interpreting the numbers correctly, but that seems unexpected? Other than that, I left a couple of review notes inline, and I'm willing to merge the task afterwards.

I approved @OKarakaya-WMF's code in https://gerrit.wikimedia.org/r/c/research/mwaddlink/+/1183685. We will meet next week to deploy it to production and make sure it works there as well. Fingers crossed!

The updated version of the service is now deployed to production. @OKarakaya-WMF is now working on updating load-datasets.py to work with two different sources of models:

This should give us flexibility on loading whatever models is appropriate at the same time. Once the work on that code is completed by ML, Growth (myself) will need to review it and assist with production deployment.

Hey @Urbanecm_WMF

I've started a patch to deploy new models here: https://gerrit.wikimedia.org/r/c/research/mwaddlink/+/1199815
It's WIP but I think it should be ready to review tomorrow. I'll let you know.

I'll also create a new goal for model deployments. So that we can discuss enabling wikis on tasks separately.

Change #1199815 had a related patch set uploaded (by Ozge; author: Ozge):

[research/mwaddlink@main] feat: loads v2 models

https://gerrit.wikimedia.org/r/1199815

Change #1199815 merged by jenkins-bot:

[research/mwaddlink@main] feat: loads v2 models

https://gerrit.wikimedia.org/r/1199815

KStoller-WMF updated the task description. (Show Details)
KStoller-WMF renamed this task from Add a Link: Rollout "Add a Link" Structured Task to Wikipedias that are supported by V2 model to Add a Link: Link Suggestions Code Review and rollout planning.Tue, Nov 18, 11:03 PM
KStoller-WMF updated the task description. (Show Details)