Page MenuHomePhabricator

Deploy "add a link" to 18th round of wikis (en.wp and de.wp)
Open, LowPublic1 Estimated Story Points

Description


https://wikitech.wikimedia.org/wiki/Add_Link#Enabling_on_a_new_wiki


English Wikipedia's specificities

English Wikipedia has a strict enforcement ofhttps://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking#What_generally_should_not_be_linked.
Folly Mox gave us a very nice summaryof how things are done at that wiki.

A script exists to track and remove what is considered overlinking. Users get a button in their toolbar that will automatically remove common terms. Also, they can connect that script toAWB.

This script could be used to improve the model.

Event Timeline

The training pipelines of the two biggest wikis run for a really long time and got stuck a couple of times but they have finally completed and generated models for the 18th round.

Model evaluation has been completed and below are the backtesting results:

[email protected][email protected]
dewiki0.790.48
enwiki0.810.45

All languages have passed the evaluation and will be deployed.

kevinbaziraadded a subscriber:kostajh.

@kostajh,we published datasets for all models that passed the evaluation in this round.

Trizek-WMFrenamed this task fromDeploy "add a link" to 18th round of wikistoDeploy "add a link" to 18th round of wikis (en.wp and de.wp).Oct 5 2023, 8:26 AM
Trizek-WMFupdated the task description.(Show Details)

@kevinbazira,I just learned that Engish Wikipedia has a script to track and remove what is considered overlinking. This script could be used to improve the model, and then fit the community's common practices. It would help the deployment at this wiki a lot. More details at:

@Trizek-WMF,thank you so much for sharing this script that helps to curb overlinking. I am looping in@MGerlach,since he will work on improving add-a-link model performance, this might interest him.

We will work on this task at the beginning of 2024.

We will work on this task at the beginning of 2024.

I thought we were aiming to enable this wikis before the end of 2023. Is there a particular reason to do it in early 2024?

We have to make proper community engagement, which is not doable at the moment as I'm working onT346108: [EPIC] IP Masking: StructuredDiscussions (Flow)/LiquidThreads Community discussion.

KStoller-WMFupdated the task description.(Show Details)
KStoller-WMFupdated the task description.(Show Details)
KStoller-WMFset the point value for this task to1.
Trizek-WMFset Due Date to May 21 2024, 4:00 PM.
Trizek-WMFraised the priority of this task fromMediumtoHigh.May 14 2024, 2:12 PM

We aren't ready to run the script to populate the suggestions. given our backlog, we are moving this task to our next sprint.

Trizek-WMFchanged Due Date from May 21 2024, 4:00 PM to Jun 4 2024, 4:00 PM.May 17 2024, 5:23 PM
Trizek-WMFupdated the task description.(Show Details)
Trizek-WMFmoved this task fromInboxtoUp Nexton theGrowth-Teamboard.

Change #1033889 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] [Growth] enwiki: Enable AddLink backend

https://gerrit.wikimedia.org/r/1033889

As far as I can see, this task asks for a "stealth" (or "dark mode" ) deployment of add a link to enwiki/dewiki, so that the quality of the recommendations can be reviewed. This means we will be maintaining a task pool of recommendations, without necessarily showing those recommendations to any users.

I reviewed this task today, in order to determine whether this is possible without any code changes. As of now, we have the following two variables in GrowthExperiments:

  • GENewcomerTasksLinkRecommendationsEnabled:which turns on the backend (and ensures the task pool is ready for the users to use),
  • GELinkRecommendationsFrontendEnabled:which makes the task available to users (assuming it is also enabled via CommunityConfiguration, of course).

If we turn on the first one, but keep the second one turned off, then the task pool should get populated (and maintained), but the task will not be visible to the end user. This is what we normally do as part of our preparation of a deployment, to ensure the task pool is ready when the first users visit their homepage after the task gets enabled.

It turns out that ondewiki,this is already the case. The backend of Add Link is enabled there (and the task pool gets refreshed periodically as well), but in Community configuration, it appears as "Disabled in site configuration", which appears to be what this task asks for. Assuming we wantenwikito be in the same state asdewiki(task pool maintained, but unused), then that should be easy to do. If we do that, then the Growth team would need to be involved for showing Add Link to users (to set the frontend flag on), but the involvement would be minimal (~30 mins for an engineer).

If the goal is to ensure admins can enable Add Link at those two wikis at any time (w/o any further involvement from our team), then things get a little bit more tricky. Whendewikiasked for Add Link to be turned off on their wiki, we originally did that via CommunityConfiguration, but for some reason, users were still getting it (and saving tasks). Related conversation about this is atT288420andT294712.

Since this was nearly 3 years ago, I'm not sure what exactly happened from the top of my head. If we do want to make it possible for admins to enable AddLink via CommunityConfiguration at any time, we would need to figure that out (or at least, figure out whether it is going to be a problemagain). Theoretically, whatever happened back then might not be a problem at this point, as it did happen when the A/B testing code for structured AddLink was in place (seeremoval patch from Nov 2021).

Back then, enablement of AddLink depended not only on site configuration, but also on the user in question (due to the A/B testing). Since we no longer have the A/B testing code in place, availability of structured AddLink should dependonlyon site configuration (whether server-side or in CommunityConfiguration) at this point. If the bug that affected us in 2021 was in the A/B testing code for Add Link, then the bug should no longer appear. But, that is something we would need to figure out (and test).

Summary:As of now, it is very easy to enable the backend (as requested here), but leave the frontend switch offon the server side.This will mean later deployment of Add Link would be much easier (as the pool would be ready already), but it wouldn't be possible to enable Add Link via CommunityConfiguration (we would still need to be involved). If we want admins to be able to self-serve this, we would need to do the investigation I described above, and release only after the investigation is done. Because we need to enable the backend either way, I'll do that part, and leave the rest for later.

@Trizek-WMF@KStoller-WMFWould you mind clarifying what would be the intended end result here?

Change #1033889mergedby jenkins-bot:

[operations/mediawiki-config@master] [Growth] enwiki: Enable AddLink backend

https://gerrit.wikimedia.org/r/1033889

Mentioned in SAL (#wikimedia-operations)[2024-05-20T08:02:45Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]]

Mentioned in SAL (#wikimedia-operations)[2024-05-20T08:05:14Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations)[2024-05-20T08:19:52Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]] (duration: 17m 07s)

As far as I can see, this task asks for a "stealth" (or "dark mode" ) deployment of add a link to enwiki/dewiki, so that the quality of the recommendations can be reviewed. This means we will be maintaining a task pool of recommendations, without necessarily showing those recommendations to any users.

I reviewed this task today, in order to determine whether this is possible without any code changes. As of now, we have the following two variables in GrowthExperiments:

  • GENewcomerTasksLinkRecommendationsEnabled:which turns on the backend (and ensures the task pool is ready for the users to use),
  • GELinkRecommendationsFrontendEnabled:which makes the task available to users (assuming it is also enabled via CommunityConfiguration, of course).

If we turn on the first one, but keep the second one turned off, then the task pool should get populated (and maintained), but the task will not be visible to the end user. This is what we normally do as part of our preparation of a deployment, to ensure the task pool is ready when the first users visit their homepage after the task gets enabled.

It turns out that ondewiki,this is already the case. The backend of Add Link is enabled there (and the task pool gets refreshed periodically as well), but in Community configuration, it appears as "Disabled in site configuration", which appears to be what this task asks for. Assuming we wantenwikito be in the same state asdewiki(task pool maintained, but unused), then that should be easy to do. If we do that, then the Growth team would need to be involved for showing Add Link to users (to set the frontend flag on), but the involvement would be minimal (~30 mins for an engineer).

If the goal is to ensure admins can enable Add Link at those two wikis at any time (w/o any further involvement from our team), then things get a little bit more tricky. Whendewikiasked for Add Link to be turned off on their wiki, we originally did that via CommunityConfiguration, but for some reason, users were still getting it (and saving tasks). Related conversation about this is atT288420andT294712.

Since this was nearly 3 years ago, I'm not sure what exactly happened from the top of my head. If we do want to make it possible for admins to enable AddLink via CommunityConfiguration at any time, we would need to figure that out (or at least, figure out whether it is going to be a problemagain). Theoretically, whatever happened back then might not be a problem at this point, as it did happen when the A/B testing code for structured AddLink was in place (seeremoval patch from Nov 2021).

Back then, enablement of AddLink depended not only on site configuration, but also on the user in question (due to the A/B testing). Since we no longer have the A/B testing code in place, availability of structured AddLink should dependonlyon site configuration (whether server-side or in CommunityConfiguration) at this point. If the bug that affected us in 2021 was in the A/B testing code for Add Link, then the bug should no longer appear. But, that is something we would need to figure out (and test).

Summary:As of now, it is very easy to enable the backend (as requested here), but leave the frontend switch offon the server side.This will mean later deployment of Add Link would be much easier (as the pool would be ready already), but it wouldn't be possible to enable Add Link via CommunityConfiguration (we would still need to be involved). If we want admins to be able to self-serve this, we would need to do the investigation I described above, and release only after the investigation is done. I started filling the task pool, and I'll wait with any further action for clarification.

@Trizek-WMF@KStoller-WMFWould you mind clarifying what would be the intended course of action here?

Thanks for the detailed explanation!

Ideally we want communities to be able to easily disable and enable this task independently; with no involvement from WMF.
However, given that we are nearing the CommunityConfiguration Extension release, I don't think it makes sense to investigate the underlying issue in Special:EditGrowthConfig, and instead wait until the new extension is released and then take the time to follow up.

I've attempted to summarize the issue and next steps with theenwiki community here.

Wikimedia Deutschland is handling the communication with dewiki, so I'll also reach out to our contacts there with a summary of the current state of this release.

Trizek-WMFlowered the priority of this task fromHightoLow.Jun 12 2024, 1:16 PM

I'm keeping this task open while the two communities discuss the topic. I lower my engagement there, keeping it as "react to new responses and opportunities".