The Wikimedia Foundation’s Editing team is working on a set of improvements for the visual editor to help new volunteers understand and follow some of the policies necessary to make constructive changes to Wikipedia projects.
This work is guided by the Wikimedia Foundation Annual Plan, specifically by Wiki Experiences 1.2: “Widespread deployment of interventions shown to collectively cause a 10% relative increase (y-o-y) on mobile web and a 25% relative increase (y-o-y) on iOS of newcomers who publish ≥1 constructive edit in the main namespace on a mobile device, as measured by controlled experiments.”
In this AB test, we are evaluating the impact of showing multiple Reference Checks within a single editing session. An editing session is defined as a period of activity starting with a contributor clicking an edit button and ending when they publish or abandon the edit. The Reference Check invites users who have added more than 50 new characters to an article namespace to include a reference to the edit they’re making if they have not already done so themselves at the time they indicate their intent to save.
In the current default experience, a single Reference Check is presented even in cases when the edit someone is attempting may warrant multiple references (e.g. adding new sentences in separate sections). The Multi-Check references experience removes this constraint and allows multiple Reference Checks to be presented in a single edit when the edit they are attempting warrants them.
The findings from this A/B test will be relevant for the near-term future where multiple Edit Checks of the same and/or different types (e.g. Peacock Check, Paste Check, etc.) have the potential to become activated within a single edit session.
You can find more information about features of this tool and project updates on the project page.
Methodology
The team ran an AB test from 25 March 2025 through 15 May 2025 to determine the impact of presenting multiple Reference Checks within a single session.
Specifically, we want to learn what – if any – changes in edit quality and completion do we observe when people have the potential to see multiple Reference Checks within a single edit. More details on the measurement plan and decision scenarios are documented in the task desciption.
During this experiment, 50% of users editing a desktop or mobile main namespace page using Visual Editor were randomly assigned to the test group and could be shown multiple Reference Checks if their edit met the specified requirements during their edit, and 50% were randomly assigned to the control group and only shown one Reference Check in an editing session even if their edit warranted multiple references (the default editing experience at partner wikis).
The test included all mobile web and desktop contributors (both registered and unregistered) to the 12 participating wikis that started an edit with Visual Editor (see full list of participating Wikipedias on the this task description). Users remained in the same test group for the duration of the test. We also limited the analysis to edits completed by unregistered users and users with 100 or fewer edits as those are the users that would be shown Reference Check under the default config settings. Edits completed with Visual Editor account for about 48% of all main namespace edits by these users at the partner wikis.
Figure1: Multi-Check AB Test Bucketing Overview
As shown in Figure 1, not all edits bucketed in the AB test experiment met the requirements for being shown one or multiple Reference Checks. A Reference Check was only shown if the contributor met the specified requirements at the time they indicated their intent to save by clicking the pre-publish button. If the user was in the test group, they could be shown multiple checks if their edit warrented multiple references.
Reference check was presented at least once at 6,435 new content edits published in the control group and 6,277 new content edits published in the test group.
In the test group, multiple reference checks were presented in a single editing session at 27% (1,697 new content edits) of all published new content edits where reference check was activated.
For edits shown multiple checks, the majority of these edits (72.8%) were shown between 2 to 5 reference checks with a single session. 5% of these multi-check edits ( 87 new content edits) were shown over 16 reference checks within a single session.
For each key performance metrics and secondary metrics, we reviewed the following dimensions:
overall by experiment group (test and control),
by platform (mobile web or desktop),
by user experience and status,
and by partner Wikipedia.
We also compared edits shown multiple Reference Checks in a single session in the test group to edits that were only presented a single Reference Check. For edits presented more than one Reference Check, we reviewed a split by the number of checks shown to determine if there was a significant metric change at a certain number of checks presented.
Summary of Results
KPI 1: Proportion of new content edits with a reference. Users are more likely to include at least one reference with their new content edits when multi-check (references) is available. We confirmed a statistically significant 5.9% increase in the proportion of all new content edits with a reference in the test group where multi-check was available compared to all new content edits completed in the control group. We observed similar increases for edits published on desktop and mobile web. Edits that were shown multiple Reference Checks in a session are 1.3 times more likely to include at least one new reference in the final published edit compared to sessions shown a single Reference Check.
KPI 2: Revert Rate: Overall, we did not identify any significant changes in new content edit revert rate between the control and test group. However, there was a -34.7% decrease in revert rate when directly comparing edits presented multiple checks compared to edits presented a single reference check. This decrease is likely in part because the types of edits that warrant multiple Reference Checks are less likely to be reverted than the types of edits that warrant only a single check.
Secondary Metric: Constructive Activation: We did not identify any significant changes in overall constructive activation rates on desktop or mobile web due to introduction of mult-check. Overall, the constructive activation rate was 17.7% in the control group and 17.6% in the test group.
Secondary Metric: Proportion of users that publish at least one new content edit with a reference Overall, there was a 5.5% increase in the proportion of distinct users who published a new content edit with a reference when multi-check was available. We also found that more users were likely to publish a new content edit with a reference on desktop compared to mobile web when multi-check is available. We observed a 6.2% increase in the proportion of distinct users that published a new content edit with a reference on desktop compared to no statistically signficant increases on mobile web.
Guardrail Summary:
We did not observe any significant changes in the identified guardrails to indicate that the introduction of multi-check is negatively impacting the user’s editing experience. There were no decreases in edit completion rate for up to 5 reference checks being presented in a single session (which accounts for the majority of multi-check edits). We also did not identify any increases in revert rate at any number of reference checks presented.
Key Performance Indicator 1: Proportion of published new content edits that include a reference
Hypothesis: The quality of new content edits users make in the main namespace will increase because a greater percentage of these edits will include a reference.
Methodology: We reviewed the proportion of published new content edits (editcheck-newcontent) where people were were shown at least one Reference Check and included at least one net new reference (editcheck-newreference). Please see the edit tag mediawiki page for more details on how these tags are applied.
#Set fields and factor levels to assess number of checks shown#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar presented at save attempt (vs user going back and forth between save attempt moments)edit_check_publish_data <- edit_check_publish_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1& n_sidebar_opens <2, "multiple checks shown", "single check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("single check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_check_publish_data <- edit_check_publish_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1| (n_checks_shown >1& n_sidebar_opens >=2) ~'1', n_checks_shown ==2& n_sidebar_opens <2~'2', n_checks_shown >2& n_checks_shown <=5& n_sidebar_opens <2~"3-5", n_checks_shown >5& n_checks_shown <=10& n_sidebar_opens <2~"6-10", n_checks_shown >10& n_checks_shown <=15& n_sidebar_opens <2~"11-15", n_checks_shown >15& n_checks_shown <=20& n_sidebar_opens <2~"16-20", n_checks_shown >20& n_sidebar_opens <2~"over 20" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20") ))
Overall by Experiment Group
Code
published_edits_reference_overall <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%"))
Code
published_edits_reference_overall_table <- published_edits_reference_overall_const %>%gt() %>%tab_header(title ="New content edits where reference check was shown and that include a new reference" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_edits ="Number of new content edits shown reference check",n_edits_wref ="Number of new content edits with new reference",prop_edits ="Proportion of new content edits with a new reference" ) %>%tab_source_note( gt::md('Limited to new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reference_overall_table))
New content edits where reference check was shown and that include a new reference
Experiment Group
Number of new content edits shown reference check
Number of new content edits with new reference
Proportion of new content edits with a new reference
control (single check)
6435
2623
40.8%
test (multiple checks)
6277
2712
43.2%
Limited to new content edits where at least one reference check was shown
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reference_overall %>%ggplot(aes(x= test_group, y = n_edits_wref/n_edits, fill = test_group)) +geom_col(position ='dodge') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_edits_wref,"edits"), fontface=2), vjust=1.2, size =10, color ="white") +labs (y ="Percent of new content edits ",x ="Experiment Group",title ="New content edits that include a new reference",caption ="Limited to new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment Group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="none",axis.line =element_line(colour ="black")) p
There was a statistically signficant 5.9% increase (2.4 percentage points) in the proportion of new content edits with a reference in the test group where multi-check was available to users. This includes all new content edits where at least one reference check was shown.
By if multiple reference checks were shown
Code
published_edits_reference_ifmultiple <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(test_group, multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%"))
Code
published_edits_reference_ifmultiple_table <- published_edits_reference_ifmultiple %>%gt() %>%tab_header(title ="New content edits that include a new reference" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",multiple_checks_shown ="Multiple checks shown",n_edits ="Number of new content edits shown reference check",n_edits_wref ="Number of new content edits with new reference",prop_edits ="Proportion of new content edits with a new reference" ) %>%tab_source_note( gt::md('Limited to new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reference_ifmultiple_table ))
New content edits that include a new reference
Multiple checks shown
Number of new content edits shown reference check
Number of new content edits with new reference
Proportion of new content edits with a new reference
control (single check)
single check shown
6435
2623
40.8%
test (multiple checks)
single check shown
4580
1788
39%
multiple checks shown
1697
924
54.4%
Limited to new content edits where at least one reference check was shown
Code
p <- published_edits_reference_ifmultiple %>%ggplot(aes(x= multiple_checks_shown, y = n_edits_wref/n_edits, fill = test_group)) +geom_col( position =position_dodge2(preserve ="single")) +#facet_grid(~multiple_checks_shown) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_edits_wref,"edits"), fontface=2), position =position_dodge(width =1), vjust=1.2, size =8, color ="white") +labs (y ="Percent of new content edits ",x ="Experiment Group",title ="New content edits that include a new reference \n by if multiple reference checks were shown",caption ="Limited to new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="bottom",axis.line =element_line(colour ="black")) p
We also compared editing sessions where multiple Reference Checks were shown to sessions shown only a single Reference Check was presented. Edits presented a single Reference Check in both the control and test group have the same rate of adding references as those experiences are identical.
Edits are shown multiple checks are 1.3 times more likely to include at least one new reference in the final published edit compared to sessions shown a single reference check.
By number of checks shown
Code
published_edits_reference_nchecks <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1& test_group =='test (multiple checks)') %>%#limit to new content edits where reference check showgroup_by(checks_shown_bucket) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%")) %>%mutate(n_edits_sanitized =ifelse(n_edits <50, "<50", n_edits),n_edits_wref_sanitized =ifelse(n_edits_wref <50, "<50", n_edits_wref)) #sanitizing per data publication guidelines
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reference_nchecks %>%ggplot(aes(x= checks_shown_bucket, y = n_edits_wref/n_edits)) +geom_col(position ='dodge', fill ="dodgerblue4") +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_edits_wref_sanitized,"edits"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of new content edits ",x ="Number of reference checks shown",title ="New content edits that include a new reference \n by number of reference checks shown",caption ="Limited to new content edits where at least one new reference check was shown") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),legend.position="bottom",axis.line =element_line(colour ="black")) p
The proportion of edits with a new reference generally increases with an increasing number of checks shown. 57% of edits presented between 6 to 10 Reference Checks included a new reference compared to 39% of edits presented a single Reference Check.
The rate of increase appears to start to diminish around 11 to 15 checks; however, there was also a limited number of edits (107 edits) where more than 10 Reference Checks were presented in a single editing session.
By Platform
Code
published_edits_reference_platform <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%"))
Code
published_edits_reference_platform_table <- published_edits_reference_platform %>%gt() %>%tab_header(title ="New content edits that include a new reference by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",platform ="Platform",n_edits ="Number of new content edits shown reference check",n_edits_wref ="Number of new content edits with new reference",prop_edits ="Proportion of new content edits with a new reference" ) %>%tab_source_note( gt::md('Limited to new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reference_platform_table))
New content edits that include a new reference by platform
Experiment Group
Number of new content edits shown reference check
Number of new content edits with new reference
Proportion of new content edits with a new reference
Desktop
control (single check)
3911
1902
48.6%
test (multiple checks)
3821
1971
51.6%
Mobile Web
control (single check)
2524
721
28.6%
test (multiple checks)
2456
741
30.2%
Limited to new content edits where at least one reference check was shown
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reference_platform %>%ggplot(aes(x= test_group, y = n_edits_wref/n_edits, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~platform) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_edits_wref,"edits"), fontface=2), vjust=1.2, size =8, color ="white") +labs (y ="Percent of new content edits ",x ="Experiment Group",title ="New content edits that include a new reference \n by platform",caption ="Limited to new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
There were similar increases in new content edits with a reference on desktop and mobile web for users:
Desktop: 6.2% increase in the proportion of new content edits with a reference.
Mobile Web: 5.6% increase in the proportion of new content edits with a reference.
Overall, edits completed on mobile web are less likely to include a new reference compared to edits completed on desktop.
By User Experience
Code
published_edits_reference_userexp <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(experience_level_group, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1 ])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%"))
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reference_userexp %>%ggplot(aes(x= test_group, y = n_edits_wref/n_edits, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~experience_level_group) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_edits_wref,"edits"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of new content edits ",x ="Experiment Group",title ="New content edits that include a new reference \n by user experience",caption ="Limited to new content edits where at least one reference check was shown ") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),text =element_text(size=20),legend.position="bottom",axis.line =element_line(colour ="black")) p
We also observed similar increases across all reviewed user types (unregistered contributors, newcomers, and Junior Contributors).
By Partner Wikipedia
Code
published_edits_reference_wiki <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_edits_wref =n_distinct(editing_session[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_edits =paste0(round(n_edits_wref/n_edits *100, 1), "%"))
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reference_wiki %>%filter(!wiki %in%c('Afrikaans Wikipedia', 'Igbo Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia')) %>%# remove wikis with inufficient eventsggplot(aes(x= test_group, y = n_edits_wref/n_edits, fill = test_group)) +geom_col(position ='dodge') +facet_wrap(~wiki, nrow=2) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits), fontface=2), vjust=1.2, size =6, color ="white") +labs (y ="Percent of new content edits ",x ="Experiment Group",title ="New content edits that include a new reference \n by partner Wikipedia",caption ="Includes all new content edits where at least one reference check was shown. \n Excludes smaller wikis where insufficient events were logged.") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),text =element_text(size=24),legend.position="bottom",axis.line =element_line(colour ="black")) p
Results vary by partner Wikipedia. We observed increases in the proportion of new content edits that include a reference across half the partner Wikipedias with the highest increase observed at Spanish Wikipedia (24.5% increase [8 percentage points]).
Modeling the impact of multi-check on whether a new content edit includes a reference
We next explored different models to correctly infer the impact of offering multi-check on the liklihood a new content edit will include a reference while also accounting for variability across different wikis and user. This allows us to confirm if the observed increase above is statistically significant (did not occur due to random chance).
We used a Bayesian Hierarchical regression model to model this structure. For this model, we identified whether at least one new reference was included as the response variable, the user’s assigned test group as the predictor variable, and the user and Wikipedia as random effects.
Code
# limit to new content edits where reference check was shownedit_check_publish_data_model <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1)
Code
#redefine including a reference as factor for use in the modeledit_check_publish_data_model$included_new_reference <-factor( edit_check_publish_data_model$included_new_reference,levels =c(0, 1) )
Code
priors <-c(set_prior(prior ="std_normal()", class ="b"),set_prior("cauchy(0, 5)", class ="sd"))
Since the model parameters are on the log-odds scale, we needed to apply the following transformations to make sense of them.
We used the “divide-by-4” rule suggested by Gelman, Hill, and Vehtari 2021 1 to approximate the maximum increase in the probability of success corresponding to which editing interface (new topic tool or previous new section link workflow) was used. Using the bayesian model, we can also directly calculate the average lift.
Since the model parameters are on the log-odds scale, we need to take the exponentiation of the effect (exp(β1)) to determine the multiplicative effect on the odds of a Junior Contributor successfully publishing at least 1 new topic.
Based on estimates from the model, we found that edits where multi-check is available are 1.2 times more likely to include a new reference in their new content edit.
We also found there is an average 5.1% increase (maximum 5.4% increase) in the probability of an edit including a new reference when switching from the single check experience to the multi-check experience. We can confirm statistical significance at the 0.05 level for all of these estimates (as indicated by credible intervals that do not cross 1).
Key Insights
There was 5.9% increase in the proportion of new content edits with a reference in the test group where multi-check was available to users.
Sessions shown multiple checks are more likely to include at least one new reference in the final published edit compared to sessions shown a single reference check. For edits where multiple checks were presented, there was a 36% increase in the proportion of new content edits with a reference compared to edits where only one check was presented. It’s worth noting that edits that would warrant multiple reference checks are likely larger and also more likely to include at least one new reference.
The proportion of edits with a new reference generally increases with an increasing number of checks shown. 57% of edits presented between 6 to 10 reference check included a new reference compared to 40% of edits presented a single reference check.
Increases were across all all reviewed user types, platforms, and at half the partner wikis. We observed similar increases on desktop and mobile web.
KPI 2: Proportion of published edits that add new content and are reverted within 48 hours
We also reviewed revert rate to determine the impact of introducing multi-check on the quality of edits being published.
Hypothesis: The quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will include a reference or an explicit acknowledgement as to why these edits lack references.
Methodology We reviewed the proportion of all new content edits in the control and test groups that were reverted within 48 hours. We limited the analysis to new content edits where at least one reference check was shown.
published_edits_reverted_overall_table <- published_edits_reverted_overall %>%gt() %>%tab_header(title ="New content edit revert rate of edits shown Reference Check" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",n_content_edits ="Number of new content edits",n_reverted_edits ="Number of new content edits reverted",prop_edits ="Proportion of new content edits reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reverted_overall_table))
New content edit revert rate of edits shown reference check
Experiment group
Number of new content edits
Number of new content edits reverted
Proportion of new content edits reverted
control (single check)
6435
1448
22.5%
test (multiple checks)
6277
1481
23.6%
Limited to published new content edits where at least one reference check was shown
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reverted_overall %>%ggplot(aes(x= test_group, y = n_reverted_edits/n_content_edits, fill = test_group)) +geom_col(position ='dodge') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_reverted_edits,"reverted edits"), fontface=2), vjust=1.2, size =8, color ="white") +labs (y ="Percent of new content edits reverted ",x ="Experiment Group",title ="New content edit revert rate of edits shown reference check",caption ="Limited to published new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment Group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),legend.position="none",axis.line =element_line(colour ="black")) p
Overall across all new content edits where reference check was shown, we observed a slight 5% increase (1 percentage points) in the revert rate of new content edits shown at least one reference check when multiple reference checks were available to elibile edits (test group). These results are not statistically significant (p-value of 0.0719).
We observed slight increases for edits published with and without a new reference indicating that observed overall increases were due to chance.
published_edits_reverted_ifmultiple_table <- published_edits_reverted_ifmultiple %>%gt() %>%tab_header(title ="New content edit revert rate of edits shown reference check by if multiple checks shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",multiple_checks_shown ="Multiple checks shown",n_content_edits ="Number of new content edits",n_reverted_edits ="Number of new content edits reverted",prop_edits ="Proportion of new content edits reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reverted_ifmultiple_table))
New content edit revert rate of edits shown reference check by if multiple checks shown
Multiple checks shown
Number of new content edits
Number of new content edits reverted
Proportion of new content edits reverted
control (single check)
single check shown
6435
1448
22.5%
test (multiple checks)
single check shown
4580
1213
26.5%
multiple checks shown
1697
268
15.8%
Limited to published new content edits where at least one reference check was shown
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reverted_ifmultiple %>%ggplot(aes(x= multiple_checks_shown, y = n_reverted_edits/n_content_edits, fill = test_group)) +geom_col( position =position_dodge2(preserve ="single")) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_reverted_edits,"\n reverted edits"), fontface=2), position =position_dodge(width =1), vjust=1.2, size =6.5, color ="white") +labs (y ="Percent of new content edits reverted ",x ="Experiment Group",title ="New content edit revert rate by \n if multiple reference checks were shown",caption ="Limited to new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),axis.title.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
There was a -34.7% decrease in revert rate when directly comparing edits presented multiple checks compared to edits presented a single reference check. This decrease is likely in part because the types of edits that warrant multiple reference checks are less likely to be reverted than the types of edits that warrant only a single check.
By number of reference checks shown
Code
published_edits_reverted_nchecks <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit new content editsgroup_by(checks_shown_bucket) %>%summarise(n_content_edits =n_distinct(editing_session),n_reverted_edits =n_distinct(editing_session[was_reverted ==1])) %>%#look at revertsmutate(prop_edits =paste0(round(n_reverted_edits/n_content_edits *100, 1), "%")) %>%mutate(n_content_edits_sanitized =ifelse(n_content_edits <50, "<50", n_content_edits),n_reverted_edits_sanitized =ifelse(n_reverted_edits <50, "<50", n_reverted_edits)) #sanitizing per data publication guidelines
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reverted_nchecks %>%ggplot(aes(x= checks_shown_bucket, y = n_reverted_edits/n_content_edits)) +geom_col(position ='dodge', fill ='dodgerblue4') +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_reverted_edits_sanitized,"\n reverted edits"), fontface=2), vjust=1.2, size =5, color ="white") +labs (y ="Percent of new content edits reverted ",x ="Number of reference check shown",title ="New content edit revert rate by number of Reference Checks shown",caption ="Limited to published new content edits where at least one reference check was shown") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),legend.position="bottom",axis.line =element_line(colour ="black")) p
We observed revert rate decreases with an increasing number of checks presented. The revert rate of edits presented between 6 to 10 reference check is 11% compared a revert rate of 24% for edits presented a single check.
We did not identifiy any significant increases in the revert rate at any number of checks presented; however,there is a limited sample of edits presented over 10 reference checks so more data would be needed to confirm these trends.
published_edits_reverted_platform_table <- published_edits_reverted_platform %>%gt() %>%tab_header(title ="New content edit revert rate of edits shown reference check by platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",platform ="Platform",n_content_edits ="Number of new content edits",n_reverted_edits ="Number of new content edits reverted",prop_edits ="Proportion of new content edits reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(published_edits_reverted_platform_table))
New content edit revert rate of edits shown reference check by platform
Experiment group
Number of new content edits
Number of new content edits reverted
Proportion of new content edits reverted
Desktop
control (single check)
3911
594
15.2%
test (multiple checks)
3821
620
16.2%
Mobile Web
control (single check)
2524
854
33.8%
test (multiple checks)
2456
861
35.1%
Limited to published new content edits where at least one reference check was shown
Code
dodge <-position_dodge(width=0.9)p <- published_edits_reverted_platform %>%ggplot(aes(x= test_group, y = n_reverted_edits/n_content_edits, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~platform) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_reverted_edits,"\n reverted edits"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of new content edits reverted ",x ="Experiment Group",title ="New content edit revert rate by platform",caption ="Limited to published new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
There were no statistically signficant increases in revert rate overall by experiment group for desktop or mobile web.
dodge <-position_dodge(width=0.9)p <- published_edits_reverted_userexp %>%ggplot(aes(x= test_group, y = n_reverted_edits/n_content_edits, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~experience_level_group) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_edits, "\n", n_reverted_edits,"\n reverted edits"), fontface=2), vjust=1.2, size =6, color ="white") +labs (y ="Percent of new content edits reverted ",x ="Experiment Group",title ="New content edit revert rate by user experience",caption ="Limited to published new content edits where at least one reference check was shown") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=24),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
Results vary slightly based on the type of user completing the edit but none of the observed changes were statistically significant.
published_edits_reverted_wiki_table <- published_edits_reverted_wiki %>%ungroup() %>%mutate(n_content_edits =ifelse(n_content_edits <50, "<50", n_content_edits),n_reverted_edits =ifelse(n_reverted_edits <50, "<50", n_reverted_edits)) %>%#sanitizing per data publication guidelines group_by(wiki) %>%gt() %>%tab_header(title ="New content edit revert rate of edits shown reference check by partner Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",wiki ="Wikipedia",n_content_edits ="Number of new content edits",n_reverted_edits ="Number of new content edits reverted",prop_edits ="Proportion of new content edits reverted" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown. Excludes wikis where sufficient events were not logged') )display_html(as_raw_html(published_edits_reverted_wiki_table))
New content edit revert rate of edits shown reference check by partner Wikipedia
Experiment group
Number of new content edits
Number of new content edits reverted
Proportion of new content edits reverted
Arabic Wikipedia
control (single check)
444
68
15.3%
test (multiple checks)
365
70
19.2%
Chinese Wikipedia
control (single check)
243
<50
16%
test (multiple checks)
224
<50
18.8%
French Wikipedia
control (single check)
1848
384
20.8%
test (multiple checks)
1769
390
22%
Italian Wikipedia
control (single check)
1419
329
23.2%
test (multiple checks)
1413
354
25.1%
Japanese Wikipedia
control (single check)
543
<50
7%
test (multiple checks)
560
<50
6.8%
Portuguese Wikipedia
control (single check)
474
83
17.5%
test (multiple checks)
468
99
21.2%
Spanish Wikipedia
control (single check)
1299
475
36.6%
test (multiple checks)
1303
457
35.1%
Vietnamese Wikipedia
control (single check)
141
<50
21.3%
test (multiple checks)
139
<50
21.6%
Limited to published new content edits where at least one reference check was shown. Excludes wikis where sufficient events were not logged
Modeling the impact of multi-check on whether a new content edit is reverted
As our second KPI, we also used a Bayesian Hierarchical regression model to correctly infer the impact of offering multi-check on the liklihood of a new content edit being reverted within 48 hours.
Code
#redefine revert status as factor for use in the modeledit_check_publish_data_model$included_new_reference <-factor( edit_check_publish_data_model$was_reverted,levels =c(0, 1) )
Code
priors <-c(set_prior(prior ="std_normal()", class ="b"),set_prior("cauchy(0, 5)", class ="sd"))
Based on estimates from the model, we are not able to confirm the impact of multi-check on the overall revert rate of new content edits.
Key Insights
Overall, there are no significant difference in the revert rate of new content edits between the control and the test group for editing sessions where at least one reference check was shown.
In the test group, there was a -34.7% decrease in revert rate when directly comparing edits presented multiple checks compared to edits presented a single reference check. This decrease is likely in part because the types of edits that warrant multiple reference checks are less likely to be reverted than the types of edits that warrant only a single check.
Revert rate decreases with an increasing number of checks presented. The revert rate of edits presented between 6 to 10 reference check is 11% compared a revert rate of 24% for edits presented a single check. We did not identifiy any significant increases in the revert rate at any number of checks presented.
There were no statistically significant changes in revert rate by platform, user type, or at any of the partner Wikipedias.
Seconday Metric 1: Constructive Activation
Hypothesis: New account holders will be more likely to publish an unreverted edit to the main namespace within 24 hours of creating an account because they will be made aware of the need to accompany new text they’re attempting to publish with a reference, when they don’t first think/know to do so themselves.
For WE 1.2 KR, we defined constructive activation as: “The percentage of newcomers making at least one edit to an article in the main namespace of a Wikipedia project on a mobile device within 24 hours of registration (also on a mobile device) and that edit not being reverted within 48 hours of being published.”
There were 53,137 users that created an account on eiter desktop or mobile web at one of the partner Wikipedias during the AB test timeframe. While, don’t assign a user to a bucket until they begin an edit, we calculated the number of accounts available to be activated within each group as half the total number of users that created an account at one of the partner wikis while the AB test was deployed (based on the 50/50 split used in the AB test).
Code
# load data for assessing activationsall_users_edit_data <-read.csv(file ='data/all_users_edit_data_final.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
# calculate the total number of new account holders for each test groupexperiment_group_n_accounts =round(length(unique(all_users_edit_data$user_id)) *0.50, 0)
There were no signficant changes to the overall constructive activation rates.
Reference check is not presented to newcomers until they attempt to save an edit, requiring them to successfully transition through several stages after creating an account before reaching this stage. During this reviewed timeframe, reference check was shown to about 6.7% of all newcomers that created an account.
Constructive edits by newcomers
To help isolate the impact of this intervention on newcomers, we also reviewed changes in overall constructive edit rates. This limits the analysis to newcomers that successfully published an edit where reference check was shown.
For this analysis, we’re defining constructive edits as the proportion of all edits completed by newcomers within 24 hours that are not reverted within 48 hours. This is limited to users that were shown at least once reference check within 24 hours after registering.
Code
# constructive editsconstructive_edits_editcheck <- all_users_edit_data %>%filter(num_article_edits_24hrs_editcheck >0) %>%#limit to edits where ref check was shown at least oncegroup_by(test_group) %>%summarise(num_article_edits_total =sum(num_article_edits_24hrs_all),num_article_reverts_total =sum(num_article_reverts_24hrs_all)) %>%mutate(pct_const =paste0(round((num_article_edits_total-num_article_reverts_total)/num_article_edits_total *100, 1), "%")) %>%gt() %>%opt_stylize(5) %>%tab_header(title ="Constructive edits completed by newcomers shown Reference Check at least once" ) %>%cols_label(test_group ="Experiment Group",num_article_edits_total ="Total number of edits published",num_article_reverts_total ="Total number of edits reverted",pct_const ="Constructive Edit Rate" ) %>%tab_footnote(footnote ="Defined as the proportion of all published edits that are not reverted within 48 hours",locations =cells_column_labels(columns ="pct_const" ) ) display_html(as_raw_html(constructive_edits_editcheck))
Constructive edits completed by newcomers shown Reference Check at least once
Experiment Group
Total number of edits published
Total number of edits reverted
Constructive Edit Rate1
control (single check)
65485
17766
72.9%
test (multiple checks)
118100
23407
80.2%
1 Defined as the proportion of all published edits that are not reverted within 48 hours
We observed a +10% increase in the proportion of constructive edits by users in the test group; however, this change is not statistically signficant.
Note: There is a also a significant increase in the number of edits published by newcomers in the test group. This trend needs to be investigated further to confirm.
Mobile Web Constructive Activation Rates
There were 22996 users that created an account on mobile web at one of the partner Wikipedias during the AB test timeframe.
Code
# load mobile web data for assessing activationsmobile_users_edit_data <-read.csv(file ='data/mobile_users_edit_data_final.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
# calculate number of account holders for each test groupmobile_experiment_group_n_accounts =round(length(unique(mobile_users_edit_data$user_id)) *0.50, 0)
Desktop Constructive Activation Rates By Experiment Group
Test Group
Number of newcomers
Number of users constructively activated
Constructive Activation Rates
control (single check)
15070
2744
18.2%
test (multiple checks)
15070
2791
18.5%
There were no signficant changes in constructive activation rates on desktop.
Key Insights
There were no significant changes in constructive activation rates when reviewing overall edits or by platform. Overall, constructive activation rate was 17.7% in the control group and 17.6% in the test group. Note: Activation rates for both the mobile web and desktop seem slightly lower than typical rates observed on each platform during the AB test timeframe. This might be due to the required join to EditAttempStep, which may have caused a loss of some edits that were not instrumented correctly. See T394961.
Reference Check is not presented to newcomers until they attempt to save an edit, requiring them to successfully transition through several stages after creating an account before reaching this stage. During this reviewed timeframe, reference check was shown to about 6% of all newcomers that created an account.
To help isolate the impact of this intervention on newcomers, we also reviewed changes in overall constructive edit rates defined as proportion of edits published by newcomers that are reverted. We observed a +10% increase in the proportion of constructive edits by newcomers in the test group; however, this change was not statistically signficant.
Secondary Metric 2: Increase in the proportion of users that publish at least one new content edit that includes a reference.
Hypothesis: Unregistered users and users with 100 or fewer edits will be more aware of the need to add a reference when contributing new content because the visual editor will prompt them to do so in cases where they have not done so themselves.
Methodology:
This metric is similar to KPI 1 except that it look at proportion of distinct editors versus distinct edits. There were no significant differences to the results reported in KPI 1 as the majority of users posted just one new content edit during the reviewed time period. See overall results below.
Overall by experiment group
Code
published_users_reference_overall <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(test_group) %>%summarise(n_users =n_distinct(user_id),n_users_wref =n_distinct(user_id[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_users =paste0(round(n_users_wref/n_users *100, 1), "%")) %>%gt() %>%tab_header(title ="Proportion of users that publish at least one new content edit with a reference" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_users ="Number of distinct users",n_users_wref ="Number of users that include a new reference",prop_users ="Proportion of users that include a new reference" ) %>%tab_source_note( gt::md('Limited to users shown reference check and that published at least one new content edit') )display_html(as_raw_html(published_users_reference_overall))
Proportion of users that publish at least one new content edit with a reference
Experiment Group
Number of distinct users
Number of users that include a new reference
Proportion of users that include a new reference
control (single check)
4960
2120
42.7%
test (multiple checks)
4899
2200
44.9%
Limited to users shown reference check and that published at least one new content edit
There was a 5.5% increase in the proportion of distinct users that published a new content edit with a reference when multi-check was available. This increase is statistically significant (p value 0.0151).
By Platform
Code
published_users_reference_byplatform <- edit_check_publish_data %>%filter( was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(platform, test_group) %>%summarise(n_users =n_distinct(user_id),n_users_wref =n_distinct(user_id[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_users =paste0(round(n_users_wref/n_users *100, 1), "%"))
Code
dodge <-position_dodge(width=0.9)p <- published_users_reference_byplatform %>%ggplot(aes(x= test_group, y = n_users_wref/n_users, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~platform) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_users, "\n", n_users_wref,"\n users"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of users ",title ="Users that published a new content edit with a reference by platform",caption ="Limited to users shown reference check and that published at least one new content edit") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
More distinct users were likely to publish a new content edit with a reference on desktop compared to mobile web. We observed a 6.2% increase in the proportion of distinct users that published a new content edit with a reference compared to a 1% increase on mobile web. The slight increase on mobile web is not statistically signficant.
By User Experience
Code
published_users_reference_byuserexp <- edit_check_publish_data %>%filter( was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(experience_level_group, test_group) %>%summarise(n_users =n_distinct(user_id),n_users_wref =n_distinct(user_id[included_new_reference ==1])) %>%#limit to new content edits without a refernecemutate(prop_users =paste0(round(n_users_wref/n_users *100, 1), "%"))
Code
dodge <-position_dodge(width=0.9)p <- published_users_reference_byuserexp %>%ggplot(aes(x= test_group, y = n_users_wref/n_users, fill = test_group)) +geom_col(position ='dodge') +facet_grid(~experience_level_group) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_users, "\n", n_users_wref,"\n users"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of users ",title ="Users that published a new content edit with a reference by user type",caption ="Limited to users shown reference check and that published at least one new content edit") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
By Partner Wikipedia
Code
published_users_reference_bywiki <- edit_check_publish_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to new content edits where reference check showgroup_by(wiki, test_group) %>%summarise(n_users =n_distinct(user_id),n_users_wref =n_distinct(user_id[included_new_reference ==1& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(prop_users =paste0(round(n_users_wref/n_users *100, 1), "%")) %>%filter(!wiki %in%c('Afrikaans Wikipedia', 'Igbo Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia')) # remove wikis with inufficient events
Code
dodge <-position_dodge(width=0.9)p <- published_users_reference_bywiki %>%ggplot(aes(x= test_group, y = n_users_wref/n_users, fill = test_group)) +geom_col(position ='dodge') +facet_wrap(~wiki, nrow=2) +scale_y_continuous(labels = scales::percent) +geom_text(aes(label =paste(prop_users, "\n", n_users_wref,"\n users"), fontface=2), vjust=1.2, size =7, color ="white") +labs (y ="Percent of users ",x ="Experiment Group",title ="Users that published a new content edit \n with a reference by partner Wikipedia",caption ="Limited to users shown reference check and that published at least one new content edit. \n Excludes wikis where sufficient events were not logged") +scale_fill_manual(values=c("#999999", "dodgerblue4"), name ="Experiment group") +theme(panel.grid.minor =element_blank(),panel.background =element_blank(),plot.title =element_text(hjust =0.5),text =element_text(size=20),axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),legend.position="bottom",axis.line =element_line(colour ="black")) p
Key Insights
There was a 5.5% increase in the proportion of distinct users who published a new content edit with a reference when multi-check was available.
More distinct users were likely to publish a new content edit with a reference on desktop compared to mobile web. We observed a 6.2% increase in the proportion of distinct users that published a new content edit with a reference compared to a 1% increase on mobile web.
Increases were observed in the test group across all user types. Results vary by partner Wikipedia.
Secondary Metric 3a: Constructive Retention Rate
Hypothesis: Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that includes a reference because Edit Check will have caused them to realize references are required when contributing new content to Wikipedia.
First we reviewed the proportion of newcomers and Junior Contributors that publish an edit Reference Check was shown and successfully and return to make an unreverted edit to a main namespace. We reviewed the following retention timeframes: returns between 2 to 7 days (7 day retention) and 2 to 30 days (30 day retention).
seven_day_retention_overall_table <- seven_day_retention_overall %>%gt() %>%tab_header(title ="Constructive seven day retention rate" ) %>%cols_label(test_group ="Experiment group",return_editors ="Number of editors that returned second month",editors ="Number of first month editors",retention_rate ="Retention rate" ) %>%opt_stylize(5) %>%tab_footnote(footnote ="Limited to users shown at least one reference check and that made an unreverted edit",locations =cells_column_labels(columns ='retention_rate' ) ) display_html(as_raw_html(seven_day_retention_overall_table))
Constructive seven day retention rate
Experiment group
Number of editors that returned second month
Number of first month editors
Retention rate1
control (single check)
170
6445
2.6%
test (multiple checks)
164
6329
2.6%
1 Limited to users shown at least one reference check and that made an unreverted edit
thirty_day_retention_overall_table <- thirty_day_retention_overall %>%gt() %>%tab_header(title ="Constructive thirty day retention rate" ) %>%cols_label(test_group ="Experiment group",return_editors ="Number of editors that returned second month",editors ="Number of first month editors",retention_rate ="Retention rate" ) %>%opt_stylize(5) %>%tab_footnote(footnote ="Limited to users shown at least one reference check and that made an unreverted edit",locations =cells_column_labels(columns ='retention_rate' ) ) display_html(as_raw_html(thirty_day_retention_overall_table))
Constructive thirty day retention rate
Experiment group
Number of editors that returned second month
Number of first month editors
Retention rate1
control (single check)
138
6445
2.1%
test (multiple checks)
129
6329
2%
1 Limited to users shown at least one reference check and that made an unreverted edit
Secondary Metric 3b: Constructive Retention Rate with Reference Included
We also reviewed proportion of users that publish an edit where referenc check was shown and return to make a new content edit with a reference to a main namespace.
seven_day_retention_overall_table_wref <- seven_day_retention_overall_ref %>%gt() %>%tab_header(title ="Constructive seven day retention rate with reference included" ) %>%cols_label(test_group ="Experiment group",return_editors ="Number of editors that returned second month",editors ="Number of first month editors",retention_rate ="Retention rate" ) %>%opt_stylize(5) %>%tab_footnote(footnote ="Limited to users shown at least one reference check and that made an unreverted edit",locations =cells_column_labels(columns ='retention_rate' ) ) display_html(as_raw_html(seven_day_retention_overall_table_wref))
Constructive seven day retention rate with reference included
Experiment group
Number of editors that returned second month
Number of first month editors
Retention rate1
control (single check)
170
6445
2.6%
test (multiple checks)
164
6329
2.6%
1 Limited to users shown at least one reference check and that made an unreverted edit
thirty_day_retention_overall_table_wref <- thirty_day_retention_overall_wref %>%gt() %>%tab_header(title ="Constructive thirty day retention rate with reference included" ) %>%cols_label(test_group ="Experiment group",return_editors ="Number of editors that returned second month",editors ="Number of first month editors",retention_rate ="Retention rate" ) %>%opt_stylize(5) %>%tab_footnote(footnote ="Limited to users shown at least one reference check and that made an unreverted edit",locations =cells_column_labels(columns ='retention_rate' ) ) display_html(as_raw_html(thirty_day_retention_overall_table_wref))
Constructive thirty day retention rate with reference included
Experiment group
Number of editors that returned second month
Number of first month editors
Retention rate1
control (single check)
138
6445
2.1%
test (multiple checks)
129
6329
2%
1 Limited to users shown at least one reference check and that made an unreverted edit
Key Insights
We did not observe any statistically signficant changes in seven or 30 day constructive retention rates during the reviewed timeframe.
To review impacts to metric for future experiments, we could extend experiment durations to obtain a larger sample size and review longer retention timeframes such as second month retention.
Appendix: Guardrails
We reviewed a set of metrics to make sure that the introduction of multi-check was not negatively impacting the user’s editing experience. Identified guardrails include: edit completion rate, user block rate after being shown reference check, and false postivie rates.
Note: We also monitored edit revert rate and confirmed there were no signficant increases in revert rate overall or at any number of reference checks presented. Please see the KPI2 section above for the revert rate results.
Edit Completion Rate
While introducing multiple reference checks introduces extra steps in the publishing workflow that may cause some decrease in edit completion rate, we want to ensure it does not cause significant disruption to contributors.
Methodology: We reviewed the proportion of edits by users that were shown Reference Check during their edit session and successfully published their edit (action = saveSuccess). The analysis is limited to only edits that reached the point where Reference Check was presented at least once after indicating their intent to save (action = saveIntent).
Code
# load data for assessing edit completion rateedit_completion_rates_data <-read.csv(file ='data/edit_completion_rates_data_final.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
#Set fields and factor levels to assess number of checks shown#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)edit_completion_rates_data <- edit_completion_rates_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1& n_sidebar_opens <2, "multiple checks shown", "one check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("one check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_completion_rates_data <- edit_completion_rates_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1| (n_checks_shown >1& n_sidebar_opens >=2) ~'1', n_checks_shown ==2& n_sidebar_opens <2~'2', n_checks_shown >2& n_checks_shown <=5& n_sidebar_opens <2~"3-5", n_checks_shown >5& n_checks_shown <=10& n_sidebar_opens <2~"6-10", n_checks_shown >10& n_checks_shown <=15& n_sidebar_opens <2~"11-15", n_checks_shown >15& n_checks_shown <=20& n_sidebar_opens <2~"16-20", n_checks_shown >20& n_sidebar_opens <2~"over 20" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20") ))
Code
#Remove two abnormal instancse of multiple checks being shown within control groupedit_completion_rates_data <- edit_completion_rates_data %>%filter(!(test_group =='control (single check)'& multiple_checks_shown =="multiple checks shown"))#two abnormal instances of ref checks being shown but no sidebar being logged as openededit_completion_rates_data <- edit_completion_rates_data %>%filter(!(ref_check_shown ==1&is.na(multiple_checks_shown)))
Overall by experiment group
Code
edit_completion_rate_overall <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%#limit to sessions where referen check was showngroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_edits ="Number of edit attempts shown reference check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown at least one reference check') )display_html(as_raw_html(edit_completion_rate_overall ))
Edit completion rate by experiment group
Experiment Group
Number of edit attempts shown reference check
Number of published edits
Proportion of edits saved
control (single check)
11317
8559
75.6%
test (multiple checks)
11244
8372
74.5%
Limited to edit attempts shown at least one reference check
By if multiple checks were shown
Code
edit_completion_rate_bymulti <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%group_by(test_group, multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment group",multiple_checks_shown ="Multiple checks shown",n_edits ="Number of edit attempts shown reference check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown at least one reference check') )display_html(as_raw_html(edit_completion_rate_bymulti ))
Edit completion rate by if multiple checks were shown
Multiple checks shown
Number of edit attempts shown reference check
Number of published edits
Proportion of edits saved
control (single check)
one check shown
11317
8559
75.6%
test (multiple checks)
one check shown
8139
6062
74.5%
multiple checks shown
3105
2310
74.4%
Limited to edit attempts shown at least one reference check
By Number of Checks Shown
Code
edit_completion_rate_bynchecks <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%group_by( checks_shown_bucket) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%ungroup()%>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_saves =ifelse(n_saves <50, "<50", n_saves)) %>%#sanitizing per data publication guidelines select(-2) %>%gt() %>%tab_header(title ="Edit completion rate by the number of reference checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of checks shown",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edits shown at least one reference check') )display_html(as_raw_html(edit_completion_rate_bynchecks ))
Edit completion rate by the number of reference checks shown
Number of checks shown
Number of published edits
Proportion of edits saved
1
14621
75.1%
2
821
80.7%
3-5
844
76.4%
6-10
406
69%
11-15
129
69.7%
16-20
<50
54.3%
over 20
66
50.8%
Limited to edits shown at least one reference check
By Platform
Code
edit_completion_rate_byplatform <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%group_by(platform, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by experiment group and platform" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",platform ="Platform",n_edits ="Number of edit attempts shown reference check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown at least one reference check') )display_html(as_raw_html(edit_completion_rate_byplatform))
Edit completion rate by experiment group and platform
Experiment Group
Number of edit attempts shown reference check
Number of published edits
Proportion of edits saved
Desktop
control (single check)
6542
5239
80.1%
test (multiple checks)
6608
5154
78%
Mobile Web
control (single check)
4775
3320
69.5%
test (multiple checks)
4636
3218
69.4%
Limited to edit attempts shown at least one reference check
By User Experience
Code
edit_completion_rate_byuserstatus <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%group_by(experience_level_group, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Edit completion rate by experiment group and editor experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",experience_level_group ="Experiment Group",n_edits ="Number of edit attempts shown reference check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to edit attempts shown at least one reference check') )display_html(as_raw_html(edit_completion_rate_byuserstatus ))
Edit completion rate by experiment group and editor experience
Test Group
Number of edit attempts shown reference check
Number of published edits
Proportion of edits saved
Unregistered
control (single check)
6337
4533
71.5%
test (multiple checks)
6303
4424
70.2%
Newcomer
control (single check)
1530
1128
73.7%
test (multiple checks)
1493
1100
73.7%
Junior Contributor
control (single check)
3450
2898
84%
test (multiple checks)
3448
2848
82.6%
Limited to edit attempts shown at least one reference check
By Partner Wikipedia
Code
edit_completion_rate_bywiki <- edit_completion_rates_data %>%filter(ref_check_shown ==1) %>%group_by(wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_saves =n_distinct(editing_session[saved_edit >0])) %>%mutate(completion_rate =paste0(round(n_saves/n_edits *100, 1), "%")) %>%filter(n_saves >=100) %>%gt() %>%tab_header(title ="Edit completion rate by partner Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",wiki ="Wikipedia",n_edits ="Number of edit attempts shown edit check",n_saves ="Number of published edits",completion_rate ="Proportion of edits saved" ) %>%tab_source_note( gt::md('Limited to wikis with at least 100 published edits') )display_html(as_raw_html(edit_completion_rate_bywiki))
Edit completion rate by partner Wikipedia
Test Group
Number of edit attempts shown edit check
Number of published edits
Proportion of edits saved
Arabic Wikipedia
control (single check)
1029
598
58.1%
test (multiple checks)
903
489
54.2%
Chinese Wikipedia
control (single check)
404
337
83.4%
test (multiple checks)
375
285
76%
French Wikipedia
control (single check)
2966
2449
82.6%
test (multiple checks)
3069
2491
81.2%
Italian Wikipedia
control (single check)
2439
1887
77.4%
test (multiple checks)
2419
1850
76.5%
Japanese Wikipedia
control (single check)
1004
733
73%
test (multiple checks)
940
718
76.4%
Portuguese Wikipedia
control (single check)
894
623
69.7%
test (multiple checks)
881
616
69.9%
Spanish Wikipedia
control (single check)
2329
1726
74.1%
test (multiple checks)
2359
1706
72.3%
Vietnamese Wikipedia
control (single check)
220
177
80.5%
test (multiple checks)
253
177
70%
Limited to wikis with at least 100 published edits
Key Insights
We did not observe any significant decreases in edit completion rate for users presented with multiple reference checks in a session. The edit completion rate for users presented multiple reference checks was 74% compared to 75% for users presented a single reference check.
Edit completion rates stay around 75% or higher for up to 5 checks shown within a single session. After that, edit completion rate decreases to 69% for editing sessions shown between 6 to 15 checks. There were about 200 edit attempts where over 16 reference checks were logged. Further investigation of these edits would help provide insights into the types of edits causing a high number of checks to be presented.
We also did not observe any significant differences in edit completion rate by platform, user experience level, or partner Wikipedia.
False Positive Rate
Methodology:
As an indicator of false postivie rates, we reviewed the proportion of published new content edits that met the following requirements: * People elected to dismiss adding a new reference. This was determined by edits where the user explicitly declined to add a reference at least once in a session (event.feature = 'editCheck-addReference'AND event.action = 'action-reject') * no new reference was included in the final published new content edit (edits with revision tag:editcheck-newreference). * the edit was not reverted within 48 hours.
Note: It’s possible that these edits should be reverted due to lack of citation but were not within 48 hours.
We also reviewed the proportion of edits checks presented (event.feature = 'editCheck-addReference' AND event.action = 'check-shown-presave') that were dismissed by the user to understand the rate of reference check dimissial. This was determined by edits where the user declined to add a reference by explicilty selecting the decline option (event.feature = 'editCheck-addReference'AND event.action = 'action-reject')
Code
# load data for assessing edit reject frequencyedit_check_reject_data <-read.csv(file ='data/edit_check_reject_data_final.tsv',header =TRUE,sep ="\t",stringsAsFactors =FALSE )
#Set fields and factor levels to assess number of checks shown#Note limited to 1 sidebar open as we're looking for cases where multiple checks presented in a single sidebar (vs user going back and forth)edit_check_reject_data <- edit_check_reject_data %>%mutate(multiple_checks_shown =ifelse(n_checks_shown >1& n_sidebar_opens <2, "multiple checks shown", "single check shown"), multiple_checks_shown =factor( multiple_checks_shown ,levels =c("single check shown", "multiple checks shown")))# note these buckets can be adjusted as needed based on distribution of dataedit_check_reject_data <- edit_check_reject_data %>%mutate(checks_shown_bucket =case_when(is.na(n_checks_shown) ~'0', n_checks_shown ==1| (n_checks_shown >1& n_sidebar_opens >=2) ~'1', n_checks_shown ==2& n_sidebar_opens <2~'2', n_checks_shown >2& n_checks_shown <=5& n_sidebar_opens <2~"3-5", n_checks_shown >5& n_checks_shown <=10& n_sidebar_opens <2~"6-10", n_checks_shown >10& n_checks_shown <=15& n_sidebar_opens <2~"11-15", n_checks_shown >15& n_checks_shown <=20& n_sidebar_opens <2~"16-20", n_checks_shown >20& n_sidebar_opens <2~"over 20" ),checks_shown_bucket =factor(checks_shown_bucket ,levels =c("0","1","2", "3-5", "6-10","11-15" ,"16-20", "over 20") ))
Code
#remove some small occurrences of abnormal data. Will investigate but <0.001% of data at moment so won't impact results.#Remove one abnormal instance of multiple checks being shown within control groupedit_check_reject_data <- edit_check_reject_data %>%filter(!(test_group =='control (single check)'& multiple_checks_shown =="multiple checks shown"))# remove one abnormal instance of multiple reject actions being logged with no instances of checks being shown# Relable n_rejects optionedit_check_reject_data <- edit_check_reject_data %>%filter(!(is.na(n_checks_shown) & n_rejects >0)) %>%mutate(n_rejects =ifelse(n_checks_shown >0&is.na(n_rejects), 0, n_rejects))#two abnormal instances of ref checks being shown but no sidebar being logged as openededit_check_reject_data <- edit_check_reject_data %>%filter(!(was_edit_check_shown ==1&is.na(multiple_checks_shown)))
Overall by experiment group
Proportion of all reference checks that are dismissed
Code
edit_check_dismissal_overall <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1 ) %>%#limit to where showngroup_by(test_group) %>%summarise(n_checks_shown =sum(n_checks_shown), #Note there are NAs for sessions that don't select. Need to replace with 0n_rejects =sum(n_rejects )) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_checks_shown *100, 1), "%")) %>%gt() %>%opt_stylize(5) %>%tab_header(title ="Overall Reference Check dismissal rate" ) %>%cols_label(#multiple_checks_shown = "Multiple checks shown",n_checks_shown ="Number of checks shown",n_rejects ="Number of reference checks dismissed",dismissal_rate ="Proportion of reference checks dismissed" ) %>%tab_source_note( gt::md('Limited to published edits') )display_html(as_raw_html(edit_check_dismissal_overall))
Overall Reference Check dismissal rate
test_group
Number of checks shown
Number of reference checks dismissed
Proportion of reference checks dismissed
control (single check)
9804
6496
66.3%
test (multiple checks)
24333
13343
54.8%
Limited to published edits
Proportion of new content edits where no reference is included and are not reverted
Code
# Method 2:edit_check_fp_overall <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to where showngroup_by(test_group) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Unreverted new content edits where no reference was added" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(edit_check_fp_overall ))
Unreverted new content edits where no reference was added
Experiment Group
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
control (single check)
8448
3501
41.4%
test (multiple checks)
8258
3160
38.3%
Limited to published new content edits where at least one reference check was shown
By if multiple checks were shown
Proportion of all reference checks that are dismissed
Code
edit_check_dismissal_ifmultiple <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1 ) %>%#limit to where showngroup_by(test_group, multiple_checks_shown) %>%summarise(n_checks_shown =sum(n_checks_shown), #Note there are NAs for sessions that don't select. Need to replace with 0n_rejects =sum(n_rejects )) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_checks_shown *100, 1), "%")) %>%gt() %>%opt_stylize(5) %>%tab_header(title ="Overall Reference Check dismissal rate by if multiple checks were shown" ) %>%cols_label(#multiple_checks_shown = "Multiple checks shown",n_checks_shown ="Number of checks shown",n_rejects ="Number of reference checks dismissed",dismissal_rate ="Proportion of reference checks dismissed" ) %>%tab_source_note( gt::md('Limited to published edits') )display_html(as_raw_html(edit_check_dismissal_ifmultiple))
Overall Reference Check dismissal rate by if multiple checks were shown
multiple_checks_shown
Number of checks shown
Number of reference checks dismissed
Proportion of reference checks dismissed
control (single check)
single check shown
9804
6496
66.3%
test (multiple checks)
single check shown
11737
5754
49%
multiple checks shown
12596
7589
60.2%
Limited to published edits
Proportion of new content edits where no reference is included and are not reverted
Code
edit_check_fp_bymultiple <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to where showngroup_by(test_group,multiple_checks_shown) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Unreverted new content edits where no reference was added by if multiple checks were shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",multiple_checks_shown ="Multiple Checks",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(edit_check_fp_bymultiple ))
Unreverted new content edits without a reference by if multiple checks were shown
Multiple Checks
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
control (single check)
single check shown
8448
3501
41.4%
test (multiple checks)
single check shown
5988
2388
39.9%
multiple checks shown
2270
772
34%
Limited to published new content edits where at least one reference check was shown
By number of Reference Checks Shown
Proportion of all reference checks that are dismissed
Code
edit_check_dismissal_nchecks <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1 ) %>%#limit to where showngroup_by(checks_shown_bucket) %>%summarise(n_checks_shown =sum(n_checks_shown), #Note there are NAs for sessions that don't select. Need to replace with 0n_rejects =sum(n_rejects )) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_checks_shown *100, 1), "%")) %>%gt() %>%opt_stylize(5) %>%tab_header(title ="Reference Check dismissal rate by number of checks shown" ) %>%cols_label(#multiple_checks_shown = "Multiple checks shown",n_checks_shown ="Number of checks shown",n_rejects ="Number of reference checks dismissed",dismissal_rate ="Proportion of reference checks dismissed" ) %>%tab_source_note( gt::md('Limited to published edits') )display_html(as_raw_html(edit_check_dismissal_nchecks ))
Overall Reference Check dismissal rate by number of checks shown
checks_shown_bucket
Number of checks shown
Number of reference checks dismissed
Proportion of reference checks dismissed
1
21541
12250
56.9%
2
1618
1013
62.6%
3-5
3108
1858
59.8%
6-10
2946
1546
52.5%
11-15
1598
916
57.3%
16-20
788
479
60.8%
over 20
2538
1777
70%
Limited to published edits
Proportion of new content edits where no reference is included and are not reverted
Code
edit_check_fp_bynchecks <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1& n_sidebar_opens <2 ) %>%#limit to where showngroup_by( checks_shown_bucket) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%ungroup() %>%mutate(n_edits =ifelse(n_edits <50, "<50", n_edits),n_rejects =ifelse(n_rejects <50, "<50", n_rejects)) %>%#sanitizing per data publication guidelinesgt() %>%tab_header(title ="Unreverted new content edits where no reference was added by number of checks shown" ) %>%opt_stylize(5) %>%cols_label(checks_shown_bucket ="Number of reference checks shown",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published new content edits where at lest one reference check was shown') )display_html(as_raw_html(edit_check_fp_bynchecks))
Unreverted new content edits without a reference by number of checks shown
Number of reference checks shown
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
1
12924
5393
41.7%
2
809
299
37%
3-5
831
278
33.5%
6-10
396
133
33.6%
11-15
125
<50
24%
16-20
<50
<50
29.5%
over 20
65
<50
29.2%
Limited to published new content edits where at lest one reference check was shown
By Platform
Proportion of all reference checks that are dismissed
Code
edit_check_dismissal_byplatform <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1 ) %>%#limit to where showngroup_by(platform, test_group) %>%summarise(n_checks_shown =sum(n_checks_shown), #Note there are NAs for sessions that don't select. Need to replace with 0n_rejects =sum(n_rejects )) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_checks_shown *100, 1), "%")) %>%gt() %>%opt_stylize(5) %>%tab_header(title ="Reference Check dismissal rate by platform" ) %>%cols_label(#multiple_checks_shown = "Multiple checks shown",n_checks_shown ="Number of checks shown",n_rejects ="Number of reference checks dismissed",dismissal_rate ="Proportion of reference checks dismissed" ) %>%tab_source_note( gt::md('Limited to published edits') )display_html(as_raw_html(edit_check_dismissal_byplatform ))
Reference Check dismissal rate by platform
test_group
Number of checks shown
Number of reference checks dismissed
Proportion of reference checks dismissed
desktop
control (single check)
6068
3866
63.7%
test (multiple checks)
17191
9660
56.2%
phone
control (single check)
3736
2630
70.4%
test (multiple checks)
7142
3683
51.6%
Limited to published edits
Proportion of new content edits where no reference is included and are not reverted
Code
edit_check_fp_byplatform <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to where showngroup_by(platform,test_group) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Unreverted new content edits where no reference was added by number of checks shown" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",platform ="Platform",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check was shown') )display_html(as_raw_html(edit_check_fp_byplatform ))
Unreverted new content edits where no reference was added by number of checks shown
Experiment Group
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
desktop
control (single check)
5165
2085
40.4%
test (multiple checks)
5075
1884
37.1%
phone
control (single check)
3283
1416
43.1%
test (multiple checks)
3183
1276
40.1%
Limited to published new content edits where at least one reference check was shown
By User Experience
Proportion of new content edits where no reference is included and are not reverted
Code
edit_check_fp_byuserstatus <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to where showngroup_by(experience_level_group, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%gt() %>%tab_header(title ="Unreverted new content edits where no reference was added by user experience" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",experience_level_group ="User Status",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published new content edits where at least one reference check shown') )display_html(as_raw_html(edit_check_fp_byuserstatus))
Unreverted new content edits without a reference by user experience
Experiment Group
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
Unregistered
control (single check)
4498
2058
45.8%
test (multiple checks)
4385
1877
42.8%
Newcomer
control (single check)
1104
382
34.6%
test (multiple checks)
1081
350
32.4%
Junior Contributor
control (single check)
2846
1061
37.3%
test (multiple checks)
2792
933
33.4%
Limited to published new content edits where at least one reference check shown
By Partner Wikipedia
Proportion of new content edits where no reference is included and are not reverted
Code
edit_check_dismissal_bywiki <- edit_check_reject_data %>%filter(was_edit_check_shown ==1& is_new_content ==1) %>%#limit to where showngroup_by(wiki, test_group) %>%summarise(n_edits =n_distinct(editing_session),n_rejects =n_distinct(editing_session[n_rejects >0& included_new_reference ==0& was_reverted ==0])) %>%#limit to new content edits without a refernecemutate(dismissal_rate =paste0(round(n_rejects/n_edits *100, 1), "%")) %>%filter(!wiki %in%c('Afrikaans Wikipedia', 'Igbo Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia'))%>%# remove wikis with inufficient eventfilter(n_rejects >65) %>%#remove wikis with too few editsgt() %>%tab_header(title ="Unreverted new content edits where no reference was added by partner Wikipedia" ) %>%opt_stylize(5) %>%cols_label(test_group ="Experiment Group",wiki ="Wikipedia",n_edits ="Number of edits shown reference check",n_rejects ="Number of edits that did not add at least one new reference",dismissal_rate ="Proportion of edits where people elected to not add a reference" ) %>%tab_source_note( gt::md('Limited to published edits where at least one reference check show. Excludes wikis where insufficient events were logged') )display_html(as_raw_html(edit_check_dismissal_bywiki))
Unreverted new content edits where no reference was added by partner Wikipedia
Experiment Group
Number of edits shown reference check
Number of edits that did not add at least one new reference
Proportion of edits where people elected to not add a reference
Arabic Wikipedia
control (single check)
586
214
36.5%
test (multiple checks)
474
140
29.5%
Chinese Wikipedia
control (single check)
333
150
45%
test (multiple checks)
279
125
44.8%
French Wikipedia
control (single check)
2417
1050
43.4%
test (multiple checks)
2469
955
38.7%
Italian Wikipedia
control (single check)
1875
861
45.9%
test (multiple checks)
1834
813
44.3%
Japanese Wikipedia
control (single check)
725
378
52.1%
test (multiple checks)
703
344
48.9%
Portuguese Wikipedia
control (single check)
609
155
25.5%
test (multiple checks)
603
163
27%
Spanish Wikipedia
control (single check)
1698
605
35.6%
test (multiple checks)
1682
528
31.4%
Vietnamese Wikipedia
control (single check)
176
70
39.8%
test (multiple checks)
175
75
42.9%
Limited to published edits where at least one reference check show. Excludes wikis where insufficient events were logged
Key Insights
We did not identify any increases in false positive rate.
In the test group, users declined adding a reference to an edit that was not reverted at 38% of new content edits compared to 41% of edits in the control group. Additionally, there was no increase in the proportion of reference checks dismissed by users who published a new content edit. 66% of reference checks in the control group were dismissed compared to 55% in the test group.
When limited to edits presented multiple checks, there is a higher increase in the proportion of individual checks dismissed. In the test group, 60% of reference checks presented in multi-check sessions were dismissed compared to about 50% presented a single check. However, we observed a decrease in the proportion of unreverted new content edits where users declined to add a reference for users shown multiple checks.
There were no significant increases in false positive rates or reference check dismissal rates by user type, platform, or partner Wikipedia.
Block Rates
Methodology: We gathered all edits where edit check was shown from the mediawiki_revision_change_tag table and joined with mediawiki_private_cu_changes to gather user name info. We then reviewed both global and local blocks made within 6 hours of an edit published where reference check was shown as identified in the logging table.
Note: At the time of this analysis, May block data was unavailable for dates so analysis is limited to blocks that occured between 25 March 2025 through 31 April 2025.
Code
# load data for assessing blocksedit_check_blocks <-read.csv(file ='data/edit_check_eligible_users_blocked.csv',header =TRUE,sep =",",stringsAsFactors =FALSE )
Code
#rename experiment field to clarifyedit_check_blocks <- edit_check_blocks%>%mutate(test_group =factor(bucket,levels =c("2025-03-editcheck-multicheck-reference-control", "2025-03-editcheck-multicheck-reference-test"),labels =c("control (single check)", "test (multiple checks)")))
Code
edit_check_local_blocks_overall <- edit_check_blocks %>%group_by(test_group) %>%summarise(blocked_users =n_distinct(cuc_ip[is_local_blocked =='True'| is_global_blocked =='True']),all_users =n_distinct(cuc_ip)) %>%#look at blocksmutate(prop_blocks =paste0(round(blocked_users/all_users *100, 1), "%")) %>%select(-c(2,3)) %>%#removing granular data columns gt() %>%tab_header(title ="Proportion of users blocked by experiment group" ) %>%opt_stylize(5) %>%cols_label(test_group ="Test Group",prop_blocks ="Proportion of users blocked" ) %>%tab_source_note( gt::md('Limited to users blocked 6 hours after publishing an edit where reference check was shown') )display_html(as_raw_html(edit_check_local_blocks_overall))
Proportion of users blocked by experiment group
Test Group
Proportion of users blocked
control (single check)
2.9%
test (multiple checks)
3.7%
Limited to users blocked 6 hours after publishing an edit where reference check was shown
Key Insights
3.3% of users were blocked after publishing an edit where at least one reference check was shown. By experiment group, 3.7% of users were blocked in the test group compared to 3% in the control group. This difference is not statistically significant and limited to edits by unregistered users in each group.
No global blocks were issued to any users that published an edit where at least one reference check was shown.
Footnotes
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2021. Regression and other stories. https://doi.org/10.1017/9781139161879.↩︎