Page MenuHomePhabricator

Estimate usage trends for MediaWiki
Open, MediumPublic

Description

How did the number of MediaWiki installations and their size and user base change over time?

See also:

Event Timeline

First attempt, based on WikiApiary data:

tgr@pub2:~$ echo -e "date\tsites\tpages\t\tarticles\tusers\tactive users\tviews"; for i in {0..10}; do echo -ne "`date -d "-$i years" +'%Y-%m-%d'`\t"; sudo mysql apiary -sse "select count(*), sum(cast(pages as unsigned)), sum(cast(articles as unsigned)), sum(cast(users as unsigned)), sum(activeusers), sum(cast(views as unsigned)) from statistics join (select website_id, max(capture_date) capture_date from statistics where capture_date < now() - interval $i year group by website_id) latest using (website_id, capture_date);"; done
date	sites	pages		articles	users	active users	views
2018-02-07	29103	9464173880548373900	9246875197677622383	9235334728272838330	17419973	9223372124763959384
2017-02-07	28811	9464173880466980087	9246875197620677257	9235334727757979941	16959587	9223372126081911322
2016-02-07	28329	959049823	178053412	3279723137	16707148	87911590818
2015-02-07	27589	865377743	162263307	4976054630	15543342	88137137067
2014-02-07	13599	520884470	119218721	3299564804	9452508	52891855356
2013-02-07	767	127965022	34847710	108402622	567830	4695668932
2012-02-07	0	NULL	NULL	NULL	NULL	NULL
2011-02-07	0	NULL	NULL	NULL	NULL	NULL
2010-02-07	0	NULL	NULL	NULL	NULL	NULL
2009-02-07	0	NULL	NULL	NULL	NULL	NULL
2008-02-07	0	NULL	NULL	NULL	NULL	NULL

The site numbers look reasonale (except of course these are the sites WikiApiary knew about, not the total number, but the 2015+ data looks plausible), the rest not so much.

Note that this doesn't match the number shown on the WikiApiary front page (currently 26K).

Pingback:

tgr@stat1006:~$ echo -e "date\tsites"; for i in {0..12}; do echo -ne "`date -d "-$i months" +'%Y-%m-%d'`\t"; mysql -hanalytics-slave.eqiad.wmnet log -Asse "select count(*) from MediaWikiPingback_15781718 join (select max(id) id from MediaWikiPingback_15781718 where timestamp < date_format(now() - interval $i month, '%Y%m%d%H%i%s') group by wiki) latest using (id);"; done
date	sites
2018-02-07	40096
2018-01-07	35308
2017-12-07	31257
2017-11-07	26451
2017-10-07	22130
2017-09-07	18337
2017-08-07	14252
2017-07-07	10462
2017-06-07	6837
2017-05-07	4401
2017-04-07	1025
2017-03-07	0
2017-02-07	0

Not really sure what to make of this. Pingback is off by default in existing installations and the setting is only exposed in the web installer; does that mean 40K new MediaWiki installs in the last year? Or maybe some wikifarms enabled it on their old installs? Also the cutoff date is weird, pingback was released with 1.28 (2016 November).

Pingback includes things like setting up MediaWiki on a developer machine, which might be a reason for the inflated number. Number of sites with a given nubmer of pings (MediaWiki version upgrades):

tgr@stat1006:/srv/home/tgr$ echo -e "pings\tsites"; for i in {1..10}; do echo -ne "$i\t"; mysql -hanalytics-slave.eqiad.wmnet log -Asse "select count(*) from (select wiki, count(*) pings from MediaWikiPingback_15781718  group by wiki) wikis where pings = $i;"; done
pings   sites
1       38708
2       1067
3       244
4       125
5       59
6       24
7       7
8       3
9       2
10      1

Pingback was released in 1.28 (the current stable release is 1.30, the last LTS release is 1.27, the oldest supported non-LTS release is 1.29) so the sites with one ping could well be proper ones that installed 1.29 or 1.30 and saw no reason to update since then... we'll need a few more years for the data to be really useful for identifying "stable" sites.

Most installations are stable versions (not master, which would be typical for a developer setup):

MariaDB [log]> select sum(event_MediaWiki rlike '^[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+$') stable, sum(event_MediaWiki rlike '^[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+-alpha$') dev, sum(event_MediaWiki rlike '^[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+-wmf\.[[:digit:]]+$') wmf, sum(event_MediaWiki rlike '^[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+-rc\.[[:digit:]]+$') rc from MediaWikiPingback_15781718;
+--------+------+------+------+
| stable | dev  | wmf  | rc   |
+--------+------+------+------+
|  43541 |  957 |  141 |  101 |
+--------+------+------+------+
1 row in set (0.09 sec)

Pingback:
...
Also the cutoff date is weird, pingback was released with 1.28 (2016 November).

The older data is in a different table: MediaWikiPingback_15781718_15423246. Querying that table for the missing months gives:

2017-04-03	12622
2017-03-03	9776
2017-02-03	6577
2017-01-03	3315
2016-12-03	688
2016-11-03	192
2016-10-03	139
2016-09-03	53
2016-08-03	13
2016-07-03	0

@Tgr Is this a permanent task or can it be closed?

I don't think it's permanent, but should be redone with more data:

Pingback was released in 1.28 (the current stable release is 1.30, the last LTS release is 1.27, the oldest supported non-LTS release is 1.29) so the sites with one ping could well be proper ones that installed 1.29 or 1.30 and saw no reason to update since then... we'll need a few more years for the data to be really useful for identifying "stable" sites.