Skip to content

Commit c2080a5

Browse files
RickiJay-WMDErti
andauthored
Very Active Users (#113)
* Add Columns * Populate Columns * DB Columns * Tests * More Columns * Output * Tests! * Documentation * Lint * test: the active user/bot counting (#114) * Aggregate * Correction * Lint --------- Co-authored-by: Robert Timm <rti@users.noreply.github.com>
1 parent 41bf76a commit c2080a5

24 files changed

+575
-144
lines changed
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
"""active_users
2+
3+
Revision ID: 9cbd76e2a3af
4+
Revises: 24affa01347e
5+
Create Date: 2025-09-03 18:26:01.647408
6+
7+
"""
8+
9+
from typing import Sequence, Union
10+
11+
from alembic import op
12+
import sqlalchemy as sa
13+
14+
15+
# revision identifiers, used by Alembic.
16+
revision: str = "9cbd76e2a3af"
17+
down_revision: Union[str, None] = "24affa01347e"
18+
branch_labels: Union[str, Sequence[str], None] = None
19+
depends_on: Union[str, Sequence[str], None] = None
20+
21+
22+
def upgrade() -> None:
23+
# ### commands auto generated by Alembic - please adjust! ###
24+
with op.batch_alter_table("wikibase_log_observation_month") as batch_op:
25+
batch_op.add_column(
26+
sa.Column("user_count_five_plus", sa.Integer(), nullable=True)
27+
)
28+
batch_op.add_column(
29+
sa.Column("user_count_no_bot_five_plus", sa.Integer(), nullable=True)
30+
)
31+
with op.batch_alter_table("wikibase_log_observation_month_type") as batch_op:
32+
batch_op.add_column(
33+
sa.Column("user_count_five_plus", sa.Integer(), nullable=True)
34+
)
35+
batch_op.add_column(
36+
sa.Column("user_count_no_bot_five_plus", sa.Integer(), nullable=True)
37+
)
38+
with op.batch_alter_table("wikibase_log_observation_month_user") as batch_op:
39+
batch_op.add_column(
40+
sa.Column("user_count_five_plus", sa.Integer(), nullable=True)
41+
)
42+
with op.batch_alter_table("wikibase_recent_changes_observation") as batch_op:
43+
batch_op.add_column(
44+
sa.Column("human_change_user_count_five_plus", sa.Integer(), nullable=True)
45+
)
46+
batch_op.add_column(
47+
sa.Column("bot_change_user_count_five_plus", sa.Integer(), nullable=True)
48+
)
49+
# ### end Alembic commands ###
50+
51+
52+
def downgrade() -> None:
53+
# ### commands auto generated by Alembic - please adjust! ###
54+
with op.batch_alter_table("wikibase_recent_changes_observation") as batch_op:
55+
batch_op.drop_column("bot_change_user_count_five_plus")
56+
batch_op.drop_column("human_change_user_count_five_plus")
57+
with op.batch_alter_table("wikibase_log_observation_month_user") as batch_op:
58+
batch_op.drop_column("user_count_five_plus")
59+
with op.batch_alter_table("wikibase_log_observation_month_type") as batch_op:
60+
batch_op.drop_column("user_count_no_bot_five_plus")
61+
batch_op.drop_column("user_count_five_plus")
62+
with op.batch_alter_table("wikibase_log_observation_month") as batch_op:
63+
batch_op.drop_column("user_count_no_bot_five_plus")
64+
batch_op.drop_column("user_count_five_plus")
65+
# ### end Alembic commands ###

data/wikibase-data.db

0 Bytes
Binary file not shown.

data/wikibase-test-data.db

0 Bytes
Binary file not shown.

docs/data-list.md

Lines changed: 193 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -207,17 +207,16 @@ Data abbreviated for brevity.
207207

208208
### Log Observations:
209209

210-
Using the Action API, we query for the first log and the last 30 days'.
211-
212-
- First Log:
213-
- Date
214-
- Last Log:
215-
- Date
216-
- User Type: Bot, Missing, None, User
217-
- Last Month:
218-
- All Users: Count distinct users
219-
- Human Users: Count distinct (probably) human users
220-
- Log Count
210+
Using the Action API, we query separately for the first month of logs and the last 30 days'. We update the first month annually and the last month approximately monthly.
211+
212+
We calculate the following:
213+
214+
- `logCount`: Number of logs in range (if any)
215+
- Date of first and last logs in range (if any); user type (bot or human) of the last log
216+
- number of users with at least 1 (`allUsers`) and at least 5 (`activeUsers`) records in the logs
217+
- number of users we think are human with at least 1 (`humanUsers`) and at least 5 (`activeHumanUsers`) records in the logs
218+
- We categorize the logs into types and save similar information broken out by log type. There are currently 76 log types, plus an UNCLASSIFIED catch-all, ranging from ITEM_CREATE to ACHIEVEMENT_BADGE
219+
- We categorize the users into BOT, MISSING, HUMAN, or NONE, and save similar information broken out by user type
221220

222221
#### Example:
223222

@@ -227,55 +226,187 @@ Query:
227226
query MyQuery {
228227
wikibase(wikibaseId: 10) {
229228
logObservations {
230-
mostRecent {
231-
id
232-
observationDate
233-
returnedData
234-
firstLog {
235-
date
236-
}
237-
lastLog {
238-
date
239-
userType
229+
firstMonth {
230+
mostRecent {
231+
...WikibaseLogMonthFragment
240232
}
241-
lastMonth {
242-
allUsers
243-
humanUsers
244-
logCount
233+
}
234+
lastMonth {
235+
mostRecent {
236+
...WikibaseLogMonthFragment
245237
}
246238
}
247239
}
248240
}
249241
}
242+
243+
fragment WikibaseLogMonthFragment on WikibaseLogMonth {
244+
id
245+
observationDate
246+
returnedData
247+
firstLog {
248+
date
249+
}
250+
lastLog {
251+
date
252+
userType
253+
}
254+
logCount
255+
allUsers
256+
activeUsers
257+
humanUsers
258+
activeHumanUsers
259+
logTypeRecords {
260+
id
261+
logType
262+
logCount
263+
firstLogDate
264+
lastLogDate
265+
allUsers
266+
activeUsers
267+
humanUsers
268+
activeHumanUsers
269+
}
270+
userTypeRecords {
271+
id
272+
userType
273+
logCount
274+
firstLogDate
275+
lastLogDate
276+
allUsers
277+
activeUsers
278+
}
279+
}
250280
```
251281

252282
Result:
253283

254284
```json
255285
{
256-
"data": {
257-
"wikibase": {
258-
"logObservations": {
259-
"mostRecent": {
260-
"id": "39",
261-
"observationDate": "2024-07-03T21:18:08",
262-
"returnedData": true,
263-
"firstLog": {
264-
"date": "2021-03-19T09:20:21"
265-
},
266-
"lastLog": {
267-
"date": "2024-07-03T14:45:01",
268-
"userType": "BOT"
269-
},
270-
"lastMonth": {
271-
"allUsers": 2,
272-
"humanUsers": 1,
273-
"logCount": 387
274-
}
275-
}
276-
}
277-
}
278-
}
286+
"data": {
287+
"wikibase": {
288+
"logObservations": {
289+
"firstMonth": {
290+
"mostRecent": {
291+
"id": "229",
292+
"observationDate": "2024-08-08T11:29:46",
293+
"returnedData": true,
294+
"firstLog": {
295+
"date": "2021-03-19T09:20:21"
296+
},
297+
"lastLog": {
298+
"date": "2021-04-13T15:07:57",
299+
"userType": null
300+
},
301+
"logCount": 12,
302+
"allUsers": 4,
303+
"activeUsers": null,
304+
"humanUsers": 3,
305+
"activeHumanUsers": null,
306+
"logTypeRecords": [
307+
...
308+
{
309+
"id": "600",
310+
"logType": "ITEM_CREATE",
311+
"logCount": 1,
312+
"firstLogDate": "2021-03-20T10:36:45",
313+
"lastLogDate": "2021-03-20T10:36:45",
314+
"allUsers": 1,
315+
"activeUsers": 0,
316+
"humanUsers": 0,
317+
"activeHumanUsers": 0
318+
},
319+
{
320+
"id": "601",
321+
"logType": "PROPERTY_CREATE",
322+
"logCount": 5,
323+
"firstLogDate": "2021-03-22T19:56:32",
324+
"lastLogDate": "2021-04-13T15:07:57",
325+
"allUsers": 1,
326+
"activeUsers": 1,
327+
"humanUsers": 0,
328+
"activeHumanUsers": 0
329+
},
330+
{
331+
"id": "602",
332+
"logType": "USER_CREATE",
333+
"logCount": 4,
334+
"firstLogDate": "2021-03-19T09:20:21",
335+
"lastLogDate": "2021-03-24T08:35:41",
336+
"allUsers": 3,
337+
"activeUsers": 0,
338+
"humanUsers": 3,
339+
"activeHumanUsers": 0
340+
}
341+
],
342+
"userTypeRecords": [
343+
{
344+
"id": "152",
345+
"userType": "BOT",
346+
"logCount": 6,
347+
"firstLogDate": "2021-03-20T10:36:45",
348+
"lastLogDate": "2021-04-13T15:07:57",
349+
"allUsers": 1,
350+
"activeUsers": 1
351+
},
352+
{
353+
"id": "153",
354+
"userType": "USER",
355+
"logCount": 6,
356+
"firstLogDate": "2021-03-19T09:20:21",
357+
"lastLogDate": "2021-04-07T11:37:37",
358+
"allUsers": 3,
359+
"activeUsers": 0
360+
}
361+
]
362+
}
363+
},
364+
"lastMonth": {
365+
"mostRecent": {
366+
"id": "230",
367+
"observationDate": "2024-08-08T11:29:46",
368+
"returnedData": true,
369+
"firstLog": {
370+
"date": "2024-07-09T11:20:15"
371+
},
372+
"lastLog": {
373+
"date": "2024-07-26T11:30:36",
374+
"userType": "BOT"
375+
},
376+
"logCount": 81,
377+
"allUsers": 1,
378+
"activeUsers": 1,
379+
"humanUsers": 0,
380+
"activeHumanUsers": 0,
381+
"logTypeRecords": [
382+
{
383+
"id": "603",
384+
"logType": "ITEM_CREATE",
385+
"logCount": 81,
386+
"firstLogDate": "2024-07-09T11:20:15",
387+
"lastLogDate": "2024-07-26T11:30:36",
388+
"allUsers": 1,
389+
"activeUsers": 1,
390+
"humanUsers": 0,
391+
"activeHumanUsers": 0
392+
}
393+
],
394+
"userTypeRecords": [
395+
{
396+
"id": "154",
397+
"userType": "BOT",
398+
"logCount": 81,
399+
"firstLogDate": "2024-07-09T11:20:15",
400+
"lastLogDate": "2024-07-26T11:30:36",
401+
"allUsers": 1,
402+
"activeUsers": 1
403+
}
404+
]
405+
}
406+
}
407+
}
408+
}
409+
}
279410
}
280411
```
281412

@@ -1374,14 +1505,16 @@ Result:
13741505

13751506
Get the list of recent changes from a Wikibase instance. The [Recent Changes MediaWiki API is documented here](https://www.mediawiki.org/wiki/API:RecentChanges). A Recent Changes Observation is always calculated for the past 30 days. All change types are included; which are: `edit`, `new`, `log` and `categorize`. An Observation contains the following fields:
13761507

1377-
| Field | Description |
1378-
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
1379-
| `human_change_count` | Number of changes made by humans as reported by the MediaWiki Recent Changes API when called with the `!bot` flag. |
1380-
| `human_change_user_count` | Number of unique users found in changes requested with `!bot` flag, derived from all usernames, IP addresses for anonymous edits as well as userid in the "userhidden case". |
1381-
| `bot_change_count` | Number of changes made by bots as reported by the MediaWiki Recent Changes API when called with the `bot` flag. |
1382-
| `bot_change_user_count` | Number of unique bots found in changes requested with `bot` flag, derived from all bot/usernames. |
1383-
| `first_change_date` | Date of first change, no matter if it was made by a human or bot. |
1384-
| `last_change_date` | Date of last change, no matter if it was made by a human or bot. |
1508+
| Field | Description |
1509+
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
1510+
| `human_change_count` | Number of changes made by humans as reported by the MediaWiki Recent Changes API when called with the `!bot` flag. |
1511+
| `human_change_user_count` | Number of unique users found in changes requested with `!bot` flag, derived from all usernames, IP addresses for anonymous edits as well as userid in the "userhidden case". |
1512+
| `human_change_active_user_count` | Number of unique users with at least 5 records found in changes requested with `!bot` flag, derived from all usernames, IP addresses for anonymous edits as well as userid in the "userhidden case". |
1513+
| `bot_change_count` | Number of changes made by bots as reported by the MediaWiki Recent Changes API when called with the `bot` flag. |
1514+
| `bot_change_user_count` | Number of unique bots found in changes requested with `bot` flag, derived from all bot/usernames. |
1515+
| `bot_change_active_user_count` | Number of unique bots with at least 5 records found in changes requested with `bot` flag, derived from all bot/usernames. |
1516+
| `first_change_date` | Date of first change, no matter if it was made by a human or bot. |
1517+
| `last_change_date` | Date of last change, no matter if it was made by a human or bot. |
13851518

13861519
#### Example:
13871520

@@ -1396,8 +1529,10 @@ query MyQuery {
13961529
observationDate
13971530
humanChangeCount
13981531
humanChangeUserCount
1532+
humanChangeActiveUserCount
13991533
botChangeCount
14001534
botChangeUserCount
1535+
botChangeActiveUserCount
14011536
firstChangeDate
14021537
lastChangeDate
14031538
returnedData
@@ -1419,8 +1554,10 @@ Result:
14191554
"observationDate": "2025-07-29T13:36:00",
14201555
"humanChangeCount": 2302,
14211556
"humanChangeUserCount": 1,
1557+
"humanChangeActiveUserCount": 1,
14221558
"botChangeCount": 4,
14231559
"botChangeUserCount": 1,
1560+
"botChangeActiveUserCount": 1,
14241561
"firstChangeDate": "2025-07-03T08:31:51",
14251562
"lastChangeDate": "2025-07-25T15:41:47",
14261563
"returnedData": true

0 commit comments

Comments
 (0)