T403500 Add DAG for calculating Wikipedia template usage via flag table
Contributor checklist
-
I have written tests for this DAG that will be merged into data-engineering/airflow-dags/tests/wmde -
I have ran the above tests and code quality checks locally or with Docker as outlined in the tests section of the Airflow DAGs project readme -
I have tested the jobs for this DAG in my local database using the process defined in wmde/analytics/hql/airflow_jobs/wikipedia_template/_test_monthly -
I have tested the included DAGs using the process outlined in TEST_AIRFLOW_DAGS.md and the test variable files provided for each DAG -
All Hive tables that are needed by the included DAG jobs have been created and are accessible by the analytics-wmdeAirflow user -
All changes from the mainbranch have been rebased into this branch
Description
-
T403500: DAG runs HQL queries to derive the pages that have infoboxes, databoxes and Listeria Wikidata lists on a monthly basis and then calculates the total usages.
- DAG ID:
wikipedia_template_monthly - Destination:
wmde.wikipedia_template_flags_monthly - Destination:
wmde.wikipedia_template_usage_monthly
- DAG ID:
Note: Included is removing instances of VariableProperties that are hanging in similar DAGs as I realized that this was from a prior method we were using to derive the path to wmf_raw tables.
Test outputs
Destination table summary
wmde.wikipedia_template_flags_monthly
| month | wikipedia | page_id | infobox_found | databox_found | listeria_found |
|---|---|---|---|---|---|
| DATE | STRING | BIGINT | BOOLEAN | BOOLEAN | BOOLEAN |
wmde.wikipedia_template_usage_monthly
| month | wikipedia | total_infobox_pages | total_databox_pages | total_listeria_pages |
|---|---|---|---|---|
| DATE | STRING | BIGINT | BIGINT | BIGINT |
Test screenshots
DAG_ID
SCREENSHOT_OF_EACH_COMPLETED_DAG_GRAPH