Skip to content

Commit 340fa3e

Browse files
author
ironholds
committed
Fix #6
1 parent 8b0da9a commit 340fa3e

File tree

6 files changed

+222
-4
lines changed

6 files changed

+222
-4
lines changed

NEWS

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
Version 1.2.1 [WIP]
2+
------------------------------------------------------------------------------
3+
4+
BUG FIXES
5+
6+
* `limit` introduced as an argument to pages_in_category - thanks to Ben Marwick for finding the bug.
7+
18
Version 1.2.0
29
------------------------------------------------------------------------------
310

R/categories.R

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ categories_in_page <- function(language = NULL, project = NULL, domain = NULL,
105105
#'@param clean_response whether to do some basic sanitising of the resulting data structure.
106106
#'Set to FALSE by default.
107107
#'
108+
#'@param limit The maximum number of members to retrieve for each category. Set
109+
#'to 50 by default.
110+
#'
108111
#'@param ... further arguments to pass to httr's GET().
109112
#'
110113
#'@section warnings:
@@ -123,7 +126,7 @@ categories_in_page <- function(language = NULL, project = NULL, domain = NULL,
123126
#'@export
124127
pages_in_category <- function(language = NULL, project = NULL, domain = NULL, categories,
125128
properties = c("title","ids","sortkey","sortkeyprefix","type","timestamp"),
126-
type = c("page","subcat","file"), clean_response = FALSE,
129+
type = c("page","subcat","file"), clean_response = FALSE, limit = 50,
127130
...){
128131

129132
#Format and check
@@ -136,7 +139,7 @@ pages_in_category <- function(language = NULL, project = NULL, domain = NULL, ca
136139

137140
#Construct URL
138141
url <- url_gen(language, project, domain, "&action=query&list=categorymembers&cmtitle=",
139-
categories, "&cmprop=", properties, "&cmtype=",type)
142+
categories, "&cmprop=", properties, "&cmtype=",type, "&cmlimit=", limit)
140143

141144
#Query and return
142145
content <- query(url, "catpages", clean_response, ...)

WikipediR.Rproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,4 @@ LaTeX: pdfLaTeX
1515
BuildType: Package
1616
PackageUseDevtools: Yes
1717
PackageInstallArgs: --no-multiarch --with-keep.source
18-
PackageRoxygenize: rd,collate,namespace
18+
PackageRoxygenize: rd,collate,namespace,vignette

man/pages_in_category.Rd

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
pages_in_category(language = NULL, project = NULL, domain = NULL,
88
categories, properties = c("title", "ids", "sortkey", "sortkeyprefix",
99
"type", "timestamp"), type = c("page", "subcat", "file"),
10-
clean_response = FALSE, ...)
10+
clean_response = FALSE, limit = 50, ...)
1111
}
1212
\arguments{
1313
\item{language}{The language code of the project you wish to query,
@@ -36,6 +36,9 @@ options are any permutation of "page" (pages), "subcat" (subcategories) and "fil
3636
\item{clean_response}{whether to do some basic sanitising of the resulting data structure.
3737
Set to FALSE by default.}
3838
39+
\item{limit}{The maximum number of members to retrieve for each category. Set
40+
to 50 by default.}
41+
3942
\item{...}{further arguments to pass to httr's GET().}
4043
}
4144
\description{

vignettes/WikipediR.html

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
5+
6+
<title>Retrieving content</title>
7+
8+
<script type="text/javascript">
9+
window.onload = function() {
10+
var imgs = document.getElementsByTagName('img'), i, img;
11+
for (i = 0; i < imgs.length; i++) {
12+
img = imgs[i];
13+
// center an image if it is the only element of its parent
14+
if (img.parentElement.childElementCount === 1)
15+
img.parentElement.style.textAlign = 'center';
16+
}
17+
};
18+
</script>
19+
20+
21+
22+
23+
24+
<style type="text/css">
25+
body, td {
26+
font-family: sans-serif;
27+
background-color: white;
28+
font-size: 13px;
29+
}
30+
31+
body {
32+
max-width: 800px;
33+
margin: auto;
34+
padding: 1em;
35+
line-height: 20px;
36+
}
37+
38+
tt, code, pre {
39+
font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace;
40+
}
41+
42+
h1 {
43+
font-size:2.2em;
44+
}
45+
46+
h2 {
47+
font-size:1.8em;
48+
}
49+
50+
h3 {
51+
font-size:1.4em;
52+
}
53+
54+
h4 {
55+
font-size:1.0em;
56+
}
57+
58+
h5 {
59+
font-size:0.9em;
60+
}
61+
62+
h6 {
63+
font-size:0.8em;
64+
}
65+
66+
a:visited {
67+
color: rgb(50%, 0%, 50%);
68+
}
69+
70+
pre, img {
71+
max-width: 100%;
72+
}
73+
pre {
74+
overflow-x: auto;
75+
}
76+
pre code {
77+
display: block; padding: 0.5em;
78+
}
79+
80+
code {
81+
font-size: 92%;
82+
border: 1px solid #ccc;
83+
}
84+
85+
code[class] {
86+
background-color: #F8F8F8;
87+
}
88+
89+
table, td, th {
90+
border: none;
91+
}
92+
93+
blockquote {
94+
color:#666666;
95+
margin:0;
96+
padding-left: 1em;
97+
border-left: 0.5em #EEE solid;
98+
}
99+
100+
hr {
101+
height: 0px;
102+
border-bottom: none;
103+
border-top-width: thin;
104+
border-top-style: dotted;
105+
border-top-color: #999999;
106+
}
107+
108+
@media print {
109+
* {
110+
background: transparent !important;
111+
color: black !important;
112+
filter:none !important;
113+
-ms-filter: none !important;
114+
}
115+
116+
body {
117+
font-size:12pt;
118+
max-width:100%;
119+
}
120+
121+
a, a:visited {
122+
text-decoration: underline;
123+
}
124+
125+
hr {
126+
visibility: hidden;
127+
page-break-before: always;
128+
}
129+
130+
pre, blockquote {
131+
padding-right: 1em;
132+
page-break-inside: avoid;
133+
}
134+
135+
tr, img {
136+
page-break-inside: avoid;
137+
}
138+
139+
img {
140+
max-width: 100% !important;
141+
}
142+
143+
@page :left {
144+
margin: 15mm 20mm 15mm 10mm;
145+
}
146+
147+
@page :right {
148+
margin: 15mm 10mm 15mm 20mm;
149+
}
150+
151+
p, h2, h3 {
152+
orphans: 3; widows: 3;
153+
}
154+
155+
h2, h3 {
156+
page-break-after: avoid;
157+
}
158+
}
159+
</style>
160+
161+
162+
163+
</head>
164+
165+
<body>
166+
<!--
167+
%\VignetteEngine{knitr::knitr}
168+
%\VignetteIndexEntry{urltools}
169+
-->
170+
171+
<p>#WikipediR: A MediaWiki API client library
172+
Many websites run on versions of MediaWiki, most prominently Wikipedia and its sister sites. WikipediR is an API client library that allows you to conveniently make requests for content and associated metadata against MediaWiki instances.</p>
173+
174+
<h2>Retrieving content</h2>
175+
176+
<p>&ldquo;content&rdquo; can mean a lot of different things - but mostly, we mean the text of an article, either its current version or any previous versions. Current versions can be retrieved using <code>page_content</code>, which provides both HTML and wikitext as possible output formats. Older, individual revisions can be retrieved with <code>revision_content</code>. These functions also return a range of possible metadata about the revisions or articles in question.</p>
177+
178+
<p>Diffs between revisions can be generated using <code>revision_diff</code>, while individual &#39;&#39;elements&#39;&#39; of a page&#39;s content - particularly links - can be extracted using <code>page_links</code>, <code>page_backlinks</code>, and <code>page_external_links</code>. And if the interest is in changes to content, rather than content itself, <code>recent_changes</code> can be used to grab a slice of a project&#39;s Special:RecentChanges feed.</p>
179+
180+
<h2>Retrieving metadata</h2>
181+
182+
<p>Page-related information can be accessed using <code>page_info</code>, while categories that a page possesses can be retrieved with <code>categories_in_page</code> - the inverse of this operation (what pages are in a particular category?) uses <code>pages_in_category</code>.</p>
183+
184+
<p>User-related info can be accessed with <code>user_information</code>, while <code>user_contributions</code> allows access to recent contributions by a particular user: this can be conveniently linked up with <code>revision_content</code>, mentioned above, to retrieve the content of the last N edits by a particular editor, or metadata about those edits.</p>
185+
186+
</body>
187+
188+
</html>

vignettes/WikipediR.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
<!--
2+
%\VignetteEngine{knitr::knitr}
3+
%\VignetteIndexEntry{urltools}
4+
-->
5+
6+
#WikipediR: A MediaWiki API client library
7+
Many websites run on versions of MediaWiki, most prominently Wikipedia and its sister sites. WikipediR is an API client library that allows you to conveniently make requests for content and associated metadata against MediaWiki instances.
8+
9+
## Retrieving content
10+
"content" can mean a lot of different things - but mostly, we mean the text of an article, either its current version or any previous versions. Current versions can be retrieved using <code>page\_content</code>, which provides both HTML and wikitext as possible output formats. Older, individual revisions can be retrieved with <code>revision\_content</code>. These functions also return a range of possible metadata about the revisions or articles in question.
11+
12+
Diffs between revisions can be generated using <code>revision\_diff</code>, while individual ''elements'' of a page's content - particularly links - can be extracted using <code>page\_links</code>, <code>page\_backlinks</code>, and <code>page\_external\_links</code>. And if the interest is in changes to content, rather than content itself, <code>recent\_changes</code> can be used to grab a slice of a project's Special:RecentChanges feed.
13+
14+
## Retrieving metadata
15+
Page-related information can be accessed using <code>page\_info</code>, while categories that a page possesses can be retrieved with <code>categories\_in\_page</code> - the inverse of this operation (what pages are in a particular category?) uses <code>pages\_in\_category</code>.
16+
17+
User-related info can be accessed with <code>user\_information</code>, while <code>user\_contributions</code> allows access to recent contributions by a particular user: this can be conveniently linked up with <code>revision\_content</code>, mentioned above, to retrieve the content of the last N edits by a particular editor, or metadata about those edits.

0 commit comments

Comments
 (0)