convert to use rest_v1 api plus a bit more cli arg checking #8

apergos · 2017-01-17T12:14:04Z

This makes the existing code work with the not-so-new-now api.

d00rman

Some comments and thoughts in-lined.

d00rman · 2017-01-27T19:26:50Z

bin/dump_wiki

    process.exit(1);
 }
-
+if (!argv.dataBase && !argv.saveDir) {


Hm, I would argue that this actually makes sense because it allows one to go over the whole dataset without having to have a lot of storage taken for a run. For example, we use the htmldumper script in production when we want to force a refresh of part of the content.

+1. Running a dump without storing the results is an important use case that we should continue to support.

When you say 'refresh part of the content', refresh it where?

@apergos, in RESTBase. The htmldumper script only spiders the contents in that case, and throws away the result. Another use case is load testing in the staging environment.

d00rman · 2017-01-27T20:29:04Z

lib/htmldump.js

    this.articles = [];
    this.waiters = [];
    this.done = false;
+    this.contentHost = 'http://' + options.prefix + '/api/rest_v1'


I think it might make more sense to have contentHost as a command-line argument that replaces both host and prefix. That way one can specify exactly what they need. For example, outside of prod, that would be https://en.wikipedia.org/api/rest_v1/ while if run inside, we can set it to http://restbase.svc.eqiad.wmnet/en.wikipedia.org/v1/

prefix is used by itself though for the Host: header. Or are you suggesting parsing the value of the new argument to get the hostname out, for that purpose?

Maybe there needs to be a host arg and a contentUrl arg?

Good point, we still need to set the Host header. Perhaps we could then change the meaning of prefix to be what is now contentHost and keep host as a command-line arg? So we'd have both prefix and host, but prefix would be the URI prefix to use and host would be used for the header and the domain in the requests. The latter could be optional, so that if it's not specified, we can simply url.parse() it out of prefix.

monperrus · 2017-03-15T09:35:49Z

Cross-linking with phabricator: this PR seems to be a blocker for https://phabricator.wikimedia.org/T17017

apergos · 2017-04-07T20:36:51Z

I've made the changes I think you want, untested though. Tell me if I've understood you right?

d00rman

This PR partially tries to address the issues addressed in #9 . I think we should work on that one first, as it allows users to specify the params on the command line, which makes it more versatile (the aim is to be able to use htmldumper from both inside and outside of production)

d00rman · 2017-04-18T16:38:23Z

lib/htmldump.js

        headers: {
            'user-agent': options.userAgent,
-            host: options.prefix
+            host: options.host


This should be the domain. When sending requests to the appservers, the Host header indicates the actual domain to fetch info for, not the originating host.

d00rman · 2017-04-18T16:38:46Z

lib/htmldump.js

            }
-            var url = options.host + '/' + options.prefix
-                        + '/v1/page/html/' + encodeURIComponent(title) + '/' + oldid;
+            var url = options.prefix + '/page/html/'


This problem is addressed in #9

convert to use rest_v1 api plus a bit more cli arg checking

43e6971

d00rman suggested changes Jan 27, 2017

View reviewed changes

improve cli arg fixups

affe9fd

d00rman suggested changes Apr 18, 2017

View reviewed changes

convert to use rest_v1 api plus a bit more cli arg checking #8

Are you sure you want to change the base?

convert to use rest_v1 api plus a bit more cli arg checking #8

Uh oh!

Conversation

apergos commented Jan 17, 2017

Uh oh!

d00rman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gwicke Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monperrus commented Mar 15, 2017

Uh oh!

apergos commented Apr 7, 2017

Uh oh!

d00rman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

gwicke Jan 30, 2017 •

edited

Loading