Skip to content

Commit 708fa34

Browse files
committed
Makefiles: Use wget instead of httrack
Httrack does not name downloaded files consistently. Also, it occasionally manages to miss several files. Wget has needed filtering features since 1.15, so there no longer are reasons that prevent its usage.
1 parent faaf83f commit 708fa34

File tree

3 files changed

+12
-90
lines changed

3 files changed

+12
-90
lines changed

Makefile

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -169,22 +169,17 @@ source:
169169
mkdir "reference"
170170

171171
pushd "reference" > /dev/null; \
172-
httrack http://en.cppreference.com/w/ -k --near --include-query-string \
173-
-* +en.cppreference.com/* +upload.cppreference.com/* -*index.php\?* \
174-
-*/Special:* -*/Talk:* -*/Help:* -*/File:* -*/Cppreference:* -*/WhatLinksHere:* \
175-
-*/Template:* -*/Category:* -*action=* -*printable=* \
176-
-en.cppreference.com/book/* -en.cppreference.com/book \
177-
+*MediaWiki:Common.css* +*MediaWiki:Print.css* +*MediaWiki:Vector.css* \
178-
-*MediaWiki:Geshi.css* "+*title=-&action=raw*" --timeout=180 --retries=10 ;\
172+
regex=".*index\\.php.*|.*/Special:.*|.*/Talk:.*" \
173+
regex+="|.*/Help:.*|.*/File:.*|.*/Cppreference:.*" \
174+
regex+="|.*/WhatLinksHere:.*|.*/Template:.*|.*/Category:.*" \
175+
regex+="|.*action=.*|.*printable=.*|.*en.cppreference.com/book.*" ; \
176+
echo $$regex ; \
177+
wget --adjust-extension --page-requisites --convert-links \
178+
--force-directories --recursive --level=15 \
179+
--span-hosts --domains=en.cppreference.com,upload.cppreference.com \
180+
--reject-regex $$regex \
181+
--timeout=180 --no-verbose \
182+
--retry-connrefused --waitretry=1 --read-timeout=20 \
183+
http://en.cppreference.com/w/ ; \
179184
popd > /dev/null
180185

181-
#delete useless files
182-
rm -rf "reference/hts-cache"
183-
rm -f "reference/backblue.gif"
184-
rm -f "reference/fade.gif"
185-
rm -f "reference/hts-log.txt"
186-
rm -f "reference/index.html"
187-
188-
#download files that httrack has forgotten
189-
./httrack-workarounds.py
190-

httrack-workarounds.py

Lines changed: 0 additions & 70 deletions
This file was deleted.

preprocess.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,6 @@
7575
for filename in fnmatch.filter(filenames, '*.css'):
7676
css_files.append(os.path.join(root, filename))
7777

78-
#
79-
r1 = re.compile('<!-- Added by HTTrack -->.*?<!-- \/Added by HTTrack -->')
80-
r2 = re.compile('<!-- Mirrored from .*?-->')
8178

8279
#temporary fix
8380
r3 = re.compile('<style[^<]*?<[^<]*?MediaWiki:Geshi\.css[^<]*?<\/style>', re.MULTILINE)

0 commit comments

Comments
 (0)