plotdevice-manual/src/lib/Web.html at master · plotdevice/plotdevice-manual

History

1132 lines (991 loc) · 53.6 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

</head><body>

<h2>Description</h2>

The PlotDevice Web library offers a collection of services to retrieve content from the

internet. You can use the library to query <a href="http://www.yahoo.com">Yahoo!</a> for links,

images, news and spelling suggestions, to read RSS and Atom newsfeeds, to retrieve articles

from <a href="http://en.wikipedia.org">Wikipedia</a>, to collect quality images from <a href="http://www.morguefile.com">morgueFile</a> or <a href="http://www.flickr.com">Flickr</a>, to

get color themes from <a href="http://kuler.adobe.com">kuler</a> or <a href="http://www.colr.org">Colr</a>, to browse through HTML documents, to clean up HTML, to validate

URL’s, to create GIF images from math equations using <a href="http://www.forkosh.com/mimetex.html">mimeTeX</a>, to get ironic word definitions from <a href="http://www.urbandictionary.com">Urban Dictionary</a>.

The PlotDevice Web library works with a caching mechanism that stores things you download from

the web, so they can be retrieved faster the next time. Many of the services also work

asynchronously. This means you can use the library in an animation that keeps on running while

new content is downloaded in the background.

The library bundles Leonard Richardson’s <a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a> to parse HTM, Mark Pilgrim’s

<a href="http://www.feedparser.org/">Universal Feed Parser</a> for newsfeeds, a connection to

John Forkosh’s mathTeX server (thanks Cedric Foellmi), Leif K-Brooks entity replace algorithm,

<a href="http://code.google.com/p/simplejson/">simplejson</a>, and patches for Debian from the

people at <a href="http://indywiki.sourceforge.net/">Indywiki</a>.

<h2>Download</h2>

<tbody>

<tr>

</td><td><a href="http://plotdevice.io/extras/web.zip">web.zip</a> (390KB)

Last updated for NodeBox 1.9.4.6

Licensed under GPL Author: Tom De Smedt

</td></tr></tbody></table>

<h2>Documentation</h2>

<ul>

<li><a href="#library">How to get the library up and running</a>

</li><li><a href="#validation">Validating web content</a>

</li><li><a href="#url">Working with URL’s</a>

</li><li><a href="#html">Working with HTML</a>

</li><li><a href="#yahoo">Querying Yahoo! for links, images and news</a>

</li><li><a href="#yahoocontextual">Improving Yahoo! results with a contextual search</a>

</li><li><a href="#yahoospelling">Using Yahoo! to suggest spelling corrections</a>

</li><li><a href="#yahoosort">Using Yahoo! to sort associatively</a>

</li><li><a href="#google">Querying Google</a>

</li><li><a href="#newsfeed">Reading newsfeeds</a>

</li><li><a href="#wikipedia">Retrieving articles from Wikipedia</a>

</li><li><a href="#wikipediahelpers">Some helper commands to draw Wikipedia content in PlotDevice</a>

</li><li><a href="#morguefile">Querying morgueFile for images</a>

</li><li><a href="#flickr">Querying Flickr for images</a>

</li><li><a href="#kuler">Querying kuler for color themes</a>

</li><li><a href="#colr">Querying Colr for color schemes</a>

</li><li><a href="#math">Creating GIF images from math equations</a>

</li><li><a href="#urbandictionary">Word definitions from Urban Dictionary</a>

</li><li><a href="#asynchronous">Working with asynchronous downloads</a>

</li><li><a href="#cache">Clearing the cache</a>

</li><li><a href="#json">Reading JSON</a>

</li></ul>

<hr/>

<h2><a id="library" name="library" title="library"></a>How to get the library up and

running</h2>

Put the web library folder in the same folder as your script so PlotDevice can find the

library.

You can also put it in ~/Library/Application Support/PlotDevice/.

<pre>web = ximport("web")

</pre>

Outside of PlotDevice you can also just do import web.

Proxy servers

If you are behind a proxy server the library may not be able to connect to the internet.

In that case you need to inform the library with the set_proxy() command:

<pre>web.set_proxy("https://www.myproxyserver.com:80", type="https")

</pre>

<hr/>

<h2><a id="validation" name="validation" title="validation"></a>Validating web content</h2>

Web content is accessed with a URL, the address you use to connect to a place on the

internet. The library has a number of commands to find out what type of content (e.g. web page,

image, ...) is associated with a given URL.

The most basic command, is_url() checks whether a given string is a grammatically

correct URL (e.g. http://nodebox.net but not htp://nodebox.net). It takes a wait

parameter indicating the number of seconds after which the library should stop connecting to

the internet and give up.

<pre>web.is_url(url, wait=10)

</pre>

Even if a URL is valid, it might not refer to actual content on the internet. We can check

if a URL exists with the not_found() command:

<pre>web.url.not_found(url, wait=10)

</pre>

The following commands are useful to find out what content is associated with the URL. We

can discern between HTML web pages which we can parse with <a href="#html">page.parse()</a>, newsfeeds which we can parse with <a href="#newsfeed">newsfeed.parse()</a>, images, audio and video etc. which we can download with

<a href="#url">url.retrieve()</a>.

<pre>web.url.is_webpage(url, wait=10)

</pre>

<pre>web.url.is_stylesheet(url, wait=10)

</pre>

<pre>web.url.is_plaintext(url, wait=10)

</pre>

<pre>web.url.is_pdf(url, wait=10)

</pre>

<pre>web.url.is_newsfeed(url, wait=10)

</pre>

<pre>web.url.is_image(url, wait=10)

</pre>

<pre>web.url.is_audio(url, wait=10)

</pre>

<pre>web.url.is_video(url, wait=10)

</pre>

<pre>web.url.is_archive(url, wait=10)

</pre>

<hr/>

<h2><a id="url" name="url" title="url"></a>Working with URL’s</h2>

A <a href="http://en.wikipedia.org/wiki/Url">URL</a> is the address you use to connect

to a page on the internet, for example: http://nodebox.net. The PlotDevice Web library can

do three different things with a URL: download the content associated with it, parse it (find

out which parts make up the URL) and construct it from scratch (a simple way to create a URL

with HTTP GET or HTTP POST data).

<pre>web.download(url, wait=60, cache=None, type=".html")

</pre>

<pre>web.save(url, path="", wait=60)

</pre>

<pre>web.url.retrieve(url, wait=60, asynchronous=False, cache=None, type=".html")

</pre>

<pre>web.url.parse(url)

</pre>

<pre>web.url.create(url="", method="get")

</pre>

The download() command returns the content associated with the given web address. The

command has an optional parameter wait that determines how long to wait for a

download. If the time is exceeded, the download is aborted.

The last two parameters can be used to cache downloaded content locally, so it

doesn’t have to downloaded again in the future. The cache parameter is a string with the

name of a subfolder in /cache where to store content. The type parameter is the file

extension of the downloaded content.

The save() command stores the URL’s content at the given local path. If no path is

given it will attempt to extract a filename from the URL and store that in the current working

directory. The path to the saved file is returned.

The Web library also has easier ways to deal with specific web content like HTML

(page.parse() command) or Wikipedia articles (wikipedia.search() command) for

example.

The download() command is actually an alias of the url.retrieve() command.

This command has an additional asynchronous parameter useful to download stuff in the

background while an animation keeps on running. We’ll see about asynchronous downloads

later on. The url.retrieve() command returns an object with a data property

containing a string with the downloaded content. If you don’t need anything that complicated

just use the easy download() command:

<pre># Download an image from the PlotDevice Gallery.

url = "http://nodebox.net/code/data/media/twisted-final.jpg"

img = web.download(url)

# Display the image data in PlotDevice.

image(None, 0, 0, data=img)

# Write the image data to a file.

file = open("twisted.jpg", "w")

file.write(img)

file.close()

</pre>

The url.parse() command splits a given url into its components. The returned objects

has the following properties:

<ul>

<li>url.protocol: the type of internet service, usually http

</li><li>url.domain: the domain name, for example, nodebox.net

</li><li>url.username: a username for a secure connection

</li><li>url.password: a password for a secure connection

</li><li>url.port: the port number at the host

</li><li>url.path: the subdirectory at the server, for example /code/index.php/

</li><li>url.page: the name of the document, for example search

</li><li>url.anchor: named anchor on the page

</li><li>url.query: a dictionary of query string values, for example { ‘q’: ‘pixels’

}

</li><li>url.method: the query string method either ‘get’ or ‘post’

</li></ul>

In the same way the url.create() command returns an object with these properties.

This command is useful to, for example, construct URL’s with a POST query and pass that to

url.retrieve() or page.parse().

For example. this script retrieves the first 10 forum pages from PlotDevice:

<pre>url = web.url.create("http://nodebox.net/code/index.php/Share")

url.query["p"] = 1

for i in range(10):

print web.url.retrieve(url).data

url.query["p"] += 1

</pre>

<hr/>

<h2><a id="html" name="html" title="html"></a>Working with HTML</h2>

The PlotDevice Web library uses Leonard Richardson’s <a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a> to parse HTML content. This

means you can search and loop through all the tags in the HTML. For example, you might want to

download a HTML page from the internet, find all the links in it and then download all those

links. Or find all the image tags in the source and then retrieve all those images with

download().

The page.parse() command takes a URL as input and returns a Beautiful Soup object.

The optional cached parameter determines if downloaded pages should be cached locally

for faster retrieval.

<pre>web.page.parse(url, wait=10, cached=True)

</pre>

You can get the meta information in the HTML header with the returned object’s title,

description and keywords properties:

<pre>html = web.page.parse("http://nodebox.net")

print html.description

>>> PlotDevice is a Mac OS X application that lets you create 2D visuals

>>> (static, animated or interactive) using Python programming code

>>> and export them as a PDF or a QuickTime movie. PlotDevice is free

>>> and well-documented.

print html.keywords

>>>[u'PlotDevice', u'Home']

</pre>

You can easily get all the links in the HTML with the links() method. It takes an

optional external parameter which when True, returns only links to other

domains/websites.

<pre>html = web.page.parse("http://nodebox.net/code/index.php/About")

print html.links()

>>> [u'http://research.nodebox.net',

>>> u'http://www.opensource.org/licenses/mit-license.php',

>>> u'http://www.python.org/', u'http://bert.debruijn.be/kgp/',

>>> u'http://diveintopython.org/xml_processing/',

>>> u'http://processing.org',

>>> u'http://www.freelists.org/archives/the_posthumans/',

>>> u'http://research.nodebox.net'

>>> ]

</pre>

The find_all() method returns a list of specific tags. The find() method just

returns the first tag:

<pre>html = web.page.parse("http://nodebox.net/")

print html.find("title").string

>>> PlotDevice | Home

titles = html.find_all("h2")

for title in titles:

print title.string

>>> News

>>> Current projects

>>> Gallery favorites

content = html.find(id="content")

print web.html.plain(content)

>>> Welcome to PlotDevice PlotDevice is a Mac OS X application

>>> that lets you create 2D visuals (static, animated or interactive)

>>> using Python programming code and export them as a PDF or

>>> a QuickTime movie. PlotDevice is free and well-documented.

>>> Read more

>>>

>>> Download PlotDevice for Mac OS X (version 1.8.5) Universal Binary

>>>

>>> Latest updates:

>>>

>>> * Interactivity

>>> * Stop running scripts by hitting command-dot.

>>> ...

</pre>

As you can see you can supply tag names or attribute-value pairs to the find methods.

The find() and find_all() methods return Tag objects that each have

find() and find_all() too. Alternatively, you can also find tags more directly,

for example: html.body.p returns a list of all p tags.

<pre>html = web.page.parse("http://nodebox.net/")

list = html.find(id="content").find("ul")

print web.html.plain(list)

>>> * Interactivity

>>> * Stop running scripts by hitting command-dot.

>>> * Transparent PDFs with the background() command.

>>> * Fast, integrated path mathematics .

>>> * Store libaries centrally in the Application Support folder.

# These statements retrieve exactly the same.

list = html.body(id="content")[0].ul

list = html.body.ul

</pre>

If you need to retrieve tags by their CSS classname, use the find_class() method.

To get attributes from a tag, address it as a dictionary:

<pre>html = web.page.parse("http://nodebox.net/")

html.body.a["href"] # the first link's href attribute

>>> Home

</pre>

<hr/>

The PlotDevice Web library has various commands to clean up HTML.

<pre># Replaces HTML special characters by readable characters.

web.html.replace_entities(unicode, placeholder=" ")

</pre>

<pre># Removes all tags from HTML except those in the exclude list.

web.html.strip_tags(html, exclude=[], linebreaks=False,

blocks="\n", breaks="\n", columns="\n")

</pre>

<pre>web.html.strip_javascript(html)

</pre>

<pre>web.html.strip_inline_css(html)

</pre>

<pre>web.html.strip_comments(html)

</pre>

<pre>web.html.strip_forms(html)

</pre>

<pre># If there are 10 consecutive spaces, 9 of them are removed.

web.html.collapse_spaces(str)

</pre>

<pre># Allow only a maximum of max linebreaks to build up.

web.html.collapse_linebreaks(str, max=2)

</pre>

<pre># Converts tabs to spaces, optionally leaving the left indentation unmodified.

web.html.collapse_tabs(str, indent=False)

</pre>

<pre># Combines all of the above.

web.html.plain(html)

</pre>

<hr/>

<h2><a id="yahoo" name="yahoo" title="yahoo"></a>Querying Yahoo! for links, images and

news</h2>

Before you can use the Web library to query the Yahoo! search engine, you need to obtain

a license key: <a href="http://developer.yahoo.com/search/">http://developer.yahoo.com/search/</a>

Click ‘get an application ID’. Fill out the form and you’ll end up with a long string

of numbers and characters which is your Yahoo! license key. It entitles you to 5000 queries a

day. Now register your license key in your PlotDevice script:

<pre>web.yahoo.license_key("myID")

print web.yahoo.license_key()

>>> myID

</pre>

Note that you can query Yahoo! without setting a license key, in which case you are using a

default key that you share with all the other PlotDevice users who work with the Web library.

Use the yahoo.search(), yahoo.search_images() and yahoo.search_news()

commands to query Yahoo! for links to relevant webpages/images/news:

<pre>web.yahoo.search(q, start=1, count=10, context=None, cached=False)

</pre>

<pre>web.yahoo.search_images(q, start=1, count=10, cached=False)

</pre>

<pre>web.yahoo.search_news(q, start=1, count=10, cached=False)

</pre>

The commands take a q query parameter, and optional start, count and

cached parameters. The start parameter defines the first link to return,

count defines the total amount of links to return. The cached parameter defines

if Yahoo queries are cached locally (so they can be retrieved faster in the future).

<pre>results = web.yahoo.search("plotdevice", start=1, count=5, cached=False)

for item in results:

print item.title

>>> PlotDevice | Home

>>> PlotDevice | Features

>>> visualcomplexity.com | PlotDevice

>>> PlotDevice - Wikipedia, the free encyclopedia

>>> Nodebox - SWiK

</pre>

Each item in the list of results is a YahooResult object with the following

properties:

<ul>

<li>result.url: the URL of the linked page

</li><li>result.title: the title of the linked page

</li><li>result.description: a short description for the page

</li><li>result.type: the <a href="http://en.wikipedia.org/wiki/Mimetype">MIME-type</a> of

the linked page

</li><li>result.date: the modification date of the linked page

</li><li>result.width: for images, the width in pixels

</li><li>result.height: for images, the height in pixels

</li><li>result.source: for news items, the source of the article

</li><li>result.language: for news items, the language used

</li></ul>

The list of results has a total property containing the total number of results

Yahoo! has for your query:

<pre>print results.total

>>> 37200

</pre>

<hr/>

<h2><a id="yahoocontextual" name="yahoocontextual" title="yahoocontextual"></a>Improving Yahoo!

results with a contextual search</h2>

Suppose you are querying Yahoo! for apple. Most likely Yahoo! will return links to

pages relating to the Apple Computer company. But perhaps you wanted links to apples as in

fruit. The yahoo.search() command has an optional context parameter which

you can supply a description of what you mean exactly with apple:

<pre>ctx = '''

The apple tree was perhaps the earliest tree to be cultivated,

and apples have remained an important food in all cooler climates.

To a greater degree than other tree fruit, except possibly citrus,

apples store for months while still retaining their nutritive value.

We are not looking for a company named Apple.

'''

results = web.yahoo.search("apple", count=5, context=ctx)

print r.total

for item in results:

print item.title

>>> Apple - Wikipedia, the free encyclopedia

>>> Apple Core

>>> Apple - Free Encyclopedia

>>> Apple : Apple (fruit)

>>> About the Apple -- Fruit

</pre>

<hr/>

<h2><a id="yahoospelling" name="yahoospelling" title="yahoospelling"></a>Using Yahoo! to

suggest spelling corrections</h2>

To get spelling suggestions from Yahoo, use the yahoo.suggest_spelling() command. It

returns a list of suggested spelling corrections:

<pre>corrections = web.yahoo.suggest_spelling ("amazoon", cached=False)

print corrections

>>> ['amazon']

</pre>

<hr/>

<h2><a id="yahoosort" name="yahoosort" title="yahoosort"></a>Using Yahoo! to sort

associatively</h2>

You can use Yahoo! to sort concepts according to association. Is sky more

green, red or blue?

<pre>colors = ["green", "blue", "red"]

sorted = web.yahoo.sort(colors, "sky", strict=False, cached=True)

for word, weight in sorted:

print word, weight

>>> blue sky 0.396039408425

>>> red sky 0.33604773392

>>> green sky 0.267912857655

</pre>

In this example, Yahoo! is queried for green sky, blue sky and red sky.

The result is a sorted list of (query, weight) tuples. We learn that blue

is the color best associated with sky.

The <a href="http://nodebox.net/code/index.php/Prism">Prism</a> algorithm roughly works in

this way.

<h2><a id="google" name="google" title="google"></a>Querying Google</h2>

You can run <a href="http://www.google.com">Google</a> queries in the same way as <a href="#yahoo">querying Yahoo!</a>.

The library has the following commands available:

<pre>web.google.search(q, start=0, cached=False)

</pre>

<pre>web.google.search_images(q, start=0, size="", cached=False)

</pre>

<pre>web.google.search_news(q, start=0, cached=False)

</pre>

<pre>web.google.search_blogs(q, start=0, cached=False)

</pre>

<pre>web.google.sort(words, context="", strict=True, cached=False)

</pre>

The search commands return a list of results. This list has an additional total

property. Each item in the list is a GoogleResult object with the following properties:

<ul>

<li>result.url: the URL of the linked page

</li><li>result.title: the title of the linked page

</li><li>result.description: a short description for the page

</li><li>result.date: for news and blogs search

</li><li>result.author: for news and blogs search.

</li><li>result.location: for news search.

</li></ul>

Per search, a list of 8 items is returned from the given start item. Google will only

ever return the first 32 results, so the maximum value for start is 24.

When searching for images, the results can be filtered for image size with the

optional size parameter.

Acceptable values are ‘small’, ‘medium’, ‘large’ and ‘wallpaper’.

<hr/>

<h2><a id="newsfeed" name="newsfeed" title="newsfeed"></a>Reading newsfeeds</h2>

The newsfeed.read() command returns information from <a href="http://en.wikipedia.org/wiki/Rss">RSS</a> or <a href="http://en.wikipedia.org/wiki/Atom_%28standard%29">Atom</a> newsfeeds as a collection of news

items with a title, link, description, and more.

<pre>web.newsfeed.parse(url, wait=10, cached=True)

</pre>

The returned newsfeed object has the following properties:

<ul>

<li>newsfeed.title: the title of the newsfeed

</li><li>newsfeed.description: a short description for the newsfeed

</li><li>newsfeed.link: a link to the homepage of the news channel

</li><li>newsfeed.date: a publication date of the news channel

</li><li>newsfeed.encoding: the text encoding used (usually Unicode)

</li><li>newsfeed.items: a list of news items

</li></ul>

The items property is a list in which each item object has properties of its own:

<ul>

<li>item.title: the title of the news item

</li><li>item.link: a link to the full article online

</li><li>item.description: a short description of the news item

</li><li>item.date: the publication date of the news item

</li><li>item.author: the author of the news item

</li></ul>

<pre>newsfeed = web.newsfeed.parse"http://www.whitehouse.gov/rss/news.xml")

print "Channel:", newsfeed.title

print "Channel description:", newsfeed.description

for item in newsfeed.items:

print "Title:", item.title

print "Link:", item.link

print "Description", item.description

print "Date:", item.date

print "Author:", item.author

</pre>

There are other properties as well, like item.date_parsed, item.author_detail and

item.author_detail.email. See the <a href="http://www.feedparser.org/">Universal Feed

Parser</a> documentation for more information.

The address of some well-known newsfeeds can be found in the newsfeed.favorites

dictionary or with the newsfeed.favorite_url() command:

<pre>print web.newsfeed.favorite_url("apple")

>>> http://images.apple.com/main/rss/hotnews/hotnews.rss

</pre>

<hr/>

<h2><a id="wikipedia" name="wikipedia" title="wikipedia"></a>Retrieving articles from

Wikipedia</h2>

<a href="http://en.wikipedia.org/">Wikipedia</a> is a multilingual, web-based, free content

encyclopedia project. Wikipedia is written collaboratively by volunteers; the vast majority of

its articles can be edited by anyone with access to the Internet.

The wikipedia.search() command retrieves articles from Wikipedia. It parses an

article corresponding to the given query into lists of related articles and paragraphs of plain

text without any HTML or other markup in it:

<pre>web.wikipedia.search(q, language="en", light=False, wait=10, cached=True)

</pre>

The command takes a q query parameter and, optionally, the language the

article should be written in. When light is True, only a title, links to

other articles, categories and disambiguation will be parsed from the article

(it’s faster than a full parse).

Note that the q parameter is case-insenstive - this gives the best chance of

retrieving an article.

If you do want case-sensitivity use search(q, case_sensitive=True).

The return value is an article object with the following properties:

<ul>

<li>article.title: the title of the article

</li><li>article.links: a list of titles of related articles

</li><li>article.categories: a list of categories this article belongs to

</li><li>article.disambiguation: a list of article titles describing other

interpretations of the query

</li><li>article.paragraphs: a list of paragraph objects

(WikipediaParagraph)

</li><li>article .images: a list of image objects (WikipediaImage)

</li><li>article.tables: a list of table objects (WikipediaTable)

</li><li>article.references: a list of reference objects (WikipediaReference)

</li><li>article.translations: a dictionary of language keys linking to title

translations

</li><li>article.important: important phrases that appear in bold in the online

article

</li><li>article.markup: the source text of the article in <a href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> markup

</li></ul>

<hr/>

Article links

Each of the titles in the article.links list can be supplied to

wikipedia.search() to get more information on that topic.

<pre>article = web.wikipedia.search("plotdevice")

print article.title

print article.links

>>> PlotDevice

>>> [u'2007', u'2D computer graphics', u'Adobe Photoshop', u'Animation',

>>> u'CMYK', u'Computer animation', u'Core Image', u'DrawBot',

>>> u'February 27', u'Graphic design', u'HSV color space',

>>> u'MIT License', u'Mac OS X', u'OpenGL', u'Portable Document Format',

>>> u'PostScript', u'Processing (programming language)',

>>> u'Python (programming language)', u'QuickTime', u'RGB', u'SVG',

>>> u'WordNet', u'alpha transparency', u'artificial intelligence',

>>> u'collage', u'graphic design'

>>> ]

</pre>

<hr/>

Article paragraphs

Each paragraph object in article.paragraphs is a list of plain text blocks.

Furthermore, a paragraph has a number of properties. This code snippet loops through all

the paragraphs:

<pre>article = web.wikipedia.search("plotdevice")

for paragraph in article.paragraphs:

# A paragraph's depth determines

# if it's a subparagraph or a top-level paragraph.

if paragraph.depth <= 1:

print "="*100

else:

print "-"*100

print paragraph.title + "\n"

# Each paragraph is a list of separate blocks of text:

for textblock in paragraph:

print textblock + "\n"

</pre>

To display a paragraph with the <a href="../ref/Primitives.html#text()">text()</a> command

you can loop over all the textblocks individually:

<pre>fontsize(10)

x, y, w = 20, 20, 300

for textblock in article.paragraphs[0]:

text(textblock, x, y, width=w)

y += textheight(textblock, width=w) + 10

</pre>

Or simply use the Python str() command on the entire list:

<pre>text(str(article.paragraphs[0]), 20, 20, width=300)

</pre>

A paragraph object has the following properties:

<ul>

<li>paragraph.title: the title or heading of this paragraph

</li><li>paragraph.depth: the depth of the paragraph

</li><li>paragraph.parent: a WikipediaParagraph object containing this subparagraph

</li><li>paragraph.children: a list of sub-WikipediaParagraph objects

</li><li>paragraph.main: a list of article titles whose contents describe this paragraph in

detail

</li><li>paragraph.related: more article titles that have related contents

</li><li>paragraph.tables: a list of WikipediaTable objects found in this

paragraph

</li></ul>

<hr/>

Article images

Image objects in the article.images list have properties (like a description) that

can help in discerning what is being displayed in the image:

<pre>article = web.wikipedia.search("computer")

for img in article.images:

print img.description

>>> The NASA Columbia Supercomputer.

>>> A computer in a wristwatch.

>>> The Jacquard loom was one of the first programmable devices.

>>> ...

</pre>

An image object has the following properties:

<ul>

<li>image.path: the image’s filename

</li><li>image.description: a description parsed from the source content

</li><li>image.links: a list of related article titles

</li><li>image properties: a list of properties parsed from the source (e.g. left,

thumb, ...)

</li></ul>

Finally, here is a little web mash-up to draw article images in PlotDevice:

<pre># 1) Get the image filename from the article.

article = web.wikipedia.search("computer")

img = article.images[0]

# 2) Get the HTML for the Wikipedia page displaying the full-size image

img = img.path.replace(" ", "_")

html = web.page.parse("http://en.wikipedia.org/wiki/Image:"+img)

# 3) Find the link in the HTML pointing to the image file.

for a in html.find_all("a"):

if a.has_key("href") and a["href"].endswith(img):

img = a["href"]

break

# 4) Download the image link.

# Pass the downloaded data to the image() command.

img = web.download(img)

image(None, 0, 0, data=img)

</pre>

<hr/>

Article tables

Tables parsed from an article can be accessed from the article.tables list or from

article.paragraph[i].tables. A table object is a list of rows. Each row is a list of

cells:

<pre>article = web.wikipedia.search("computer")

table = article.tables[0]

print table.paragraph.title

print table.title, "("+table.properties+")"

for row in table:

print "-"*50

print row.properties

for cell in row:

print cell, "("+cell.properties+")"

</pre>

As you can see, tables, rows and cells all have properties. Tables also have a

title property and a paragraph property linking to the paragraph object in which

this table was found.

<hr/>

Article references

Text blocks in a Wikipedia paragraph can contain numerous references to websites, journal

and footnotes. They are marked as a number between square brackets, e.g. [15].

For example:

<pre>>>> A key component common to all CPUs is the program counter,

>>> a special memory cell (a register) that keeps track of which

>>> location in memory the next instruction is to be read from. [11]

</pre>

corresponds to article.references[10] - keeping in mind that list indices start from

0:

<pre>print article.references[10]

>>> Instructions often occupy more than one memory address,

>>> so the program counters usually increases by the number of

>>> memory locations required to store one instruction

</pre>

Each reference object in the article.references list has a number of properties. In

the worst case all of the information is stored in reference.note, in the best case the

reference has data for all of the folllowing properties:

<ul>

<li>reference.title: a title of a publication

</li><li>reference.url: a link to a web page

</li><li>reference.author: the author of a publication

</li><li>reference.first: the author’s first name

</li><li>reference.last: the author’s last name

</li><li>reference.journal: the journal in which the article is published

</li><li>reference.publisher: the journal’s publisher

</li><li>reference.date: publication date

</li><li>reference.year: publication year

</li><li>reference.id: an ISBN book number

</li><li>reference.note: footnotes and descriptions

</li></ul>

<hr/>

Article translations

Here’s an example script how to get the translated version of an article:

<pre>article = web.wikipedia.search("computer")

language = "fr"

if article.translations.has_key(language):

translation = article.translations[language]

article = web.wikipedia.search(translation, language)

print article.title

>>> Ordinateur

</pre>

The dictionary of all languages used in Wikipedia:

<pre>print web.wikipedia.languages

</pre>

<hr/>

<h2><a id="wikipediahelpers" name="wikipediahelpers" title="wikipediahelpers"></a>Some helper

commands to draw Wikipedia content in PlotDevice</h2>

A number of commands in the library can help you find out how to display content from

Wikipedia in PlotDevice. There are commands to draw lists, math equations and tables.

<pre># Returns True if a given text block str in a paragraph

# is preformatted text, e.g. a programming code example.

web.wikipedia.is_preformatted(str)

</pre>

<pre># Returns True if a text block in a paragraph is a (numbered) list.

web.wikipedia.is_list(str)

</pre>

<pre># Returns True if a text block in a paragraph is a math equation.

web.wikipedia.is_math(str)

</pre>

<pre># Draws a list text block at x, y coordinates in PlotDevice.

web.wikipedia.draw_list(str, x, y, w, padding=5, callback=None)

</pre>

<pre># Use mimeTeX to draw an image of a math equation at x, y.

web.wikipedia.draw_math(str, x, y, alpha=1.0)

</pre>

<pre># Draws WikipediaTable object in PlotDevice; works very poorly.

web.wikipedia.draw_table(table, x, y, w, padding=5)

</pre>

<hr/>

<h2><a id="morguefile" name="morguefile" title="morguefile"></a>Querying morgueFile for

images</h2>

<a href="http://www.morguefile.com">morgueFile</a> contains photographs freely contributed

by many artists to be used in creative projects by visitors to the site.

The morguefile.search() command returns a list of images on morgueFile that

correspond to a given query. It has an optional parameter max specifying the maximum

number of images to return:

<pre>web.morguefile.search(q, max=100, wait-10, cached=True)

</pre>

<pre>web.morguefile.search_by_author(q, max=100, wait=10, cached=True)

</pre>

Each image object in the returned list has the following properties:

<ul>

<li>img.id: the unique morgueFile ID for the image

</li><li>img.category: the category the image belongs to

</li><li>img.author: the name of the author

</li><li>img.name: the image’s name

</li><li>img.url: the URL of the image thumbnail

img.width: the image width in pixels

</li><li>img.height: the image height in pixels

</li><li>img.date: the date the image was added to morgueFile

</li></ul>

<pre>images = web.morguefile.search("leaf", max=10)

for img in images:

print img.name, img.views

>>> IMG_1662_d.JPG

>>> fedegrafo_100_0221.jpg

>>> cha827.jpg

>>> Filiford_P1010003.JPG

>>> IMG_8336.jpg

>>> CIMG0012_s.JPG

>>> Target_spot_disease_on_maple.JPG

>>> IMG_1664.JPG

>>> bumpy_leaf.JPG

>>> Aztec_Grass.JPG

</pre>

Each image object in the list has a download() method that stores the image file

locally in cache. It has an optional parameter size, which can be set to ‘small’ (image

thumbnail) or ‘large’ (default):

<pre>img = images[0]

img.download(size="large", wait=60)

image(img.path, 0, 0, width=img.width, height=img.height)

print img.author, img.path

>>> Filiford cache/morguefile/240060eb1e1a0628ae32aff811b167ef.JPG

</pre>

<h2><a id="flickr" name="flickr" title="flickr"></a>Querying Flickr for images</h2>

<a href="http://www.flickr.com">Flickr</a> is an online photo management and sharing

application.

You can query it for images in the same way as <a href="#morguefile">querying morgueFile</a>.

<pre>web.flickr.search(q, start=1, count=100, wait=10, cached=True)

</pre>

<pre>web.flickr.recent(start=1, count=100, wait=10, cached=True)

</pre>

The flickr.search() command has two optional parameters: sort and

match. The sort order can be set either to ‘interest’, ‘relevance’ (default) or ‘date’.

The match parameter can be either ‘all’ (image tags must include all of the search

words) or ‘any’ (default).

Each image object in the returned list has a download() method like the morgueFile

interface. The download() method has an optional size parameter which can be

‘square’, ‘small’, ‘medium’, ‘large’ and ‘wallpaper’. This way you can specify the size of the

image to download.

<hr/>

<h2><a id="kuler" name="kuler" title="kuler"></a>Querying kuler for color themes</h2>

<a href="http://kuler.adobe.com/">kuler</a> is an Adobe web-application that allows users to

construct and share color themes.

The kuler.search() command returns a list of color themes on kuler that correspong to

a given query. It has an optional page parameter defining the starting page (each page

has 30 themes).

<pre>web.kuler.search(q, page=0, wait=10, cached=True)

</pre>

<pre>web.kuler.search_by_id(id, page=0, wait=10, cached=True)

</pre>

<pre>web.kuler.search_by_popularity(page=0, wait=10, cached=True)

</pre>

<pre>web.kuler.search_by_rating(page=0, wait=10, cached=True)

</pre>

Each theme object in the returned list has the following properties:

<ul>

<li>theme.id: the unique kuler id for the theme

</li><li>theme.author: the name of the author

</li><li>theme.label: the title of the theme

</li><li>theme.tags: a list of keywords for a theme found with

kuler.search_by_id()

</li><li>theme.darkest: a tuple of (R, G, B) values for the darkest color in the theme

</li><li>theme.lightest: a tuple of (R, G, B) values the lightest color in the theme

</li></ul>

You can loop through a theme object as a list. Each item in the theme is a tuple of (R, G,

B) values, which you can supply to <a href="../ref/Line+Color.html#fill()">fill()</a> or

<a href="../ref/Line+Color.html#stroke()">stroke()</a> in PlotDevice.

<pre>themes = web.kuler.search("banana")

for r, g, b in themes[0]:

print r, g, b

# The kuler.preview() command gives you an idea of the theme's colors.

web.kuler.preview(themes[0])

</pre>

Each theme also has a draw() method that displays the colors in the theme in PlotDevice:

<pre>themes = web.kuler.search("banana")

themes[0].draw(50, 50, w=40, h=120)

</pre>

<h2><a id="colr" name="colr" title="colr"></a>Querying Colr for color themes</h2>

<a href="http://www.colr.org">Colr</a> is an online tool by Lloyd Dalton to let people

fiddle around with colors.

You can query it for color themes in the same way as querying <a href="#kuler">kuler</a>.

<pre>web.colr.search(q, page=0, wait=10, cached=True)

</pre>

<pre>web.colr.search_by_id(id, page=0, wait=10, cached=True)

</pre>

<pre>web.colr.latest(page=0, wait=10, cached=True)

</pre>

<pre>web.colr.random(page=0, wait=10, cached=True)

</pre>

You can manipulate each theme object in the returned list as with the kuler interface.

<hr/>

<h2><a id="math" name="math" title="math"></a> Creating PNG or GIF images from math

equations</h2>

John Forkosh has a <a href="http://www.forkosh.com/mathtex.html">mathTeX server</a> that

creates PNG or GIF images from math equations.

<pre>web.mathtext.png(equation, dpi=120, color="")

</pre>

<pre>web.mathtext.gif(equation, dpi=120, color="")

</pre>

The optional dpi parameter sets the image resolution, while color can be the

name of a primary color (e.g. blue, green, red, ...)

<pre>equation = r"E = hf = \frac{hc}{\lambda} \,\! "

img = web.mathtex.gif(equation)

image(img, 0, 0)

</pre>

<hr/>

<h2><a id="urbandictionary" name="urbandictionary" title="urbandictionary"></a>Word definitions

from Urban Dictionary</h2>

<a href="http://www.urbandictionary.com">Urban Dictionary</a> is a slang dictionary with

user-defined description for words. You can often get some humorous (or cruel and childish)

results from it.

The urbandictionary.search() command returns a list of definitions for a given word.

<pre>web.urbandictionary.search(q, cached=True)

</pre>

Each definition object in the returned list has the following properties:

<ul>

<li>definition.description: a description of the given word

</li><li>definition.example: example usage of the word in a sentence

</li><li>definition.author: the author who came up with the definition

</li><li>definition.links: a list of words linked to the definition

</li><li>definition.url: the web page where the definition can be found

</li></ul>

<pre>definitions = web.urbandictionary.search("life")

print definitions[0].description

>>> A sexually-transmitted, terminal disease.

</pre>

<hr/>

<h2><a id="asynchronous" name="asynchronous" title="asynchronous"></a>Working with asynchronous

downloads</h2>

Downloading content from the internet may take a moment to complete. When running an

animation in PlotDevice, you don’t want it to halt while PlotDevice waits for the download to

complete. The Web library offers you the capability to download asynchronously.

NodeBox will then continue running with the download taking place in the background

memory. Once it is done you can start manipulating the retrieved data.

The url.retrieve() command has some optional parameters to do asynchronous downloads:

<pre>web.url.retrieve(url, wait=60, asynchronous=False, cache=None, type=".html")

</pre>

With asynchronous set to True, the download will occur in the background. The

returned object has a done property which is True when downloading has terminated. The

object’s data property then contains the source data.

You can also set a wait amount of seconds that is the maximum amount of time PlotDevice

will connect to the internet. When the limit is exceeded and no data was fully recovered, the

returned object will have an error property set. When something else went wrong

error will be set as well by the way, usually with a URLTimeout, a

HTTPError or a HTTP404NotFound exception.

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FilesExpand file tree

Web.html

Latest commit

History

Web.html

File metadata and controls