Skip to content
Merged
2 changes: 1 addition & 1 deletion docs/scenarios/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ The following command lists all available minions running CentOS using the grain

Salt also provides a state system. States can be used to configure the minion hosts.

For example, when a minion host is ordered to read the following state file, will install
For example, when a minion host is ordered to read the following state file, it will install
and start the Apache server:

.. code-block:: yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/scenarios/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Command Line Applications
Clint
-----

.. todo:: Write about Clint
.. todo:: Write about Clint
6 changes: 6 additions & 0 deletions docs/scenarios/client.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,9 @@ messaging library aimed at use in scalable distributed or concurrent
applications. It provides a message queue, but unlike message-oriented
middleware, a ØMQ system can run without a dedicated message broker. The
library is designed to have a familiar socket-style API.

RabbitMQ
--------

.. todo:: Write about RabbitMQ

3 changes: 1 addition & 2 deletions docs/scenarios/db.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ Django ORM
The Django ORM is the interface used by `Django <http://www.djangoproject.com>`_
to provide database access.

It's based on the idea of models, an abstraction that makes it easier to
It's based on the idea of `models <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_, an abstraction that makes it easier to
manipulate data in Python.

Documentation can be found `here <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_
6 changes: 3 additions & 3 deletions docs/scenarios/gui.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Gtk
PyGTK provides Python bindings for the GTK+ toolkit. Like the GTK+ library
itself, it is currently licensed under the GNU LGPL. It is worth noting that
PyGTK only currently supports the Gtk-2.X API (NOT Gtk-3.0). It is currently
recommended that PyGTK is not used for new projects and existing applications
recommended that PyGTK not be used for new projects and existing applications
be ported from PyGTK to PyGObject.

Tk
Expand All @@ -60,10 +60,10 @@ available on the `Python Wiki <http://wiki.python.org/moin/TkInter>`_.

Kivy
----
Kivy is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.
`Kivy <http://kivy.org>`_ is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.

Kivy is written in Python, based on OpenGL and supports different input devices such as: Mouse, Dual Mouse, TUIO, WiiMote, WM_TOUCH, HIDtouch, Apple's products and so on.

Kivy is actively being developed by a community and free to use. It operates on all major platforms (Linux, OSX, Windows, Android).

The main resource for information is the website: http://kivy.org
The main resource for information is the website: http://kivy.org
4 changes: 2 additions & 2 deletions docs/scenarios/imaging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ The `Python Imaging Library <http://www.pythonware.com/products/pil/>`_, or PIL
for short, is *the* library for image manipulation in Python.

It works with Python 1.5.2 and above, including 2.5, 2.6 and 2.7. Unfortunately,
it doesn't work with 3.0+ yet.
it doesn't work with 3.0+ yet.

Installation
~~~~~~~~~~~~

PIL has a reputation of not being very straightforward to install. Listed below
are installation notes on various systems.

Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
to install. It has good setup instructions for all platforms.

Installing on Linux
Expand Down
6 changes: 3 additions & 3 deletions docs/scenarios/network.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,19 @@ Twisted

`Twisted <http://twistedmatrix.com/trac/>`_ is an event-driven networking engine. It can be
used to build applications around many different networking protocols, including http servers
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
`many more <http://twistedmatrix.com/trac/wiki/Documentation>`_.

PyZMQ
-----

`PyZMQ <http://zeromq.github.com/pyzmq/>`_ is the Python binding for `ZeroMQ <http://www.zeromq.org/>`_,
which is a high-performance asynchronous messaging library. One great advantage is that ZeroMQ
can be used for message queuing without message broker. The basic patterns for this are:
can be used for message queuing without a message broker. The basic patterns for this are:

- request-reply: connects a set of clients to a set of services. This is a remote procedure call
and task distribution pattern.
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
distribution pattern.
- push-pull (or pipeline): connects nodes in a fan-out / fan-in pattern that can have multiple
steps, and loops. This is a parallel task distribution and collection pattern.
Expand Down
13 changes: 9 additions & 4 deletions docs/scenarios/scientific.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ people who only need the basic requirements can just use NumPy.

NumPy is compatible with Python versions 2.4 through to 2.7.2 and 3.1+.

Numba
-----
.. todo:: Write about Numba

SciPy
-----

Expand All @@ -60,8 +64,9 @@ Resources

Installation of scientific Python packages can be troublesome. Many of these
packages are implemented as Python C extensions which need to be compiled.
This section lists various so-called Python distributions which provide precompiled and
easy-to-install collections of scientific Python packages.
This section lists various so-called scientific Python distributions which
provide precompiled and easy-to-install collections of scientific Python
packages.

Unofficial Windows Binaries for Python Extension Packages
---------------------------------------------------------
Expand Down Expand Up @@ -91,6 +96,6 @@ Anaconda
Python Distribution <https://store.continuum.io/cshop/anaconda>`_ which
includes all the common scientific python packages and additionally many
packages related to data analytics and big data. Anaconda comes in two
flavours, a paid for version and a completely free and open source community
flavors, a paid for version and a completely free and open source community
edition, Anaconda CE, which contains a slightly reduced feature set. Free
licences for the paid-for version are available for academics and researchers.
licenses for the paid-for version are available for academics and researchers.
200 changes: 101 additions & 99 deletions docs/scenarios/scrape.rst
Original file line number Diff line number Diff line change
@@ -1,99 +1,101 @@
HTML Scraping
=============

Web Scraping
------------

Web sites are written using HTML, which means that each web page is a
structured document. Sometimes it would be great to obtain some data from
them and preserve the structure while we're at it. Web sites provide
don't always provide their data in comfortable formats such as ``.csv``.

This is where web scraping comes in. Web scraping is the practice of using a
computer program to sift through a web page and gather the data that you need
in a format most useful to you while at the same time preserving the structure
of the data.

lxml and Requests
-----------------

`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
XML and HTML documents really fast. It even handles messed up tags. We will
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_ module instead of the already built-in urlib2
due to improvements in speed and readability. You can easily install both
using ``pip install lxml`` and ``pip install requests``.

Lets start with the imports:

.. code-block:: python

from lxml import html
import requests

Next we will use ``requests.get`` to retrieve the web page with our data
and parse it using the ``html`` module and save the results in ``tree``:

.. code-block:: python

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)

``tree`` now contains the whole HTML file in a nice tree structure which
we can go over two different ways: XPath and CSSSelect. In this example, I
will focus on the former.

XPath is a way of locating information in structured documents such as
HTML or XML documents. A good introduction to XPath is on `W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .

There are also various tools for obtaining the XPath of elements such as
FireBug for Firefox or if you're using Chrome you can right click an
element, choose 'Inspect element', highlight the code and then right
click again and choose 'Copy XPath'.

After a quick analysis, we see that in our page the data is contained in
two elements - one is a div with title 'buyer-name' and the other is a
span with class 'item-price':

::

<div title="buyer-name">Carson Busses</div>
<span class="item-price">$29.95</span>

Knowing this we can create the correct XPath query and use the lxml
``xpath`` function like this:

.. code-block:: python

#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

Lets see what we got exactly:

.. code-block:: python

print 'Buyers: ', buyers
print 'Prices: ', prices

::

Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']

Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
'$15.00', '$114.07', '$10.09']

Congratulations! We have successfully scraped all the data we wanted from
a web page using lxml and Requests. We have it stored in memory as two
lists. Now we can do all sorts of cool stuff with it: we can analyze it
using Python or we can save it a file and share it with the world.

A cool idea to think about is modifying this script to iterate through
the rest of the pages of this example dataset or rewriting this
application to use threads for improved speed.
HTML Scraping
=============

Web Scraping
------------

Web sites are written using HTML, which means that each web page is a
structured document. Sometimes it would be great to obtain some data from
them and preserve the structure while we're at it. Web sites don't always
provide their data in comfortable formats such as ``csv`` or ``json``.

This is where web scraping comes in. Web scraping is the practice of using a
computer program to sift through a web page and gather the data that you need
in a format most useful to you while at the same time preserving the structure
of the data.

lxml and Requests
-----------------

`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
XML and HTML documents really fast. It even handles messed up tags. We will
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_
module instead of the already built-in urlib2 due to improvements in speed and
readability. You can easily install both using ``pip install lxml`` and
``pip install requests``.

Lets start with the imports:

.. code-block:: python

from lxml import html
import requests

Next we will use ``requests.get`` to retrieve the web page with our data
and parse it using the ``html`` module and save the results in ``tree``:

.. code-block:: python

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)

``tree`` now contains the whole HTML file in a nice tree structure which
we can go over two different ways: XPath and CSSSelect. In this example, I
will focus on the former.

XPath is a way of locating information in structured documents such as
HTML or XML documents. A good introduction to XPath is on
`W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .

There are also various tools for obtaining the XPath of elements such as
FireBug for Firefox or the Chrome Inspector. If you're using Chrome, you
can right click an element, choose 'Inspect element', highlight the code,
right click again and choose 'Copy XPath'.

After a quick analysis, we see that in our page the data is contained in
two elements - one is a div with title 'buyer-name' and the other is a
span with class 'item-price':

::

<div title="buyer-name">Carson Busses</div>
<span class="item-price">$29.95</span>

Knowing this we can create the correct XPath query and use the lxml
``xpath`` function like this:

.. code-block:: python

#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

Lets see what we got exactly:

.. code-block:: python

print 'Buyers: ', buyers
print 'Prices: ', prices

::

Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']

Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
'$15.00', '$114.07', '$10.09']

Congratulations! We have successfully scraped all the data we wanted from
a web page using lxml and Requests. We have it stored in memory as two
lists. Now we can do all sorts of cool stuff with it: we can analyze it
using Python or we can save it to a file and share it with the world.

A cool idea to think about is modifying this script to iterate through
the rest of the pages of this example dataset or rewriting this
application to use threads for improved speed.
12 changes: 7 additions & 5 deletions docs/scenarios/speed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The GIL

`The GIL`_ (Global Interpreter Lock) is how Python allows multiple threads to
operate at the same time. Python's memory management isn't entirely thread-safe,
so the GIL is required to prevents multiple threads from running the same
so the GIL is required to prevent multiple threads from running the same
Python code at once.

David Beazley has a great `guide`_ on how the GIL operates. He also covers the
Expand All @@ -58,8 +58,8 @@ C Extensions
The GIL
-------

`Special care`_ must be taken when writing C extensions to make sure you r
egister your threads with the interpreter.
`Special care`_ must be taken when writing C extensions to make sure you
register your threads with the interpreter.

C Extensions
::::::::::::
Expand All @@ -76,7 +76,9 @@ Pyrex
Shedskin?
---------


Numba
-----
.. todo:: Write about Numba and the autojit compiler for NumPy

Threading
:::::::::
Expand All @@ -86,7 +88,7 @@ Threading
---------


Spanwing Processes
Spawning Processes
------------------


Expand Down
Loading