Sven R. Kunze

UI Changes of Ubuntu 17.10

2017-11-02T00:36:00.001+01:00

A Timeline Of Discovery

Dist-Upgrade went through without an issue. Check! ✓
Up-to-date packages installed. They seem to work just fine. Check! ✓
Old stuff removed. Check! ✓
Internals basically work the same except some minor systemd changes. Check! ✓
A lot of UI changes. 🗙

UI Changes

First things first: login now requires an additional keystroke (Enter-Key) to see/activate login form of user. Meeeh! 🗙
Starting applications from dock results in strange icon animation. Meeeh! 🗙
Pinning applications is not called “pinning” anymore but “Add to Favorites”. Meeeh for beginners – pinning is a well-known concept!
Missing Desktop Link to Desktop in Files. Meeeh!
No hot corners for closing maximized application by mouse. Meeeh! 🗙
Trash not in dock, polluting desktop – disabled trash icon altogether. Meeeh! 🗙
Top bar does not merge with window bar when maximized – wasting a lot of space. Meeeh! 🗙
Ancient-looking icons with random arrows on the top right. Meeeh! 🗙
Aggregation of unrelated concepts, makes it harder to find right quick settings and requires a second click to open menus with arrows. Meeeh! 🗙
Clock in the middle of the top bar. Actually no issue, but is reason for issues 5 and 7, so meeeh!
Space-wasting buttons on the top-left corner, another reason for issues 5 and 7. Meeeh! 🗙
Apps-floating-around animation when clicking on the Sudoku icon on the bottom left. Like kindergarden, meeeh! 🗙
No known shortcut for Sudoku icon. Meeeh! 🗙
Cannot see all settings in the Settings app at once, needs scrolling now – space wasting again. Meeeh! 🗙
Single settings dialog looks clean. Nice! ✓
Super-key search doesn't work reliably – depending on the mouse position (windows steal focus). Super annoying. Meeeh! 🗙
Super-key search not forgiving of typos like in former Ubuntu versions (e.g. geidt -> gedit) Meeeh! 🗙
Lock screen image disappears when entering password - displaying gray-noise background. Looks 10 years older! Meeeh! 🗙
Long-pressing super key does not reveal dock numbers of applications. Meeeh! 🗙
Long-pressing super key does not reveal shortcuts of systems. Meeeh! 🗙
In preparation of this post: taking screenshots - they aren’t in the clipboard by default but stored in Pictures as an image file. Couldn't find a setting so far. Meeeh! 🗙
I don't get the conceptual difference (thus intended usage) of the Sudoku Icon search and the Super-Key search. Both allow to search for/start applications but they look different for unknown reasons. 🗙
Plugged in my phone. Why isn't it in the dock? I discovered it after closing all apps (and finished the task). Hmm. 🗙

Conclusion

Upgrade from 17.04 works fine!
Internals and drivers work fine as well!
UI has changed a lot; as a desktop OS graphical design is important:
- wasting a lot of space – especially bad for smaller screens such as the laptop I use to write this post
- aged look :-(
- a lot of useless bells and whistles

Despite all the meeehs, thanks a lot, Ubuntu team, for the hard work you put into this release. I am very eager to see the next iteration of this product!

Cheers,
Sven

PS: you might wonder why I nitpick about so many tiny details of the new Gnome-based UI. I care because I build my workflows around these tiny details of which I have now no viable alternative (e.g. the clipboard one - issue 21) or no time-efficient one (hot-corners for closing apps by mouse).

Drinking Games with PostgreSQL: GIN and RUM

2017-10-28T10:50:00.000+02:00

Fulltext support is a very important feature of most modern relational databases. For one, they enable fast information retrieval and for another, they simply allow application designers and operators to remain inside the known relational world. No need for a second database-like system, no need for additional maintenance, no need for yet another library, no need for a fundamentally different query language. You get the point.

Starting with version 8.3, PostgreSQL supports the tsvector and tsquery constructs and with it indexing support for the matching operator. This post will cover more of those indexing technology which we can utilize for accelerating regular expressions, LIKE-queries, fulltext and even JSON-related queries.
Juniper-Flavored Spirit

GIN stands for Generic Inverted Index and is the de facto standard index since PostgreSQL 8.3 for indexing tsvectors thus accelerating subsequent tsqueries.

Here is how it works:

a document (basically text) is split into normalized words
GIN is, simply speaking, a BTree of words mapping to document IDs

With this strategy, it is possible to index a lot of documents and retrieve them efficiently via the matching operator @@. If you want to read about it in detail, have a look at here and here.

In recent years, fulltext has become more and more important. So GIN has seen e.g. performance improvements in PostgreSQL 9.4. Reducing the index size and speeding-up multi-key lookups (rare + frequent) did the trick to accelerate the @@ operator even more for larger amounts of documents.

I can recommend this resource and this one, if you need more good reads.

BUT there's still room for improvement, and that's where more beverages come into play.

Distilled Alcoholic Beverage

It seems like the folks over at Postgres Professional have a preference for alcoholic beverages (please mind the vodka at the end of the presentation referred above). Good beverages appear to help pushing boundaries even further and build a brand-new index infrastructure. They called it:

RUM

Why do we need yet another index implementation? The following list shows the items missing efficient/general indexing support:

phrase operator (<->, <n>)
ranking (ts_rank, ts_rank_cd)
inverse fulltext search
inverse regular expression
position in JSON arrays

I didn't find out whether or not RUM is a acronym, but the basic idea is to store additional data among the usual GIN data [1][2]. This allows associating ordering information for efficient ranking, distance determination or positioning in JSON arrays. This approach increases the size of the index marginally but speeds up real-world queries several orders of magnitude.

In order to leverage RUM best, they also gave birth to an new ranking function: ts_score which is a middleground between the well-known ranking functions ts_rank and ts_rank_cd.
The slides are pretty clear about the shortcomings of those functions [1][2]:

ts_rank doesn't supports logical operators
ts_rank_cd works poorly with OR queries

I don't know completely how they plan to integrate this into PostgreSQL trunk if at all, but personally, I would love to see these features in PostgreSQL 11. They solve many real-world use-cases. So, keep up the good work!

Btw. if you want to give them a hand or review, stop by at their GitHub repositoy.

Enjoy the drinks 😉,
Sven

[1] https://pgconf.ru/media/2017/04/03/20170316H3_Korotkov-rum.pdf
[2] http://www.sai.msu.su/~megera/postgres/talks/pgopen-2016-rum.pdf

PostgreSQL: production-ready Hash Indexes

2017-06-27T23:44:00.000+02:00

Up to PostgreSQL 9.6, hash indexes were second class. Using them in production was not recommended; they weren't crash-safe. That's going to change with PostgreSQL 10. And here's what's gonna change.

Write Ahead Logging for Hash Indexes
Commit Fest Entry and the relevant threads (1) + (2)

The actual problem with hash indexes were their inability of using the WAL. Quoting the docs:

Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes.

With PostgreSQL 10, this drawback has gone. The community, with Amit Kapilla leading the way, worked hard to remove this restriction and made hash index a first class citizen of the PostgreSQL.

What's in for you, you might wonder. According to Amit Kapilla, hash indexes tend to perform better than B-trees if the column values are unique. So, it can be beneficial to use hash indexes instead of btrees. Performance FTW!

Given the production-readiness of hash indexes, it now makes sense to improve the long-neglected index type in general. It's hard to say if "WAL for Hash Indexes" was the trigger here but it makes subsequent patches to hash indexes usable in the first place. So, I picked the following two interesting ones.

Microvacuum Hash Indexes
Commit Fest Entry and the relevant thread

This patch reduces the index size by factors of 2 to 4. It does that by increasing the re-usability of hash index pages and avoiding page splits.
A small but nice side effect: while reviewing the patch, occasional deadlocks caught the community's eye and eventually led to improved testing code for other index types.

A Better Way to Expand Hash Indexes.
Commit Fest Entry and the relevant thread

Hash indexes need to grow when more and more items are inserted. Usually, there's a so-called bucket increase which, by this patch, is chopped into smaller pieces and allows for a more gradual way of increasing the size of hash indexes.

Given those improvements, I can imagine a lot of people now considering hash indexes as a viable alternative to the venerable B-tree. What about you?

Regards,
Sven

CI Done Right

2017-06-10T15:00:00.002+02:00

Broken.

Recently, some people have had some serious issues with a broken setuptools release [1, 2, 3]. One special complain was about broken CI systems just because of this central package. Replying to those reactions, I had a conversation via twitter about certain design decisions of continuous integration [4, 5].

In this post, I want to assemble part of my personal experience in building those systems for the last couple of years. The result should be a little cozy guideline for those setting up their CI system and CD system within a corporate environment.

So, let's start with those rough steps:

Define what is to be tested.
Define what will happened in case of success and failure of those tests.

Each of those steps aren't easy and setting it up over night won't help. CI systems differ from environment to environment. As usual, it depends. This said, let's dig into the details.

What is to be tested?

We can break down this question into those subpoints:

Define the build environment, including

system binaries
system configs
directory structures
running services

Define the test environment, including

your source-code
your packages
3rd-party packages
non-source-code data

Define this stuff. Don't even start without having put any thought into it. Usually, we want to test our code and its integration in 3rd-party code but not the 3rd-party code itself.

And there is more to it. Software usually comes in form of releases. In practice, release A does almost the same things as release B. That's why we have a constant package name with changing versions attached to it.

However on an abstract level, each release of a package is actually a different package in itself. There is a diff, isn't it? So, we need to bother with package releases before implementing our CI system, which basically boils down to:

Dependency Update Cadence

You guessed it - we need to define it as well for everything already defined above (usually dependencies to your code): system binaries, 3rd-party packages, your packages, etc.

How often do you update the OS environment?
How often do you update 3rd-party dependencies? Which ones?
How often do you update your packages?

Manually?
On a regular basis like monthly?
Do you really need to be bleeding edge? Or will a more stable version do?

Ever updated one of your important frameworks (e.g. like Django)? Don't tell me you could fix the incompatibilities in the few millions of lines of legacy source code within a day - all of them.

Some package require bleeding-edge versions of different dependencies (like in case of virtualenv installing latest setuptoos). Using a private package servers and some cronjobs is one way to answer those questions in a sensible and enterprise-friendly way.

Pinning dependencies is another way. Use it, to define versions and define a (preferably automated) process of updating those according to your needs.

This will give you a proper test environment, where you can rely on what complements your code while being tested. Test results aren't meaningful otherwise.

Accounting for dependency versions enables you to change their update cadence according to quick needs.

As it seemed some of the people who were using the bleeding-edge version of setuptools haven't had a way to change their setup quick enough. So, they relied on how fast the team around setuptools could fix it. That is unacceptable in corporate software engineering.

What happens in case of test success and test failure?

As mentioned at the beginning, every CI/CD setup is different and it should serve their creators' needs. So, here's a collection of items, people might want to do after a successful test run (YMMV but usually that means zero failures):

merge code
create a release aka version tagging
build a package of this release
trigger jobs for dependent packages
update/deploy QA (re-running test there again?) and production systems with theses releases

Here we go with another list in case of test failures (usually that means at least a single test failure):

notify developers
re-run tests under certain circumstances
do nothing ;-)

For the sake of completeness: all of these items should be triggered and executed automatically with no supervision required except in case of errors of the CI/CD machinery. That's a design aspect, you need to care about deeply.

Considering all those points should help building CI/CD systems while allowing you to minimize wasted enterprise resources (aka your time) and increase acceptance within your team.

Happy coding,

Sven

[1] https://www.reddit.com/r/Python/comments/6elcaa/psa_setuptools_broken_release_36_dont_use_it/
[2] https://github.com/pypa/setuptools/issues/1042
[3] https://github.com/pypa/setuptools/pull/1043

[4] https://twitter.com/kunsv/status/870399225883836417
[5] https://twitter.com/lucaswiman/status/870682012675211265

PostgreSQL 10 is on its way

2017-05-03T22:45:00.001+02:00

A multitude of features and fascinating details.

The next major release of PostgreSQL is on its way. It's a huge release in terms of features, improvements and bugfixes.

First things first

Versioning has been changed to a simpler scheme using only two numbers: major and minor version. So, this will be PostgreSQL 10 followed by 10.1 and 10.2 and so on. Okay, now let's get to the real deal 😉

Logical Replication

The inclusion of pglogical as logical replication has been completed. So, PostgreSQL 10 will feature a publish-subscribe replication system which finally supports use-cases like zero-downtime upgrades, data consolidation à la data warehousing and scaling out read load; using one database technology only. I personally find this an important feat for the opensource project!

Parallel Support

Parallel support has been extended far beyond the initial parallel-enabling architecture (e.g. flagging parallel-safe functions) and its first implementation for sequential scans with hash join, nested loops and aggregate functions. Now PostgreSQL includes parallel btree index scans, parallel bitmap heap scans, parallel merge joins and parallel uncorrelated subqueries as well.

Hash Indexes

Finally, hash indexes are going to be fully featured citizens of PostgreSQL, thus the age-old warning against their usage in production is going to be removed.

Fulltext meets JSON

Other improvements will be fulltext searches performed on JSON datasets, plus more utility functions for JSON data. Basically the cornerstone of what almost all modern non-trivial websites need to do to provide a streamlined user experience combined with short development cycles.

In a series of upcoming posts, I am going to cover the most interesting aspects in detail by looking at specific commit sets.

Sven

PostgreSQL 9.6 Released

2016-09-29T23:38:00.000+02:00

We are finally there. There is a new major version of PostgreSQL - 9.6.

Here are the news, so go ahead and read what they've done for the community. It's just great!

Cheers!

PostgreSQL: Optimizing Aggregates

2016-09-07T23:00:00.002+02:00

Aggregation of vast energy: our sun.

It's time to write about PostgreSQL 9.6 again. With the Release Candidate 1 out in the wild, we slowly approach the end of a very interesting development cycle. Here I would like to talk about the development in the field of aggregates. So, two commits, 9/552 and 9/435, will improve the performance of queries using aggregate functions and GROUP BY clauses:

Title Combine Aggs: Serialize/Deserialize Internal aggregate states
Topic Server Features
Created 2016-02-29 04:08:46
Last modified 2016-04-08 17:49:52 (5 months ago)
Latest email 2016-04-09 10:13:21 (5 months ago)
Status2016-03: Committed
Authors David Rowley (davidrowley)
Reviewers Robert Haas (rhaas)Become reviewer
Committer Robert Haas (rhaas)
(full discussion)

Title Remove Functionally Dependent GROUP BY Columns
Topic Performance
Created 2015-12-02 10:23:32
Last modified 2016-02-11 22:52:25 (6 months, 4 weeks ago)
Latest email 2016-02-14 21:16:35 (6 months, 3 weeks ago)
Status2016-03: Committed
2016-01: Moved to next CF
Authors David Rowley (davidrowley)
Reviewers Julien Rouhaud (rjuju)Become reviewer
Committer Tom Lane (tgl)
(full discussion)

The former commit basically improves the performance of queries that look like that:

SELECT AVG(c), VARIANCE(c), SUM(c) FROM my_table;

Here they basically reduce amount of work to a single calculation for each row which previously would do redundant calculations for each result column. This speedup is due to sharing the internal state a common transition function of those aggregate functions.

Another extra is that sharing independent state variables across different aggregates allows parallelizinig the computation of these variables. In case of the functions above which share [count, sum, sum of squares], all three are independent.

The discussion around this topic started in the end of 2014 and ended in spring 2016. So, Robert Haas committed the patch and noted "Man, that was a lot of work" indicating the difficulties in implementing this infrastructure correctly.

The latter commit basically removed useless GROUP BY arguments, i.e. if they functionally depend on each other. So, the GROUP BY statement can be simplified. The PostgreSQL planner does this in 9.6 now. There are some quite impressive performance gains (up to 50%) with some rather small performance losses due to increased planning overhead.

Its main use-case are those users migrating from database systems which do not allow to omit redundant GROUP BY arguments when these appear in the SELECT clause.

That's it for now.

Sven

Followup: systemd user instances

2016-09-06T19:45:00.002+02:00

In this last post, I wrote about how to fix systemd user instances for older/broken systemd versions. Here, I'd like to explain how we managed to get the solution for, say, more than a single host where you can't do those changes by hand (at least not while keeping you sane and your customers happy).

In order to keep track here, we need to do the following:

generate an SSH key for root if missing
deploy that SSH key for the users in question if missing
deploy our-user@.service on the host
enable our-user@.service for the users in question

For that purpose, we built an RPM package (I guess the same will be possible with your favorite package management system as well), which looks like this:

[Unit]
Description=Alternative User Manager for %I
After=sshd.service

[Service]
ExecStartPre=/bin/bash -c "test -e ~/.ssh/id_rsa || ssh-keygen
                                                     -t rsa -N '' -f ~/.ssh/id_rsa"
ExecStartPre=/bin/bash -c "/usr/bin/ssh -oBatchMode=yes %I@localhost /usr/bin/echo
   || cat ~/.ssh/id_rsa.pub | /usr/bin/su -l %I -c 'tee -a ~/.ssh/authorized_keys'"

ExecStart=/usr/bin/ssh %I@localhost /usr/lib/systemd/systemd --user

Restart=on-failure

[Install]
WantedBy=default.target

NOTE: don't break the lines.

The noticeable difference to our first version is the addition of two additional ExecStartPre options which perform (1) and (2) of our TODO list. Especially (2) turned out to be very tricky using all sorts of shell magic.

The remaining points (3) and (4) requires us to perform remote execution powered by SaltStack's (also on github). This way, we can deploy our additional package hassle-freely on all affected hosts and with it the service unit. Enabling and starting the service (via salt) also performs steps (1) and (2) and we are all set!

Best,
Sven

What do you do when you need more systemd instances?

2016-08-23T21:24:00.000+02:00

You wanna make it there? You need to go there!

Today, we had a very interesting problem. We needed to have systemd run additional instances of itself to manage custom daemons. This works like the following.

You would need to enable "lingering" for the corresponding users:

loginctl enable-linger <user> # only once via root

# alternative: touch /var/lib/systemd/linger/<user>

After this, you can happily put your service file here:

mkdir -p ~/.config/systemd/user/
vim ~/.config/systemd/user/test.service

[Unit]
Description=test

[Service]
Type=simple
ExecStart=/bin/sleep 10000
Restart=on-abort

[Install]
WantedBy=default.target

Then, enabling/starting/stopping your brand-new service (and everything else you would expect from a proper task management) works like a charm, also after reboots.

systemctl --user enable test.service
systemctl --user start test.service
systemctl --user status test.service

(Please note the --user argument; run this under a non-root user.)

Well, life could be so easy if everything runs on Ubuntu 16.04 (or on a recent distribution for that purpose). In fact, not all our production servers do. There's a big portion of openSUSE 12.3 servers, which need special handling.

Once we brought our test setup described above to said servers, we noticed that

systemctl --user

fails with

Failed to issue method call: Process /bin/false exited with status 1

Not very helpful indeed, so we dug deeper. The core issue here is that there is simply no user instance of systemd running. This is what /lib/systemd/system/user@.service is good for. On my Ubuntu 16.04, user@1000.service is enabled and running, thus maintaining a second systemd instance just for my login user in addition to root's system systemd - commonly known as pid 1. If you stop user@1000.service, you'll notice that systemctl --user also fails.

In short, on openSUSE 12.3, the mechanism to start user instances of systemd is simply broken. Starting the user@.service results in a failure.

How to fix it for openSUSE 12.3?

The outlook of updating all production servers, made the following solutions unacceptable (due to possibility of errors or failures while updating and necessary reboots; basically headaches on steroids):

update systemd (will it reboot at all?)
change the PAM config (will we authenticate again?)
even deeper changes in Linux (no please)

So, we decided to stick with what we have and what as known to work properly.

How does it work anyway?

If one pays closer attention to the user@.service unit, we see what it actually does. Let's do the same in a shell:

user:~$ systemd --user

It works and systemctl works as well! Horray. :)
So, it should be a no-brainer for root, right?

root:~# su - user -c '/usr/lib/systemd/systemd --user'

Failed to create root cgroup hierarchy: Permission denied
Failed to allocate manager object: Permission denied

Uh, that's odd. Maybe, that's the reason why systemd cannot create another instance of itself. But what's the difference here? When does it work and when not?

Use SSH!

In the first try, we connected to the server via ssh. There, the authentication and session creation process is successfully finished and thoroughly tested. So, we decided to use ssh when writing the following our-user@.service unit file:

[Unit]
Description=Alternative User Manager for %I
After=sshd.service

[Service]
ExecStart=/usr/bin/ssh %I@localhost /usr/lib/systemd/systemd --user
Restart=on-failure

[Install]
WantedBy=default.target

Enabling this, made the whole systemd user instance magic work again for openSUSE 12.3 again.

Cheers,
Sven

Further readings about system vs user instances of systemd:
https://www.freedesktop.org/software/systemd/man/systemd.html#--system
https://www.freedesktop.org/software/systemd/man/systemctl.html#--user

PostgreSQL: Index-Only Scans with Partial Indexes

2016-07-15T17:29:00.000+02:00

Partial sun lurking out of the water.

Another posts of my PostgreSQL 9.6 series. This time, I am talking about commit 9/299. The complete discussion can be found here.

Title index-only scans with partial indexes
Topic Performance
Created 2015-07-10 18:32:48
Last modified 2016-03-31 22:00:55 (2 months, 4 weeks ago)
Latest email 2016-04-01 02:39:49 (2 months, 4 weeks ago)
Status
2016-03: Committed
2016-01: Moved to next CF
2015-11: Moved to next CF
2015-09: Moved to next CF
Authors Tomas Vondra, Kyotaro Horiguchi
Reviewers Kyotaro Horiguchi, Kevin Grittner, Konstantin Knizhnik
Committer Tom Lane (tgl)

This commit basically allows to use partial indexes to participate in index-only scans. Index-only scans, as the name suggests, use the corresponding index only to sift through the data thus being much faster than going back and forth to the related table.

However, some indexes, namely partial indexes, that cannot be used that easily for this kind of optimization. Partial indexes are indexes which include a WHERE clause in order to index a slice of the data only.

Quoting Tomas Vondra:

In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).

Initial discussion on this topic concerns the increase of complexity and runtime of the planning phase which is usually the case when the planner needs to act more and more intelligent. Properly discussed and being delayed three times, the patch was merged into the development branch of PostgreSQL by Tom Lane on 31th of March 2016.

Thanks again for another improvement on the performance of the PostgreSQL-Server.

Sven

PostgreSQL: Parallel Aggregate

2016-06-28T18:56:00.000+02:00

With PostgreSQL 9.6 looming on the horizon, I went out to sift through some of PostgreSQL's commitfests to find some interesting bits and pieces. This post is the start of a series covering commits of the next generation of the venerable database management system.

Outflow made parallel.

We all fancy performance improvements and concurrency especially in those days where servers tend to grow upper double-digit numbers of execution cores. As suspected earlier, we will now see a lot of movement in that direction as the foundation for parallel execution has been introduced. So, let's start with the following: Parallel Aggregate. See its key data below, from which you see that it's already been integrated successfully into the main branch of development:

Title parallel aggregate
Topic Server Features
Created 2016-02-29 00:15:35
Last modified 2016-03-21 13:36:54
Latest email 2016-03-22 05:47:25
Status 2016-03: Committed
Authors David Rowley, Haribabu Kommi
Reviewers Robert Haas (rhaas)
Committer Robert Haas (rhaas)

Robert Haas initially created an infrastructure for parallel execution in PostgreSQL (described in detail there) by adding a Gather node which spawns a number of workers to solve a parallelizable workload of an SQL execution plan. David Rowley and Haribabu Kommi extended this idea to aggregation which also allows parallel execution in certain situations.

First drafts can be found here. As noted there, the aggregate needs to indicate parallel support and to keep the implementation simple, they only implemented the most basic bits first. As such, most of the potential of parallelism still lies ahead of us. In the course of the review and the future development, some difficulties arose and some related issues needed to be handled. But in the end the commit went through and you can make use of it now.

This given, I just can say a huge "Thank You!" to the PostgreSQL team.

Best,
Sven

What is a path?

2016-03-31T16:33:00.000+02:00

Wisecracker.

pathlib is a provisional stdlib module. However, as the current threads (here, here, here and here) on python-ideas show, it is not as easy to work with as originally intended. Once you have a Path object, it's quite easy to use what Path offers which is a lot.

One big problem, though, is the interaction of Path objects with existing stdlib functions. Most of the later are string-consuming functions whereas the former are no strings at all. As far as I can see, this is one reason why pathlib lacks broader adoption and many agree. This situation leads to the following possible resolutions:

make Path objects compatible with strings (basically make them inherit from strings)
make existing stdlib functions accept Path objects (basically make them accept both and convert if needed)
do both but that seems superfluous

Solution 2 would also affect third-party libraries as noted here.

In order to decide appropriately, it becomes necessary to answer the following question:

What is a path in the first place?

Brett Cannon made a good point of why PEP 428 (that's the one which introduced pathlib as a stdlib module) deliberately chose Path not to inherit from string. I pondered over it for a while and saw that from his perspective a path is actually not a string but rather a complex data-structure consisting of parts with distinct meanings: literally the steps (of which a path consists) to a resource. The string which represents a path as most people know it is just that: a representation of a more complex object, just like a dict or a list. Let me make this a bit clearer.

I think we agree on the following: writing down 21 characters in a row is a string, right? So, what about these 21 characters?

{1: 11, 2: 12, 3: 13}

If you see that in a Python program (and presumably in many other modern programming languages), you associate that with a dictionary, mapping, hash, etc. So, these 21 characters are a mere representation of a complex object with a very rich functionality.

The following paragraphs summarizes what makes the discussion about paths and strings to hard. Depending on whom you ask there are different interpretations of what a path actually is.

Paths as complex objects and strings as their representation

Let's put this analogy to work with paths. If you come from a language that treats strings as file paths (like Python), you can imagine and categorize the facilities of pathlib like so:

pure path - operating on the path string
concrete path - operating on files corresponding to the given path

The classic "extract the file extension" issue is done easily with the pure path methods. Writing to a file is also easily done with concrete path operations. So, it seems paths are pretty complex objects with some internal structure and a lot of functionality.

Paths as monolithic object for addressing resources

The previous interpretation is not the only one. Despite all the fine functionality of extracting file extensions, concatenating parts to a larger path, etc., building a path is not an end in itself. When you got a path, it addresses a resource on a machine. When doing so for reading or writing that resource, you actually don't care about whether the path consists of parts or not. To you, it's a monolithic structure, an address.

But, you might say, each part of a path represents a directory in hierarchical file systems. Sure that is true for many file systems but not for all. Moreover, how often do you really care about the underlying directory structure? It needs to be there to make things work, of course. When it's there, you mostly don't care. How often do you need to create a subtree in an directory in order to create a single config file? I encounter this once in a while and to be honest: it sucks.

touch /home/me/on/your/ssd.conf will fail if the directory "on/your/" has not been created by somebody before me.

Especially for me, as a Web developer, it's quite hard to understand what purpose this restriction serves. Within a Web application the hierarchy of URLs is an emergent property not a prerequisite.

Users of git are accustomed to not-committing directories in. Why? Because it's unnecessary and the directory structure is again emerging from the files names themselves (aka from the content).

This said, it's rather cumbersome to attribute semantics to the parts of a string that happens to be separated by "/" or "\". At least to me, a path made of one piece.

What about security then?

One can further argue that Web development and git repositories are different here. There is a clear boundary where a path can lead. A URL path cannot address a foreign resource on another domain. git file paths are contained within the repository root.

See the common theme? There is a container from where the path of a resource cannot escape.

If you have a complete file system available at your fingertips, a lot harm can be done when malicious user input is concatenated unattendedly as a subpath; actually to address a resource within a container but misused to gain access to the complete file system.

I cannot say if the container pattern would work for everybody but it's definitely worth exploring as there are some prominent working examples out there.

Conclusion

I really like pathlib since it solves many frequently asked questions in the right way once and for all.

But I don't like using it as an argument again inheriting paths from strings saying paths have internal structure in contrast to strings. At least to me, they do not. That, on the other hand, does not necessarily mean inheriting path from string is a good idea but it makes it no worse one than it was before.

Best,
Sven

p-strings

2016-03-31T13:24:00.003+02:00

Currently, there is an interesting debate on python-ideas on the topic of "Would we like to add so-called p-strings to Python?". The p-string idea basically extends the f-string syntax which will be released in the upcoming Python 3.6.

The "p" in p-string stands for path and one of the alternative proposals is to add the following syntactic sugar to Python like this:

p'/someroot/{myvariable}/file.ext'

This basically is supposed to create a pathlib.Path object which allows all sort of convenience methods like extracting the extension or iterating over the parts of the path and more.

Despite the usability improvement, there are of course reservations about the idea. Mainly these are:

path vs. str (I will cover this in another post)
tying a syntax to a stdlib module (in this case pathlib)
security concerns about including user input (same discussion arose with f-strings back then)

If you find the proposal useful or have something else to contribute to the discussion, we look forward to seeing you on the mailing list. :)

Best,
Sven

Python makes you a worse programmer

2016-03-29T11:50:00.000+02:00

Thanks Luke for this interesting read: http://lukeplant.me.uk/blog/posts/why-learning-haskell-python-makes-you-a-worse-programmer/

It reminds me of English as it is substantially simpler than most other languages. What I've heard (from themselves) is that most native English speakers are not easily motivated to learn a second language. And that is although they know all the corresponding advantages like healthier brains, better first language, more interesting traveling, etc.

A good article about why to learn a second language: http://www.omniglot.com/language/articles/benefitsoflearningalanguage.htm

Safe Cache Invalidation

2016-03-22T22:22:00.001+01:00

Caches - as fragile as bubbles.

There are only two hard things in Computer Science: cache invalidation and naming things.

– Phil Karlton

And right he is. Both is true for the package that I would like to present in this post. Based on functools.lru_cache, it allows you to specify when the caches should be invalided. In the absence of a proper name for this kind of functionality, I called it xcache, analogous to xheap and xfork.

You can find the source at github and the pre-built package on PyPI.

What's in the package?

The purpose of xcache can be explained best using an example. Imagine you have a function like the following:

@lru_cache
def math_func(a, b, c):
    return ....

Let's assume this is one of those proper mathematical functions you know from school or something built upon those. That means, beside all parameters being one letters, the same input yields the same output, now and forever. It further means that an LRU cache can be used to speed up this function enormously without compromising readability.

But, since there is an eternal battle going on between mathematicians and computer scientists, you don't write all your functions in this manner. Even certainly, most of the functions you've written and you are about to write will have side-effects which will inevitably lead to wrong results the longer you cache those.

In short, the associated RLU caches should be invalidated once in a while to maintain the proper output of not-so-mathematical functions. This is where xcache comes in. It allows to invalidate caches in two ways:

using automatic memory management (aka garbage collection)
using context managers (aka with or @)

The following examples illustrate the use-cases by attaching RLU caches to the lifespan of a Web request. Normally each request is handled within its own transaction, so most of the data used can be considered constant while handling the request. As soon as the request is finished, the transaction ends (committed or rollback); thus the caches should be invalidated since another concurrently executed request might have changed the underlying data.

Invalidation via Memory Management

some preparation

from xcache import cached_gen

request_cache = cached_gen(lambda: request) # create new cache wrapper

@request_cache()
def check_permission(user, obj):
    return ...

invalidation happens magically

request = ..... # where ever you get your request from

objs = .... # list of some objects
if any(check_permission(request.user, obj) for obj in objs):
    print(result_success)
else:
    print(result_deny)

request = .... # another request; all request caches are invalidated

NOTE: we generally attach the request object to some thread-local object, so ref_cache_gen can access it regardless of context.

Invalidation via Context Manager

preparation again

from xcache import cached

@cached()
def check_permissions(user, obj):
    return ...

explicit invalidation

from xcache import clean_caches

for request in request_list:
    with clean_caches():        # start with empty caches
        objs = .... # list of some objects
        if any(check_permission(request.user, obj) for obj in objs):
            print(result_success)
        else:
            print(result_deny)  # after this line caches are empty as well

NOTE: using clean_caches you can even specify to which object caches should be attached to.

Conclusion

RLU caches are very useful as is cache invalidation. Thus, you might find xcache to be a low-overhead addition to your caching libs. Check out the docs for more options and use-cases. You can plug in all rlu_cache-compatible cache implementations into xcache, cf. cachetools.

Best,
Sven

Even Faster Heaps

2016-03-08T23:19:00.001+01:00

An ambulance rushing by.

Heaps are about performance. So, it is time to make xheap faster again. After realizing that the actual slowdown of RemovalHeap and XHeap does not simply stem from the general overhead but from NOT using the C implementation at all, I decided to change that.

Here's an update of the benchmark. Compared to its predecessor, the change was quite a success. The removal capability accounts for an 4x slowdown now compared to its prior 50x. Furthermore, I could improve the testsuite considerably and as expected I needed to fix some bugs then.

The faster and better version of xheap can be obtained from PyPI and the source can be found at github.

operation 1,000 items 10,000 items 100,000 items 1,000,000 items

init heapq 0.03 ( 1.00x) 0.41 ( 1.00x) 4.45 ( 1.00x) 64.37 ( 1.00x)
Heap 0.03 ( 1.02x) 0.42 ( 1.03x) 4.46 ( 1.00x) 64.36 ( 1.00x)
RemovalHeap 0.05 ( 1.57x) 0.62 ( 1.53x) 7.94 ( 1.79x) 101.16 ( 1.57x)

pop heapq 0.01 ( 1.00x) 0.07 ( 1.00x) 0.85 ( 1.00x) 10.32 ( 1.00x)
Heap 0.01 ( 1.46x) 0.10 ( 1.38x) 1.13 ( 1.33x) 13.20 ( 1.28x)
RemovalHeap 0.02 ( 4.18x) 0.24 ( 3.51x) 2.61 ( 3.07x) 28.37 ( 2.75x)

push heapq 0.00 ( 1.00x) 0.03 ( 1.00x) 0.34 ( 1.00x) 3.58 ( 1.00x)
Heap 0.01 ( 1.82x) 0.06 ( 1.85x) 0.61 ( 1.81x) 6.38 ( 1.78x)
RemovalHeap 0.01 ( 2.83x) 0.09 ( 2.91x) 0.97 ( 2.86x) 9.81 ( 2.74x)

init heapq 0.14 ( 1.00x) 1.60 ( 1.00x) 24.00 ( 1.00x) 271.42 ( 1.00x)
OrderHeap 0.17 ( 1.26x) 1.88 ( 1.18x) 26.80 ( 1.12x) 299.55 ( 1.10x)
XHeap 0.19 ( 1.42x) 2.10 ( 1.31x) 30.14 ( 1.26x) 332.56 ( 1.23x)

pop heapq 0.01 ( 1.00x) 0.15 ( 1.00x) 1.83 ( 1.00x) 22.37 ( 1.00x)
OrderHeap 0.02 ( 1.80x) 0.24 ( 1.60x) 2.73 ( 1.50x) 31.36 ( 1.40x)
XHeap 0.03 ( 2.66x) 0.33 ( 2.25x) 3.69 ( 2.02x) 41.03 ( 1.83x)

push heapq 0.00 ( 1.00x) 0.04 ( 1.00x) 0.54 ( 1.00x) 5.67 ( 1.00x)
OrderHeap 0.01 ( 3.97x) 0.15 ( 3.70x) 1.62 ( 3.03x) 16.46 ( 2.90x)
XHeap 0.01 ( 3.28x) 0.12 ( 3.06x) 1.38 ( 2.58x) 13.94 ( 2.46x)

remove RemovalHeap 0.02 ( 1.00x) 0.19 ( 1.00x) 2.18 ( 1.00x) 22.62 ( 1.00x)
XHeap 0.02 ( 0.90x) 0.17 ( 0.89x) 1.72 ( 0.79x) 17.60 ( 0.78x)

Kudos to Srinivas who proposed the mark&sweep approach in the first place, especially the sweeping condition which allows an amortized runtime of O(log n) for pop, push and remove.

Best,
Sven

Raymond Tomlinson, the inventor of email, died

2016-03-06T22:40:00.003+01:00

Raymond Tomlinson invented one of the most famous technologies of today: email.

He died on Friday.

Read more about him on Ars: http://arstechnica.com/business/2016/03/e-mail-inventor-ray-tomlinson-who-popularized-symbol-dies-at-74/

LRU Caches

2016-03-02T22:06:00.000+01:00

Precious little pieces preserving the balance of nature.

Python features LRU caches. For this purpose, the decorator @functools.lru_cache is provided. You can configure the size of the cache as well as whether equal arguments of different types should be distinguished.

RLU stands for "least recently used", i.e. if the maximum size of the cache has been reached and a new item is to be inserted, the item with the oldest access timestamp will be discarded to make room for the new resident. The cache size can be unlimited which especially useful for short running scripts.

Let's get our hands dirty:

from time import time

def fib(n):
    return fib(n-1) + fib(n-2) if n > 1 else 1

for j in range(0, 40, 5):
    a = time()
    f = fib(j)
    b = time()
    print('{t:10.8f} fib({j})={f}'.format(t=b-a, j=j, f=f))

Which results in:

0.00000048 fib(0)=1
0.00000048 fib(0)=1
0.00000238 fib(5)=8
0.00001502 fib(10)=89
0.00017858 fib(15)=987
0.00217247 fib(20)=10946
0.01955438 fib(25)=121393
0.21021986 fib(30)=1346269
2.30364680 fib(35)=14930352

Runtime increases dramatically.

What if you really need fib(100)? It seems you are screwed then, right? Let's see how lru_cache remedies the situation here:

from functools import lru_cache

@lru_cache(maxsize=None)  # unlimited cache size
def fib(n):
    return fib(n-1) + fib(n-2) if n > 1 else 1

for j in range(0, 1000, 100):
    a = time()
    f = fib(j)
    b = time()
    print('{t:10.8f} fib({j})={f:e}'.format(t=b-a, j=j, f=f))

print(fib.cache_info())

Et voilà:

0.00000381 fib(0)=1.000000e+00
0.00016809 fib(100)=5.731478e+20
0.00013471 fib(200)=4.539737e+41
0.00013041 fib(300)=3.595793e+62
0.00014305 fib(400)=2.848123e+83
0.00013804 fib(500)=2.255915e+104
0.00013733 fib(600)=1.786845e+125
0.00017452 fib(700)=1.415308e+146
0.00013733 fib(800)=1.121024e+167
0.00014663 fib(900)=8.879303e+187
CacheInfo(hits=907, misses=901, maxsize=None, currsize=901)

Handsome execution times even for very large Fibonacci numbers.

Conclusion

performance gained - yay
intention of the implementation preserved - yay
recursion depth issue not solved - try range(0, 10000, 1000)

That's for now. There'll be another post covering automatic cache invalidation. Stay tuned.

Best,
Sven

DROWN—Yet Another Vulnerability of TLS

2016-03-01T21:31:00.003+01:00

Yet another vulnerability of TLS has been discovered even affecting the latest version 1.2 as Ars wrote:

http://arstechnica.com/security/2016/03/more-than-13-million-https-websites-imperiled-by-new-decryption-attack/

Best,
Sven

Designing xfork

2016-03-01T20:51:00.002+01:00

Recently, I came to know a small team working on a problem which they try to solve by using threads. As expected, problems popped up soon and development slowed down considerably. So based on the previous post, I would like to lay out my intentions and design decisions regarding xfork, a module I've written and actively maintain in analysis of the newly introduced async/await syntax.

Concurrency is a hard engineering problem.

Take it seriously and even consider not being concurrent a valid option.

Design Assumptions

I created xfork from the following observations based on my own experience. Developers usually:

don't understand 100% of the problem's domain.
need a simple approach to get things right.
understand code written in a sequential style.
don't know what environment their code is running on such as:

How many cores has the target system?
How many processes are allowed on/would wrestle the target system down?
How much memory has the target system?
How often will the code be re-used and re-executed?

Each observation will be addressed by a following section.

Background Tasks

The observation 1 stems from some pretty basic human property. So, let me put it bluntly:

we don't want processes
we don't want threads
we don't want coroutines

What we really want is faster execution. Parallel (or at least concurrent) execution is just a means to an end here. In turn, processes, threads and coroutines are just a means to parallel execution. So, we better build some abstraction which is actually closer to the developers problem: faster execution.

Let's start by calling units of execution which can run independently "background tasks" or simply "tasks".

Task Hierarchy

In order to address assumption 2, something to structure a collection of tasks is needed.

A software developer is just a normal guy who needs simple solutions for his job. Something that has emerged several times throughout of human history are hierarchies. As humans are concerned, they understand hierarchies pretty well. Most companies are structured this way, your folder and files system is probably a hierarchical one, as is your governmental system or the process tree of your computer, tablet or smartphone.

To put it simply, a hierarchy is a layered system—so you only care about the layer above and below you—and one layer is represented by a single representative—so you greatly simplify the communication to the layers above and below. These two properties made hierarchies quite successful so far.

This said, we go with a hierarchical system for now when it comes to concurrency. That means, there is one task managing a bunch of independent and similar tasks. Managing basically subsumes task creation, result collection and result processing.

Functions as Tasks

xfork has been designed to address observation 3 and to take the warning from the beginning seriously. A main goal was to make hopping back and forth from sequential to concurrent style of programming as easy as possible.

The most basic concept, developers usually understand are functions. Thus, they act as kind of a bridge between the two worlds. A function can be executed either by waiting for its result (sequential style) or by submitting it to a background worker and requesting its result at a later point (concurrent style).

This will especially be clear when working with a large legacy code-base. You might finally consider using concurrent approaches to speed things up but a complete rewrite is out of question. One does not simply throw away large collection of already working functions.

Task Management

This job should be done for you by xfork. It should take care of the question whether to create a thread or a process for a background task. Moreover, the number of processes and threads needs to be managed without developer interaction by creating and closing them down for you on the fly and according to the machines capabilities.

When should a task be a process, when should it be a thread and when should it be implemented as a coroutine running in an event loop? The last post gives some pretty simple explanation for this. Processes utilize the multicore architectures of today's computers, so are suitable for CPU-bound tasks. Coroutines are designed to wait for I/O efficiently, so I/O-bound tasks are their use-cases. Threads are located somewhere in the middle especially when it comes to the GIL of CPython. So right now, they apply for the I/O-bound side of tasks.

This said, the main exercise for a developer using xfork is actually thinking of whether their function is I/O-bound or CPU-bound and whether it is thread-safe or not.

Conclusion

All observations being addressed, I think it's time to make a break. A next post will investigate the current implementation of xfork.

Best,

Sven

Concurrency in Python

2016-02-23T19:57:00.000+01:00

More speed by having two rails. Not always true.

Last year, PEP 0492 got accepted, which introduced coroutines and async/await to Python. During that time, I started subscribing to some Python mailing lists and participated in discussions since then. I wondered how ordinary Python developers can write code that can be executed in parallel or at least concurrently. Specifically regarding asyncio (coroutines) and concurrency in general, we got a survey compiled which I want to record here.

Improving Performance by Running Independent Tasks Concurrently - A Survey

	Processes	Threads	Coroutines
purpose	cpu-bound tasks	cpu- & i/o-bound tasks	i/o-bound tasks
customizable	no	no	yes
controllable	no	no	yes

managed by	os scheduler	os scheduler + interpreter	event loop
parallelism	yes	no	no
switching	at any time	after any bytecode	at user-defined points
shared state	no	yes	yes

startup time	biggest/medium*	medium	smallest
CPU overhead**	biggest	medium	smallest
memory overhead	biggest	medium	smallest

pool class	multiprocessing.Pool	multiprocessing.dummy.Pool	asyncio.BaseEventLoop
solo class	multiprocessing.Process	threading.Thread	asyncio.coroutine

* biggest - on Windows and if using 'spawn' ('fork'+'exec'); medium - if using 'fork' alone
** due to context switching

I started this little survey out of curiosity and professional needs to speed up our Python production systems. What I can tell from the experience gained in that field is that you basically need to ask yourself two questions:

Does this code run faster with concurrency?
Is the code cpu-bound or i/o-bound?

Writing and maintaining concurrent code always makes many brains hurt. So, if you don't have any significant improvement, leave your code alone. If you still want to, you then either need to settle for processes (having cpu-bound tasks) or threads/coroutines (having i/o-bound tasks). As usual, your mileage may vary (also considering the table).

In the course of the thread, Steve Dower responded very ingeniously (here and here) which was my main driver to extend the survey. It explains certain values in the table quite easily, so I am going to quote him here for your convenience.

Steve Dower's "Let's Bake a Cake"

Let's say you are making a cake. There are two high-level steps involved:

Gather all the ingredients

Mix all the ingredients

Bake it in the oven

You are personally required to do steps 1 and 2 ("hands-on"). They takes all of your time and attention and you can't do anything else simultaneously.

For step 3, you hand off the work to the oven. While the oven is baking, you are basically free to do other things.

In this analogy, "you" are the main thread and the oven is another thread. (Thread and process are interchangeable here in the general sense - the GIL in Python is practicality that makes processes preferable, but that doesn't affect the concepts.) Steps 1 and 2 are CPU bound (as far as "you" the main thread are concerned), and step 3 is IO bound from "your" (the main thread's) point-of-view.

Step 3 requires you to wait until it is complete:

You can do a synchronous wait, by sitting and staring at the oven until it's done.

You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet.

You can use a signal/interrupt, when the oven is ready, regardless of whether you are ready to handle the interruption (but note: you know that the oven is done without having to walk over and check it).

Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished. (and if you never interrupt yourself, the oven never makes a sound)

This last option is most efficient for you, because you aren't interrupted at awkward times (i.e. greatly reduced need for locking on shared state) but you also don't have to walk all the way over to the oven to check whether it is done. You pause, listen, and get straight back to work if the oven is still going. That's the core feature of asyncio - not the networking or subprocess support - the ability to be notified efficiently that a task is complete without being interrupted by that notification.

Now let's expand this to making 3 cakes in parallel to see how "parallelism" works. Since there's so much going on, we'll create a TODO list:

Make cake #1

Make cake #2

Make cake #3

(This means we've started three tasks to the current event loop. It's likely these are three external requests from clients, such as HTTP requests. It is possible, though not common in my experience, for production software to explicitly start with multiple tasks like this. More common is to have one task and a UI event loop that injects UI events as necessary.)

Task 1 is the obvious place to start, so we take that off the TODO list and start working on it. The steps to make cake #1 are:

Gather ingredients for cake #1

Mix ingredients for cake #1

Bake cake #1

Gathering ingredients is a synchronous operation (`def gather_ingredients()`) so we do that until we've gathered everything.

Mixing ingredients is a long, interruptible operation (`async def mix_ingredients()`, with occasional explicit `await yield()` or whatever syntax was chosen for this), so we start mixing and then pause. When we pause, we put our current task on the TODO list:

Make cake #2

Make cake #3

Continue mixing cake #1

We see that our next task is to make cake #2, so we repeat the steps above and eventually pause while we're mixing. Now the TODO list looks like:

Make cake #3

Continue mixing cake #1

Continue mixing cake #2

And this continues. (Note that selecting which task to continue with is a detail of the event loop you're using. Check the spec to see whether some tasks have a higher priority or what order tasks are continued in. And bear in mind that so far, we've only used explicit yields - "I'm ready to do something else now if something needs doing".)

Eventually we will finish mixing one of the cakes, let's say it's cake #1. We will put it in the oven (`await put_in_oven()`) and then check the TODO list for what we should do next. There's nothing for us to do with cake #1, so our TODO list looks like:

Continue mixing cake #2

Continue mixing cake #3

Eventually, the oven will finish baking cake #1 and will add its own item to the TODO list:

Continue mixing cake #2

Continue mixing cake #3

Cake #1 is ready

When we take a break from mixing cake #2, we will continue mixing cake #3 (again, depending on your event loop's policy with regards to prioritisation). When we take a break from mixing cake #3, "Cake #1 is ready" will be the top of our TODO list and so we will continue with the statement following where we awaited it (it probably looked like `await put_in_oven(); remove_from_oven()` or maybe `baked_cake = await put_in_oven(mixed_ingredients)`).

Eventually our TODO list will be empty, and so we will sit there waiting for something to appear on it (such as another incoming request, or an oven adding a "remove cake" item).

Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work.

-----"But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread)."

Because that is the wrong equality - it's really 1 baker = 1 thread.

Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication).

You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters).

So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly.

I further want to thank everybody who participated in discussing the matter and thus improved the survey enormously with a lot of patient explanations and insightful details.

There will be another post covering a module, I've written back then and which I maintain actively. It was and still is a way of digesting the concurrency matter in an attempt to improve usability and reduce boilerplate in Python.

Best,
Sven

My Python IDE Journey

2016-02-19T17:02:00.000+01:00

Pick one.

This post is not intended as advertising but to illustrate my journey to my currently used Python IDE. I tried several ones in recent years due to educational and professional needs as well as to satisfy my curiosity.

First Stop

Everything starts with gedit, nano and vim, right? Not quite full IDEs but it's a start. You can at least write code and have some syntax highlighting available. Until today, a colleague of mine uses vim with tons of plugins featuring "go to definition", "find usages", "code completion", "project nav tree", etc. So, it's quite possible to work with simple editors and enhance them indefinitely.

As you can imagine, I was looking for something else which goes beyond the venerable terminal. So, I started looking for an alternative with the following properties (in its order of priority):

out-of-the-box experience
mouse usage where appropriate
fewer keystrokes and mouse clicks
faster search without manual indexing
configurable executions (run unittests, run scripts, etc.)
debugging with most important things on a glance
introspection

Second Stop

I made my second stop at Spyder. It is an open-source project providing almost all the basic needs described above. In general it feels like Eclipse but is much simpler, cleaner and more thought-out to my taste.

Spyder stands for Scientific PYthon Development EnviRonment and as expected is best suited for scientific tasks such as researching. So, it handles small and informal scripts quite well for performing data transformations, evaluation and plotting diagrams.

Spyder works reliably given you've installed all necessary third-party dependencies such as pyflakes and rope. An up-to-date list can be obtained from here. Furthermore, if you are inclined to use numpy and scipy to live up to Spyder's name, you also need compiling tools available (at least my machine does it with Ubuntu 14.04 installed). So, the out-of-the-box experience, somewhat impaired, is still way above most multi-purpose editors.

Further not required but nice-to-have features are available: an integrated profiler, static code analysis custom color/font schemes and an object inspector (aka "show me the doc string").

After a year or so working with Spyder, the journey resumed to satisfy also the following emerging requirements:

integration of version control (such as local history and svn+git)
better usability and more convenience
integrated bash

Third Stop

Another colleague of mine showed that JetBrains (the maker of ReSharper) open-sourced their community edition of PyCharm. So, since this was another requirement for me, I gave it a try and I fall in love with it instantly. The people at JetBrains just know how to do their craftsmanship. PyCharm is a beautiful IDE with literally tons of features. It can handle professional workloads with massive amounts of files, yet is usable and convenient.

It definitely brings you a solid out-of-the-box experience. Thus, if you don't care or don't want to bother with installing any third-party library just to get a decent IDE, PyCharm is your choice.

In fact, the first-mentioned colleague got so inspired by PyCharm that he went out to give every possible vim plugin a try to replicate PyCharm's productivity features. He has not given up yet, but his efforts brought him massive increase of productivity even though working on a regular terminal session.

This one also solved the missing SCM and local code history integration. Furthermore, its usability is beyond good and evil and anything I've seen so far. Using it makes me feel free and I have to admit not using it make me feel a lot slower. But don't forget about what I wrote about IDEs in general last time. Last but not least, if you ever need a terminal, it's right there at your fingertips running in the correct directory.

Fourth Stop

Quite recently, in an attempt of life-long learning I made a short trip to some interesting piece of technology: IPython notepad. It's not a traditional Python IDE but it can improve your productivity given the right workload.

Just think of it as an interactive Python session (where you can also execute blobs of Python code) which you be able to resume later. The notepad stores the Python source and its corresponding output (prints, tables, diagrams, ...) once executed. Thus you can examine the results later or re-execute the code if there's been some changes to the data.

This way, it makes it a perfect tool for scientific usage and because of its simplicity it's even more suitable than Spyder in my opinion. If you want to give it a try and see if it would complement your current workflows, have a look at this pandas cookbook for some wild number crunching experience.

See you next time around,
Sven

PS: After writing this post, I felt inclined to install a Spyder again to see where the project has come to. It turns out it is actively maintained and the folks of Spyder make great progress. Still, the installation procedure is still feels brittle and quite manual aka error-prone. So, I wish them good luck and hope we will see some serious competition in the Python IDE area.

Let's go down the rabbit hole!

2016-02-18T18:23:00.000+01:00

Things can be topsy-turvy when considered upside down.

As mentioned in the previous post, there is an interesting and at the same time weird piece of code duplication in RemovalHeap and XHeap that is necessary to make them work properly. This post will cover this oddity in depth.

Imagine you want to count the number of items being set in a list. So, instead of providing a native list object, you write your own class like this:

class MyList(list):
    count = 0
    def __setitem__(self, key, value):
        self.count += 1
        super(MyList, self).__setitem__(key, value)

ml = MyList([0])
for i in range(10):
    ml[0] = -i

print(ml.count) # print 10

Sounds good, right? Now, RemovalHeap and XHeap do exactly this when they keep track of the index of an item. Instead of counting, though, they store the new index in a dictionary. This is necessary for fast item removal, i.e. runtime of O(log n).

So far so good and this could be the end of the story if that were the only necessity for fast item removal. However, it turned out it isn't.

When we now apply heappop on our counting class, it turns out it doesn't work anymore:

from heapq import heappop

ml = MyList(range(10))
heappop(ml)
print(ml.count) # print 0

No changes at all? That cannot be right.

It seems we don't jump back into Python code. You might say, this is due to the underlying C implementation of heappop but have a look. Let's copy the Python source from heappop and call it my_heappop:

def my_heappop(heap):
    from heapq import _siftup
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

As you can see, it just delegates work to _siftup and that will do the trick:

ml = MyList(range(10))
my_heappop(ml)
print(ml.count) # print 6

Now, there have been 6 changes to our heap when popping off the first item.

What's wrong here? As one can see, Python does in fact hop back and forth intertwined Python and C code. Why the public API of heapq doesn't work as expected whereas the private one does, is unclear to me. This peculiar behavior of heappop also holds for heappush, heapreplace and heappushpop. As usual I am open for suggestions and explanations.

In any case, that is the reason why RemovalHeap and XHeap duplicate some parts of heapq and now you know.

Best,
Sven

NOTE: I could reproduce this behavior for Python 2.7.6 and Python 3.4.3.

The xheap Benchmark

2016-02-16T21:16:00.000+01:00

These are the inlets of a steam engine. That means, it's time to perform some serious measurements!

We are going to compare xheap and heapq. The benchmark suite can be found right by the source.

The Competitors

heapq - collections of heap functions of Python stdlib written in C
xheap - object-oriented wrappers for heapq

The Benchmark of Runtime

In order to make things comparable, I split the benchmark up into two benchmark cases. Both cases differ by the fact that customizing the ordering of a heap always requires more work. The second benchmark takes that into account.
Both cases run heapify, pop and push 10,000 times and calculate the minimum of these runtimes (in milliseconds) which you see presented below. Depending on which item is popped or pushed, runtimes can vary. Because of that, we start with a heap of size X, then popping or pushing X/32 items.
Each case further has a baseline which is heapq obviously. I used timeit at suggested here; the benchmark ran on Python 3.4.

1) heapq vs Heap vs RemovalHeap

operation 1,000 items 10,000 items 100,000 items 1,000,000 items

init heapq 0.03 ( 1.00x) 0.41 ( 1.00x) 4.50 ( 1.00x) 64.83 ( 1.00x)
Heap 0.03 ( 1.02x) 0.42 ( 1.02x) 4.47 ( 0.99x) 64.70 ( 1.00x)
RemovalHeap 0.11 ( 3.35x) 1.32 ( 3.23x) 17.88 ( 3.98x) 222.94 ( 3.44x)

pop heapq 0.01 ( 1.00x) 0.07 ( 1.00x) 0.86 ( 1.00x) 10.05 ( 1.00x)
Heap 0.01 ( 1.45x) 0.09 ( 1.33x) 1.08 ( 1.26x) 12.73 ( 1.27x)
RemovalHeap 0.27 (51.73x) 3.68 (52.28x) 44.35 (51.80x) 517.98 (51.54x)

push heapq 0.00 ( 1.00x) 0.03 ( 1.00x) 0.34 ( 1.00x) 3.65 ( 1.00x)
Heap 0.01 ( 1.81x) 0.06 ( 1.83x) 0.61 ( 1.82x) 6.46 ( 1.77x)
RemovalHeap 0.08 (23.66x) 0.66 (19.65x) 6.48 (19.20x) 67.46 (18.46x)

As expected the bare C implementation outperforms any wrapper lib written in Python. However, the performance penalty is quite small compared to what we gain through cleaner code and better maintainability when using Heap.

The current implementation of RemovalHeap definitely runs slower as it needs to keep track of changing indexes. Perhaps, that is a good reason to rethink its implementation and switch to a mark-and-sweep approach. So, this benchmark is a good starting point for the future. Additionally, this shows that if you don't really need removal, you better stick to Heap.

2) heapq + tuples vs OrderHeap vs XHeap

operation 1,000 items 10,000 items 100,000 items 1,000,000 items

init heapq 0.14 ( 1.00x) 1.60 ( 1.00x) 24.16 ( 1.00x) 272.64 ( 1.00x)
OrderHeap 0.17 ( 1.26x) 1.90 ( 1.19x) 27.20 ( 1.13x) 302.80 ( 1.11x)
XHeap 0.29 ( 2.16x) 3.18 ( 1.99x) 50.32 ( 2.08x) 553.61 ( 2.03x)

pop heapq 0.01 ( 1.00x) 0.15 ( 1.00x) 1.82 ( 1.00x) 21.85 ( 1.00x)
OrderHeap 0.02 ( 1.75x) 0.23 ( 1.58x) 2.72 ( 1.49x) 30.59 ( 1.40x)
XHeap 0.28 (25.25x) 3.84 (25.98x) 46.32 (25.45x) 540.32 (24.73x)

push heapq 0.00 ( 1.00x) 0.04 ( 1.00x) 0.54 ( 1.00x) 5.69 ( 1.00x)
OrderHeap 0.01 ( 4.00x) 0.15 ( 3.68x) 1.65 ( 3.08x) 16.50 ( 2.90x)
XHeap 0.04 ( 9.55x) 0.35 ( 8.79x) 3.85 ( 7.16x) 39.49 ( 6.94x)

Generally speaking, the overhead of OrderHeap and XHeap to provide custom orders diminishes when we hit bigger and bigger sizes. This makes sense as the overhead basically consists of creating and unpacking tuple values on the fly and super calls only once per operation; thus is constant.

The Benchmark of Removal

This particular feature cannot be compared using heapq, Heap and OrderHeap since they simply don't feature removal. For future reference and for providing guidance, I still present the benchmark results comparing RemovalHeap and XHeap to each other. As an attempt to equalize item comparison, RemovalHeap will be fed by tuples this time. RemovalHeap is the baseline.

operation 1,000 items 10,000 items 100,000 items 1,000,000 items

remove RemoHeap 0.12 ( 1.00x) 1.26 ( 1.00x) 13.28 ( 1.00x) 149.50 ( 1.00x)
XHeap 0.12 ( 0.94x) 1.19 ( 0.95x) 12.16 ( 0.92x) 124.02 ( 0.83x)

The Benchmark of Comparisons

Regarding heapify, pop and push, xheap does not perform even a single item comparison more than heapq. As you can infer from the source, Heap and OrderHeap leave it completely to heapq for efficiency reasons.

RemovalHeap and XHeap, on the other side, have a peculiarity that require them to perform at least some of the item comparisons on their own. Though, the amount of comparisons stay the same as the source is a simple copy from heapq. Another post will cover why this is necessary.

Conclusion

As suspected in the previous post, both convenience and features come at price. So, in order to preserve your initial goals when using heaps (= speed), your best choice is the feature-poorest variant. Most of the overhead is constant per operation since it's consists of wrapper, super and descriptor calls. Given the current optimization efforts of CPython, this kind of overhead can be reduced even further without changing xheap itself.

I would like to express my gratitude to the Python community for providing heapq. Without this amazing library the development of xheap hadn't been possible.

As usual, I am open for suggestions of how to improve the benchmark. To see how xheap performs on your machine, you can simply execute test_xheap_time.py from the xheap repository.

A beautiful steam engine running at full speed. You can read more about it here.

Best,
Sven

Fast Object-Oriented Heap Implementation

2016-01-30T21:17:00.004+01:00

There comes light to the darkness of your heaps.

This is the third post of a series of heap-related ones. See here and here for the back story.

In the last post, we found the heapq module lacking important features. Average Joe Dev doesn't want to clutter up his source code and and re-implement the same features all over the place to rectify the shortcomings of heapq. Understandably, Python core devs don't want to compromise on the performance of heapq either—being fast is the mission of a heap.

One issue with all proposals and implementations I've seen so far, is either their reduced feature set or their lack of performance. So, I went out to fix that once again and (I hope) for all.
What's different this time? Not much, I suppose, except that I made the following observation. Past implementations provided a single feature-enhanced and object-oriented implementation. So, you get object orientation but with an unnecessary slowdown. One solution would be to utilize heapq alongside with a such a feature-rich heap class. Then again using different interfaces for almost the same thing usually produces headaches in production. So, this approach isn't quite optimal.

If you need a feature, you will implement the corresponding logic (and slowness) anyway. From what can tell, cluttering your source does not make you and your program faster in the long run. Thus, hiding complexity in a feature-specific class make sense. Taking into account the performance penalty produced by more features, it makes further sense to split things up into different implementations.

So, I came to the conclusion that this problem cannot be solved by a single heap implementation but by a feature-complete heap suite: different classes with the same base interface for different use-cases.

xheap is such a heap suite. I further provided you with a repo at github to be used as an issue tracker. The remainder of this post will examine each class provided by xheap; a next post will investigate their performance.

Heap

The heap interface is pretty standard these days:

peek - get first item
push - insert an item
pop - remove first item
pushpop - push then pop; but faster
poppush/replace - pop then push; but faster

You somehow also expect it to work like an array/list (cf. heap invariant). All variants of Heap will support that interface. Some demo code:

from xheap import Heap

heap = Heap([5, 4, 3, 1])  # make heap from list
heap.push(2)
heap.pop()              # returns 1
heap.peek()             # returns 2
heap[0]                 # returns 2
heap.pushpop(6)         # returns 2
heap.replace(7)         # returns 3

Benefit? It's object oriented! And as fast as heapq since it's a thin wrapper.

You can replace it with a more feature-rich (and potentially slower) version if needed. And by the time that happens, the only thing you need to change is using another heap class that provides the wanted features.

OrderHeap

You guessed it: you can specify how items are compared. The key-parameter (analogous to builtin functions max or sorted) is specified during heap initialization:

from xheap import OrderHeap

heap = OrderHeap(key=lambda x: -ord(x))  # define the order
heap.push('a')
heap.push('z')
heap.push('t')
heap.pop()       # returns z

Benefit? Auto-wrapping new items into tuples by which the heap is sorted internally. You would have to re-implement it anyway in all places (without making a mistake btw.), so the heap class is the best location where such logic naturally belongs to.

Why not setting the key, each time you push an item? Right now, I couldn't imagine such a use-case. From what I can tell, the key is always derived from the item itself. So, a tuple might suffice for now; if you feel that's not quite true for your project, let me know so I can tweak OrderHeap to your needs.

Btw. pull requests are welcome as well. ;-)

RemovalHeap

Pretty obvious as well: you can remove an item from anywhere in the heap without manually keeping track of indexes.

Benefit? Keeping track of indexes, is not implemented easily—especially outside of a class. It further requires special tweaking to Heap.pop to enable removing an item from the middle of the heap.

There is an alternative approach using periodic sweeps but hey, the concrete implementation is encapsulated within the heap class. So, if I ever feel like changing it for an better approach, nobody will notice.

It's demo time:

from xheap import RemovalHeap

heap = RemovalHeap(['z', 'u', 'd', 'a'])
heap.remove('d')

XHeap

For the problem described here, I need both properties. Thus, XHeap is basically a conjunction of OrderHeap and RemovalHeap: you can remove items AND you can define arbitrary orderog.

What's left is investigating how fast or slow each class is compared to Python's original heapq module. To be continued …

Best,
Sven