masnun.rocks()

Interfaces in Python: Protocols and ABCs

Sat, 15 Apr 2017 15:55:18 +0600

The idea of interface is really simple - it is the description of how an object behaves. An interface tells us what an object can do to play it’s role in a system. In object oriented programming, an interface is a set of publicly accessible methods on an object which can be used by other parts of the program to interact with that object. Interfaces set clear boundaries and help us organize our code better. In some langauges like Java, interfaces are part of the language syntax and strictly enforced. However, in Python, things are a little different. In this post, we will explore how interfaces can be implemented in Python.

Informal Interfaces: Protocols / Duck Typing

There’s no interface keyword in Python. The Java / C# way of using interfaces is not available here. In the dynamic language world, things are more implicit. We’re more focused on how an object behaves, rather than it’s type/class.

If it talks and walks like a duck, then it is a duck

So if we have an object that can fly and quack like a duck, we consider it as a duck. This called “Duck Typing”. In runtime, instead of checking the type of an object, we try to invoke a method we expect the object to have. If it behaves the way we expected, we’re fine and move along. But if it doesn’t, things might blow up. To be safe, we often handle the exceptions in a try..except block or use hasattr to check if an object has the specific method.

In the Python world, we often hear “file like object” or “an iterable” - if an object has a read method, it can be treated as a file like object, if it has an __iter__ magic method, it is an iterable. So any object, regardless of it’s class/type, can conform to a certain interface just by implementing the expected behavior (methods). These informal interfaces are termed as protocols. Since they are informal, they can not be formally enforced. They are mostly illustrated in the documentations or defined by convention. All the cool magic methods you have heard about - __len__, __contains__, __iter__ - they all help an object to conform to some sort of protocols.

class Team:
    def __init__(self, members):
        self.__members = members

    def __len__(self):
        return len(self.__members)

    def __contains__(self, member):
        return member in self.__members


justice_league_fav = Team(["batman", "wonder woman", "flash"])

# Sized protocol
print(len(justice_league_fav))

# Container protocol
print("batman" in justice_league_fav)
print("superman" in justice_league_fav)
print("cyborg" not in justice_league_fav)

In our above example, by implementing the __len__ and __contains__ method, we can now directly use the len function on a Team instance and check for membership using the in operator. If we add the __iter__ method to implement the iterable protocol, we would even be able to do something like:


for member in justice_league_fav:
    print(member)

Without implementing the __iter__ method, if we try to iterate over the team, we will get an error like:

TypeError: 'Team' object is not iterable

So we can see that protocols are like informal interfaces. We can implement a protocol by implementing the methods expected by it.

Formal Interfaces: ABCs

While protocols work fine in many cases, there are situations where informal interfaces or duck typing in general can cause confusion. For example, a Bird and Aeroplane both can fly(). But they are not the same thing even if they implement the same interfaces / protocols. Abstract Base Classes or ABCs can help solve this issue.

The concept behind ABCs is simple - we define base classes which are abstract in nature. We define certain methods on the base classes as abstract methods. So any objects deriving from these bases classes are forced to implement those methods. And since we’re using base classes, if we see an object has our class as a base class, we can say that this object implements the interface. That is now we can use types to tell if an object implements a certain interface. Let’s see an example.

import abc

class Bird(abc.ABC):
    @abc.abstractmethod
    def fly(self):
        pass

There’s the abc module which has a metaclass named ABCMeta. ABCs are created from this metaclass. So we can either use it directly as the metaclass of our ABC (something like this - class Bird(metaclass=abc.ABCMeta):) or we can subclass from the abc.ABC class which has the abc.ABCMeta as it’s metaclass already.

Then we have to use the abc.abstractmethod decorator to mark our methods abstract. Now if any class derives from our base Bird class, it must implement the fly method too. The following code would fail:

class Parrot(Bird):
    pass

p = Parrot()

We see the following error:

TypeError: Can't instantiate abstract class Parrot with abstract methods fly

Let’s fix that:


class Parrot(Bird):
    def fly(self):
        print("Flying")


p = Parrot()

Also note:

>>> isinstance(p, Bird)
True

Since our parrot is recognized as an instance of Bird ABC, we can be sure from it’s type that it definitely implements our desired interface.

Now let’s define another ABC named Aeroplane like this:

class Aeroplane(abc.ABC):
    @abc.abstractmethod
    def fly(self):
        pass


class Boeing(Aeroplane):
    def fly(self):
        print("Flying!")

b = Boeing()

Now if we compare:


>>> isinstance(p, Aeroplane)
False
>>> isinstance(b, Bird)
False

We can see even though both objects have the same method fly but we can now differentiate easily which one implements the Bird interface and which implements the Aeroplane interface.

We saw how we can create our own, custom ABCs. But it is often discouraged to create custom ABCs and rather use/subclass the built in ones. The Python standard library has many useful ABCs that we can easily reuse. We can get a list of useful built in ABCs in the collections.abc module - https://docs.python.org/3/library/collections.abc.html#module-collections.abc. Before writing your own, please do check if there’s an ABC for the same purpose in the standard library.

ABCs and Virtual Subclass

We can also register a class as a virtual subclass of an ABC. In that case, even if that class doesn’t subclass our ABC, it will still be treated as a subclass of the ABC (and thus accepted to have implemented the interface). Example codes will be able to demonstrate this better:

@Bird.register
class Robin:
    pass

r = Robin()

And then:

>>> issubclass(Robin, Bird)
True
>>> isinstance(r, Bird)
True
>>>

In this case, even if Robin does not subclass our ABC or define the abstract method, we can register it as a Bird. issubclass and isinstance behavior can be overloaded by adding two relevant magic methods. Read more on that here - https://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass

Django REST Framework: Using the request object

Mon, 27 Mar 2017 12:49:01 +0600

While working with Django REST Framework aka DRF, we often wonder how to customize our response based on request parameters. May be we want to check something against the logged in user (request.user) ? Or may be we want to modify part of our response based on a certain request parameter? How do we do that? We will discuss a few use cases below.

ModelViewSet - Filtering based on `request`

This is very often required while using ModelViewSets. We have many Items in our database. But when listing them, we only want to display the items belonging to the current logged in user.

from rest_framework.permissions import IsAuthenticated

class ItemViewSet(ModelViewSet):
    permission_classes = (IsAuthenticated,)
    serializer_class = ItemSerializer

    def get_queryset(self):
        queryset = Item.objects.all().filter(user=request.user)

        another_param = self.request.GET.get('another_param')
        if another_param:
            queryset = queryset.filter(another_field=another_param)

        return queryset

If you are using the awesome ModelViewSet, you can override the get_queryset method. Inside it, you can access the request object as self.request. In the above example, we are only listing the items which has our current user set as their user field. At the same time, we are also filtering the queryset based on another parameter. Basically you have the queryset and self.request available to you, feel free to use your imagination to craft all the queries you need!

Serializers - Modifying Response based on `request`

What if we don’t want to display item_count for the users by default? What if we only want to display that field when a request parameter, show_count is set? We can override the serializer to do that.

class UserSerializer(ModelSerializer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        context = kwargs.get('context', None)
        if context:
            request = kwargs['context']['request']

            show_count = request.GET.get('show_count')
            if show_count:
                self.fields['item_count'] = IntegerField(source="item_count")

When Serializers are constructed by DRF, it gets the request in the context. So we should always check if it exists and use it as needed. We can override the serializer fields by accessing self.fields.

Please note: The request object will be passed only if DRF constructs the serializer for you, for example when you just pass the serializer_class to a ModelViewSet. But if you are using the Serializer in your custom views, please do remember to pass the request manually, otherwise it won’t work.

item_serializer = ItemSerializer(item, context={"request": request})

In our case we have just used IntegerField. You can of course use another serializer to embed the full data of a related field.

Using `request` in Serializer Fields

Serializer fields have context too!

class ShortURLField(ReadOnlyField):
    def to_representation(self, value):
        return self.context['request'].build_absolute_uri(value)

and here’s the serializer:

class URLSerializer(ModelSerializer):
    short_url = ShortURLField()

    class Meta:
        model = URL
        fields = "__all__"

In the URL model, there is a method named short_url that returns a slug for that url. In our custom ShortURLField, we have customized the to_representation method to use the build_absolute_uri(value) method on current request for creating the full url from the slug.

Django Admin: Expensive COUNT(*) Queries

Mon, 20 Mar 2017 22:43:59 +0600

If you are a Django developer, it is very likely that you use the Django Admin regularly. And if you have maintained a website with a huge amount of data, you probably already know that Django Admin can become very slow when the database table gets so large. If you log the SQL queries (either using Django logging or using Django Debug Toolbar), you would notice a very expensive SQL query, something like this:

SELECT COUNT(*) AS "__count" FROM "table_name"

In the default settings, you will actually notice this query twice. If you use Django Debug Toolbar, it will tell you that the query was duplicated 2 times.

Issue - 1

By default ModelAdmin has show_full_result_count = True which shows the full result count in the admin interface. This is the source of one of the count(*) queries.

To fix that, we just need to set this on our ModelAdmin:

show_full_result_count = False

Issue - 2

Even after switching show_full_result_count off, we are still noticing a count(*) query in the log. It’s because the Django Paginator does a count itself.

The solution is to somehow bypass the expensive query while still returning a number so the pagination works as expected. We can cache the count value or even run raw SQL query find an approximate value through a rather inexpensive lookup somewhere else.

Here’s a quick example of a paginator that runs the expensive query once and then caches the results:

from django.core.paginator import Paginator
from django.core.cache import cache

# Modified version of a GIST I found in a SO thread
class CachingPaginator(Paginator):
    def _get_count(self):

        if not hasattr(self, "_count"):
            self._count = None

        if self._count is None:
            try:
                key = "adm:{0}:count".format(hash(self.object_list.query.__str__()))
                self._count = cache.get(key, -1)
                if self._count == -1:
                    self._count = super().count
                    cache.set(key, self._count, 3600)

            except:
                self._count = len(self.object_list)
        return self._count

    count = property(_get_count)

Now on our ModelAdmin we just need to use this paginator.

paginator = CachingPaginator

Once we have done that, it will be slow when we first time load the page and it will be faster afterwards. We can also fetch and cache this value from time to time. This solution might not get us the exact count and thus mess up pagination sometimes but in most cases that would not be much of a problem.

Django Channels: Using Custom Channels

Sun, 27 Nov 2016 07:48:51 +0600

In my earlier blog post - Introduction to Django Channels, I mentioned that we can create our own channels for various purposes. In this blog post, we would discuss where custom channels can be useful, what could be the challenges and of course we would see some code examples. But before we begin, please make sure you are familiar with the concepts of Django Channels. I would recommend going through the above mentioned post and the official docs to familiarize yourself with the basics.

Our Use Case

Channels is just a queue which has consumers (workers) listenning to it. With that concept in mind, we might be able to think of many innovative use cases a queue could have. But in our example, we will keep the idea simple. We are going to use Channels as a means of background task processing.

We will create our own channels for different tasks. There will be consumers waiting for messages on these channels. When we want to do something in the background, we would pass it on the appropriate channels & the workers will take care of the tasks. For example, we want to create a thumbnail of an user uploaded photo? We pass it to the thumbnails channel. We want to send a confirmation email, we send it to the welcome_email channel. Like that. If you are familiar with Celery or Python RQ, this would sound pretty familiar to you.

Now here’s my use case - in one of the projects I am working on, we’re building APIs for mobile applications. We use BrainTree for payment integration. The mobile application sends a nonce - it’s like a token that we can use to initiate the actual transaction. The transaction has two steps - first we initiate it using the nonce and I get back a transaction id. Then I query whether the transaction succeeded or failed. I felt it would be a good idea to process this in the background. We already have a websocket end point implemented using Channels. So I thought it would be great to leverage the existing setup instead of introducing something new in the stack.

Challenges

It has so far worked pretty well. But we have to remember that Channels does not gurantee delivery of the messages and there is no retrying if a message fails. So we wrote a custom management command that checks the orders for any records that have the nonce set but no transaction id or there is transaction id but there is no final result stored. We then scheduled this command to run at a certain interval and queue up the unfinished/incomplete orders again. In our case, it doesn’t hurt if the orders need some 5 to 10 minutes to process.

But if we were working on a product where the message delivery was time critical for our business, we probably would have considered Celery for the background processing part.

Let’s see the codes!

First we needed to write a handler. The hadler would receive the messages on the subscribed channel and process them. Here’s the handler:

def braintree_process(message):
    order_data = message.content.get('order')
    order_id = message.content.get('order_id')
    order_instance = Order.objects.get(pk=order_id)

    if order_data:
        nonce = order_data.get("braintree_nonce")
        if nonce:
            # [snipped]

            TRANSACTION_SUCCESS_STATUSES = [
                braintree.Transaction.Status.Authorized,
                braintree.Transaction.Status.Authorizing,
                braintree.Transaction.Status.Settled,
                braintree.Transaction.Status.SettlementConfirmed,
                braintree.Transaction.Status.SettlementPending,
                braintree.Transaction.Status.Settling,
                braintree.Transaction.Status.SubmittedForSettlement
            ]

            result = braintree.Transaction.sale({
                'amount': str(order_data.get('total')),
                'payment_method_nonce': nonce,
                'options': {
                    "submit_for_settlement": True
                }
            })

            if result.is_success or result.transaction:
                transaction = braintree.Transaction.find(result.transaction.id)
                if transaction.status in TRANSACTION_SUCCESS_STATUSES:
                    # [snipped]
                else:
                    # [snipped]
            else:
                errors = []
                for x in result.errors.deep_errors:
                    errors.append(str(x.code))

                # [snipped]

Then we needed to define a routing so the messages on a certain channel is passed on to this handler. So in our channel routing, we added this:

from channels.routing import route
from .channel_handlers import braintree_process

channel_routing = [
    route("braintree_process", braintree_process),
    # [snipped] ...
]

We now have a routing set and a handler ready to accept messages. So we’re ready! All we need to do is to start passing the data to this channel.

When the API receives a nonce, it just passes the order details to this channel:

Channel("braintree_process").send({
    "order": data,
    "order_id": order.id
})

And then the workers start working. They accept the message and then starts processing the payment request.

In our case, we already had the workers running (since they were serving our websocket requests). If you don’t have any workers running, don’t forget to run them.

python manage.py runworker

If you are wondering about how to deploy channels, I have you covered - Deploying Django Channels using Daphne

Prioritizing / Scaling Channels

In our project, Django Channels do two things - handling websocket connections for realtime communication, process delayed jobs in background. As you can probably guess, the realtime part is more important. In our current setup, the running workers handle both types of requests as they come. But we want to dedicate more workers to the websocket and perhaps just one worker should keep processing the payments.

Luckily, we can limit our workers to certain channels using the --only-channels flag. Or alternatively we can exclude certain channels by using the --exclude-channels flags.

Concluding Thoughts

I personally find the design of channels very straightforward, simple and easy to reason about. When Channels get merged into Django, it’s going to be quite useful, not just for implementing http/2 or websockets, but also as a way to process background tasks with ease and without introducing third party libraries.

Exploring Asyncio - uvloop, sanic and motor

Thu, 17 Nov 2016 03:33:38 +0600

The asyncio package was introduced in the standard library from Python 3.4. The package is still in provisional stage, that is backward compatibility can be broken with future changes. However, the Python community is pretty excited about it and I know personally that many people have started using it in production. So, I too decided to try it out. I built a rather simple micro service using the excellent sanic framework and motor (for accessing mongodb). uvloop is an alternative event loop implementation written in Cython on top of libuv and can be used as a drop in replacement for asyncio’s event loop. Sanic uses uvloop behind the scene to go fast.

In this blog post, I would quickly introduce the technologies involved and then walk through some sample code with relevant explanations.

What is Asyncio? Why Should I Care?

In one of my earlier blog post - Async Python: The Different Forms of Concurrency, I have tried to elaborate on the different ways to achieve concurrency in the Python land. In the last part of the post, I have tried to explain what asyncio brings new to the table.

Asyncio allows us to write asynchronous, concurrent programs running on a single thread, using an event loop to schedule tasks and multiplexing I/O over sockets (and other resources). The one line explanation might be a little complex to comprehend at a glance. So I will break it down. In asyncio, everything runs on a single thread. We use coroutines which can be treated as small units of task that we can pause and resume. Then there is I/O multiplexing - when our tasks are busy waiting for I/O, an event loop pauses them and allows other tasks to run. When the paused tasks finish I/O, the event loop resumes them. This way even a single thread can handle / serve a large number of connections / clients by effectively juggling between “active” tasks and tasks that are waiting for some sort of I/O.

In general synchronous style, for example, when we’re using thread based concurrency, each client will occupy a thread and when we have a large number of connections, we will soon run out of threads. Though not all of those threads were active at a given time, some might have been simply waiting for I/O, doing nothing. Asyncio helps us solve this problem and provides an efficient solution to the concurrency problem.

While Twisted, Tornado and many other solutions have existed in the past, NodeJS brought huge attention to this kind of solution. And with Asyncio being in the standard library, I believe it will become the standard way of doing async I/O in the Python world over time.

What about uvloop?

We talked about event loop above. It schedules the tasks and deals with various events. It also manages the I/O multiplexing using the various options offered by the operating system. In simple words - the event loop is very critical and the central part of the whole asyncio operations. The asyncio package ships with an event loop by default. But we can also swap it for our custom implementations if we need/prefer. uvloop is one such event loop that is very very fast. The key to it’s success could be partially attributed to Cython. Cython allows us to write codes in Python like syntax while the codes perform like C. uvloop was written in Cython and it uses the famous libuv library (also used by NodeJS).

If you are wondering if uvloop’s performances are good enough reason to swap out the default event loop, you may want to read this aricle here - uvloop: Blazing fast Python networking or you can just look at this following chart taken from that blog post:

Yes, it can go faster than NodeJS and catch up to Golang. Convinced yet? Let’s talk about Sanic!

Sanic - Gotta go fast!

Sanic was inspired by the above article I talked about. They used uvloop and httptools too (referenced in the article). The framework provides a nice, Flask like syntax along with the async / await syntax from Python 3.5.

Please Note: uvloop still doesn’t work on Windows properly. Sanic uses the default asyncio event loop if uvloop is not available. But this probably doesn’t matter because in most cases we deploy to linux machines anyway. Just in case you want to try out the performance gains on Windows, I recommend you use a VM to test it inside a Linux machine.

Motor

Motor started off as an async mongodb driver for Tornado. Motor = Mongodb + Tornado. But Motor now has pretty nice support for asyncio. And of course we can use the async / await syntax too.

I guess we have had brief introductions to the technologies we are going to use. So let’s get started with the actual work.

Setting Up

We need to install sanic and motor using pip.

pip install sanic
pip install motor

Sanic should also install it’s dependencies including uvloop and ujson along with others.

Set `uvloop` as the event loop

We will swap out the default event loop and use uvloop instead.

import asyncio
import uvloop

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

Simple as that. We import asyncio and uvloop. We set the event loop policy to uvloop’s event loop policy and we’re done. Now asyncio will use uvloop as the default event loop.

Connecting to Mongodb

We will be using motor to connect to our mongodb. Just like this:

from motor.motor_asyncio import AsyncIOMotorClient

mongo_connection = AsyncIOMotorClient("<mongodb connection string>")

contacts = mongo_connection.mydatabase.contacts

We import the AsyncIOMotorClient and pass our mongodb connection string to it. We also point to our target collection using a name / variable so that we can easily (and directly) use that collection later. Here mydatabase is the db name and contacts is the collection name.

Request Handlers

Now we will dive right in and write our request handlers. For our demo application, I will create two routes. One for listing the contacts and one for creating new ones. But first we must instantiate sanic.

from sanic import Sanic
from sanic.response import json

app = Sanic(__name__)

Flask-like, remember? Now that we have the app instance, let’s add routes to it.

@app.route("/")
async def list(request):
    data = await contacts.find().to_list(20)
    for x in data:
        x['id'] = str(x['_id'])
        del x['_id']

    return json(data)


@app.route("/new")
async def new(request):
    contact = request.json
    insert = await contacts.insert_one(contact)
    return json({"inserted_id": str(insert.inserted_id)})

The routes are simple and for the sake of brevity, I haven’t written any error handling codes. The list function is async. Inside it we await our contacts to arrive from the database, as a list of 20 entries. In a sync style, we would use the find method directly but now we await it.

After we have the results, we quickly iterate over the documents and add id key and remove the _id key. The _id key is an instance of ObjectId which would need us to use the bson package for serialization. To avoid complexity here, we just convert the id to string and then delete the ObjectId instance. The rest of the document is usual string based key-value pairs (dict). So it should serialize fine.

In the new function, we grab the incoming json payload and pass it to the insert_one method directly. request.json would contain the dict representation of the json request. Check out this page for other request data available to you. Here, we again await the insert_one call. When the response is available, we take the inserted_id and send a response back.

Running the App

Let’s see the code first:

loop = asyncio.get_event_loop()

app.run(host="0.0.0.0", port=8000, workers=3, debug=True, loop=loop)

Here we get the default event loop and pass it to app.run along with other obvious options. With the workers argument, we can set how many workers we want to use. This allows us to spin up multiple workers and take advantages of multiple cpu cores. On a single core machine, we can just set it to 1 or totally skip that one.

The loop is optional as well. If we do not pass the loop, sanic will create a new one and set it as the default loop. But in our case, we have connected to mongodb using motor before the app.run function could actually run. Motor now already uses the default event loop. If we don’t pass that same loop to sanic, sanic will initialize a new event loop. Our database access and sanic server will be on two different event loops and we won’t be able to make database calls. That is why we use the get_event_loop function to retrieve the current default event loop and pass it to sanic. This is also why we set uvloop as the default event loop on top of the file. Otherwise we would end up with the default loop (that comes with asyncio) and sanic would also have to use that. Initializing uvloop at the beginning makes sure everyone uses it.

Final Code

So here’s the final code. We probably should clean up the imports and bring them up on top. But to relate to the different steps, I kept them as is. Also as mentioned earlier, the code has no error handling. We should write proper error handling code in all serious projects.

import asyncio
import uvloop

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())



from motor.motor_asyncio import AsyncIOMotorClient

mongo_connection = AsyncIOMotorClient("<connection string>")

contacts = mongo_connection.mydatabase.contacts


from sanic import Sanic
from sanic.response import json

app = Sanic(__name__)


@app.route("/")
async def list(request):
    data = await contacts.find().to_list(20)
    for x in data:
        x['id'] = str(x['_id'])
        del x['_id']

    return json(data)


@app.route("/new")
async def new(request):
    contact = request.json
    insert = await contacts.insert_one(contact)
    return json({"inserted_id": str(insert.inserted_id)})


loop = asyncio.get_event_loop()

app.run(host="0.0.0.0", port=8000, workers=3, debug=True, loop=loop)

Now let’s try it out?

Trying Out

I have saved the above code as main.py. So let’s run it.

python main.py

Now we can use curl to try it out. Let’s first add a contact:

curl -X POST -H "Content-Type: application/json" -d '{"name": "masnun"}' "http://localhost:8000/new"

We should see something like:

{"inserted_id":"582ceb772c608731477f5384"}

Let’s verify by checking / -

curl -X GET "http://localhost:8000/"

If everything goes right, we should see something like:

[{"id":"582ceb772c608731477f5384","name":"masnun"}]

I hope it works for you too! :-)

If you have any feedback or suggestions, please feel free to share it in the comments section. I would love to disqus :-)

Deploying Django Channels using Daphne

Wed, 02 Nov 2016 07:07:09 +0600

In one of my earlier post, we have seen an overview of how Django Channels work and how it helps us build cool stuff. However, in that post, we covered deployment briefly. So here in this post, we shall go over deployment again, with a little more details and of course code samples.

What do we need?

For running Django Channels, we would use the following setup:

nginx as the proxy
daphne as the interface server
redis as the backend

Let’s get started.

Setup Redis and Configure App

We need to setup redis if it’s not installed already. Here’s how to do it on Ubuntu:

sudo apt-get install redis-server

If we want to use the redis backend, we also need to setup asgi-redis.

pip install asgi_redis

In your settings.py file, make sure you used redis as the backend and input the host properly.

Here’s a demo:

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "asgi_redis.RedisChannelLayer",
        "CONFIG": {
            "hosts": [("localhost", 6379)],
        },
        "ROUTING": "realtime.routing.channel_routing",
    },
}

Starting Daphne

If you have installed channels from pip, you should have the daphne command available already. In the very unlikely case you don’t have it installed, here’s the command:

pip install daphne

To run daphne, we use the following command:

daphne -b 0.0.0.0 -p 8001 <app>.asgi:channel_layer

Daphne will bind to 0.0.0.0 and use 8001 as the port.

Here <app> is our app name / the module that contains the asgi.py file. Please refer to the previous blog post to know what we put in the asgi.py file.

We now need to make sure daphne is automatically started at system launch and restarted when it crashes. In this example, I would stick to my old upstart script. But you would probably want to explore excellent projects like circus or supervisor or at least systemd.

Here’s the upstart script I use:

start on runlevel [2345]
stop on runlevel [016]

respawn

script
    cd /home/ubuntu/<app home>
    export DJANGO_SETTINGS_MODULE="<app>.production_settings"
    exec daphne -b 0.0.0.0 -p 8001 <app>.asgi:channel_layer
end script

Running Workers

We need at least one running worker before daphne can start processing requests. To run a worker, we use the following command:

python manage.py runworker

The runworker command spawns one worker with one thread. We should have more than one ideally. It is recommended to have n number of workers where n is the number of available cpu cores.

Here’s a simple upstart script to keep the worker running:

start on runlevel [2345]
stop on runlevel [016]

respawn

script
    cd /home/ubuntu/<app home>
    export DJANGO_SETTINGS_MODULE="<app>.production_settings"
    exec python3 manage.py runworker
end script

It would be much easier to launch multiple workers if you use supervisord or circus.

Nginx Conf

Finally here’s the nginx conf I use. Please note I handle all incoming requests with daphne which is probably not ideal. You can keep using uwsgi for your existing, non real time parts and only handle the real time part with daphne. Since setting up wsgi is popular knowledge, I will just focus on what we need for daphne.

server {
    listen 80;
    client_max_body_size 20M;

    location /static {
       	alias /home/ubuntu/<app home>/static;

    }

    location /media {
        alias /home/ubuntu/<app home>/media;

    }

    location / {


       	    proxy_pass http://0.0.0.0:8001;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;

        }

}

We have our daphne server running on port 8001 so we set a proxy to that url. Now if daphne and worker are running, we should be able to see our webpage when we visit the url.

Getting Help Interactively in Python

Tue, 01 Nov 2016 17:00:51 +0600

Working with a module that you’re not familiar with? No internet? Somehow the docs are not accessible? Or simply feeling adventourous? Python has you covered. There are a few ways to get help Interactively. In this post, we will try a few of them.

The `dir` built-in

The dir built in is a very helpful one. If you call it without any arguments, that is just dir(), it will return the names available in the current scope. When passed with an argument, it would display the available attributes of the passed object (inherited or it’s own).

>>> import os
>>> dir(os)
['CLD_CONTINUED', 'CLD_DUMPED', 'CLD_EXITED', 'CLD_TRAPPED', 'EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_LOCK', 'F_OK', 'F_TEST', 'F_TLOCK', 'F_ULOCK', 'MutableMapping', 'NGROUPS_MAX', 'O_ACCMODE', 'O_APPEND', 'O_ASYNC', 'O_CLOEXEC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'PRIO_PGRP', 'PRIO_PROCESS', 'PRIO_USER', 'P_ALL', 'P_NOWAIT', 'P_NOWAITO', 'P_PGID', 'P_PID', 'P_WAIT', 'RTLD_GLOBAL', 'RTLD_LAZY', 'RTLD_LOCAL', 'RTLD_NODELETE', 'RTLD_NOLOAD', 'RTLD_NOW', 'R_OK', 'SCHED_FIFO', 'SCHED_OTHER', 'SCHED_RR', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'ST_NOSUID', 'ST_RDONLY', 'TMP_MAX', 'WCONTINUED', 'WCOREDUMP', 'WEXITED', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEXITED', 'WIFSIGNALED', 'WIFSTOPPED', 'WNOHANG', 'WNOWAIT', 'WSTOPPED', 'WSTOPSIG', 'WTERMSIG', 'WUNTRACED', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fwalk', '_get_exports_list', '_putenv', '_spawnvef', '_unsetenv', '_wrap_close', 'abort', 'access', 'altsep', 'chdir', 'chflags', 'chmod', 'chown', 'chroot', 'close', 'closerange', 'confstr', 'confstr_names', 'cpu_count', 'ctermid', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'environb', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fchdir', 'fchmod', 'fchown', 'fdopen', 'fork', 'forkpty', 'fpathconf', 'fsdecode', 'fsencode', 'fstat', 'fstatvfs', 'fsync', 'ftruncate', 'fwalk', 'get_blocking', 'get_exec_path', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getegid', 'getenv', 'getenvb', 'geteuid', 'getgid', 'getgrouplist', 'getgroups', 'getloadavg', 'getlogin', 'getpgid', 'getpgrp', 'getpid', 'getppid', 'getpriority', 'getsid', 'getuid', 'initgroups', 'isatty', 'kill', 'killpg', 'lchflags', 'lchmod', 'lchown', 'linesep', 'link', 'listdir', 'lockf', 'lseek', 'lstat', 'major', 'makedev', 'makedirs', 'minor', 'mkdir', 'mkfifo', 'mknod', 'name', 'nice', 'open', 'openpty', 'pardir', 'path', 'pathconf', 'pathconf_names', 'pathsep', 'pipe', 'popen', 'pread', 'putenv', 'pwrite', 'read', 'readlink', 'readv', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sched_get_priority_max', 'sched_get_priority_min', 'sched_yield', 'sendfile', 'sep', 'set_blocking', 'set_inheritable', 'setegid', 'seteuid', 'setgid', 'setgroups', 'setpgid', 'setpgrp', 'setpriority', 'setregid', 'setreuid', 'setsid', 'setuid', 'spawnl', 'spawnle', 'spawnlp', 'spawnlpe', 'spawnv', 'spawnve', 'spawnvp', 'spawnvpe', 'st', 'stat', 'stat_float_times', 'stat_result', 'statvfs', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sync', 'sys', 'sysconf', 'sysconf_names', 'system', 'tcgetpgrp', 'tcsetpgrp', 'terminal_size', 'times', 'times_result', 'truncate', 'ttyname', 'umask', 'uname', 'uname_result', 'unlink', 'unsetenv', 'urandom', 'utime', 'wait', 'wait3', 'wait4', 'waitpid', 'walk', 'write', 'writev']
>>>

Coupled with getattr, you can actually write your own custom utilities to better inspect objects.

The `help` built-in

I guess I don’t have to tell you how help-ful this one can be?

Did you know the help built in is based on pydoc.help?

If you just call help without any arguments, it will launch an interactive help prompt where you can just type in names and it will display help for that. Here’s an example:

>>> help()

Welcome to Python 3.5's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/3.5/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".

help> list

help>

When you type in list and hit enter, it will show you the docs for the list built in. To quit, press q. As described in the text above, typing in “modules”, “keywords” etc will list what is available.

Interestingly the help functionality is built on top of pydoc so it will be able to help you with most of the installed modules (even the third party ones) as long as the modules have doctstrings available. Brilliant, no?

Now if you call the help callable with an argument, it will display help for that item. The above example for viewing the docs for list can be done this way too:

>>> help(list)

Neat, huh?

Using the `pydoc` Module

In the previous section, we mentioned pydoc. From the name, you can probably guess what it does. Just to be certain, let’s try this:

>>> import pydoc
>>> help(pydoc)

As you can read in there, the pydoc module generates documentation in html or text format for interactive usages (like in the previous section). It can read Python source files, parse the docstrings and generate helpful information for us. Pydoc module comes with your Python installation. So it is always available to you.

There are some interesting use cases of this module. You can run it from the command line. Just use pydoc <name> where the <name> is the name of a function, module, class etc. It will display the same interactive, generated docs we get from help(<name>).

And then pydoc -k <keyword> would search the keyword in the available modules’ synopsis.

If you would like to browse the docs on a web browser, you can run pydoc -b and it will run a server and open your browser, pointing to the address of the server. If you would like to set the port yourself, use pydoc -p <port> and then in the prompt, type “b” to open the browser. You can browse the docs and search as needed.

The `inspect` Module

The inspect module has some interesting use cases too. It can help us know more about different objects in runtime.

The following functions check for object types:

ismodule()
isclass()
ismethod()
isfunction()
isgeneratorfunction()
isgenerator()
istraceback()
isframe()
iscode()
isbuiltin()
isroutine()

We can use the getmembers() function to get all the members of an object, class or module. We can filter the members by passing one of the above functions as the second argument.

>>> len(inspect.getmembers(os))
284
>>> len(inspect.getmembers(os, inspect.isclass))
9
>>>

The getdoc function can be used to retrieve available documentation from an object.

>>> inspect.getdoc(list)
"list() -> new empty list\nlist(iterable) -> new list initialized from iterable's items"

The inspect module has some other cool functions too. Do check them out. And of course, you know how! ;-)

>>> import inspect
>>> help(inspect)

Async Python: The Different Forms of Concurrency

Thu, 06 Oct 2016 12:10:03 +0600

With the advent of Python 3 the way we’re hearing a lot of buzz about “async” and “concurrency”, one might simply assume that Python recently introduced these concepts/capabilities. But that would be quite far from the truth. We have had async and concurrent operations for quite some times now. Also many beginners may think that asyncio is the only/best way to do async/concurrent operations. In this post we shall explore the different ways we can achieve concurrency and the benefits/drawbacks of them.

Defining The Terms

Before we dive into the technical aspects, it is essential to have some basic understanding of the terms frequently used in this context.

Sync vs Async

In Syncrhonous operations, the tasks are executed in sync, one after one. In asynchronous operations, tasks may start and complete independent of each other. One async task may start and continue running while the execution moves on to a new task. Async tasks don’t block (make the execution wait for it’s completion) operations and usually run in the background.

For example, you have to call a travel agency to book for your next vacation. And you need to send an email to your boss before you go on the tour. In synchronous fashion, you would first call the travel agency, if they put you on hold for a moment, you keep waiting and waiting. Once it’s done, you start writing the email to your boss. Here you complete one task after another. But if you be clever and while you are waiting on hold, you could start writing up the email, when they talk to you, you pause writing the email, talk to them and then resume the email writing. You could also ask a friend to make the call while you finish that email. This is asynchronicity. Tasks don’t block one another.

Concurrency and Parallelism

Concurrency implies that two tasks make progress together. In our previous example, when we considered the async example, we were making progress on both the call with the travel agent and writing the email. This is concurrency.

When we talked about taking help from a friend with the call, in that case both tasks would be running in parallel.

Parallelism is in fact a form of concurrency. But parallelism is hardware dependent. For example if there’s only one core in the CPU, two operations can’t really run in parallel. They just share time slices from the same core. This is concurrency but not parallelism. But when we have multiple cores, we can actually run two or more operations (depending on the number of cores) in parallel.

Quick Recap

So this is what we have realized so far:

Sync: Blocking operations.
Async: Non blocking operations.
Concurrency: Making progress together.
Parallelism: Making progress in parallel.

Parallelism implies Concurrency. But Concurrency doesn’t always mean Parallelism.

Threads & Processes

Python has had Threads for a very long time. Threads allow us to run our operations concurrently. But there was/is a problem with the Global Interpreter Lock (GIL) for which the threading could not provide true parallelism. However, with multiprocessing, it is now possible to leverage multiple cores with Python.

Threads

Let’s see a quick example. In the following code, the worker function will be run on multiple threads, asynchronously and concurrently.

import threading
import time
import random


def worker(number):
    sleep = random.randrange(1, 10)
    time.sleep(sleep)
    print("I am Worker {}, I slept for {} seconds".format(number, sleep))


for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    t.start()

print("All Threads are queued, let's see when they finish!")

Here’s a sample output from a run on my machine:

$ python thread_test.py
All Threads are queued, let's see when they finish!
I am Worker 1, I slept for 1 seconds
I am Worker 3, I slept for 4 seconds
I am Worker 4, I slept for 5 seconds
I am Worker 2, I slept for 7 seconds
I am Worker 0, I slept for 9 seconds

So you can see we start 5 threads, they make progress together and when we start the threads (and thus executing the worker function), the operation does not wait for the threads to complete before moving on to the next print statement. So this is an async operation.

In our example, we passed a function to the Thread constructor. But if we wanted we could also subclass it and implement the code as a method (in a more OOP way).

Further Reading:

To know about Threads in details, you can follow these resources:

https://pymotw.com/3/threading/index.html

Global Interpreter Lock (GIL)

The Global Interpreter Lock aka GIL was introduced to make CPython’s memory handling easier and to allow better integrations with C (for example the extensions). The GIL is a locking mechanism that the Python interpreter runs only one thread at a time. That is only one thread can execute Python byte code at any given time. This GIL makes sure that multiple threads DO NOT run in parallel.

Quick facts about the GIL:

One thread can run at a time.
The Python Interpreter switches between threads to allow concurrency.
The GIL is only applicable to CPython (the defacto implementation). Other implementations like Jython, IronPython don’t have GIL.
GIL makes single threaded programs fast.
For I/O bound operations, GIL usually doesn’t harm much.
GIL makes it easy to integrate non thread safe C libraries, thansk to the GIL, we have many high performance extensions/modules written in C.
For CPU bound tasks, the interpreter checks between N ticks and switches threads. So one thread does not block others.

Many people see the GIL as a weakness. I see it as a blessing since it has made libraries like NumPy, SciPy possible which have taken Python an unique position in the scientific communities.

Further Reading:

These resources can help dive deeper into the GIL:

http://www.dabeaz.com/python/UnderstandingGIL.pdf

Processes

To get parallelism, Python introduced the multiprocessing module which provides APIs which will feel very similar if you have used Threading before.

In fact, we will just go and change our previous example. Here’s the modified version that uses Process instead of Thread.


import multiprocessing
import time
import random


def worker(number):
    sleep = random.randrange(1, 10)
    time.sleep(sleep)
    print("I am Worker {}, I slept for {} seconds".format(number, sleep))


for i in range(5):
    t = multiprocessing.Process(target=worker, args=(i,))
    t.start()

print("All Processes are queued, let's see when they finish!")

So what’s changed? I just imported the multiprocessing module instead of threading. And then instead of Thread, I used Process. That’s it, really! Now instead of multi threading, we are using multiple processes which are running on different core of your CPU (assuming you have multiple cores).

With the Pool class, we can also distribute one function execution across multiple processes for different input values. If we take the example from the official docs:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

Here, instead of iterating over the list of values and calling f on them one by one, we are actually running the function on different processes. One process executes f(1), another runs f(2) and another runs f(3). Finally the results are again aggregated in a list. This would allow us to break down heavy computations into smaller parts and run them in parallel for faster calculation.

Further Reading:

https://pymotw.com/3/multiprocessing/index.html

The `concurrent.futures` module

The concurrent.futures module packs some really great stuff for writing async codes easily. My favorites are the ThreadPoolExecutor and the ProcessPoolExecutor. These executors maintain a pool of threads or processes. We submit our tasks to the pool and it runs the tasks in available thread/process. A Future object is returned which we can use to query and get the result when the task has completed.

Here’s an example of ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ThreadPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())

I have a blog post on the concurrent.futures module here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html which might be helpful for exploring the module deeper.

Further Reading:

https://pymotw.com/3/concurrent.futures/

Asyncio - Why, What and How?

You probably have the question many people in the Python community have - What does asyncio bring new to the table? Why did we need one more way to do async I/O? Did we not have threads and processes already? Let’s see!

Why do we need asyncio?

Processes are costly to spawn. So for I/O, Threads are chosen largely. We know that I/O depends on external stuff - slow disks or nasty network lags make I/O often unpredictable. Now, let’s assume that we are using threads for I/O bound operations. 3 threads are doing different I/O tasks. The interpreter would need to switch between the concurrent threads and give each of them some time in turns. Let’s call the threads - T1, T2 and T3. The three threads have started their I/O operation. T3 completes it first. T2 and T1 are still waiting for I/O. The Python interpreter switches to T1 but it’s still waiting. Fine, so it moves to T2, it’s still waiting and then it moves to T3 which is ready and executes the code. Do you see the problem here?

T3 was ready but the interpreter switched between T2 and T1 first - that incurred switching costs which we could have avoided if the interpreter first moved to T3, right?

What is asyncio?

Asyncio provides us an event loop along with other good stuff. The event loop tracks different I/O events and switches to tasks which are ready and pauses the ones which are waiting on I/O. Thus we don’t waste time on tasks which are not ready to run right now.

The idea is very simple. There’s an event loop. And we have functions that run async, I/O operations. We give our functions to the event loop and ask it to run those for us. The event loop gives us back a Future object, it’s like a promise that we will get something back in the future. We hold on to the promise, time to time check if it has a value (when we feel impatient) and finally when the future has a value, we use it in some other operations.

Asyncio uses generators and coroutines to pause and resume tasks. You can read these posts for more details:

How do we use asyncio?

Before we beging, let’s see example codes:

import asyncio
import datetime
import random


async def my_sleep_func():
    await asyncio.sleep(random.randint(0, 5))


async def display_date(num, loop):
    end_time = loop.time() + 50.0
    while True:
        print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
        if (loop.time() + 1.0) >= end_time:
            break
        await my_sleep_func()


loop = asyncio.get_event_loop()

asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))

loop.run_forever()

Please note that the async/await syntax is Python 3.5+ only. if we walk through the codes:

We have an async function display_date which takes a number (as an identifier) and the event loop as parameters.
The function has an infinite loop that breaks after 50 secs. But during this 50 sec period, it repeatedly prints out the time and takes a nap. The await function can wait on other async functions (coroutines) to complete.
We pass the function to event loop (using the ensure_future method).
We start running the event loop.

Whenever the await call is made, asyncio understands that the function is probably going to need some time. So it pauses the execution, starts monitoring any I/O event related to it and allows tasks to run. When asyncio notices that paused function’s I/O is ready, it resumes the function.

Making the Right Choice

We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
       print("Use Threads")
else:
    print("Multi Processing")

CPU Bound => Multi Processing
I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
I/O Bound, Slow I/O, Many connections => Asyncio

Creating an executable file using Cython

Sat, 01 Oct 2016 17:27:23 +0600

Disclaimer: I am quite new to Cython, if you find any part of this post is incorrect or there are better ways to do something, I would really appreciate your feedback. Please do feel free to leave your thoughts in the comments section :)

I know Cython is supposed to be used for building extensions, but I was wondering if we can by any chance compile a Python file into executable binary using Cython? I searched on Google and found this StackOverflow question. There is a detailed answer on this question which is very helpful. I tried to follow the instructions and after (finding and ) fixing some paths, I managed to do it. I am going to write down my experience here in case someone else finds it useful as well.

Embedding the Python Interpreter

Cython compiles the Python or the Cython files into C and then compiles the C code to create the extensions. Interestingly, Cython has a CLI switch --embed whic can generate a main function. This main function embeds the Python interpreter for us. So we can just compile the C file and get our single binary executable.

Getting Started

First we need to have a Python (.py) or Cython (.pyx) file ready for compilation. Let’s start with a plain old “Hello World” example.

print("Hello World!")

Let’s convert this Python file to a C source file with embedded Python interpreter.

cython --embed -o hello_world.c hello_world.py

It should generate a file named hello_world.c in the current directory. We now compile it to an executable.

gcc -v -Os -I /Users/masnun/.pyenv/versions/3.5.1/include/python3.5m -L /usr/local/Frameworks/Python.framework/Versions/3.5/lib  -o test test.c  -lpython3.5  -lpthread -lm -lutil -ldl

Please note you must have the Python source code and dynamic libraries in order to successfully compile it. I am on OSX and I use PyEnv. So I passed the appropriate paths and it compiled fine.

Now I have an executable file, which I can run:

$ ./hello_world
Hello World!

Dynamic Linking

In this case, the executable we produce is dynamically linked to our specified Python version. So this may not be fully portable (the libraries will need to be available on target machines). But this should work fine if we compile against common versions (for example the default version of Python or a version easily obtainable via the package manager).

Including Other Modules

Up untill now, I haven’t found any easy ways to include other 3rd party pure python modules (ie. requests) directly compiled into the binary. However, if I want to split my codes into multiple files, I can create other .pyx files and use the include statement with those.

For example, here’s hello.pyx:

cdef struct Person:
    char *name
    int age

cdef say():
    cdef Person masnun = Person(name="masnun", age=20)
    print("Hello {}, you are {} years old!".format(masnun.name.decode('utf8'), masnun.age))

And here’s my main file - test.pyx -

include "hello.pyx"

say()

Now if I compile test.pyx just like above example, it will also include the code in hello.pyx and I can call the say function as if it was in test.pyx itself.

However, shared libraries like PyQt would have no issues - we can compile them as is. So basically we can take any PyQt code example and compile it with Cython - it should work fine!

Can Cython make Python Great in Programming Contests?

Wed, 28 Sep 2016 08:00:30 +0600

Python is getting very popular as the first programming language in both home and aborad. I know many of the Bangladeshi universities have started using Python to introduce beginners to the wonderful world of programming. This also seems to be the case in the US. I have talked to a few friends from other countries and they agree to the fact that Python is quickly becoming the language people learn first. A quick google search could explain why Python is getting so popular among the learners.

Python in Programming Contests

Recently Python has been been included in ICPC, before that Python has usually had less visibility / presence in programming contests. And of course there are valid reasons behind that. The defacto implementation of Python - “CPython” is quite slow. It’s a dynmaic language and that costs in terms of execution speed. C / C++ / Java is way faster than Python and programming contests are all about speed / performance. Python would allow you to solve problems in less lines of code but you may often hit the time limit. Despite the limitation, people have continiously chosen Python to learn programming and solve problems on numerous programming related websites. This might have convnced the authority to include Python in ICPC. But we do not yet know which flavor (read implementation) and version of Python will be available to the ICPC contestants. From different sources I gather that Python will be supported but the time limit issue remains - it is not guranteed that a problem can be solved within the time limit using Python. That makes me wonder, can Cython help in such cases?

Introduction to Cython

From the official website:

Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.

With Cython, we can add type hints to our existing Python programs and compile them to make them run faster. But what is more awesome is the Cython language - it is a superset of Python and allows us to write Python like code which performs like C.

Don’t trust my words, see for yourself in the Tutorial and Cython Language Basics.

Cython is Fast

When I say fast, I really mean - very very fast.

Image Source: http://ibm.co/20XSZ4F

The above image is taken from an article from IBM Developer Works which shows how Cython compares to C in terms of speed.

You can also check out these links for random benchmarks from different people:

And finally, do try yourself and benchmark Cython against C++ and see how it performs!

Bonus article – Blazing fast Python networking :-)

Cython is easy to Setup

OK, so is it easy to make Cython available in the contest environments? Yes, it is! The only requirements of Cython is that you must have a C Compiler installed on your system along with Python. Any computer used for contest programming is supposed to have a C compiler installed anyway.

We just need one command to install Cython:

pip install Cython

PS: Many Scientific distributions of Python (ie. Anaconda) already ships Cython.

Cython in Programming Contests

Since we saw that Cython is super fast and easy to setup, programming contests can make Cython available along with CPython to allow the contestants make their programs faster and get along with Java / C++. It will make Python an attractive choice for serious problem solving.

I know the Cython language is not exactly Python. It is a superset of the Python language. So beginners might not be familiar with the language and that’s alright. Beginners can start with Python and start solving the easier problems with Python. When they start competitive programming and start hitting the time limits, then Cython is one of the options they can choose to make their code run faster. Of course Cython needs some understanding of how C works - that’s fine too because Cython still feels more productive than writing plain old C or C++.

Final words

PyPy is already quite popular in the Python community. Dropbox and Microsoft are also working on their Python JITs. I believe that someday Python JITs would be as fast as Java / C++. Today, Python is making programming fun for many beginners. I hope with Cython, we can worry less about the time limits and accept Python as a fitting tool in our competitive programming contests!

Introduction to Django Channels

Sun, 25 Sep 2016 21:27:34 +0600

Django is a brilliant web framework. In fact it is my most favourite one for various reasons. An year and a half ago, I switched to Python and Django for all my web development. I am a big fan of the eco system and the many third party packages. Particularly I use Django REST Framework whenever I need to create APIs. Having said that, Django was more than good enough for basic HTTP requests. But the web has changed. We now have HTTP/2 and web sockets. Django could not support them well in the past. For the web socket part, I usually had to rely on Tornado or NodeJS (with the excellent Socket.IO library). They are good technologies but most of my web apps being in Django, I really wished there were something that could work with Django itself. And then we had Channels. The project is meant to allow Django to support HTTP/2, websockets or other protocols with ease.

Concepts

The underlying concept is really simple - there are channels and there are messages, there are producers and there are consumers - the whole system is based on passing messages on to channels and consuming/responding to those messages.

Let’s look at the core components of Django Channels first:

channel - A channel is a FIFO queue like data structure. We can have many channels depending on our need.
message - A message contains meaningful data for the consumers. Messages are passed on to the channels.
consumer - A consumer is usually a function that consumes a message and take actions.
interface server - The interface server knows how to handle different protocols. It works as a translator or a bridge between Django and the outside world.

How does it work?

A http request first comes to the Interface Server which knows how to deal with a specific type of request. For example, for websockets and http, Daphne is a popular interface server. When a new http/websocket request comes to the interface server (daphne in our case), it accepts the request and transforms it into a message. Then it passes the message to the appropriate channel. There are predefined channels for specific types. For example, all http requests are passed to http.request channel. For incoming websocket messages, there is websocket.receive. So these channels receive the messages when the corresponding type of requests come in to the interface server.

Now that we have channels getting filled with messages, we need a way to process these messages and take actions (if necessary), right? Yes! For that we write some consumer functions and register them to the channels we want. When messages come to these channels, the consumers are called with the message. They can read the message and act on them.

So far, we have seen how we can read an incoming request. But like all web applications, we should write something back too, no? How do we do that? As it happens, the interface server is quite clever. While transforming the incoming request into a message, it creates a reply channel for that particular client request and registers itself to that channel. Then it passes the reply channel along with the message. When our consumer function reads the incoming message, it can pass a response to the reply channel attached with the message. Our interface server is listenning to that reply channel, remember? So when a response is sent back to the reply channel, the interface server grabs the message, transforms it into a http response and sends back to the client. Simple, no?

Writing a Websocket Echo Server

Enough with the theories, let’s get our hands dirty and build a simple echo server. The concept is simple. The server accepts websocket connections, the client writes something to us, we just echo it back. Plain and simple example.

Install Django & Channels

pip install channels

That should do the trick and install Django + Channels. Channels has Django as a depdency, so when you install channels, Django comes with it.

Create An App

Next we create a new django project and app -

django-admin.py startproject djchan

cd djchan

python manage.py startapp realtime

Configure `INSTALLED_APPS`

We have our Django app ready. We need to add channels and our django app (realtime) to the INSTALLED_APPS list under settings.py. Let’s do that:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    "channels",
    "realtime"
]

Write our Consumer

After that, we need to start writing a consumer function that will process the incoming websocket messages and send back the response:

# consumers.py 
def websocket_receive(message):
    text = message.content.get('text')
    if text:
        message.reply_channel.send({"text": "You said: {}".format(text)})

The code is simple enough. We receieve a message, get it’s text content (we’re expecting that the websocket connection will send only text data for this exmaple) and then push it back to the reply_channel - just like we planned.

Channels Routing

We have our consume function ready, now we need to tell Django how to route messages to our consumer. Just like URL routing, we need to define our channel routings.

# routing.py
from channels.routing import route
from .consumers import websocket_receive
 
channel_routing = [
    route("websocket.receive", websocket_receive, path=r"^/chat/"),
]

The code should be self explanatory. We have a list of route objects. Here we select the channel name (websocket.receive => for receieving websocket messages), pass the consumer function and then configure the optional path. The path is an interesting bit. If we didn’t pass a value for it, the consumer will get all the messages in the websocket.receive channel on any URL. So if someone created a websocket connection to / or /private or /user/1234 - regardless of the url path, we would get all incoming messages. But that’s not our intention, right? So we restrict the path to /chat so only connections made to that url are handled by the consumer. Please note the beginning /, unlike url routing, in channels, we have to use it.

Configuring The Channel Layers

We have defined a consumer and added it to a routing table. We’re more or less ready. There’s just a final bit of configuration we need to do. We need to tell channels two things - which backend we want to use and where it can find our channel routing.

Let’s briefly talk about the backend. The messages and the channels - Django needs some sort of data store or message queue to back this system. By default Django can use in memory backend which keeps these things in memory but if you consider a distributed app, for scaling large, you need something else. Redis is a popular and proven piece of technology for these kinds of scenarios. In our case we would use the Redis backend.

So let’s install that:

pip install asgi_redis

And now we put this in our settings.py:

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "asgi_redis.RedisChannelLayer",
        "CONFIG": {
            "hosts": [("localhost", 6379)],
        },
        "ROUTING": "realtime.routing.channel_routing",
    },
}

Running The Servers

Make sure that Redis is running (usually redis-server should run it). Now run the django app:

python manage.py runserver

In local environment, when you do runserver - Django launches both the interface server and necessary background workers (to run the consumer functions in the background). But in production, we should run the workers seperately. We will get to that soon.

Trying it Out!

Once our dev server starts up, let’s open up the web app. If you haven’t added any django views, no worries, you should still see the “It Worked!” welcome page of Django and that should be fine for now. We need to test our websocket and we are smart enough to do that from the dev console. Open up your Chrome Devtools (or Firefox | Safari | any other browser’s dev tools) and navigate to the JS console. Paste the following JS code:


socket = new WebSocket("ws://" + window.location.host + "/chat/");
socket.onmessage = function(e) {
    alert(e.data);
}
socket.onopen = function() {
    socket.send("hello world");
}

If everything worked, you should get an alert with the message we sent. Since we defined a path, the websocket connection works only on /chat/. Try modifying the JS code and send a message to some other url to see how they don’t work. Also remove the path from our route and see how you can catch all websocket messages from all the websocket connections regardless of which url they were connected to. Cool, no?

Our Custom Channels

We have seen that certain protocols have predefined channels for various purposes. But we are not limited to those. We can create our own channels. We don’t need to do anything fancy to initialize a new channel. We just need to mention a name and send some messages to it. Django will create the channel for us.

Channel("thumbnailer").send({
        "image_id": image.id
    })

Of course we need corresponding workers to be listenning to those channels. Otherwise nothing will happen. Please note that besides working with new protocols, Channels also allow us to create some sort of message based task queues. We create channels for certain tasks and our workers listen to those channels. Then we pass the data to those channels and the workers process them. So for simpler tasks, this could be a nice solution.

Scaling Production Systems

Running Workers Seperately

On a production environment, we would want to run the workers seperately (since we would not run runserver on production anyway). To run the background workers, we have to run this command:

python manage.py runworker

ASGI & Daphne

In our local environment, the runserver command took care of launching the Interface server and background workers. But now we have to run the interface server ourselves. We mentioned Daphne already. It works with the ASGI standard (which is commonly used for HTTP/2 and websockets). Just like wsgi.py, we now need to create a asgi.py module and configure it.

import os
from channels.asgi import get_channel_layer

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "djchan.settings")

channel_layer = get_channel_layer()

Now we can run the server:

daphne djchan.asgi:channel_layer

If everything goes right, the interface server should start running!

ASGI or WSGI

ASGI is still new and WSGI is a battle tested http server. So you might still want to keep using wsgi for your http only parts and asgi for the parts where you need channels specific features.

The popular recommendation is that you should use nginx or any other reverse proxies in front and route the urls to asgi or uwsgi depending on the url or Upgrade: WebSocket header.

Retries and Celery

The Channels system does not gurantee delivery. If there are tasks which needs the certainity, it is highly recommended to use a system like Celery for these parts. Or we can also roll our own checks and retry logic if we feel like that.

masnun.rocks()

Interfaces in Python: Protocols and ABCs

Informal Interfaces: Protocols / Duck Typing

Formal Interfaces: ABCs

ABCs and Virtual Subclass

Further reading

Django REST Framework: Using the request object

ModelViewSet - Filtering based on request

Serializers - Modifying Response based on request

Using request in Serializer Fields

Django Admin: Expensive COUNT(*) Queries

Issue - 1

Issue - 2

Django Channels: Using Custom Channels

Our Use Case

Challenges

Let’s see the codes!

Prioritizing / Scaling Channels

Concluding Thoughts

Exploring Asyncio - uvloop, sanic and motor

What is Asyncio? Why Should I Care?

What about uvloop?

Sanic - Gotta go fast!

Motor

Setting Up

Set uvloop as the event loop

Connecting to Mongodb

Request Handlers

Running the App

Final Code

Trying Out

Deploying Django Channels using Daphne

What do we need?

Setup Redis and Configure App

Starting Daphne

Running Workers

Nginx Conf

Getting Help Interactively in Python

The dir built-in

The help built-in

Using the pydoc Module

The inspect Module

Async Python: The Different Forms of Concurrency

Defining The Terms

Sync vs Async

Concurrency and Parallelism

Quick Recap

Threads & Processes

Threads

Global Interpreter Lock (GIL)

Processes

The concurrent.futures module

Asyncio - Why, What and How?

Why do we need asyncio?

What is asyncio?

How do we use asyncio?

Making the Right Choice

Creating an executable file using Cython

Embedding the Python Interpreter

Getting Started

Dynamic Linking

Including Other Modules

Can Cython make Python Great in Programming Contests?

Python in Programming Contests

Introduction to Cython

Cython is Fast

Cython is easy to Setup

Cython in Programming Contests

Final words

Introduction to Django Channels

Concepts

How does it work?

Writing a Websocket Echo Server

Install Django & Channels

Create An App

Configure INSTALLED_APPS

Write our Consumer

Channels Routing

Configuring The Channel Layers

Running The Servers

ModelViewSet - Filtering based on `request`

Serializers - Modifying Response based on `request`

Using `request` in Serializer Fields

Set `uvloop` as the event loop

The `dir` built-in

The `help` built-in

Using the `pydoc` Module

The `inspect` Module

The `concurrent.futures` module

Configure `INSTALLED_APPS`