masnun.rocks() http://masnun.rocks/tags/python/index.xml Recent content on masnun.rocks() Hugo -- gohugo.io en-us Interfaces in Python: Protocols and ABCs http://masnun.rocks/2017/04/15/interfaces-in-python-protocols-and-abcs/ Sat, 15 Apr 2017 15:55:18 +0600 http://masnun.rocks/2017/04/15/interfaces-in-python-protocols-and-abcs/ <p>The idea of interface is really simple - it is the description of how an object behaves. An interface tells us what an object can do to play it&rsquo;s role in a system. In object oriented programming, an interface is a set of publicly accessible methods on an object which can be used by other parts of the program to interact with that object. Interfaces set clear boundaries and help us organize our code better. In some langauges like Java, interfaces are part of the language syntax and strictly enforced. However, in Python, things are a little different. In this post, we will explore how interfaces can be implemented in Python.</p> <h2 id="informal-interfaces-protocols-duck-typing">Informal Interfaces: Protocols / Duck Typing</h2> <p>There&rsquo;s no <code>interface</code> keyword in Python. The Java / C# way of using interfaces is not available here. In the dynamic language world, things are more implicit. We&rsquo;re more focused on how an object behaves, rather than it&rsquo;s type/class.</p> <blockquote> <p>If it talks and walks like a duck, then it is a duck</p> </blockquote> <p>So if we have an object that can fly and quack like a duck, we consider it as a duck. This called &ldquo;Duck Typing&rdquo;. In runtime, instead of checking the type of an object, we try to invoke a method we expect the object to have. If it behaves the way we expected, we&rsquo;re fine and move along. But if it doesn&rsquo;t, things might blow up. To be safe, we often handle the exceptions in a <code>try..except</code> block or use <code>hasattr</code> to check if an object has the specific method.</p> <p>In the Python world, we often hear &ldquo;file like object&rdquo; or &ldquo;an iterable&rdquo; - if an object has a <code>read</code> method, it can be treated as a file like object, if it has an <code>__iter__</code> magic method, it is an iterable. So any object, regardless of it&rsquo;s class/type, can conform to a certain interface just by implementing the expected behavior (methods). These informal interfaces are termed as <strong>protocols</strong>. Since they are informal, they can not be formally enforced. They are mostly illustrated in the documentations or defined by convention. All the cool magic methods you have heard about - <code>__len__</code>, <code>__contains__</code>, <code>__iter__</code> - they all help an object to conform to some sort of protocols.</p> <pre><code class="language-python">class Team: def __init__(self, members): self.__members = members def __len__(self): return len(self.__members) def __contains__(self, member): return member in self.__members justice_league_fav = Team([&quot;batman&quot;, &quot;wonder woman&quot;, &quot;flash&quot;]) # Sized protocol print(len(justice_league_fav)) # Container protocol print(&quot;batman&quot; in justice_league_fav) print(&quot;superman&quot; in justice_league_fav) print(&quot;cyborg&quot; not in justice_league_fav) </code></pre> <p>In our above example, by implementing the <code>__len__</code> and <code>__contains__</code> method, we can now directly use the <code>len</code> function on a <code>Team</code> instance and check for membership using the <code>in</code> operator. If we add the <code>__iter__</code> method to implement the iterable protocol, we would even be able to do something like:</p> <pre><code class="language-python"> for member in justice_league_fav: print(member) </code></pre> <p>Without implementing the <code>__iter__</code> method, if we try to iterate over the team, we will get an error like:</p> <pre><code>TypeError: 'Team' object is not iterable </code></pre> <p>So we can see that protocols are like informal interfaces. We can implement a protocol by implementing the methods expected by it.</p> <h2 id="formal-interfaces-abcs">Formal Interfaces: ABCs</h2> <p>While protocols work fine in many cases, there are situations where informal interfaces or duck typing in general can cause confusion. For example, a <code>Bird</code> and <code>Aeroplane</code> both can <code>fly()</code>. But they are not the same thing even if they implement the same interfaces / protocols. <strong>Abstract Base Classes</strong> or <strong>ABCs</strong> can help solve this issue.</p> <p>The concept behind ABCs is simple - we define base classes which are abstract in nature. We define certain methods on the base classes as abstract methods. So any objects deriving from these bases classes are forced to implement those methods. And since we&rsquo;re using base classes, if we see an object has our class as a base class, we can say that this object implements the interface. That is now we can use types to tell if an object implements a certain interface. Let&rsquo;s see an example.</p> <pre><code class="language-python">import abc class Bird(abc.ABC): @abc.abstractmethod def fly(self): pass </code></pre> <p>There&rsquo;s the <code>abc</code> module which has a metaclass named <code>ABCMeta</code>. ABCs are created from this metaclass. So we can either use it directly as the metaclass of our ABC (something like this - <code>class Bird(metaclass=abc.ABCMeta):</code>) or we can subclass from the <code>abc.ABC</code> class which has the <code>abc.ABCMeta</code> as it&rsquo;s metaclass already.</p> <p>Then we have to use the <code>abc.abstractmethod</code> decorator to mark our methods abstract. Now if any class derives from our base <code>Bird</code> class, it must implement the <code>fly</code> method too. The following code would fail:</p> <pre><code class="language-python">class Parrot(Bird): pass p = Parrot() </code></pre> <p>We see the following error:</p> <pre><code>TypeError: Can't instantiate abstract class Parrot with abstract methods fly </code></pre> <p>Let&rsquo;s fix that:</p> <pre><code class="language-python"> class Parrot(Bird): def fly(self): print(&quot;Flying&quot;) p = Parrot() </code></pre> <p>Also note:</p> <pre><code class="language-python">&gt;&gt;&gt; isinstance(p, Bird) True </code></pre> <p>Since our parrot is recognized as an instance of <code>Bird</code> ABC, we can be sure from it&rsquo;s type that it definitely implements our desired interface.</p> <p>Now let&rsquo;s define another ABC named <code>Aeroplane</code> like this:</p> <pre><code class="language-python">class Aeroplane(abc.ABC): @abc.abstractmethod def fly(self): pass class Boeing(Aeroplane): def fly(self): print(&quot;Flying!&quot;) b = Boeing() </code></pre> <p>Now if we compare:</p> <pre><code class="language-python"> &gt;&gt;&gt; isinstance(p, Aeroplane) False &gt;&gt;&gt; isinstance(b, Bird) False </code></pre> <p>We can see even though both objects have the same method <code>fly</code> but we can now differentiate easily which one implements the <code>Bird</code> interface and which implements the <code>Aeroplane</code> interface.</p> <p>We saw how we can create our own, custom ABCs. But it is often discouraged to create custom ABCs and rather use/subclass the built in ones. The Python standard library has many useful ABCs that we can easily reuse. We can get a list of useful built in ABCs in the <code>collections.abc</code> module - <a href="https://docs.python.org/3/library/collections.abc.html#module-collections.abc">https://docs.python.org/3/library/collections.abc.html#module-collections.abc</a>. Before writing your own, please do check if there&rsquo;s an ABC for the same purpose in the standard library.</p> <h2 id="abcs-and-virtual-subclass">ABCs and Virtual Subclass</h2> <p>We can also register a class as a <em>virtual subclass</em> of an ABC. In that case, even if that class doesn&rsquo;t subclass our ABC, it will still be treated as a subclass of the ABC (and thus accepted to have implemented the interface). Example codes will be able to demonstrate this better:</p> <pre><code class="language-python">@Bird.register class Robin: pass r = Robin() </code></pre> <p>And then:</p> <pre><code class="language-python">&gt;&gt;&gt; issubclass(Robin, Bird) True &gt;&gt;&gt; isinstance(r, Bird) True &gt;&gt;&gt; </code></pre> <p>In this case, even if <code>Robin</code> does not subclass our ABC or define the abstract method, we can <code>register</code> it as a <code>Bird</code>. <code>issubclass</code> and <code>isinstance</code> behavior can be overloaded by adding two relevant magic methods. Read more on that here - <a href="https://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass">https://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass</a></p> <h2 id="further-reading">Further reading</h2> <ul> <li><a href="https://www.python.org/dev/peps/pep-3119/">PEP 3119 &ndash; Introducing Abstract Base Classes</a></li> <li><a href="https://pymotw.com/3/abc/">abc module on PyMOTW</a></li> <li><a href="https://docs.python.org/3/library/abc.html">abc module docs</a></li> </ul> Django REST Framework: Using the request object http://masnun.rocks/2017/03/27/django-rest-framework-using-request-object/ Mon, 27 Mar 2017 12:49:01 +0600 http://masnun.rocks/2017/03/27/django-rest-framework-using-request-object/ <p>While working with <a href="http://www.django-rest-framework.org/">Django REST Framework</a> aka DRF, we often wonder how to customize our response based on request parameters. May be we want to check something against the logged in user (<code>request.user</code>) ? Or may be we want to modify part of our response based on a certain request parameter? How do we do that? We will discuss a few use cases below.</p> <h2 id="modelviewset-filtering-based-on-request">ModelViewSet - Filtering based on <code>request</code></h2> <p>This is very often required while using <code>ModelViewSet</code>s. We have many <code>Item</code>s in our database. But when listing them, we only want to display the items belonging to the current logged in user.</p> <pre><code class="language-python">from rest_framework.permissions import IsAuthenticated class ItemViewSet(ModelViewSet): permission_classes = (IsAuthenticated,) serializer_class = ItemSerializer def get_queryset(self): queryset = Item.objects.all().filter(user=request.user) another_param = self.request.GET.get('another_param') if another_param: queryset = queryset.filter(another_field=another_param) return queryset </code></pre> <p>If you are using the awesome <code>ModelViewSet</code>, you can override the <code>get_queryset</code> method. Inside it, you can access the <code>request</code> object as <code>self.request</code>. In the above example, we are only listing the items which has our current user set as their <code>user</code> field. At the same time, we are also filtering the queryset based on another parameter. Basically you have the queryset and <code>self.request</code> available to you, feel free to use your imagination to craft all the queries you need!</p> <h2 id="serializers-modifying-response-based-on-request">Serializers - Modifying Response based on <code>request</code></h2> <p>What if we don&rsquo;t want to display <code>item_count</code> for the users by default? What if we only want to display that field when a request parameter, <code>show_count</code> is set? We can override the serializer to do that.</p> <pre><code class="language-python">class UserSerializer(ModelSerializer): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) context = kwargs.get('context', None) if context: request = kwargs['context']['request'] show_count = request.GET.get('show_count') if show_count: self.fields['item_count'] = IntegerField(source=&quot;item_count&quot;) </code></pre> <p>When Serializers are constructed by DRF, it gets the <code>request</code> in the <code>context</code>. So we should always check if it exists and use it as needed. We can override the serializer fields by accessing <code>self.fields</code>.</p> <p>Please note: The <code>request</code> object will be passed only if DRF constructs the serializer for you, for example when you just pass the <code>serializer_class</code> to a <code>ModelViewSet</code>. But if you are using the Serializer in your custom views, please do remember to pass the request manually, otherwise it won&rsquo;t work.</p> <pre><code class="language-python">item_serializer = ItemSerializer(item, context={&quot;request&quot;: request}) </code></pre> <p>In our case we have just used <code>IntegerField</code>. You can of course use another serializer to embed the full data of a related field.</p> <h2 id="using-request-in-serializer-fields">Using <code>request</code> in Serializer Fields</h2> <p>Serializer fields have <code>context</code> too!</p> <pre><code class="language-python">class ShortURLField(ReadOnlyField): def to_representation(self, value): return self.context['request'].build_absolute_uri(value) </code></pre> <p>and here&rsquo;s the serializer:</p> <pre><code class="language-py">class URLSerializer(ModelSerializer): short_url = ShortURLField() class Meta: model = URL fields = &quot;__all__&quot; </code></pre> <p>In the <code>URL</code> model, there is a method named <code>short_url</code> that returns a slug for that url. In our custom <code>ShortURLField</code>, we have customized the <code>to_representation</code> method to use the <code>build_absolute_uri(value)</code> method on current request for creating the full url from the slug.</p> Django Admin: Expensive COUNT(*) Queries http://masnun.rocks/2017/03/20/django-admin-expensive-count-all-queries/ Mon, 20 Mar 2017 22:43:59 +0600 http://masnun.rocks/2017/03/20/django-admin-expensive-count-all-queries/ <p>If you are a Django developer, it is very likely that you use the Django Admin regularly. And if you have maintained a website with a huge amount of data, you probably already know that Django Admin can become very slow when the database table gets so large. If you log the SQL queries (either using Django logging or using Django Debug Toolbar), you would notice a very expensive SQL query, something like this:</p> <pre><code class="language-SQL">SELECT COUNT(*) AS &quot;__count&quot; FROM &quot;table_name&quot; </code></pre> <p>In the default settings, you will actually notice this query twice. If you use Django Debug Toolbar, it will tell you that the query was duplicated 2 times.</p> <h3 id="issue-1">Issue - 1</h3> <p>By default <code>ModelAdmin</code> has <code>show_full_result_count = True</code> which shows the full result count in the admin interface. This is the source of one of the <code>count(*)</code> queries.</p> <p>To fix that, we just need to set this on our <code>ModelAdmin</code>:</p> <pre><code class="language-Python">show_full_result_count = False </code></pre> <h3 id="issue-2">Issue - 2</h3> <p>Even after switching <code>show_full_result_count</code> off, we are still noticing a <code>count(*)</code> query in the log. It&rsquo;s because the Django Paginator does a count itself.</p> <p>The solution is to somehow bypass the expensive query while still returning a number so the pagination works as expected. We can cache the count value or even run raw SQL query find an approximate value through a rather inexpensive lookup somewhere else.</p> <p>Here&rsquo;s a quick example of a paginator that runs the expensive query once and then caches the results:</p> <pre><code class="language-Python">from django.core.paginator import Paginator from django.core.cache import cache # Modified version of a GIST I found in a SO thread class CachingPaginator(Paginator): def _get_count(self): if not hasattr(self, &quot;_count&quot;): self._count = None if self._count is None: try: key = &quot;adm:{0}:count&quot;.format(hash(self.object_list.query.__str__())) self._count = cache.get(key, -1) if self._count == -1: self._count = super().count cache.set(key, self._count, 3600) except: self._count = len(self.object_list) return self._count count = property(_get_count) </code></pre> <p>Now on our <code>ModelAdmin</code> we just need to use this paginator.</p> <pre><code class="language-Python">paginator = CachingPaginator </code></pre> <p>Once we have done that, it will be slow when we first time load the page and it will be faster afterwards. We can also fetch and cache this value from time to time. This solution might not get us the exact count and thus mess up pagination sometimes but in most cases that would not be much of a problem.</p> Django Channels: Using Custom Channels http://masnun.rocks/2016/11/27/django-channels-using-custom-channels/ Sun, 27 Nov 2016 07:48:51 +0600 http://masnun.rocks/2016/11/27/django-channels-using-custom-channels/ <p>In my earlier blog post - <a href="http://masnun.rocks/2016/09/25/introduction-to-django-channels/">Introduction to Django Channels</a>, I mentioned that we can create our own channels for various purposes. In this blog post, we would discuss where custom channels can be useful, what could be the challenges and of course we would see some code examples. But before we begin, please make sure you are familiar with the concepts of Django Channels. I would recommend going through the above mentioned post and the official docs to familiarize yourself with the basics.</p> <h3 id="our-use-case">Our Use Case</h3> <p>Channels is just a queue which has consumers (workers) listenning to it. With that concept in mind, we might be able to think of many innovative use cases a queue could have. But in our example, we will keep the idea simple. We are going to use Channels as a means of background task processing.</p> <p>We will create our own channels for different tasks. There will be consumers waiting for messages on these channels. When we want to do something in the background, we would pass it on the appropriate channels &amp; the workers will take care of the tasks. For example, we want to create a thumbnail of an user uploaded photo? We pass it to the <code>thumbnails</code> channel. We want to send a confirmation email, we send it to the <code>welcome_email</code> channel. Like that. If you are familiar with Celery or Python RQ, this would sound pretty familiar to you.</p> <p>Now here&rsquo;s my use case - in one of the projects I am working on, we&rsquo;re building APIs for mobile applications. We use BrainTree for payment integration. The mobile application sends a <code>nonce</code> - it&rsquo;s like a token that we can use to initiate the actual transaction. The transaction has two steps - first we initiate it using the nonce and I get back a transaction id. Then I query whether the transaction succeeded or failed. I felt it would be a good idea to process this in the background. We already have a websocket end point implemented using Channels. So I thought it would be great to leverage the existing setup instead of introducing something new in the stack.</p> <h3 id="challenges">Challenges</h3> <p>It has so far worked pretty well. But we have to remember that Channels does not gurantee delivery of the messages and there is no retrying if a message fails. So we wrote a custom management command that checks the orders for any records that have the nonce set but no transaction id or there is transaction id but there is no final result stored. We then scheduled this command to run at a certain interval and queue up the unfinished/incomplete orders again. In our case, it doesn&rsquo;t hurt if the orders need some 5 to 10 minutes to process.</p> <p>But if we were working on a product where the message delivery was time critical for our business, we probably would have considered Celery for the background processing part.</p> <h3 id="let-s-see-the-codes">Let&rsquo;s see the codes!</h3> <p>First we needed to write a handler. The hadler would receive the messages on the subscribed channel and process them. Here&rsquo;s the handler:</p> <pre><code class="language-python">def braintree_process(message): order_data = message.content.get('order') order_id = message.content.get('order_id') order_instance = Order.objects.get(pk=order_id) if order_data: nonce = order_data.get(&quot;braintree_nonce&quot;) if nonce: # [snipped] TRANSACTION_SUCCESS_STATUSES = [ braintree.Transaction.Status.Authorized, braintree.Transaction.Status.Authorizing, braintree.Transaction.Status.Settled, braintree.Transaction.Status.SettlementConfirmed, braintree.Transaction.Status.SettlementPending, braintree.Transaction.Status.Settling, braintree.Transaction.Status.SubmittedForSettlement ] result = braintree.Transaction.sale({ 'amount': str(order_data.get('total')), 'payment_method_nonce': nonce, 'options': { &quot;submit_for_settlement&quot;: True } }) if result.is_success or result.transaction: transaction = braintree.Transaction.find(result.transaction.id) if transaction.status in TRANSACTION_SUCCESS_STATUSES: # [snipped] else: # [snipped] else: errors = [] for x in result.errors.deep_errors: errors.append(str(x.code)) # [snipped] </code></pre> <p>Then we needed to define a routing so the messages on a certain channel is passed on to this handler. So in our channel routing, we added this:</p> <pre><code class="language-python">from channels.routing import route from .channel_handlers import braintree_process channel_routing = [ route(&quot;braintree_process&quot;, braintree_process), # [snipped] ... ] </code></pre> <p>We now have a routing set and a handler ready to accept messages. So we&rsquo;re ready! All we need to do is to start passing the data to this channel.</p> <p>When the API receives a <code>nonce</code>, it just passes the order details to this channel:</p> <pre><code class="language-python">Channel(&quot;braintree_process&quot;).send({ &quot;order&quot;: data, &quot;order_id&quot;: order.id }) </code></pre> <p>And then the workers start working. They accept the message and then starts processing the payment request.</p> <p>In our case, we already had the workers running (since they were serving our websocket requests). If you don&rsquo;t have any workers running, don&rsquo;t forget to run them.</p> <pre><code>python manage.py runworker </code></pre> <p>If you are wondering about how to deploy channels, I have you covered - <a href="http://masnun.rocks/2016/11/02/deploying-django-channels-using-daphne/">Deploying Django Channels using Daphne</a></p> <h3 id="prioritizing-scaling-channels">Prioritizing / Scaling Channels</h3> <p>In our project, Django Channels do two things - handling websocket connections for realtime communication, process delayed jobs in background. As you can probably guess, the realtime part is more important. In our current setup, the running workers handle both types of requests as they come. But we want to dedicate more workers to the websocket and perhaps just one worker should keep processing the payments.</p> <p>Luckily, we can limit our workers to certain channels using the <code>--only-channels</code> flag. Or alternatively we can exclude certain channels by using the <code>--exclude-channels</code> flags.</p> <h3 id="concluding-thoughts">Concluding Thoughts</h3> <p>I personally find the design of channels very straightforward, simple and easy to reason about. When Channels get merged into Django, it&rsquo;s going to be quite useful, not just for implementing http/2 or websockets, but also as a way to process background tasks with ease and without introducing third party libraries.</p> Exploring Asyncio - uvloop, sanic and motor http://masnun.rocks/2016/11/17/exploring-asyncio-uvloop-sanic-motor/ Thu, 17 Nov 2016 03:33:38 +0600 http://masnun.rocks/2016/11/17/exploring-asyncio-uvloop-sanic-motor/ <p>The <code>asyncio</code> package was introduced in the standard library from Python 3.4. The package is still in provisional stage, that is backward compatibility can be broken with future changes. However, the Python community is pretty excited about it and I know personally that many people have started using it in production. So, I too decided to try it out. I built a rather simple micro service using the excellent <code>sanic</code> framework and <code>motor</code> (for accessing mongodb). <code>uvloop</code> is an alternative event loop implementation written in Cython on top of libuv and can be used as a drop in replacement for asyncio&rsquo;s event loop. Sanic uses <code>uvloop</code> behind the scene to go fast.</p> <p>In this blog post, I would quickly introduce the technologies involved and then walk through some sample code with relevant explanations.</p> <h3 id="what-is-asyncio-why-should-i-care">What is Asyncio? Why Should I Care?</h3> <p>In one of my earlier blog post - <a href="http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/">Async Python: The Different Forms of Concurrency</a>, I have tried to elaborate on the different ways to achieve concurrency in the Python land. In the last part of the post, I have tried to explain what asyncio brings new to the table.</p> <p>Asyncio allows us to write asynchronous, concurrent programs running on a single thread, using an event loop to schedule tasks and multiplexing I/O over sockets (and other resources). The one line explanation might be a little complex to comprehend at a glance. So I will break it down. In asyncio, everything runs on a single thread. We use coroutines which can be treated as small units of task that we can pause and resume. Then there is I/O multiplexing - when our tasks are busy waiting for I/O, an event loop pauses them and allows other tasks to run. When the paused tasks finish I/O, the event loop resumes them. This way even a single thread can handle / serve a large number of connections / clients by effectively juggling between &ldquo;active&rdquo; tasks and tasks that are waiting for some sort of I/O.</p> <p>In general synchronous style, for example, when we&rsquo;re using thread based concurrency, each client will occupy a thread and when we have a large number of connections, we will soon run out of threads. Though not all of those threads were active at a given time, some might have been simply waiting for I/O, doing nothing. Asyncio helps us solve this problem and provides an efficient solution to the concurrency problem.</p> <p>While Twisted, Tornado and many other solutions have existed in the past, NodeJS brought huge attention to this kind of solution. And with Asyncio being in the standard library, I believe it will become the standard way of doing async I/O in the Python world over time.</p> <h3 id="what-about-uvloop">What about uvloop?</h3> <p>We talked about event loop above. It schedules the tasks and deals with various events. It also manages the I/O multiplexing using the various options offered by the operating system. In simple words - the event loop is very critical and the central part of the whole asyncio operations. The <code>asyncio</code> package ships with an event loop by default. But we can also swap it for our custom implementations if we need/prefer. <code>uvloop</code> is one such event loop that is very very fast. The key to it&rsquo;s success could be partially attributed to Cython. Cython allows us to write codes in Python like syntax while the codes perform like C. <code>uvloop</code> was written in Cython and it uses the famous <code>libuv</code> library (also used by NodeJS).</p> <p>If you are wondering if <code>uvloop</code>&rsquo;s performances are good enough reason to swap out the default event loop, you may want to read this aricle here - <a href="https://magic.io/blog/uvloop-blazing-fast-python-networking/">uvloop: Blazing fast Python networking</a> or you can just look at this following chart taken from that blog post:</p> <p><img src="http://i.imgur.com/0iMUePy.png" /></p> <p>Yes, it can go faster than NodeJS and catch up to Golang. Convinced yet? Let&rsquo;s talk about Sanic!</p> <h3 id="sanic-gotta-go-fast">Sanic - Gotta go fast!</h3> <p>Sanic was inspired by the above article I talked about. They used <code>uvloop</code> and <code>httptools</code> too (referenced in the article). The framework provides a nice, Flask like syntax along with the <code>async / await</code> syntax from Python 3.5.</p> <p><strong>Please Note:</strong> <code>uvloop</code> still doesn&rsquo;t work on Windows properly. Sanic uses the default asyncio event loop if uvloop is not available. But this probably doesn&rsquo;t matter because in most cases we deploy to linux machines anyway. Just in case you want to try out the performance gains on Windows, I recommend you use a VM to test it inside a Linux machine.</p> <h3 id="motor">Motor</h3> <p>Motor started off as an async mongodb driver for Tornado. Motor = <strong>Mo</strong>ngodb + <strong>Tor</strong>nado. But Motor now has pretty nice support for asyncio. And of course we can use the <code>async / await</code> syntax too.</p> <p>I guess we have had brief introductions to the technologies we are going to use. So let&rsquo;s get started with the actual work.</p> <h3 id="setting-up">Setting Up</h3> <p>We need to install <code>sanic</code> and <code>motor</code> using <code>pip</code>.</p> <pre><code>pip install sanic pip install motor </code></pre> <p>Sanic should also install it&rsquo;s dependencies including <code>uvloop</code> and <code>ujson</code> along with others.</p> <h3 id="set-uvloop-as-the-event-loop">Set <code>uvloop</code> as the event loop</h3> <p>We will swap out the default event loop and use <code>uvloop</code> instead.</p> <pre><code class="language-python">import asyncio import uvloop asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()) </code></pre> <p>Simple as that. We import asyncio and uvloop. We set the event loop policy to uvloop&rsquo;s event loop policy and we&rsquo;re done. Now asyncio will use uvloop as the default event loop.</p> <h3 id="connecting-to-mongodb">Connecting to Mongodb</h3> <p>We will be using <code>motor</code> to connect to our mongodb. Just like this:</p> <pre><code class="language-python">from motor.motor_asyncio import AsyncIOMotorClient mongo_connection = AsyncIOMotorClient(&quot;&lt;mongodb connection string&gt;&quot;) contacts = mongo_connection.mydatabase.contacts </code></pre> <p>We import the <code>AsyncIOMotorClient</code> and pass our mongodb connection string to it. We also point to our target collection using a name / variable so that we can easily (and directly) use that collection later. Here <code>mydatabase</code> is the db name and <code>contacts</code> is the collection name.</p> <h3 id="request-handlers">Request Handlers</h3> <p>Now we will dive right in and write our request handlers. For our demo application, I will create two routes. One for listing the contacts and one for creating new ones. But first we must instantiate sanic.</p> <pre><code class="language-python">from sanic import Sanic from sanic.response import json app = Sanic(__name__) </code></pre> <p>Flask-like, remember? Now that we have the <code>app</code> instance, let&rsquo;s add routes to it.</p> <pre><code class="language-python">@app.route(&quot;/&quot;) async def list(request): data = await contacts.find().to_list(20) for x in data: x['id'] = str(x['_id']) del x['_id'] return json(data) @app.route(&quot;/new&quot;) async def new(request): contact = request.json insert = await contacts.insert_one(contact) return json({&quot;inserted_id&quot;: str(insert.inserted_id)}) </code></pre> <p>The routes are simple and for the sake of brevity, I haven&rsquo;t written any error handling codes. The <code>list</code> function is <code>async</code>. Inside it we <code>await</code> our contacts to arrive from the database, as a list of 20 entries. In a sync style, we would use the <code>find</code> method directly but now we <code>await</code> it.</p> <p>After we have the results, we quickly iterate over the documents and add <code>id</code> key and remove the <code>_id</code> key. The <code>_id</code> key is an instance of <code>ObjectId</code> which would need us to use the <code>bson</code> package for serialization. To avoid complexity here, we just convert the id to string and then delete the ObjectId instance. The rest of the document is usual string based key-value pairs (<code>dict</code>). So it should serialize fine.</p> <p>In the <code>new</code> function, we grab the incoming json payload and pass it to the <code>insert_one</code> method directly. <code>request.json</code> would contain the <code>dict</code> representation of the json request. Check out <a href="https://github.com/channelcat/sanic/blob/master/docs/request_data.md">this page</a> for other request data available to you. Here, we again <code>await</code> the <code>insert_one</code> call. When the response is available, we take the <code>inserted_id</code> and send a response back.</p> <h3 id="running-the-app">Running the App</h3> <p>Let&rsquo;s see the code first:</p> <pre><code class="language-python">loop = asyncio.get_event_loop() app.run(host=&quot;0.0.0.0&quot;, port=8000, workers=3, debug=True, loop=loop) </code></pre> <p>Here we get the default event loop and pass it to <code>app.run</code> along with other obvious options. With the <code>workers</code> argument, we can set how many workers we want to use. This allows us to spin up multiple workers and take advantages of multiple cpu cores. On a single core machine, we can just set it to 1 or totally skip that one.</p> <p>The <code>loop</code> is optional as well. If we do not pass the loop, sanic will create a new one and set it as the default loop. But in our case, we have connected to mongodb using motor before the <code>app.run</code> function could actually run. Motor now already uses the default event loop. If we don&rsquo;t pass that same loop to sanic, sanic will initialize a new event loop. Our database access and sanic server will be on two different event loops and we won&rsquo;t be able to make database calls. That is why we use the <code>get_event_loop</code> function to retrieve the current default event loop and pass it to sanic. This is also why we set <code>uvloop</code> as the default event loop on top of the file. Otherwise we would end up with the default loop (that comes with asyncio) and sanic would also have to use that. Initializing <code>uvloop</code> at the beginning makes sure everyone uses it.</p> <h3 id="final-code">Final Code</h3> <p>So here&rsquo;s the final code. We probably should clean up the imports and bring them up on top. But to relate to the different steps, I kept them as is. Also as mentioned earlier, the code has no error handling. We should write proper error handling code in all serious projects.</p> <pre><code class="language-python">import asyncio import uvloop asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()) from motor.motor_asyncio import AsyncIOMotorClient mongo_connection = AsyncIOMotorClient(&quot;&lt;connection string&gt;&quot;) contacts = mongo_connection.mydatabase.contacts from sanic import Sanic from sanic.response import json app = Sanic(__name__) @app.route(&quot;/&quot;) async def list(request): data = await contacts.find().to_list(20) for x in data: x['id'] = str(x['_id']) del x['_id'] return json(data) @app.route(&quot;/new&quot;) async def new(request): contact = request.json insert = await contacts.insert_one(contact) return json({&quot;inserted_id&quot;: str(insert.inserted_id)}) loop = asyncio.get_event_loop() app.run(host=&quot;0.0.0.0&quot;, port=8000, workers=3, debug=True, loop=loop) </code></pre> <p>Now let&rsquo;s try it out?</p> <h3 id="trying-out">Trying Out</h3> <p>I have saved the above code as <code>main.py</code>. So let&rsquo;s run it.</p> <pre><code class="language-sh">python main.py </code></pre> <p>Now we can use <code>curl</code> to try it out. Let&rsquo;s first add a contact:</p> <pre><code class="language-sh">curl -X POST -H &quot;Content-Type: application/json&quot; -d '{&quot;name&quot;: &quot;masnun&quot;}' &quot;http://localhost:8000/new&quot; </code></pre> <p>We should see something like:</p> <pre><code>{&quot;inserted_id&quot;:&quot;582ceb772c608731477f5384&quot;} </code></pre> <p>Let&rsquo;s verify by checking <code>/</code> -</p> <pre><code>curl -X GET &quot;http://localhost:8000/&quot; </code></pre> <p>If everything goes right, we should see something like:</p> <pre><code>[{&quot;id&quot;:&quot;582ceb772c608731477f5384&quot;,&quot;name&quot;:&quot;masnun&quot;}] </code></pre> <p>I hope it works for you too! :-)</p> <p>If you have any feedback or suggestions, please feel free to share it in the comments section. I would love to disqus :-)</p> Deploying Django Channels using Daphne http://masnun.rocks/2016/11/02/deploying-django-channels-using-daphne/ Wed, 02 Nov 2016 07:07:09 +0600 http://masnun.rocks/2016/11/02/deploying-django-channels-using-daphne/ <p>In one of my <a href="http://masnun.rocks/2016/09/25/introduction-to-django-channels/">earlier post</a>, we have seen an overview of how Django Channels work and how it helps us build cool stuff. However, in that post, we covered deployment briefly. So here in this post, we shall go over deployment again, with a little more details and of course code samples.</p> <h3 id="what-do-we-need">What do we need?</h3> <p>For running Django Channels, we would use the following setup:</p> <ul> <li>nginx as the proxy</li> <li>daphne as the interface server</li> <li>redis as the backend</li> </ul> <p>Let&rsquo;s get started.</p> <h3 id="setup-redis-and-configure-app">Setup Redis and Configure App</h3> <p>We need to setup redis if it&rsquo;s not installed already. Here&rsquo;s how to do it on Ubuntu:</p> <pre><code>sudo apt-get install redis-server </code></pre> <p>If we want to use the redis backend, we also need to setup <code>asgi-redis</code>.</p> <pre><code>pip install asgi_redis </code></pre> <p>In your <code>settings.py</code> file, make sure you used redis as the backend and input the host properly.</p> <p>Here&rsquo;s a demo:</p> <pre><code>CHANNEL_LAYERS = { &quot;default&quot;: { &quot;BACKEND&quot;: &quot;asgi_redis.RedisChannelLayer&quot;, &quot;CONFIG&quot;: { &quot;hosts&quot;: [(&quot;localhost&quot;, 6379)], }, &quot;ROUTING&quot;: &quot;realtime.routing.channel_routing&quot;, }, } </code></pre> <h3 id="starting-daphne">Starting Daphne</h3> <p>If you have installed <code>channels</code> from pip, you should have the <code>daphne</code> command available already. In the very unlikely case you don&rsquo;t have it installed, here&rsquo;s the command:</p> <pre><code>pip install daphne </code></pre> <p>To run daphne, we use the following command:</p> <pre><code>daphne -b 0.0.0.0 -p 8001 &lt;app&gt;.asgi:channel_layer </code></pre> <p>Daphne will bind to <code>0.0.0.0</code> and use <code>8001</code> as the port.</p> <p>Here <code>&lt;app&gt;</code> is our app name / the module that contains the <code>asgi.py</code> file. Please refer to the previous blog post to know what we put in the <code>asgi.py</code> file.</p> <p>We now need to make sure <code>daphne</code> is automatically started at system launch and restarted when it crashes. In this example, I would stick to my old upstart script. But you would probably want to explore excellent projects like <code>circus</code> or <code>supervisor</code> or at least <code>systemd</code>.</p> <p>Here&rsquo;s the upstart script I use:</p> <pre><code>start on runlevel [2345] stop on runlevel [016] respawn script cd /home/ubuntu/&lt;app home&gt; export DJANGO_SETTINGS_MODULE=&quot;&lt;app&gt;.production_settings&quot; exec daphne -b 0.0.0.0 -p 8001 &lt;app&gt;.asgi:channel_layer end script </code></pre> <h3 id="running-workers">Running Workers</h3> <p>We need at least one running worker before daphne can start processing requests. To run a worker, we use the following command:</p> <pre><code>python manage.py runworker </code></pre> <p>The <code>runworker</code> command spawns one worker with one thread. We should have more than one ideally. It is recommended to have <code>n</code> number of workers where <code>n</code> is the number of available cpu cores.</p> <p>Here&rsquo;s a simple upstart script to keep the worker running:</p> <pre><code>start on runlevel [2345] stop on runlevel [016] respawn script cd /home/ubuntu/&lt;app home&gt; export DJANGO_SETTINGS_MODULE=&quot;&lt;app&gt;.production_settings&quot; exec python3 manage.py runworker end script </code></pre> <p>It would be much easier to launch multiple workers if you use supervisord or circus.</p> <h3 id="nginx-conf">Nginx Conf</h3> <p>Finally here&rsquo;s the nginx conf I use. Please note I handle all incoming requests with daphne which is probably not ideal. You can keep using <code>uwsgi</code> for your existing, non real time parts and only handle the real time part with daphne. Since setting up wsgi is popular knowledge, I will just focus on what we need for daphne.</p> <pre><code>server { listen 80; client_max_body_size 20M; location /static { alias /home/ubuntu/&lt;app home&gt;/static; } location /media { alias /home/ubuntu/&lt;app home&gt;/media; } location / { proxy_pass http://0.0.0.0:8001; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection &quot;upgrade&quot;; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Host $server_name; } } </code></pre> <p>We have our daphne server running on port <code>8001</code> so we set a proxy to that url. Now if daphne and worker are running, we should be able to see our webpage when we visit the url.</p> Getting Help Interactively in Python http://masnun.rocks/2016/11/01/getting-help-interactively-in-python/ Tue, 01 Nov 2016 17:00:51 +0600 http://masnun.rocks/2016/11/01/getting-help-interactively-in-python/ <p>Working with a module that you&rsquo;re not familiar with? No internet? Somehow the docs are not accessible? Or simply feeling adventourous? Python has you covered. There are a few ways to get help Interactively. In this post, we will try a few of them.</p> <h3 id="the-dir-built-in">The <code>dir</code> built-in</h3> <p>The <code>dir</code> built in is a very helpful one. If you call it without any arguments, that is just <code>dir()</code>, it will return the names available in the current scope. When passed with an argument, it would display the available attributes of the passed object (inherited or it&rsquo;s own).</p> <pre><code class="language-python">&gt;&gt;&gt; import os &gt;&gt;&gt; dir(os) ['CLD_CONTINUED', 'CLD_DUMPED', 'CLD_EXITED', 'CLD_TRAPPED', 'EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_LOCK', 'F_OK', 'F_TEST', 'F_TLOCK', 'F_ULOCK', 'MutableMapping', 'NGROUPS_MAX', 'O_ACCMODE', 'O_APPEND', 'O_ASYNC', 'O_CLOEXEC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'PRIO_PGRP', 'PRIO_PROCESS', 'PRIO_USER', 'P_ALL', 'P_NOWAIT', 'P_NOWAITO', 'P_PGID', 'P_PID', 'P_WAIT', 'RTLD_GLOBAL', 'RTLD_LAZY', 'RTLD_LOCAL', 'RTLD_NODELETE', 'RTLD_NOLOAD', 'RTLD_NOW', 'R_OK', 'SCHED_FIFO', 'SCHED_OTHER', 'SCHED_RR', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'ST_NOSUID', 'ST_RDONLY', 'TMP_MAX', 'WCONTINUED', 'WCOREDUMP', 'WEXITED', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEXITED', 'WIFSIGNALED', 'WIFSTOPPED', 'WNOHANG', 'WNOWAIT', 'WSTOPPED', 'WSTOPSIG', 'WTERMSIG', 'WUNTRACED', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fwalk', '_get_exports_list', '_putenv', '_spawnvef', '_unsetenv', '_wrap_close', 'abort', 'access', 'altsep', 'chdir', 'chflags', 'chmod', 'chown', 'chroot', 'close', 'closerange', 'confstr', 'confstr_names', 'cpu_count', 'ctermid', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'environb', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fchdir', 'fchmod', 'fchown', 'fdopen', 'fork', 'forkpty', 'fpathconf', 'fsdecode', 'fsencode', 'fstat', 'fstatvfs', 'fsync', 'ftruncate', 'fwalk', 'get_blocking', 'get_exec_path', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getegid', 'getenv', 'getenvb', 'geteuid', 'getgid', 'getgrouplist', 'getgroups', 'getloadavg', 'getlogin', 'getpgid', 'getpgrp', 'getpid', 'getppid', 'getpriority', 'getsid', 'getuid', 'initgroups', 'isatty', 'kill', 'killpg', 'lchflags', 'lchmod', 'lchown', 'linesep', 'link', 'listdir', 'lockf', 'lseek', 'lstat', 'major', 'makedev', 'makedirs', 'minor', 'mkdir', 'mkfifo', 'mknod', 'name', 'nice', 'open', 'openpty', 'pardir', 'path', 'pathconf', 'pathconf_names', 'pathsep', 'pipe', 'popen', 'pread', 'putenv', 'pwrite', 'read', 'readlink', 'readv', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sched_get_priority_max', 'sched_get_priority_min', 'sched_yield', 'sendfile', 'sep', 'set_blocking', 'set_inheritable', 'setegid', 'seteuid', 'setgid', 'setgroups', 'setpgid', 'setpgrp', 'setpriority', 'setregid', 'setreuid', 'setsid', 'setuid', 'spawnl', 'spawnle', 'spawnlp', 'spawnlpe', 'spawnv', 'spawnve', 'spawnvp', 'spawnvpe', 'st', 'stat', 'stat_float_times', 'stat_result', 'statvfs', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sync', 'sys', 'sysconf', 'sysconf_names', 'system', 'tcgetpgrp', 'tcsetpgrp', 'terminal_size', 'times', 'times_result', 'truncate', 'ttyname', 'umask', 'uname', 'uname_result', 'unlink', 'unsetenv', 'urandom', 'utime', 'wait', 'wait3', 'wait4', 'waitpid', 'walk', 'write', 'writev'] &gt;&gt;&gt; </code></pre> <p>Coupled with <code>getattr</code>, you can actually write your own custom utilities to better inspect objects.</p> <h3 id="the-help-built-in">The <code>help</code> built-in</h3> <p>I guess I don&rsquo;t have to tell you how <code>help</code>-ful this one can be?</p> <blockquote> <p>Did you know the <code>help</code> built in is based on <code>pydoc.help</code>?</p> </blockquote> <p>If you just call <code>help</code> without any arguments, it will launch an interactive help prompt where you can just type in names and it will display help for that. Here&rsquo;s an example:</p> <pre><code class="language-python">&gt;&gt;&gt; help() Welcome to Python 3.5's help utility! If this is your first time using Python, you should definitely check out the tutorial on the Internet at http://docs.python.org/3.5/tutorial/. Enter the name of any module, keyword, or topic to get help on writing Python programs and using Python modules. To quit this help utility and return to the interpreter, just type &quot;quit&quot;. To get a list of available modules, keywords, symbols, or topics, type &quot;modules&quot;, &quot;keywords&quot;, &quot;symbols&quot;, or &quot;topics&quot;. Each module also comes with a one-line summary of what it does; to list the modules whose name or summary contain a given string such as &quot;spam&quot;, type &quot;modules spam&quot;. help&gt; list help&gt; </code></pre> <p>When you type in <code>list</code> and hit enter, it will show you the docs for the <code>list</code> built in. To quit, press <code>q</code>. As described in the text above, typing in &ldquo;modules&rdquo;, &ldquo;keywords&rdquo; etc will list what is available.</p> <p>Interestingly the help functionality is built on top of <code>pydoc</code> so it will be able to help you with most of the installed modules (even the third party ones) as long as the modules have doctstrings available. Brilliant, no?</p> <p>Now if you call the <code>help</code> callable with an argument, it will display help for that item. The above example for viewing the docs for <code>list</code> can be done this way too:</p> <pre><code class="language-python">&gt;&gt;&gt; help(list) </code></pre> <p>Neat, huh?</p> <h3 id="using-the-pydoc-module">Using the <code>pydoc</code> Module</h3> <p>In the previous section, we mentioned <code>pydoc</code>. From the name, you can probably guess what it does. Just to be certain, let&rsquo;s try this:</p> <pre><code class="language-python">&gt;&gt;&gt; import pydoc &gt;&gt;&gt; help(pydoc) </code></pre> <p>As you can read in there, the <code>pydoc</code> module generates documentation in html or text format for interactive usages (like in the previous section). It can read Python source files, parse the docstrings and generate helpful information for us. Pydoc module comes with your Python installation. So it is always available to you.</p> <p>There are some interesting use cases of this module. You can run it from the command line. Just use <code>pydoc &lt;name&gt;</code> where the <code>&lt;name&gt;</code> is the name of a function, module, class etc. It will display the same interactive, generated docs we get from <code>help(&lt;name&gt;)</code>.</p> <p>And then <code>pydoc -k &lt;keyword&gt;</code> would search the keyword in the available modules&rsquo; synopsis.</p> <p>If you would like to browse the docs on a web browser, you can run <code>pydoc -b</code> and it will run a server and open your browser, pointing to the address of the server. If you would like to set the port yourself, use <code>pydoc -p &lt;port&gt;</code> and then in the prompt, type &ldquo;b&rdquo; to open the browser. You can browse the docs and search as needed.</p> <h3 id="the-inspect-module">The <code>inspect</code> Module</h3> <p>The <code>inspect</code> module has some interesting use cases too. It can help us know more about different objects in runtime.</p> <p>The following functions check for object types:</p> <ul> <li><code>ismodule()</code></li> <li><code>isclass()</code></li> <li><code>ismethod()</code></li> <li><code>isfunction()</code></li> <li><code>isgeneratorfunction()</code></li> <li><code>isgenerator()</code></li> <li><code>istraceback()</code></li> <li><code>isframe()</code></li> <li><code>iscode()</code></li> <li><code>isbuiltin()</code></li> <li><code>isroutine()</code></li> </ul> <p>We can use the <code>getmembers()</code> function to get all the members of an object, class or module. We can filter the members by passing one of the above functions as the second argument.</p> <pre><code class="language-python">&gt;&gt;&gt; len(inspect.getmembers(os)) 284 &gt;&gt;&gt; len(inspect.getmembers(os, inspect.isclass)) 9 &gt;&gt;&gt; </code></pre> <p>The <code>getdoc</code> function can be used to retrieve available documentation from an object.</p> <pre><code class="language-python">&gt;&gt;&gt; inspect.getdoc(list) &quot;list() -&gt; new empty list\nlist(iterable) -&gt; new list initialized from iterable's items&quot; </code></pre> <p>The inspect module has some other cool functions too. Do check them out. And of course, you know how! ;-)</p> <pre><code class="language-python">&gt;&gt;&gt; import inspect &gt;&gt;&gt; help(inspect) </code></pre> Async Python: The Different Forms of Concurrency http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/ Thu, 06 Oct 2016 12:10:03 +0600 http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/ <p>With the advent of Python 3 the way we&rsquo;re hearing a lot of buzz about &ldquo;async&rdquo; and &ldquo;concurrency&rdquo;, one might simply assume that Python recently introduced these concepts/capabilities. But that would be quite far from the truth. We have had async and concurrent operations for quite some times now. Also many beginners may think that <code>asyncio</code> is the only/best way to do async/concurrent operations. In this post we shall explore the different ways we can achieve concurrency and the benefits/drawbacks of them.</p> <h3 id="defining-the-terms">Defining The Terms</h3> <p>Before we dive into the technical aspects, it is essential to have some basic understanding of the terms frequently used in this context.</p> <h4 id="sync-vs-async">Sync vs Async</h4> <p>In Syncrhonous operations, the tasks are executed in sync, one after one. In asynchronous operations, tasks may start and complete independent of each other. One async task may start and continue running while the execution moves on to a new task. Async tasks don&rsquo;t block (make the execution wait for it&rsquo;s completion) operations and usually run in the background.</p> <p>For example, you have to call a travel agency to book for your next vacation. And you need to send an email to your boss before you go on the tour. In synchronous fashion, you would first call the travel agency, if they put you on hold for a moment, you keep waiting and waiting. Once it&rsquo;s done, you start writing the email to your boss. Here you complete one task after another. But if you be clever and while you are waiting on hold, you could start writing up the email, when they talk to you, you pause writing the email, talk to them and then resume the email writing. You could also ask a friend to make the call while you finish that email. This is asynchronicity. Tasks don&rsquo;t block one another.</p> <h4 id="concurrency-and-parallelism">Concurrency and Parallelism</h4> <p>Concurrency implies that two tasks make progress together. In our previous example, when we considered the async example, we were making progress on both the call with the travel agent and writing the email. This is concurrency.</p> <p>When we talked about taking help from a friend with the call, in that case both tasks would be running in parallel.</p> <p>Parallelism is in fact a form of concurrency. But parallelism is hardware dependent. For example if there&rsquo;s only one core in the CPU, two operations can&rsquo;t really run in parallel. They just share time slices from the same core. This is concurrency but not parallelism. But when we have multiple cores, we can actually run two or more operations (depending on the number of cores) in parallel.</p> <h4 id="quick-recap">Quick Recap</h4> <p>So this is what we have realized so far:</p> <ul> <li> <b>Sync:</b> Blocking operations.</li> <li> <b>Async:</b> Non blocking operations.</li> <li> <b>Concurrency:</b> Making progress together.</li> <li> <b>Parallelism:</b> Making progress in parallel.</li> </ul> <p><br/></p> <p><center> <em>Parallelism implies Concurrency. But Concurrency doesn&rsquo;t always mean Parallelism.</em> </center></p> <p><br/></p> <h3 id="threads-processes">Threads &amp; Processes</h3> <p>Python has had <strong>Threads</strong> for a very long time. Threads allow us to run our operations concurrently. But there was/is a problem with the <strong>Global Interpreter Lock (GIL)</strong> for which the threading could not provide true parallelism. However, with <strong>multiprocessing</strong>, it is now possible to leverage multiple cores with Python.</p> <h4 id="threads">Threads</h4> <p>Let&rsquo;s see a quick example. In the following code, the <code>worker</code> function will be run on multiple threads, asynchronously and concurrently.</p> <pre><code class="language-python">import threading import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print(&quot;I am Worker {}, I slept for {} seconds&quot;.format(number, sleep)) for i in range(5): t = threading.Thread(target=worker, args=(i,)) t.start() print(&quot;All Threads are queued, let's see when they finish!&quot;) </code></pre> <p>Here&rsquo;s a sample output from a run on my machine:</p> <pre><code class="language-text">$ python thread_test.py All Threads are queued, let's see when they finish! I am Worker 1, I slept for 1 seconds I am Worker 3, I slept for 4 seconds I am Worker 4, I slept for 5 seconds I am Worker 2, I slept for 7 seconds I am Worker 0, I slept for 9 seconds </code></pre> <p>So you can see we start 5 threads, they make progress together and when we start the threads (and thus executing the worker function), the operation does not wait for the threads to complete before moving on to the next print statement. So this is an async operation.</p> <p>In our example, we passed a function to the <code>Thread</code> constructor. But if we wanted we could also subclass it and implement the code as a method (in a more OOP way).</p> <p><strong>Further Reading:</strong></p> <p>To know about Threads in details, you can follow these resources:</p> <ul> <li><a href="https://pymotw.com/3/threading/index.html">https://pymotw.com/3/threading/index.html</a></li> </ul> <h4 id="global-interpreter-lock-gil">Global Interpreter Lock (GIL)</h4> <p>The Global Interpreter Lock aka GIL was introduced to make CPython&rsquo;s memory handling easier and to allow better integrations with C (for example the extensions). The GIL is a locking mechanism that the Python interpreter runs only one thread at a time. That is only one thread can execute Python byte code at any given time. This GIL makes sure that multiple threads <strong>DO NOT</strong> run in parallel.</p> <p>Quick facts about the GIL:</p> <ul> <li>One thread can run at a time.</li> <li>The Python Interpreter switches between threads to allow concurrency.</li> <li>The GIL is only applicable to CPython (the defacto implementation). Other implementations like Jython, IronPython don&rsquo;t have GIL.</li> <li>GIL makes single threaded programs fast.</li> <li>For I/O bound operations, GIL usually doesn&rsquo;t harm much.</li> <li>GIL makes it easy to integrate non thread safe C libraries, thansk to the GIL, we have many high performance extensions/modules written in C.</li> <li>For CPU bound tasks, the interpreter checks between <code>N</code> ticks and switches threads. So one thread does not block others.</li> </ul> <p>Many people see the <code>GIL</code> as a weakness. I see it as a blessing since it has made libraries like NumPy, SciPy possible which have taken Python an unique position in the scientific communities.</p> <p><strong>Further Reading:</strong></p> <p>These resources can help dive deeper into the GIL:</p> <ul> <li><a href="http://www.dabeaz.com/python/UnderstandingGIL.pdf">http://www.dabeaz.com/python/UnderstandingGIL.pdf</a></li> </ul> <h4 id="processes">Processes</h4> <p>To get parallelism, Python introduced the <code>multiprocessing</code> module which provides APIs which will feel very similar if you have used Threading before.</p> <p>In fact, we will just go and change our previous example. Here&rsquo;s the modified version that uses <code>Process</code> instead of <code>Thread</code>.</p> <pre><code class="language-python"> import multiprocessing import time import random def worker(number): sleep = random.randrange(1, 10) time.sleep(sleep) print(&quot;I am Worker {}, I slept for {} seconds&quot;.format(number, sleep)) for i in range(5): t = multiprocessing.Process(target=worker, args=(i,)) t.start() print(&quot;All Processes are queued, let's see when they finish!&quot;) </code></pre> <p>So what&rsquo;s changed? I just imported the <code>multiprocessing</code> module instead of <code>threading</code>. And then instead of <code>Thread</code>, I used <code>Process</code>. That&rsquo;s it, really! Now instead of multi threading, we are using multiple processes which are running on different core of your CPU (assuming you have multiple cores).</p> <p>With the <code>Pool</code> class, we can also distribute one function execution across multiple processes for different input values. If we take the example from the official docs:</p> <pre><code class="language-python">from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': p = Pool(5) print(p.map(f, [1, 2, 3])) </code></pre> <p>Here, instead of iterating over the list of values and calling <code>f</code> on them one by one, we are actually running the function on different processes. One process executes <code>f(1)</code>, another runs <code>f(2)</code> and another runs <code>f(3)</code>. Finally the results are again aggregated in a list. This would allow us to break down heavy computations into smaller parts and run them in parallel for faster calculation.</p> <p><strong>Further Reading:</strong></p> <ul> <li><a href="https://pymotw.com/3/multiprocessing/index.html">https://pymotw.com/3/multiprocessing/index.html</a></li> </ul> <h4 id="the-concurrent-futures-module">The <code>concurrent.futures</code> module</h4> <p>The <code>concurrent.futures</code> module packs some really great stuff for writing async codes easily. My favorites are the <code>ThreadPoolExecutor</code> and the <code>ProcessPoolExecutor</code>. These executors maintain a pool of threads or processes. We submit our tasks to the pool and it runs the tasks in available thread/process. A <code>Future</code> object is returned which we can use to query and get the result when the task has completed.</p> <p>Here&rsquo;s an example of <code>ThreadPoolExecutor</code>:</p> <pre><code class="language-python">from concurrent.futures import ThreadPoolExecutor from time import sleep def return_after_5_secs(message): sleep(5) return message pool = ThreadPoolExecutor(3) future = pool.submit(return_after_5_secs, (&quot;hello&quot;)) print(future.done()) sleep(5) print(future.done()) print(future.result()) </code></pre> <p>I have a blog post on the <code>concurrent.futures</code> module here: <a href="http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html">http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html</a> which might be helpful for exploring the module deeper.</p> <p><strong>Further Reading:</strong></p> <ul> <li><a href="https://pymotw.com/3/concurrent.futures/">https://pymotw.com/3/concurrent.futures/</a></li> </ul> <p><br/></p> <h3 id="asyncio-why-what-and-how">Asyncio - Why, What and How?</h3> <p>You probably have the question many people in the Python community have - What does asyncio bring new to the table? Why did we need one more way to do async I/O? Did we not have threads and processes already? Let&rsquo;s see!</p> <h4 id="why-do-we-need-asyncio">Why do we need asyncio?</h4> <p>Processes are costly to spawn. So for I/O, Threads are chosen largely. We know that I/O depends on external stuff - slow disks or nasty network lags make I/O often unpredictable. Now, let&rsquo;s assume that we are using threads for I/O bound operations. 3 threads are doing different I/O tasks. The interpreter would need to switch between the concurrent threads and give each of them some time in turns. Let&rsquo;s call the threads - <code>T1</code>, <code>T2</code> and <code>T3</code>. The three threads have started their I/O operation. <code>T3</code> completes it first. <code>T2</code> and <code>T1</code> are still waiting for I/O. The Python interpreter switches to <code>T1</code> but it&rsquo;s still waiting. Fine, so it moves to <code>T2</code>, it&rsquo;s still waiting and then it moves to <code>T3</code> which is ready and executes the code. Do you see the problem here?</p> <p><code>T3</code> was ready but the interpreter switched between <code>T2</code> and <code>T1</code> first - that incurred switching costs which we could have avoided if the interpreter first moved to <code>T3</code>, right?</p> <h4 id="what-is-asyncio">What is asyncio?</h4> <p>Asyncio provides us an event loop along with other good stuff. The event loop tracks different I/O events and switches to tasks which are ready and pauses the ones which are waiting on I/O. Thus we don&rsquo;t waste time on tasks which are not ready to run right now.</p> <p>The idea is very simple. There&rsquo;s an event loop. And we have functions that run async, I/O operations. We give our functions to the event loop and ask it to run those for us. The event loop gives us back a <code>Future</code> object, it&rsquo;s like a promise that we will get something back in the <em>future</em>. We hold on to the promise, time to time check if it has a value (when we feel impatient) and finally when the future has a value, we use it in some other operations.</p> <p>Asyncio uses generators and coroutines to pause and resume tasks. You can read these posts for more details:</p> <ul> <li><a href="http://masnun.com/2015/11/20/python-asyncio-future-task-and-the-event-loop.html">http://masnun.com/2015/11/20/python-asyncio-future-task-and-the-event-loop.html</a></li> <li><a href="http://masnun.com/2015/11/13/python-generators-coroutines-native-coroutines-and-async-await.html">http://masnun.com/2015/11/13/python-generators-coroutines-native-coroutines-and-async-await.html</a></li> </ul> <h4 id="how-do-we-use-asyncio">How do we use asyncio?</h4> <p>Before we beging, let&rsquo;s see example codes:</p> <pre><code class="language-python">import asyncio import datetime import random async def my_sleep_func(): await asyncio.sleep(random.randint(0, 5)) async def display_date(num, loop): end_time = loop.time() + 50.0 while True: print(&quot;Loop: {} Time: {}&quot;.format(num, datetime.datetime.now())) if (loop.time() + 1.0) &gt;= end_time: break await my_sleep_func() loop = asyncio.get_event_loop() asyncio.ensure_future(display_date(1, loop)) asyncio.ensure_future(display_date(2, loop)) loop.run_forever() </code></pre> <p>Please note that the <code>async/await</code> syntax is Python 3.5+ only. if we walk through the codes:</p> <ul> <li>We have an async function <code>display_date</code> which takes a number (as an identifier) and the event loop as parameters.</li> <li>The function has an infinite loop that breaks after 50 secs. But during this 50 sec period, it repeatedly prints out the time and takes a nap. The <code>await</code> function can wait on other async functions (coroutines) to complete.</li> <li>We pass the function to event loop (using the <code>ensure_future</code> method).</li> <li>We start running the event loop.</li> </ul> <p>Whenever the <code>await</code> call is made, asyncio understands that the function is probably going to need some time. So it pauses the execution, starts monitoring any I/O event related to it and allows tasks to run. When asyncio notices that paused function&rsquo;s I/O is ready, it resumes the function.</p> <h3 id="making-the-right-choice">Making the Right Choice</h3> <p>We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:</p> <pre><code class="language-python">if io_bound: if io_very_slow: print(&quot;Use Asyncio&quot;) else: print(&quot;Use Threads&quot;) else: print(&quot;Multi Processing&quot;) </code></pre> <ul> <li>CPU Bound =&gt; Multi Processing</li> <li>I/O Bound, Fast I/O, Limited Number of Connections =&gt; Multi Threading</li> <li>I/O Bound, Slow I/O, Many connections =&gt; Asyncio</li> </ul> Creating an executable file using Cython http://masnun.rocks/2016/10/01/creating-an-executable-file-using-cython/ Sat, 01 Oct 2016 17:27:23 +0600 http://masnun.rocks/2016/10/01/creating-an-executable-file-using-cython/ <hr /> <p><strong>Disclaimer</strong>: I am quite new to Cython, if you find any part of this post is incorrect or there are better ways to do something, I would really appreciate your feedback. Please do feel free to leave your thoughts in the comments section :)</p> <hr /> <p>I know Cython is supposed to be used for building extensions, but I was wondering if we can by any chance compile a Python file into executable binary using Cython? I searched on Google and found this <a target="_blank" href="http://stackoverflow.com/questions/5105482/compile-main-python-program-using-cython">StackOverflow</a> question. There is a detailed answer on this question which is very helpful. I tried to follow the instructions and after (finding and ) fixing some paths, I managed to do it. I am going to write down my experience here in case someone else finds it useful as well.</p> <h3 id="embedding-the-python-interpreter">Embedding the Python Interpreter</h3> <p>Cython compiles the Python or the Cython files into C and then compiles the C code to create the extensions. Interestingly, Cython has a CLI switch <code>--embed</code> whic can generate a <code>main</code> function. This main function embeds the Python interpreter for us. So we can just compile the C file and get our single binary executable.</p> <h3 id="getting-started">Getting Started</h3> <p>First we need to have a Python (<code>.py</code>) or Cython (<code>.pyx</code>) file ready for compilation. Let&rsquo;s start with a plain old &ldquo;Hello World&rdquo; example.</p> <pre><code class="language-python">print(&quot;Hello World!&quot;) </code></pre> <p>Let&rsquo;s convert this Python file to a C source file with embedded Python interpreter.</p> <pre><code class="language-bash">cython --embed -o hello_world.c hello_world.py </code></pre> <p>It should generate a file named <code>hello_world.c</code> in the current directory. We now compile it to an executable.</p> <pre><code class="language-bash">gcc -v -Os -I /Users/masnun/.pyenv/versions/3.5.1/include/python3.5m -L /usr/local/Frameworks/Python.framework/Versions/3.5/lib -o test test.c -lpython3.5 -lpthread -lm -lutil -ldl </code></pre> <p>Please note you must have the Python source code and dynamic libraries in order to successfully compile it. I am on OSX and I use PyEnv. So I passed the appropriate paths and it compiled fine.</p> <p>Now I have an executable file, which I can run:</p> <pre><code class="language-bash">$ ./hello_world Hello World! </code></pre> <h3 id="dynamic-linking">Dynamic Linking</h3> <p>In this case, the executable we produce is dynamically linked to our specified Python version. So this may not be fully portable (the libraries will need to be available on target machines). But this should work fine if we compile against common versions (for example the default version of Python or a version easily obtainable via the package manager).</p> <h3 id="including-other-modules">Including Other Modules</h3> <p>Up untill now, I haven&rsquo;t found any easy ways to include other 3rd party pure python modules (ie. <code>requests</code>) directly compiled into the binary. However, if I want to split my codes into multiple files, I can create other <code>.pyx</code> files and use the <code>include</code> statement with those.</p> <p>For example, here&rsquo;s <code>hello.pyx</code>:</p> <pre><code class="language-cython">cdef struct Person: char *name int age cdef say(): cdef Person masnun = Person(name=&quot;masnun&quot;, age=20) print(&quot;Hello {}, you are {} years old!&quot;.format(masnun.name.decode('utf8'), masnun.age)) </code></pre> <p>And here&rsquo;s my main file - <code>test.pyx</code> -</p> <pre><code class="language-cython">include &quot;hello.pyx&quot; say() </code></pre> <p>Now if I compile <code>test.pyx</code> just like above example, it will also include the code in <code>hello.pyx</code> and I can call the <code>say</code> function as if it was in <code>test.pyx</code> itself.</p> <p>However, shared libraries like PyQt would have no issues - we can compile them as is. So basically we can take any PyQt code example and compile it with Cython - it should work fine!</p> Can Cython make Python Great in Programming Contests? http://masnun.rocks/2016/09/28/can-cython-make-python-great-in-programming-contests/ Wed, 28 Sep 2016 08:00:30 +0600 http://masnun.rocks/2016/09/28/can-cython-make-python-great-in-programming-contests/ <p>Python is getting very popular as the first programming language in both home and aborad. I know many of the Bangladeshi universities have started using Python to introduce beginners to the wonderful world of programming. This also seems to be <a target="_blank" href="http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext">the case</a> in the US. I have talked to a few friends from other countries and they agree to the fact that Python is quickly becoming the language people learn first. A quick <a target="_blank" href="http://bfy.tw/7v1B">google search</a> could explain why Python is getting so popular among the learners.</p> <h3 id="python-in-programming-contests">Python in Programming Contests</h3> <p>Recently Python has been been included in ICPC, before that Python has usually had less visibility / presence in programming contests. And of course there are valid reasons behind that. The defacto implementation of Python - &ldquo;CPython&rdquo; is quite slow. It&rsquo;s a dynmaic language and that costs in terms of execution speed. C / C++ / Java is way faster than Python and programming contests are all about speed / performance. Python would allow you to solve problems in less lines of code but you may often hit the time limit. Despite the limitation, people have continiously chosen Python to learn programming and solve problems on numerous programming related websites. This might have convnced the authority to include Python in ICPC. But we do not yet know which flavor (read implementation) and version of Python will be available to the ICPC contestants. From <a target="_blank" href="https://www.quora.com/What-do-you-think-about-the-induction-of-Python-in-ACM-ICPC-2017">different</a> <a target="_blank" href="http://codeforces.com/blog/entry/44899">sources</a> I gather that Python will be supported but the time limit issue remains - it is not guranteed that a problem can be solved within the time limit using Python. That makes me wonder, can Cython help in such cases?</p> <h3 id="introduction-to-cython">Introduction to Cython</h3> <p>From the <a target="_blank" href="http://cython.org/">official website</a>:</p> <blockquote> <p>Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.</p> </blockquote> <p>With Cython, we can add type hints to our existing Python programs and compile them to make them run faster. But what is more awesome is the <code>Cython</code> language - it is a superset of Python and allows us to write Python like code which performs like C.</p> <p>Don&rsquo;t trust my words, see for yourself in the <a target="_blank" href="http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html">Tutorial</a> and <a target="_blank" href="http://docs.cython.org/en/latest/src/userguide/language_basics.html#language-basics"> Cython Language Basics</a>.</p> <h3 id="cython-is-fast">Cython is Fast</h3> <p>When I say fast, I really mean - <strong>very very</strong> fast.</p> <p><center> <img src="http://masnun.rocks/images/cython-vs-c.png" alt="cython vs c" /></p> <p>Image Source: <a target="_blank" href="http://ibm.co/20XSZ4F"><a href="http://ibm.co/20XSZ4F">http://ibm.co/20XSZ4F</a></a></p> <p></center></p> <p>The above image is taken from an article from IBM Developer Works which shows how Cython compares to C in terms of speed.</p> <p>You can also check out these links for random benchmarks from different people:</p> <ul> <li><a target="_blank" href="http://www.matthiaskauer.com/2014/02/a-speed-comparison-of-python-cython-and-c/">Cython beating C++</a></li> <li><a target="_blank" href="http://prabhuramachandran.blogspot.com/2008/09/python-vs-cython-vs-d-pyd-vs-c-swig.html">Cython being 30% faster than the C++</a></li> <li><a target="_blank" href="http://aroberge.blogspot.com/2010/01/python-cython-faster-than-c.html">Another Benchmark</a></li> </ul> <p>And finally, do try yourself and benchmark Cython against C++ and see how it performs!</p> <p>Bonus article &ndash; <a href="https://magic.io/blog/uvloop-blazing-fast-python-networking/">Blazing fast Python networking</a> :-)</p> <h3 id="cython-is-easy-to-setup">Cython is easy to Setup</h3> <p>OK, so is it easy to make Cython available in the contest environments? Yes, it is! The <strong>only</strong> requirements of Cython is that you must have a <strong>C Compiler</strong> installed on your system along with Python. Any computer used for contest programming is supposed to have a C compiler installed anyway.</p> <p>We just need one command to install Cython:</p> <pre><code class="language-bash">pip install Cython </code></pre> <p><strong>PS:</strong> Many Scientific distributions of Python (ie. Anaconda) already ships Cython.</p> <h3 id="cython-in-programming-contests">Cython in Programming Contests</h3> <p>Since we saw that Cython is super fast and easy to setup, programming contests can make Cython available along with CPython to allow the contestants make their programs faster and get along with Java / C++. It will make Python an attractive choice for serious problem solving.</p> <p>I know the <code>Cython</code> language is not exactly Python. It is a superset of the Python language. So beginners might not be familiar with the language and that&rsquo;s alright. Beginners can start with Python and start solving the easier problems with Python. When they start competitive programming and start hitting the time limits, then Cython is one of the options they can choose to make their code run faster. Of course Cython needs some understanding of how C works - that&rsquo;s fine too because Cython still feels more productive than writing plain old C or C++.</p> <h3 id="final-words">Final words</h3> <p>PyPy is already quite popular in the Python community. Dropbox and Microsoft are also working on their Python JITs. I believe that someday Python JITs would be as fast as Java / C++. Today, Python is making programming fun for many beginners. I hope with Cython, we can worry less about the time limits and accept Python as a fitting tool in our competitive programming contests!</p> Introduction to Django Channels http://masnun.rocks/2016/09/25/introduction-to-django-channels/ Sun, 25 Sep 2016 21:27:34 +0600 http://masnun.rocks/2016/09/25/introduction-to-django-channels/ <p>Django is a brilliant web framework. In fact it is my most favourite one for various reasons. An year and a half ago, I switched to Python and Django for all my web development. I am a big fan of the eco system and the many third party packages. Particularly I use Django REST Framework whenever I need to create APIs. Having said that, Django was more than good enough for basic HTTP requests. But the web has changed. We now have HTTP/2 and web sockets. Django could not support them well in the past. For the web socket part, I usually had to rely on Tornado or NodeJS (with the excellent Socket.IO library). They are good technologies but most of my web apps being in Django, I really wished there were something that could work with Django itself. And then we had <strong>Channels</strong>. The project is meant to allow Django to support HTTP/2, websockets or other protocols with ease.</p> <h3 id="concepts">Concepts</h3> <p>The underlying concept is really simple - there are <code>channels</code> and there are <code>messages</code>, there are <code>producers</code> and there are <code>consumers</code> - the whole system is based on passing messages on to channels and consuming/responding to those messages.</p> <p>Let&rsquo;s look at the core components of Django Channels first:</p> <ul> <li><code>channel</code> - A channel is a FIFO queue like data structure. We can have many channels depending on our need.<br /></li> <li><code>message</code> - A message contains meaningful data for the consumers. Messages are passed on to the channels.</li> <li><code>consumer</code> - A consumer is usually a function that consumes a message and take actions.</li> <li><code>interface server</code> - The interface server knows how to handle different protocols. It works as a translator or a bridge between Django and the outside world.</li> </ul> <h3 id="how-does-it-work">How does it work?</h3> <p>A http request first comes to the <code>Interface Server</code> which knows how to deal with a specific type of request. For example, for websockets and http, <strong>Daphne</strong> is a popular interface server. When a new http/websocket request comes to the interface server (daphne in our case), it accepts the request and transforms it into a <code>message</code>. Then it passes the <code>message</code> to the appropriate <code>channel</code>. There are predefined channels for specific types. For example, all http requests are passed to <code>http.request</code> channel. For incoming websocket messages, there is <code>websocket.receive</code>. So these channels receive the messages when the corresponding type of requests come in to the interface server.</p> <p>Now that we have <code>channels</code> getting filled with <code>messages</code>, we need a way to process these messages and take actions (if necessary), right? Yes! For that we write some consumer functions and register them to the channels we want. When messages come to these channels, the consumers are called with the message. They can read the message and act on them.</p> <p>So far, we have seen how we can <strong>read</strong> an incoming request. But like all web applications, we should <strong>write</strong> something back too, no? How do we do that? As it happens, the interface server is quite clever. While transforming the incoming request into a message, it creates a <code>reply</code> channel for that particular client request and registers itself to that channel. Then it passes the reply channel along with the message. When our consumer function reads the incoming message, it can pass a response to the <code>reply channel</code> attached with the message. Our interface server is listenning to that reply channel, remember? So when a response is sent back to the reply channel, the interface server grabs the message, transforms it into a http response and sends back to the client. Simple, no?</p> <h3 id="writing-a-websocket-echo-server">Writing a Websocket Echo Server</h3> <p>Enough with the theories, let&rsquo;s get our hands dirty and build a simple echo server. The concept is simple. The server accepts websocket connections, the client writes something to us, we just echo it back. Plain and simple example.</p> <h5 id="install-django-channels">Install Django &amp; Channels</h5> <pre><code class="language-bash">pip install channels </code></pre> <p>That should do the trick and install Django + Channels. Channels has Django as a depdency, so when you install channels, Django comes with it.</p> <h5 id="create-an-app">Create An App</h5> <p>Next we create a new django project and app -</p> <pre><code class="language-bash">django-admin.py startproject djchan </code></pre> <pre><code class="language-bash">cd djchan </code></pre> <pre><code class="language-bash">python manage.py startapp realtime </code></pre> <h5 id="configure-installed-apps">Configure <code>INSTALLED_APPS</code></h5> <p>We have our Django app ready. We need to add <code>channels</code> and our django app (<code>realtime</code>) to the <code>INSTALLED_APPS</code> list under <code>settings.py</code>. Let&rsquo;s do that:</p> <pre><code class="language-python">INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', &quot;channels&quot;, &quot;realtime&quot; ] </code></pre> <h5 id="write-our-consumer">Write our Consumer</h5> <p>After that, we need to start writing a consumer function that will process the incoming websocket messages and send back the response:</p> <pre><code class="language-python"># consumers.py def websocket_receive(message): text = message.content.get('text') if text: message.reply_channel.send({&quot;text&quot;: &quot;You said: {}&quot;.format(text)}) </code></pre> <p>The code is simple enough. We receieve a message, get it&rsquo;s text content (we&rsquo;re expecting that the websocket connection will send only text data for this exmaple) and then push it back to the <code>reply_channel</code> - just like we planned.</p> <h5 id="channels-routing">Channels Routing</h5> <p>We have our consume function ready, now we need to tell Django how to route messages to our consumer. Just like URL routing, we need to define our channel routings.</p> <pre><code class="language-python"># routing.py from channels.routing import route from .consumers import websocket_receive channel_routing = [ route(&quot;websocket.receive&quot;, websocket_receive, path=r&quot;^/chat/&quot;), ] </code></pre> <p>The code should be self explanatory. We have a list of <code>route</code> objects. Here we select the channel name (<code>websocket.receive</code> =&gt; for receieving websocket messages), pass the consumer function and then configure the optional <code>path</code>. The path is an interesting bit. If we didn&rsquo;t pass a value for it, the consumer will get all the messages in the <code>websocket.receive</code> channel on any URL. So if someone created a websocket connection to <code>/</code> or <code>/private</code> or <code>/user/1234</code> - regardless of the url path, we would get all incoming messages. But that&rsquo;s not our intention, right? So we restrict the <code>path</code> to <code>/chat</code> so only connections made to that url are handled by the consumer. Please note the beginning <code>/</code>, unlike url routing, in channels, we have to use it.</p> <h5 id="configuring-the-channel-layers">Configuring The Channel Layers</h5> <p>We have defined a consumer and added it to a routing table. We&rsquo;re more or less ready. There&rsquo;s just a final bit of configuration we need to do. We need to tell channels two things - which backend we want to use and where it can find our channel routing.</p> <p>Let&rsquo;s briefly talk about the backend. The messages and the channels - Django needs some sort of data store or message queue to back this system. By default Django can use in memory backend which keeps these things in memory but if you consider a distributed app, for scaling large, you need something else. Redis is a popular and proven piece of technology for these kinds of scenarios. In our case we would use the Redis backend.</p> <p>So let&rsquo;s install that:</p> <pre><code class="language-sh">pip install asgi_redis </code></pre> <p>And now we put this in our <code>settings.py</code>:</p> <pre><code class="language-python">CHANNEL_LAYERS = { &quot;default&quot;: { &quot;BACKEND&quot;: &quot;asgi_redis.RedisChannelLayer&quot;, &quot;CONFIG&quot;: { &quot;hosts&quot;: [(&quot;localhost&quot;, 6379)], }, &quot;ROUTING&quot;: &quot;realtime.routing.channel_routing&quot;, }, } </code></pre> <h5 id="running-the-servers">Running The Servers</h5> <p>Make sure that Redis is running (usually <code>redis-server</code> should run it). Now run the django app:</p> <pre><code class="language-sh">python manage.py runserver </code></pre> <p>In local environment, when you do <code>runserver</code> - Django launches both the interface server and necessary background workers (to run the consumer functions in the background). But in production, we should run the workers seperately. We will get to that soon.</p> <h5 id="trying-it-out">Trying it Out!</h5> <p>Once our dev server starts up, let’s open up the web app. If you haven’t added any django views, no worries, you should still see the “It Worked!” welcome page of Django and that should be fine for now. We need to test our websocket and we are smart enough to do that from the dev console. Open up your Chrome Devtools (or Firefox | Safari | any other browser’s dev tools) and navigate to the JS console. Paste the following JS code:</p> <pre><code class="language-javascript"> socket = new WebSocket(&quot;ws://&quot; + window.location.host + &quot;/chat/&quot;); socket.onmessage = function(e) { alert(e.data); } socket.onopen = function() { socket.send(&quot;hello world&quot;); } </code></pre> <p>If everything worked, you should get an alert with the message we sent. Since we defined a path, the websocket connection works only on /chat/. Try modifying the JS code and send a message to some other url to see how they don’t work. Also remove the path from our route and see how you can catch all websocket messages from all the websocket connections regardless of which url they were connected to. Cool, no?</p> <h5 id="our-custom-channels">Our Custom Channels</h5> <p>We have seen that certain protocols have predefined channels for various purposes. But we are not limited to those. We can create our own channels. We don&rsquo;t need to do anything fancy to initialize a new channel. We just need to mention a name and send some messages to it. Django will create the channel for us.</p> <pre><code class="language-python">Channel(&quot;thumbnailer&quot;).send({ &quot;image_id&quot;: image.id }) </code></pre> <p>Of course we need corresponding workers to be listenning to those channels. Otherwise nothing will happen. Please note that besides working with new protocols, Channels also allow us to create some sort of message based task queues. We create channels for certain tasks and our workers listen to those channels. Then we pass the data to those channels and the workers process them. So for simpler tasks, this could be a nice solution.</p> <h3 id="scaling-production-systems">Scaling Production Systems</h3> <h5 id="running-workers-seperately">Running Workers Seperately</h5> <p>On a production environment, we would want to run the workers seperately (since we would not run <code>runserver</code> on production anyway). To run the background workers, we have to run this command:</p> <pre><code class="language-sh">python manage.py runworker </code></pre> <h5 id="asgi-daphne">ASGI &amp; Daphne</h5> <p>In our local environment, the <code>runserver</code> command took care of launching the Interface server and background workers. But now we have to run the interface server ourselves. We mentioned <strong>Daphne</strong> already. It works with the <code>ASGI</code> standard (which is commonly used for HTTP/2 and websockets). Just like <code>wsgi.py</code>, we now need to create a <code>asgi.py</code> module and configure it.</p> <pre><code class="language-python">import os from channels.asgi import get_channel_layer os.environ.setdefault(&quot;DJANGO_SETTINGS_MODULE&quot;, &quot;djchan.settings&quot;) channel_layer = get_channel_layer() </code></pre> <p>Now we can run the server:</p> <pre><code class="language-bash">daphne djchan.asgi:channel_layer </code></pre> <p>If everything goes right, the interface server should start running!</p> <h5 id="asgi-or-wsgi">ASGI or WSGI</h5> <p>ASGI is still new and WSGI is a battle tested http server. So you might still want to keep using wsgi for your http only parts and asgi for the parts where you need channels specific features.</p> <p>The popular recommendation is that you should use <code>nginx</code> or any other reverse proxies in front and route the urls to asgi or uwsgi depending on the url or <code>Upgrade: WebSocket</code> header.</p> <h5 id="retries-and-celery">Retries and Celery</h5> <p>The Channels system does not gurantee delivery. If there are tasks which needs the certainity, it is highly recommended to use a system like Celery for these parts. Or we can also roll our own checks and retry logic if we feel like that.</p>