Blog Archive, July 2016

Featured Image

Comparison of HTML5 Parsers: Gumbo vs html5lib

 July 29, 2016    0 comments 

When developing content plugin for Kodi mediacenter the most important part is where to get the content from. One of the possible ways is to scrap websites that host multimedia content. Yes, legality of that content is another question, but legal matters are beyond the scope of this post.

In Python world BeautifulSoup library (BS for short) in combination with html5lib parser is a popular choice. However, according to the BeautifulSoup documentation the html5lib parser is the slowest, albeit the most reliable, of all html parsers. So I googled for alternatives and found Gumbo parser made by Google itself. According to the description it's fully HTML5-compliant and written in pure C99 with no external dependencies. And it has Python bindings compatible with popular Python HTML parsing libraries, including BeautifulSoup. The BeautifulSoup binding was written for BS 3 of but making it compatible with BS 4 was relatively easy, which I did and submitted a pull request on GitHub (which seems to be ignored by the repo maintainers). (...)

Read post


Featured Image

Wsgi-Boost-Server: A Python WSGI Server Written in C++

 July 15, 2016    0 comments 

At last I found some time to write about my recent project — WsgiBoostServer. I started it to learn C++ and, specifically, writing binary extension modules for Python using Boost.Python. As the name implies, this is a WSGI server, that is, a HTTP server for Python web applications. But in addition to Python applications WsgiBoostServer can also serve static files that allows to use it for serving standalone Python micro-services with all their static content.

Because WsgiBoostServer is written in C++ using Boost.Asio library, it is faster than pure Python WSGI servers like Waitress or CherryPy. And since it can be used as a regular Python module (although binary), it does not require complex set-up and can be included in any Python application. More info about WsgiBoostServer and its source code can be found in my GitHub repository. It's MIT-licensed so feel free to use it as you like if you find this my little side-project interesting.

Update: Unfortunately, deeper testing revealed serious problems. WsgiBoostServer works with pure-Python WSGI applications without problems but crashes because of memory corruption if I add some binary Python modules in the mix. My guess is that Boost.Asio does not work well inside a Python interpreter which does its own memory management. Since diagnosing such arcane memory problems is way over my head, I had to abandon this project frown.

Read post