Building docker images with ansible

Docker is something really hot recently. It allows you to run your software with linux container easily. It's actually kind of OS level isolation rather than hardware simulation nor kernel simulation. So you won't have too much performance penalty but still have pretty nice virtual machine features. I really like the analog used by Docker community, shipping software should be easier and Docker serves as just like the standard container in shipping industry.

Building docker images is not hard, but ...

Although docker provides an easy way to deliver and run your software in linux container, there is still no an obvious and easy way to build a docker image for big projects. For building a large and complex project docker image, you would probably need to

  • Clone your private software repo in to build folder
  • Ensure base images are built before your project image
  • Generate some files dynamiclly, such as current git commit revision
  • Generate templates
  • Upload image with your credentials

With Dockerfile, you can only have static steps for building the image. Obviously, it was not designed for doing any of these listed above. And since docker uses a kind of layering file system, you probably don't want to put your Github credentials into the container and pull the repo inside it, because it works pretty similar to git commits, once you commit, then it's hard to remove it from the history. So you defititely want to do these things outside the container and then put them together.

My first solution - Crane

With these requirements in mind, I actually feel it's pretty similar to the experience I had with Omnibus - a tool for packing your software into a standalone dep package. So I built a simple tool in Python for building docker images, named Crane. It allows you to define steps for building the image, it also provides template generating with jinja2.

The final solution - ansible

Crane was working fine, but I actually don't like to make a new wheel and maintain it if there is already an obvious better solution. After I played ansible for a while, I realized it is actually a way better solution for building docker images. So, what is ansible you may ask, well, it's yet another deployment tool, like Puppet, Chef or SaltStack.

Wait, what? Why you are using a deployment tool for building docker image? It may sound odd to you at very begining. But ansible is not actually just yet antoher deployment tool. Its design is pretty different from its predecessors. It uses SSH for pushing commands to target marchines, other tools are all pulling based. It also provides many modules for different operations, including creating instances in EC2 or other cloud computing providers. Most importantly, it is able to do orchestration easily.

Of course it meets requirements we mentioned before

  • Clone git repo? Check.
  • Build base image? Check.
  • Generate dynamic file? Check.
  • Generate templates? Check.
  • Upload images? Check.

Moreover, with ansible, you can launch an EC2 instance, build the image inside it, and run a series of tests before you publish the image. Or you can simply build the image in your vagrant machine or in the local machine. It makes building software extremely flexible, since you can run the building process anywhere you want as long as they can be pushed as commands via SSH, you can also provision the whole building environment, or even drive a fleet in cloud for building, that's pretty awesome huh, isn't it?

Show me the code

Okay, enough of talking, let's see the code. The tasks/main.yml looks like this

- assert:
    that: 
      - 'hellobaby_version != ""'

- name: install apt packages
  apt: "name='{{ item }}' state=present"
  with_items:
    - git
    - python-pip

- name: install docker-py
  pip: name=docker-py version=0.3.1

- name: create build directory
  file: >
    dest="{{ hellobaby_build_dir }}"
    state=directory

- name: clone hellobaby git repo
  git: >
    repo="{{ hellobaby_repo }}"
    dest="{{ hellobaby_build_dir }}/hellobaby"
    version="v{{ hellobaby_version }}"
  register: hellobaby_repo

- name: remove git deploy key
  file: dest=/tmp/github_deploy_key state=absent

- name: archive repo
  shell: >
    cd "{{ hellobaby_build_dir }}/{{ item }}" &&
    git archive -o ../{{ item }}.tar HEAD
  with_items:
    - hellobaby

- name: generate templates
  template: >
    src="{{ item.src }}"
    dest="{{ hellobaby_build_dir }}/{{ item.dest }}"
  with_items:
    - { src: "Dockerfile", dest: "Dockerfile" }
    - { src: "runapp.sh", dest: "runapp.sh" }

- name: build image
  docker_image: >
    name="{{ hellobaby_image_name }}"
    tag="{{ hellobaby_image_tag }}"
    path="{{ hellobaby_build_dir }}"
    state=build

- name: tag
  command: >
    docker tag -f
    {{ hellobaby_image_name }}:{{ hellobaby_image_tag }}
    {{ hellobaby_image_name }}:{{ hellobaby_extra_tag }}
  when: hellobaby_extra_tag != ''

and the playbook looks like this

---
- name: Build Hello baby app image
  hosts: all
  sudo: yes
  vars_prompt:
    - name: hellobaby_version
      prompt: "hellobaby_version"
      default: "1.0.0"
      private: no
    - name: hellobaby_iteration 
      prompt: "hellobaby_iteration"
      default: 1
      private: no
  roles:
    - Ansibles.apt
    - hellobaby_image

So, to build with vagrant, you can run something like this

ansible-playbook \
  -i .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory \
  -u vagrant --private-key=~/.vagrant.d/insecure_private_key \
  playbooks/hellobaby.yml

You can find the complete example here - ansible-docker-demo.

A tool for deployment but also amazing for building software

Although ansible was not designed for building software, it doesn't necessary mean you cannot do not it. And surprisingly, it does so well in building software. With its machine provisioning and orchestration capability, you can integrate building and deployment togehter easily. The building environment itself can also be provisioned before building the software. Cloud computing resource can also be liveraged. I feel there are actually lots more interesting things can be done with ansible. Looking forward to see how people use it not just for deployment but also for building software :P

Auto post-commit PEP8 correction

It's always an hateful job to correct PEP8 warnings manually.

$ flake8 billy --ignore=E501,W293
billy/tests/integration/test_basic.py:401:45: W291 trailing whitespace
billy/models/processors/balanced_payments.py:116:44: W291 trailing whitespace
billy/models/processors/balanced_payments.py:133:30: W291 trailing whitespace

I bet you don't like this either. Today I cannot take it anymore. I was wondering, why I should do this thing machine should do? So I seek solutions on the Internet, and I found an article looks helpful - Syntax+pep8 checking before committing in git. The basic idea is to add a pre-commit hook script to git for checking PEP8 syntax before commit. By doing that, you cannot commit code with PEP8 warnings anymore, when you do, you see errors like this

$ git commit
billy/tests/integration/test_basic.py:401:45: W291 trailing whitespace
billy/models/processors/balanced_payments.py:116:44: W291 trailing whitespace
billy/models/processors/balanced_payments.py:133:30: W291 trailing whitespace

Which is great, but still, you need to correct these syntax issues manually. I thought there must be something can do these boring job for you. And yes, I found autopep8. I can use autopep8 in pre-commit hook to correct PEP8 issues for me. However, I don't think it is a good idea to let code formatting tool modifies your code silently when committing. I want to know what is modified by the tool. So here comes another solution, I use post-commit hook instead to compare the latest commit with previous commit:

#!/bin/sh

FILES=$(git diff HEAD^ HEAD --name-only --diff-filter=ACM | grep -e '\.py$')
 
for f in $FILES
do
    # auto pep8 correction

    autopep8 --in-place --ignore=E309,E501,W293 $f
done

All you need to do is install autopep8

pip install autopep8

and put the post-commit script at .git/hooks/post-commit. In this way, once I do a commit, it will correct PEP8 issues for me. I can review what is modified there, and make another PEP8 correction commit. With this, you can eventually enjoy coding rather than wasting time in removing trailing whitespace everywhere :D

Slow down, it's faster, let's build Python 2.8

I really love and enjoy programming in Python, as it is one of my favorite programming languages, I also love to recommend Python to other developers or people who is about to learn their very first programming language. However, there will always be an awkward thing - when they ask you which of Python 2 or Python 3 to use. My answer could be:

Ah..., you should lean Python 2 first, major libraries support Python 2 only. It will takes about one or two years before these third-party resource catch up Python 3.

Sadly, five years has been passed since Python 3 was released, but there is still only 3% of Python developers are using version 3. If people ask me the same question now, I really don't know how to answer. And I even start thinking is this the end of the programming language I like so much? These days, I read a lots of articles talking about Python 2 and Python 3 crisis:

About Python 3

Re: About Python 3

Google+ Post by Ian Bicking

the Gravity of Python 2

More About Unicode in Python 2 and 3

There are many different opinions, some of them said you should kick people harder, so that they will start using Python 3. Some of them said you should build Python 3 even better, so people will start using it. For me? I don't believe you can kick people harder to make them jump to the moon, I also don't believe people can jump to the moon just simply because you put treasure on it and say come and get it if you can, it has been 5 years passed, why we are still not on the moon?

I think the problem there is simple, the goal is too far, the community is too eager. I recall a story I hear when I was a child.

There was a young person carring all kind of goods on him, a ship is about to leave. He asked an local elder man.

Can I make it in time?

The elder man took a glance on him, and say,

If you walk slowly, take it easy, you can make it.

The young guy didn't take the advice, he ran to the port as fast as he can. Unfortunately, he fell on the road, and all his goods dropped around, he didn't make it in time.

Python community is a little bit like the eager young person in the story. Urgent to build so many fancy advanced features, but what is the point if no body is using it? I think maybe it is the time to slow it down, then we can go far.

Interestingly, there are calls for Python 2.8 in these discussions, and personally, I also believe Python 2.8 could be the missing bridge from Python 2 to Python 3. And if it is necessary, maybe there should be Python 2.9 and Python 2.10. I know it is the nature of a developer to discard old stuff, to eager to build and embrace awesome new widgets. But in real software world, you don't get to awesome suddenly, instead, you keep getting better gradually. So, let's stop blaming anyone and build Python 2.8 :)

Technical debts

When you buy a house, you raise a mortgage. When you buy a car, you raise an auto loan. Maybe you are rich enoguh to clear all the debts at once, but anyway, we all live with debts, more or less. Infact, for software developers, we all live with debts as well. They're so called technical debts. I really like the debt analog, technical debts are similiar to debts in real life. Funny enough, most of people know how real debts work, but technical debts are not well known by developers. However, it makes sense. People live with the idea of debts maybe for thousands years, but there are only few decades history of computer.

Financial debts

That's really nice to have an accurate analog. It allows me to explain things by borrowing some mature experience from financial world. I am not an expert in finance, however, when I see a cash flow diagram, I realized this is exactly the same diagram for explaining technical debts. Let's see what the cash flow diagram looks like when you rase a loan from bank.

As the name implies, it's all about flow of cash. Green arrows above the axis are the income, red arrows below the axis are the cost. When you raise a loan, you have an immediate income at the begining, but it doens't come for free. You need to pay interest to the bank periodically. And eventually, you need to refund the initial debt (not shown in the diagram). There are various different situations, for example, you may can rent the house to others, then you have recurring income, sell the house when the prise raised and refund the mortgage eventually. Nevertheless, we are not teaching finance here, the point here is that we can borrow the diagram for visualizing technical debts.

Raise a technical debt

For software development, I see production as the income, production reduced or extra time spend as the cost. So, how to raise technical debts you may ask, well, the fact is, you don't have to, there are many build-in debts during the software development process.

Let's see an example, say, you are developing a system. At first, there is only one feature in it, everytime you modify the code, you have to check that is this feature working correctly. The code you write is the income, and the time for testing is the cost. Overtime, you fix bugs, you improve the code, you always need to make sure does the feature work correctly. This is actually a very common build-in techincal debt in software development. Not to write automated testing is the debt itself, by doing that, you save some development time, it is an immediate income (production gain), however, you need to pay interest everytime you modify the code. The diagram would look like this

Things could even get worser when there are new features added to the system, you have more features to test each time you modify the code. The diagram for a scale-growing system looks like this

You said, hey, why not just ignore them, I believe they will not be broken so easily. Well, this could be true, however, saving time by not to do test, the cost for testing will become risk, your customers or end-users are going to test those undetected issues for you.

Moreover, when the system came to a really big scale, you may found yourself are always testing for countless functions, and there will also be endless bugs to fix. That's simply the debt is too high, your productivity is all eaten by the interest, you are never going to deiliver the product unless you refund the debt.

For the same case, you have more and more features in the system, but you spend your time on autmoated testing at begining. It just like you refund the debt at very first, this makes the interest in control. When you added a new feature, all you have to do is to write corresponding tests for it. In this way, You can make some real progress.

technical_debt03.png

Source of debts

Unlike real debts, you don't have a sheet tells you how much they are. Technical debts sometimes just aren't obvious. Nevertheless, we know some certain source of debts. Such as

  • Bad coding style
  • Bad design
  • No automated testing
  • No documents
  • No proper comments in code
  • No version control system
  • Dirty hacks

Maybe there are other debts not listed above, however, the point is, these debts all have similiar effect - you or team members need to pay the interest when developing on these code. For example, a badly written function, everytime developers read it, they all need extra time to understand it, that is the interest. Interestingly, although you have to pay the interest, not all debts gain you a big boost in development, some debts can be avoided easily. Experienced developers can usually produce code with good style and design.

Debts are not all that bad

So far, we talked like debts are all evil, they are demons, you should never deal with them. But the truh is raising debts can be a good thing sometimes. As raising debts buy you some time, even for real life finance world, raising debts could be a key to success. When a company has no debts, investors actually see the company must be inefficient. So this is about trade off, experienced developers not only produce code with lower debts than inexperienced ones, they also know when to raise debts, how much to raise.

For example, you are running a startup, you even don't know is your product going to work. At this moment, you can do some dirty hack to make things work, refund the technical debts later after you survive.

Nice, I am not the one who pays bill

People love free lunch, it is really nice you don't have to pay the bill, isn't it? Developers also like it. There are many situations that you are not the one paying interest for technical debts. For example, you accept a software development contract, you are pretty sure once you deliever the project, you are never going to see it again. In thise case, many developers just don't care, they are not the one who pays bill, why should they?

This is an actual moral hazard. Funny enough, it also happens in real finance world, like Wall Street bankers, they know taxpayers are going to pay the bill, why should they care risk? Unfortunately, unlike bankers, ignoring moral hazard won't earn you billion dollars. It only earns curse from the next developer. And sometimes, you just have no choice, the deadline is right ahead, all you can say is

screw it!

Despite the situation you have no choice, sometimes you can raise as much technical debts as possible without worrying about it. For instance, you are writing a run once and throw alway script, then do your best to raise debts.

Summary

For software development, it is important to understand technical debts, there is no easy or accurate way to measure them, but you can tell from your experience. My debt analog here may not be 100% precise, but surely it gives you a feeling about it. To build a successful software, you should keep the idea of technical debts in mind, you should also control them rather than letting them control you.

Keep a readable Git history

Readability of code is important, there is a fact

Code is read more than it is written

This is so true, even when you are writing code, actually, you need to read the current code base again and again. Back to the ancient programming era, people are still using forbidden black magic - goto statements. Powered by the black magic, there is a demond Spaghetti code, and it looks like this

Spaghetti.jpg

(From http://en.wikipedia.org/wiki/File:Spaghetti.jpg under Creative Commons 2.0 license)

Spaghetti code is the code hard to read and maintain, it was killing countless developers. Then a brave developer invented a new form of magic - structure programming, eventually, the demond was defeated and the black magic was forbidden since then.

This story told us readability is important, but what about readability of Git commit history? There are chances we need to look into the development history, such as finding what are the corresponding ticket for those commits? Who is the author? when are the changes introduced. Although there are tools to use, sometimes you still need to read the history and it is just unreadable and hard to understand. I bet you see much worser history than this one

1ced9f18-ffeb-11e2-854a-21d3b661b672.png

It makes reading painful. Despite chance of reading development history is less than reading code, it is still very helpful to have a clean readable linear history. Today, I am going to share some experience about keeping a readable history.

Use SourceTree

It is never pleasant to use a command line tool when there is a nice GUI tool. I hate ASCII git history graph, they are just ugly. Luckly, we have an awesome free GUI tool to use - SourceTree.

eae918a6-ffec-11e2-88a4-d6cd5858afd6.png

Always create a new branch for ticket

When you are working for an ticket or issue, you should always create a branch for it.

add07604-fff0-11e2-92b6-1ac3af04ae5f.png

You should try to keep commits in the branch only for resolving that ticket. It is okay to have some typo corrections or minor changes in it. However, if you put unrelative commit for major changes into the branch, other developers cannot know that you have some off topic changes in that branch easily.
By doing branch only for one purpose, here you have

  • Easier to understand what this branch is for
  • Easier to reverse changes introduced by this branch

Here you are working on a new branch, then you can commit

2ef071bc-fff1-11e2-94f7-e9e99d61f194.png

After then, you have several commits and they are good to merge

b9358abe-fff2-11e2-8bc0-45c240afd901.png

We want to keep the branch in history, so remember to use non-fast-forward merge, check the Do not fast-forward when merging, always create commit option

7d245a0a-fff2-11e2-8be5-9289e6d86d8d.png

It's time to merge, first, right click one the master branch and click Checkout. You should be at master branch now. Then, right click new-feature branch and click Merge.

73d9eafe-fff3-11e2-8086-799dd1e77edb.png

Remember to check Commit merged changes immediately to make a new commit directly.

Whoa, here we are, a beautiful linear history still with branch information.

068000bc-fff6-11e2-9e81-641ba0e09083.png

Always rebase before merge

For now, you are writing the next awesome feature - foobar-2000!

c67a7e4c-fff6-11e2-8d70-9bf3ad080a51.png

Things go well, however, in the mean time, a new branch is merged from other guys repo. Oh my god, foobar 3000! awesome!

f6953c1a-fff7-11e2-94c3-2676b2064b0a.png

Okay, let's see what it looks like to merge it directly

3b3b1592-fff8-11e2-889c-858a6f3c65c0.png

Ugly, let's try something better - rebase. First, right click on foobar-2000 and click checkout. Then right click on master and click Rebase

0e721cbc-fff9-11e2-99c9-4b248a38a6ea.png

This is better! And we can merge it like before

72b1754c-fff9-11e2-8754-a50629b3bd95.png

Rebase and force push

As usual, you keep work on this nice and beautiful linear history, however, you won't feel safe to leave your commits on your local machine will you? We always push our working branch to GitHub to keep it safe, get reviews and feedbacks from others

01230738-fffe-11e2-8a02-9ea6e80c50a7.png

Yes, again, you may hate this, there is another branch is merged into the master.

d6860ede-fffe-11e2-9406-05781418eec3.png

Okay, you said, this is not a big deal, I can always rebase and merge as usual. Here you rebase

12799f14-ffff-11e2-9a47-e421f30877c9.png

Well, it is still under development, you want to push to your fork, but not to merge it. Then you push, and oops!

44715750-ffff-11e2-8d0e-417ae238e6e6.png

So what just happened?

As you can see there is a branch origin/foobar-bugfix, that the HEAD in your origin remote, which is, your GitHub fork repo. When you are pushing your local foobar-bugfix to the fork repo, it means the remote one will be overwritten. It is a little bit dangerous to overwrite a HEAD in Git repo. So, it doesn't allow you to do this by default.

Again, it has risk, so you need to be sure what you are doing (although the commit will still stored in the repo, but without HEAD you cannot find them easily, you will need some low level operations to get them back). In this case, we just want to rebase our commits on the master and push it to our own repo, that won't be a big problem in most cases. It appears SourceTree doesn't support --force push, so you need to click Terminal button. Then type

git push origin foobar-bugfix -f

This will force git to push your local branch to overwrite the remote one. Let's see

$ git push origin foobar-bugfix -f
Counting objects: 11, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 809 bytes, done.
Total 9 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (9/9), done.
To /Users/VictorLin/foobar_fork
 + 178c9a4...cc5d760 foobar-bugfix -> foobar-bugfix (forced update)

Here we are

c32d8b08-0000-11e3-9350-cd8bc9cb7ca4.png

(Tips: you can click Repository and Refresh Remote Status to update your git history status on the UI)

Notice When you are the only one working on the branch, it is fine to do a force push, otherwise, be careful. For more details, please reference to
http://git-scm.com/book/ch3-6.html#The-Perils-of-Rebasing

Always rebase current developing branch when there are new commits

As you know, there would be conflicts when you are doing merge or rebase. When there are more new commits in the master branch, the more likely you are going to have a tons of confliction. So, it is a good practice to always rebase your working branch on the master when there are new commits on it.

4727a568-0003-11e3-994d-fe37950d731b.png

However, sometimes, you have some works on the branch, but they are not committed, you don't want to commit something in middle like this. But when you are doing rebase, Git won't allow you to have change to files in the workspace. In this case, you can use Stash. Click Repository and Stash Changes.

f1415576-0003-11e3-9654-16fb89934ea4.png

Then you can see your stash appears in the sidebar

260587e6-0004-11e3-99f2-32412bf3eae6.png

After you finish the rebasing, you can right click on the stash and click Apply Stash, then here you are. Your saved changes are back to working space.

Again, happy ending :D

0ca5aa82-0005-11e3-9678-ee0b9f7b4354.png

Use interactive rebase to clean dirty commits

People make mistake. Sometimes there are small commits which are for formatting or fixing typo. And these commits are all based on your own newly submitted commits.

fdd95d30-0006-11e3-8ba1-bccb8edc39f1.png

In this case, you might wonder would it be nice to adopt some black magic to make your stupid mistakes disappear? Well, yes, there is magic. You can use interactive rebase to squash some commits into pervious one. Now, you are at awesome branch, right click on master branch, then click Rebase children of xxx interactively. Then you will see interface like this

8b2c2532-0007-11e3-9da6-cb3b965a02eb.png

Select those mistake fixing commits, and click Squash with previous. And you will see multiple commits to be put altogether. And you can click Edit message to modify the commit message of the squashed commit.

f4c735d6-0007-11e3-84d7-b385c85671ad.png

Then press OK and look this!

233957d2-0008-11e3-889d-7b7148e46f65.png

Just like what he said in Mad Man, nothing happened!

This is actually even more powerful, you can arrange order of commits around, edit commit messages, delete specific commits. But be careful, like what Spider Man told you

Great power comes great responsibility

It is kind of history rewrite, it is fine to use it on the branch only you are working on, you should not use it on a shared branch unless you know exactly what you are doing.

The benefits of readable history

That's it, the history is linear, another wonderful day.

35d7c65c-0009-11e3-9dc1-12f77d3dc843.png

Readable history doeson't only look beautiful, it provides easy-to-understand development history. All team members in the project can follow the development progress easily. When something goes wrong, it is also easier to trace down the problem, especially when you need to fix it as soon as possible.

Fall in love with continuous testing and integration - Travis-CI

You know, testing is important for software development. With good continuous integration and testing there, you have confidence that your software has a certain quality. It doesn't mean your software is prefect then, however, when things broken, you can catch them and fix it. Jenkins is a pretty awesome and easy-to-use open source continuous integration tool, but for developing my own hobby open source projects, I just don't want to rent a server and run Jenkins. So, I am always wondering, wouldn't it be nice to have something like CI as service? I can just put my code there, and it could do the rest for me.

Meet the Travis-CI

Recently, I meet an awesome service which really fits what I want - Travis-CI. It has GitHub integration, so all you have to do is to grant some GitHub permissions to Travis-CI and write a configuration .travis.yml file like this:

python:
  - "2.7"
env:
  - PYRAMID_VERSION="pyramid>=1.2,<1.3"
  - PYRAMID_VERSION="pyramid>=1.3,<1.4"
  - PYRAMID_VERSION="pyramid>=1.4,<1.5"
# command to install dependencies

install: 
  - "pip install flake8 --use-mirrors"
  - "pip install $PYRAMID_VERSION"
  - "pip install -r requirements.txt --use-mirrors"
  - "pip install -r test_requirements.txt --use-mirrors"
  - "python setup.py develop"
before_script: "flake8 pyramid_genshi --ignore=E501,W291,W293"
# command to run tests

script: "python setup.py nosetests"

From https://github.com/victorlin/pyramid_genshi/blob/master/.travis.yml

Then it works like a charm, you can see the building results here https://travis-ci.org/victorlin/pyramid_genshi

The best part of it is, if you are testing open source project, it is totally free. I really love it!

Test Chef code on it, a dream in the dream

Currently, I am working on an open source Chef project for deployment. I think it would be neat to setup Travis-CI for testing my Chef code, so I tried to run Vagrant with VirtualBox on it. However, it turns out that Travis-CI testing environment is already running under a virtual machine, and it is based on OpenVZ, which is actually a container rather than hardware simulation. I cannot find a way to install VirtualBox with Travis-CI. Saddly, this is not Inception, I cannot have a dream in the dream.

Inception-Wallpaper-inception-2010-12396931-1440-900 (1).jpg

Fine, I changed my mind then, it is already a virtual environment, why don't I just run my Chef code against the Travis-CI environment?

The missing feature - interactive debugging

Okay, it appears that it is a better idea to run Chef code against Travis-CI instance instead of to have a dream in the dream. Nevertheless, it is still a pain in ass to make my Chef code works on Travis CI. You can't never get the thing done at the very first time. And you always have to push a commit to kick it starting to build, so it results in a painful trial and error loop looks like this

travis-ci-failures.PNG

In the process, you will see error output like this

  • The PostgreSQL server failed to start. Please check the log output. ...fail!

Okay... check the log out? but how? I can add a "cat /path/to/log_file" to the .travis.yml and push the commit to make it run again, but it would only be another painful waiting. I tried to reproduce the testing environment with Vagrant on my machine, but I can only find some outdated information there and some important Vagrant boxes are missing.

Like what Niko said in GTA IV

This no touching rule is killing me!

This no touching rule to Travis CI is also killing me. I think it would be nice to have a chance to interact with the CI environment after something went wrong. Fortunately, I contacted the support, they said they are working on it.

Green dots make you happy

Once I setup Travis-CI for one project, after I realize how easy it is, I just can't wait to setup for most of my open source projects. When there are red dots, you really want to erase them all. However, when it is all in green, like this

Traivs-CI-repo-list

That's really pleasant to see a all green list in Travis-CI. If you are also an open source lover, just try it now, you will love it :)

Zero-downtime Internet process migration in same host

When I am running my website, something troubles me.  While there is a bug in the production server, I need to modify code and restart them.  Sounds fine, right? Yes, for most of web servers, they are stateless,  it is not a big deal to restart them whenever you want, but it is not true for me, they are realtime audio streaming servers. When you restart a realtime streaming server, it means audience connected to the server will be interrupted.  Here is a diagram shows the problem:

server_diagram.png

You can see there are some gaps in the plot, that’s caused by server restarting.  Of course, for users, that would definitely be a bad experience.  Therefore, I’m thinking how to solve this problem recently.  Before we go into the design, let’s look the reasons for restarting server first.

  • To deploy new version of program
  • To fix bugs
  • The process is using to much memory
  • To reload environment, ulimit -n for example (the limit count of file descriptor under unix-like environment)
  • To migrate from host A to host B

For simply deploying new version of program, we can use reload function of Python to reload modules.  But there are some problems, reload function only rerun the module, those created instances are still there (if they are copied into some namespaces), it might work if the change is minor.  On the other hand,  reloading can’t solve memory usage problem, process environment change problem.  And here comes the final reason, to migrate service from host A to B.  Indeed, it is difficult not to make any down time for such migration, we only focus on migration in same host.

The solution

The biggest challenge is - how to migrate those existing connections? I did some research and have an idea in my mind. Create a new process, and transfer those connections (socket file descriptors) to the new process, and shut the old one down. Following diagrams illustrate my solution.

mig_011.png

The Master is a process which is in charge of managing migration and receiving commands.  And the process A is for running the service.

mig_02.png

Before we perform the migration, the Manager spawns process B, and wait it says "i'm ready".

mig_03.png

When process B says “Hey! I’m ready”, then the manager tells process A to send the connection state descriptor to process B.  Process B receives the state, and takes over the responsibility of running service.

mig_04.png

Finally, process B took over the service, then master tells process A “You are done.” and the process A kills himself.

That’s it, the service was migrated from one process to the other, and there is no any down time.

The problem – socket transfer

The idea sounds good, right? But still, we have some technical problem to solve. It is “How to transfer socket (file descriptor) from one process to another?”. To solve this problem, I did study, and eventually found solutions.

Child process

For most of unix-like OS, child processes inherit file descriptors from their parent. Of course we can use this feature to migrate our service, but however, it got its limitation. You can only transfer file descriptors from parent to child process.

Sendmsg

Another way to achieve same goal is, to use sendmsg through a unix domain socket to send the file descriptors. With sendmsg, you can transfer file descriptors to almost any processes you like, that’s much flexible.

A simple demonstration

To simplify the example, we only implement process A and process B here, it is quite enough for two processes to complete the migration. Before we go into the details, there is another problem to solve, sendmsg is not a standard function in Python. Fortunately, there is a third-party package sendmsg provides this function. To install sendmsg, just type

easy_install sendmsg

And here you are. Okay, following are the two programs.

a.py

import os
import socket
import sendmsg
import struct

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', 5566))
s.listen(1)
conn, addr = s.accept()
conn.send('Hello, process %d is serving' % os.getpid())
print 'Accept inet connection', conn
us = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
us.bind(('mig.sock'))
us.listen(1)
uconn, addr = us.accept()
print 'Accept unix connection', uconn
payload = struct.pack('i', conn.fileno())
sendmsg.sendmsg(
    uconn.fileno(), '', 0, (socket.SOL_SOCKET, sendmsg.SCM_RIGHTS, payload))
print 'Sent socket', conn.fileno()
print 'Done.'

b.py

import os
import socket
import sendmsg
import struct

us = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
us.connect(('mig.sock'))
print 'Make unix connection', us

result = sendmsg.recvmsg(us.fileno())
identifier, flags, [(level, type, data)] = result
print identifier, flags, [(level, type, data)]
fd = struct.unpack('i', data)[0]
print 'Get fd', fd

conn = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
os.close(fd)
conn.send('Hello, process %d is serving' % os.getpid())
raw_input()

The a.py accepts an inet socket and opens an unix domain socket, waits b.py to take over the service.  And here we run b.py, it connects to a.py and receives the fd of socket and takes it over and run the service.

The result

螢幕快照-2011-02-04-下午4.07.07.png

As the result shows, there is no down time between two Internet service processes migration in the same host.

It can be very useful to employ this trick in Internet programs which need to keep active connections. You can even migrate connections from a Python program to a C/C++ program, or vice versa. Also, to keep the memory usage low, you can migrate the service to the same program in different process periodically.

However, although it works in our demo program, but for real life complex server programs, it would be very difficult to implement migration mechanism like this. You need to dump connection state completely in one process and restore them in the other process. Due to the fact internal state of connections can be very complex, it could be impractical.

DreamOnline

I started a MMORPG project named DreamOnline ten years ago when I was a kid. So far, I have no time to finish it. Recently, I restarted the project again, and hope that I can finish it. :)

Screenshot:

client_screenshot_01.png

You can visit DreamOnline.tw to download the very basic demo.

Treeview v0.1 is released

To make it easy to manage folder and items on web-based application, I developed a new open source project Treeview.

螢幕快照-2013-02-18-上午11.25.56.png

It makes folders and items in a web-based tree become draggable. You can see the online demo. It's under the MIT license, feel free to use it or fork it.

Good logging practice in Python

In reality, logging is important. When you transfer money, there are transfer records. When an airplane is flying, black box (flight data recorder) is recording everything. If something goes wrong, people can read the log and has a chance to figure out what happened. Likewise, logging is important for system developing, debugging and running. When a program crashes, if there is no logging record, you have little chance to understand what happened. For example, when you are writing a server, logging is necessary. Following screenshot is the log file of a EZComet.com server.

ezcomet_logging.png

Without the log, I can hardly know what’s wrong if a service goes down. Not only for the servers, logging is also important for desktop GUI applications. For instance, when your program crashes on your customer’s PC, you can ask them to send the log files to you, and you may can figure why. Trust me, you will never know what kind of strange issues there will be in different PC environments. I once received an error log report like this

2011-08-22 17:52:54,828 - root - ERROR - [Errno 10104] getaddrinfo failed
Traceback (most recent call last):
  File "<string>", line 124, in main
  File "<string>", line 20, in __init__
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/wx._core", line 7978, in __init__
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/wx._core", line 7552, in _BootstrapApp
  File "<string>", line 84, in OnInit
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/twisted.internet.wxreactor", line 175, in install
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/twisted.internet._threadedselect", line 106, in __init__
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/twisted.internet.base", line 488, in __init__
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/twisted.internet.posixbase", line 266, in installWaker
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/twisted.internet.posixbase", line 74, in __init__
  File "h:\workspace\project\build\pyi.win32\mrdj\outPYZ1.pyz/socket", line 224, in meth
gaierror: [Errno 10104] getaddrinfo failed

And eventually figure out that the customer PC is infected by a virus which makes call to gethostname failed. See, how can you even know this if there is no log to read?

print is not a good idea

Although logging is important, not all developers know how to use them correctly. I saw some developers insert print statements when developing and remove those statements when it is finished. It may looks like this

print 'Start reading database'
records = model.read_recrods()
print '# records', records
print 'Updating record ...'
model.update_records(records)
print 'done'

It works when the program is a simple script, but for complex systems, you better not to use this approach. First of all, you cannot leave only important messages in the log, you may see a lots of garbage messages in the log, but can’t find anything useful.You also cannot control those print statements without modifying code, you may forgot to remove those unused prints. And all printed messages go into stdout, which is bad when you have data to output to stdout. Of course you can print messages to stderr, but still, it is not a good practice to use print for logging.

Use Python standard logging module

So, how do you do logging correctly? It’s easy, use the standard Python logging module. Thanks to Python community, logging is a standard module, it was well designed to be easy-to-use and very flexible. You can use the logging system like this

import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info('Start reading database')
# read database here

records = {'john': 55, 'tom': 66}
logger.debug('Records: %s', records)
logger.info('Updating records ...')
# update records here

logger.info('Finish updating records')

You can run it and see

INFO:__main__:Start reading database
INFO:__main__:Updating records ...
INFO:__main__:Finish updating records

What’s different between the “print” approach you asked. Well, of course there are benefits:

  • You can control message level and filter out not important ones
  • You can decide where and how to output later

There are different importance levels you can use, debug, info, warning, error and critical. By giving different level to logger or handler, you can write only error messages to specific log file, or record debug details when debugging. Let’s change the logger level to DEBUG and see the output again

logging.basicConfig(level=logging.DEBUG)

The output:

INFO:__main__:Start reading database
DEBUG:__main__:Records: {'john': 55, 'tom': 66}
INFO:__main__:Updating records ...
INFO:__main__:Finish updating records

As you can see, we adjust the logger level to DEBUG, then debug records appear in output. You can also decide how these messages are processed. For example, you can use a FileHandler to write records to a file.

import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# create a file handler

handler = logging.FileHandler('hello.log')
handler.setLevel(logging.INFO)

# create a logging format

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

# add the handlers to the logger

logger.addHandler(handler)

logger.info('Hello baby')

There are different handlers, you can also send records to you mailbox or even a to a remote server. You can also write your own custom logging handler. I’m not going to tell you details, please reference to official documents: Basic Tutorial, Advanced Tutorial and Logging Cookbook.

Write logging records everywhere with proper level

With flexibility of the logging module, you can write logging record everywhere with proper level and configure them later. What is the proper level to use, you may ask. Here I share my experience.

In most cases, you don’t want to read too much details in the log file. Therefore, debug level is only enabled when you are debugging. I use debug level only for detail debugging information, especially when the data is big or the frequency is high, such as records of algorithm internal state changes in a for-loop.

def complex_algorithm(items):
    for i, item in enumerate(items):
        # do some complex algorithm computation

        logger.debug('%s iteration, item=%s', i, item)

I use info level for routines, for example, handling requests or server state changed.

def handle_request(request):
    logger.info('Handling request %s', request)
    # handle request here

    result = 'result'
    logger.info('Return result: %s', result)

def start_service():
    logger.info('Starting service at port %s ...', port)
    service.start()
    logger.info('Service is started')

I use warning when it is important, but not an error, for example, when a user attempts to login with wrong password or connection is slow.

def authenticate(user_name, password, ip_address):
    if user_name != USER_NAME and password != PASSWORD:
        logger.warn('Login attempt to %s from IP %s', user_name, ip_address)
        return False
    # do authentication here

I use error level when something is wrong, for example, an exception is thrown, IO operation failure or connectivity issue.

def get_user_by_id(user_id):
    user = db.read_user(user_id)
    if user is None:
        logger.error('Cannot find user with user_id=%s', user_id)
        return user
    return user

I seldom use critical, you can use it when something really bad happen, for example, out of memory, disk is full or a nuclear meltdown (Hope that will not happen :S).

Use _name_ as the logger name

You don’t have to set the logger name as _name, but by doing that, it brings us some benefits. The variable \name_ is current module name in Python. For example, you call logger.getLogger(_name_) in a module “foo.bar.my_module”, then it is logger.getLogger(“foo.bar.my_module”). When you need to configure the logger, you can configure to “foo”, then all modules in “foo” packages shares same configuration. You can also understand what is the module of message when reading the log.

Capture exceptions and record them with traceback

It is always a good practice to record when something goes wrong, but it won’t be helpful if there is no traceback. You should capture exceptions and record them with traceback. Following is an example:

try:
    open('/path/to/does/not/exist', 'rb')
except (SystemExit, KeyboardInterrupt):
    raise
except Exception, e:
    logger.error('Failed to open file', exc_info=True)

By calling logger methods with exc_info=True parameter, traceback is dumped to the logger. As you can see the result

ERROR:__main__:Failed to open file
Traceback (most recent call last):
  File "example.py", line 6, in <module>
    open('/path/to/does/not/exist', 'rb')
IOError: [Errno 2] No such file or directory: '/path/to/does/not/exist'

You can also call logger.exception(msg, *args), it equals to logger.error(msg, exc_info=True, *args).

Do not get logger at the module level unless disable_existing_loggers is False

You can see a lots of example out there (including this article, I did it just for giving example in short) get logger at module level. They looks harmless, but actually, there is a pitfall – Python logging module respects all created logger before you load the configuration from a file, if you get logger at the module level like this

my_module.py

import logging

logger = logging.getLogger(__name__)

def foo():
    logger.info('Hi, foo')

class Bar(object):
    def bar(self):
        logger.info('Hi, bar')

main.py

import logging

# load my module

import my_module

# load the logging configuration

logging.config.fileConfig('logging.ini')

my_module.foo()
bar = my_module.Bar()
bar.bar()

logging.ini

[loggers]
keys=root

[handlers]
keys=consoleHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler

[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=simpleFormatter
args=(sys.stdout,)

[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=

And you expect to see the records appear in log, but you will see nothing. Why? Because you create the logger at module level, you then import the module before you load the logging configuration from a file. The logging.fileConfig and logging.dictConfig disables existing loggers by default. So, those setting in file will not be applied to your logger. It’s better to get the logger when you need it. It’s cheap to create or get a logger. You can write the code like this:

import logging

def foo():
    logger = logging.getLogger(__name__)
    logger.info('Hi, foo')

class Bar(object):
    def __init__(self, logger=None):
        self.logger = logger or logging.getLogger(__name__)

    def bar(self):
        self.logger.info('Hi, bar')

By doing that, the loggers will be created after you load the configuration. The setting will be applied correctly.

Since Python2.7, a new argument name “disable_existing_loggers” to fileConfig and dictConfig (as a parameter in schema) is added, by setting it to False, problem mentioned above can be solved. For example:

import logging
import logging.config

logger = logging.getLogger(__name__)

# load config from file 

# logging.config.fileConfig('logging.ini', disable_existing_loggers=False)

# or, for dictConfig

logging.config.dictConfig({
    'version': 1,              
    'disable_existing_loggers': False,  # this fixes the problem

    'formatters': {
        'standard': {
            'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
        },
    },
    'handlers': {
        'default': {
            'level':'INFO',    
            'class':'logging.StreamHandler',
        },  
    },
    'loggers': {
        '': {                  
            'handlers': ['default'],        
            'level': 'INFO',  
            'propagate': True  
        }
    }
})

logger.info('It works!')

Use JSON or YAML logging configuration

You can configure your logging system in Python code, but it is not flexible. It’s better to use a logging configuration file. After Python 2.7, you can load logging configuration from a dict. It means you can load the logging configuration from a JSON or YAML file. Although you can use the old .ini style logging configuration, it is difficult to read and write. Here I show you an logging configuration example in JSON or YAML

logging.json

{
    "version": 1,
    "disable_existing_loggers": false,
    "formatters": {
        "simple": {
            "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        }
    },

    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "level": "DEBUG",
            "formatter": "simple",
            "stream": "ext://sys.stdout"
        },

        "info_file_handler": {
            "class": "logging.handlers.RotatingFileHandler",
            "level": "INFO",
            "formatter": "simple",
            "filename": "info.log",
            "maxBytes": 10485760,
            "backupCount": 20,
            "encoding": "utf8"
        },

        "error_file_handler": {
            "class": "logging.handlers.RotatingFileHandler",
            "level": "ERROR",
            "formatter": "simple",
            "filename": "errors.log",
            "maxBytes": 10485760,
            "backupCount": 20,
            "encoding": "utf8"
        }
    },

    "loggers": {
        "my_module": {
            "level": "ERROR",
            "handlers": ["console"],
            "propagate": "no"
        }
    },

    "root": {
        "level": "INFO",
        "handlers": ["console", "info_file_handler", "error_file_handler"]
    }
}

logging.yaml

---

version: 1

disable_existing_loggers: False

formatters:

    simple:

        format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"



handlers:

    console:

        class: logging.StreamHandler

        level: DEBUG

        formatter: simple

        stream: ext://sys.stdout


    info_file_handler:

        class: logging.handlers.RotatingFileHandler

        level: INFO            

        formatter: simple

        filename: info.log

        maxBytes: 10485760 # 10MB

        backupCount: 20

        encoding: utf8


    error_file_handler:

        class: logging.handlers.RotatingFileHandler

        level: ERROR            

        formatter: simple

        filename: errors.log

        maxBytes: 10485760 # 10MB

        backupCount: 20

        encoding: utf8


loggers:

    my_module:

        level: ERROR

        handlers: [console]

        propagate: no


root:

    level: INFO

    handlers: [console, info_file_handler, error_file_handler]

...

Following recipe shows you how to read logging configuration from a JSON file:

import os
import json
import logging.config

def setup_logging(
    default_path='logging.json', 
    default_level=logging.INFO,
    env_key='LOG_CFG'
):
    """Setup logging configuration

    """
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = json.load(f)
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

One advantage of using JSON configuration is that the json is a standard library, you don’t need to install it. But personally, I prefer YAML. It’s very clear to read and easy to write. You can also load the YAML configuration with following recipes

import os
import logging.config

import yaml

def setup_logging(
    default_path='logging.yaml', 
    default_level=logging.INFO,
    env_key='LOG_CFG'
):
    """Setup logging configuration

    """
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = yaml.load(f.read())
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

Now, to setup logging, you can call setup_logging when starting your program. It reads logging.json or logging.yaml by default. You can also set LOG_CFG environment variable to load the logging configuration from specific path. For example,

LOG_CFG=my_logging.json python my_server.py

or if you prefer YAML

LOG_CFG=my_logging.yaml python my_server.py

Use rotating file handler

If you use FileHandler for writing logs, the size of log file will grow with time. Someday, it will occupy all of your disk. In order to avoid that situation, you should use RotatingFileHandler instead of FileHandler in production environment.

Setup a central log server when you have multiple servers

When you have multiple servers and different log files. You can setup a central log system to collect all important (warning and error messages in most cases). Then you can monitor it easily and notice what’s wrong in your system.

Conclusions

I’m glad that Python logging library is nicely designed, and the best part is that it is a standard library, you don’t have to choose. It is flexible, you can write your own handlers and filters. There are also third-party handlers such as ZeroMQ logging handler provided by pyzmq, it allows you to send logging messages through a zmq socket. If you don’t know how to use the logging system correctly, this article might be helpful. With good logging practice, you can find issues in your system easier. It’s a nice investment, don’t you buy it? :D