Provision with Ansible from inside Docker

There are many deployment tools, such as Puppet, Chef and Salt Stack, most of them are all pull-based. Which means, when you deploy to a machine, the provisioning code will be downloaded to the target machine and run locally. Unlike many others, Ansible is a push-based deployment tool, instead of pulling code, it pushes SSH commands to the target machine. It's great to have push-based approach in many situations. For example, you don't need to install Ansible runtimes on the target machine, you can simply provision it. However, there are also shortcomings of this approach. Say if you want to provision EC2 instances in an AWS auto-scalling group, you don't know when a new instance will be launched, and when it happens, it needs to be provisioned immediately. In this case, Ansible's pushing approach is not that useful, since you need to provision the target machine on demand.

There are many ways to solve that problem, namely, to run Ansible provisioning code in a pulling manner.

Ansible-pull

One obvious approach is to use ansible-pull, it's an Ansible commandline tool clones your Ansible git repo for you, and run them locally. It works, however, there are some drawbacks. First thing is the dependencies issue, to run ansible-pull on the target machine, you will need to install Ansible runtimes on the machine first, if you are running an Ansible playbook depends on newer version of Ansible, then you need to find a way to upgrade the runtimes. Another problem is the provisioning code is installed via git or other version control system, it's hard to verify the integrity of those playbooks, and the code cannot be shipped as a single file.

Ansible Tower

Ansible Tower is the official commercial tool for managing and running Ansible. There is an interesting feature it provides, which is so-called "phone home". It works like this, when a new machine is launched, it makes an HTTP request to the Ansible Tower server, just like calling home and says

hey! I'm ready, please provision me

Then the server will run ansible-playbook against the machine. It works, but one problem we see there is, when your Ansible Tower can SSH into different machines and run sudo commands, it usually means you need to install your SSH private key in the tower server, and also need to preinstall the corresponding public key to all other machines. Allowing one machine to be able to SSH into all other machines makes me feels uncomfortable, it's like to put all eggs in single bucket. Although you can actually set pass-phase for your private key on the tower server, since your machines in AWS auto-scalling group need to be provisioned at anytime, so that you cannot encrypt your private key with pass-phase.

An interesting approach - Docker

With the requirements in mind

  • No runtime dependencies issue
  • Provision code can be shipped as a file
  • Provision code integrity can be verified (signed)

an interesting idea came to my mind. Why I don't simply put Ansible playbooks into a docker container, and ship the image to the target machine, then run the Ansible playbooks from inside the docker image and SSH against the host? With a docker image, I don't need to worry about Ansible dependencies issue, including Ansible runtimes themselve and many other necessary runtimes, such as boto, can all be installed into the docker image. And the docker image, can be shipped as a single file, we can sign the file and verify it on the target machine to ensure its integrity.

A simple example

I wronte a simple project to demonstrate this idea, the project can be found on github. It's actually pretty easy, for the Dockerfile, we install ansible dependencies and install necessary roles. We also copy our own site.yml into the docker image.

FROM phusion/baseimage:0.9.15

RUN apt-get update && \
    apt-get install -y python python-dev python-pip && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN pip install ansible
RUN ansible-galaxy install \
    Ansibles.hostname \
    Ansibles.apt \
    Ansibles.build-essential \
    Ansibles.perl \
    Ansibles.monit \
    ANXS.nginx
ADD site.yml /srv/ansible/site.yml

CMD ["/sbin/my_init"]

You can build the ansible image with

sudo docker build -t ansible-examples .

Then, to run it, before you do it, you need to create a host file, and insert the private IP address of your host machine. Like this

hosts:

10.0.2.15

You should notice that since the Ansible is executed inside the docker container, so localhost simply doesn't work. You need to specify an address which is accessable from the Docker container network. To allow SSH connection from the docker container, you also need to provide a temporary SSH public key installed in the host machine, and the private key for the container to connect to the host. Here is pretty much the command you run

sudo docker run -it \

    -v /vagrant/hosts:/tmp/hosts \

    -v /vagrant/insecure_private_key:/tmp/insecure_private_key \

    ansible-examples \

    /sbin/my_init --skip-startup-files --skip-runit -- \

    ansible-playbook /srv/ansible/site.yml \

    -i /tmp/hosts -u vagrant --private-key=/tmp/insecure_private_key

We map our hosts file to the docker container at /tmp/hosts, and the SSH private key at /tmp/insecure_private_key, then we can use it in the ansible-playbook command arguments. That's it!

It's so powerful to combine Ansible and Docker

It's so powerful to combine Ansible and Docker together, as you can see, the software for provisioning machines now is packed as a Docker image, so that you can run it anywhere. It's a solid unit, you can sign it, verify it, tag it, ship it, share it and test it. Everything is installed in the container, you don't need to worry about missing some plugins or roles on the target machine.

The only drawback I can think of is you need to install Docker on the target machine before you can use this approach, but it's not a problem since Docker gets more and more popular, you can preinstalled it in your AMI. And the only thing I am not happy with docker is the image registry system, it's very slow to push or pull an image if you have many layers in it and the size is big. Actually I have an idea about building a way more better docker registry, hopefully I have time to do it.

I am already using this approach for provisioning machines in our production environment, and it works like a charm so far. I am looking forward to see people using this technique to pack deployment code into docker image, imagine this:

sudo docker pull ansible-open-stack-swift
sudo docker run -it ansible-open-stack-swift

and boom! you have a fully functional Swift cluster in AWS EC2 now, isn't that awesome?

Building docker images with ansible

Docker is something really hot recently. It allows you to run your software with linux container easily. It's actually kind of OS level isolation rather than hardware simulation nor kernel simulation. So you won't have too much performance penalty but still have pretty nice virtual machine features. I really like the analog used by Docker community, shipping software should be easier and Docker serves as just like the standard container in shipping industry.

Building docker images is not hard, but ...

Although docker provides an easy way to deliver and run your software in linux container, there is still no an obvious and easy way to build a docker image for big projects. For building a large and complex project docker image, you would probably need to

  • Clone your private software repo in to build folder
  • Ensure base images are built before your project image
  • Generate some files dynamiclly, such as current git commit revision
  • Generate templates
  • Upload image with your credentials

With Dockerfile, you can only have static steps for building the image. Obviously, it was not designed for doing any of these listed above. And since docker uses a kind of layering file system, you probably don't want to put your Github credentials into the container and pull the repo inside it, because it works pretty similar to git commits, once you commit, then it's hard to remove it from the history. So you defititely want to do these things outside the container and then put them together.

My first solution - Crane

With these requirements in mind, I actually feel it's pretty similar to the experience I had with Omnibus - a tool for packing your software into a standalone dep package. So I built a simple tool in Python for building docker images, named Crane. It allows you to define steps for building the image, it also provides template generating with jinja2.

The final solution - ansible

Crane was working fine, but I actually don't like to make a new wheel and maintain it if there is already an obvious better solution. After I played ansible for a while, I realized it is actually a way better solution for building docker images. So, what is ansible you may ask, well, it's yet another deployment tool, like Puppet, Chef or SaltStack.

Wait, what? Why you are using a deployment tool for building docker image? It may sound odd to you at very begining. But ansible is not actually just yet antoher deployment tool. Its design is pretty different from its predecessors. It uses SSH for pushing commands to target marchines, other tools are all pulling based. It also provides many modules for different operations, including creating instances in EC2 or other cloud computing providers. Most importantly, it is able to do orchestration easily.

Of course it meets requirements we mentioned before

  • Clone git repo? Check.
  • Build base image? Check.
  • Generate dynamic file? Check.
  • Generate templates? Check.
  • Upload images? Check.

Moreover, with ansible, you can launch an EC2 instance, build the image inside it, and run a series of tests before you publish the image. Or you can simply build the image in your vagrant machine or in the local machine. It makes building software extremely flexible, since you can run the building process anywhere you want as long as they can be pushed as commands via SSH, you can also provision the whole building environment, or even drive a fleet in cloud for building, that's pretty awesome huh, isn't it?

Show me the code

Okay, enough of talking, let's see the code. The tasks/main.yml looks like this

- assert:
    that: 
      - 'hellobaby_version != ""'

- name: install apt packages
  apt: "name='{{ item }}' state=present"
  with_items:
    - git
    - python-pip

- name: install docker-py
  pip: name=docker-py version=0.3.1

- name: create build directory
  file: >
    dest="{{ hellobaby_build_dir }}"
    state=directory

- name: clone hellobaby git repo
  git: >
    repo="{{ hellobaby_repo }}"
    dest="{{ hellobaby_build_dir }}/hellobaby"
    version="v{{ hellobaby_version }}"
  register: hellobaby_repo

- name: remove git deploy key
  file: dest=/tmp/github_deploy_key state=absent

- name: archive repo
  shell: >
    cd "{{ hellobaby_build_dir }}/{{ item }}" &&
    git archive -o ../{{ item }}.tar HEAD
  with_items:
    - hellobaby

- name: generate templates
  template: >
    src="{{ item.src }}"
    dest="{{ hellobaby_build_dir }}/{{ item.dest }}"
  with_items:
    - { src: "Dockerfile", dest: "Dockerfile" }
    - { src: "runapp.sh", dest: "runapp.sh" }

- name: build image
  docker_image: >
    name="{{ hellobaby_image_name }}"
    tag="{{ hellobaby_image_tag }}"
    path="{{ hellobaby_build_dir }}"
    state=build

- name: tag
  command: >
    docker tag -f
    {{ hellobaby_image_name }}:{{ hellobaby_image_tag }}
    {{ hellobaby_image_name }}:{{ hellobaby_extra_tag }}
  when: hellobaby_extra_tag != ''

and the playbook looks like this

---
- name: Build Hello baby app image
  hosts: all
  sudo: yes
  vars_prompt:
    - name: hellobaby_version
      prompt: "hellobaby_version"
      default: "1.0.0"
      private: no
    - name: hellobaby_iteration 
      prompt: "hellobaby_iteration"
      default: 1
      private: no
  roles:
    - Ansibles.apt
    - hellobaby_image

So, to build with vagrant, you can run something like this

ansible-playbook \
  -i .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory \
  -u vagrant --private-key=~/.vagrant.d/insecure_private_key \
  playbooks/hellobaby.yml

You can find the complete example here - ansible-docker-demo.

A tool for deployment but also amazing for building software

Although ansible was not designed for building software, it doesn't necessary mean you cannot do not it. And surprisingly, it does so well in building software. With its machine provisioning and orchestration capability, you can integrate building and deployment togehter easily. The building environment itself can also be provisioned before building the software. Cloud computing resource can also be liveraged. I feel there are actually lots more interesting things can be done with ansible. Looking forward to see how people use it not just for deployment but also for building software :P

Auto post-commit PEP8 correction

It's always an hateful job to correct PEP8 warnings manually.

$ flake8 billy --ignore=E501,W293
billy/tests/integration/test_basic.py:401:45: W291 trailing whitespace
billy/models/processors/balanced_payments.py:116:44: W291 trailing whitespace
billy/models/processors/balanced_payments.py:133:30: W291 trailing whitespace

I bet you don't like this either. Today I cannot take it anymore. I was wondering, why I should do this thing machine should do? So I seek solutions on the Internet, and I found an article looks helpful - Syntax+pep8 checking before committing in git. The basic idea is to add a pre-commit hook script to git for checking PEP8 syntax before commit. By doing that, you cannot commit code with PEP8 warnings anymore, when you do, you see errors like this

$ git commit
billy/tests/integration/test_basic.py:401:45: W291 trailing whitespace
billy/models/processors/balanced_payments.py:116:44: W291 trailing whitespace
billy/models/processors/balanced_payments.py:133:30: W291 trailing whitespace

Which is great, but still, you need to correct these syntax issues manually. I thought there must be something can do these boring job for you. And yes, I found autopep8. I can use autopep8 in pre-commit hook to correct PEP8 issues for me. However, I don't think it is a good idea to let code formatting tool modifies your code silently when committing. I want to know what is modified by the tool. So here comes another solution, I use post-commit hook instead to compare the latest commit with previous commit:

#!/bin/sh

FILES=$(git diff HEAD^ HEAD --name-only --diff-filter=ACM | grep -e '\.py$')
 
for f in $FILES
do
    # auto pep8 correction

    autopep8 --in-place --ignore=E309,E501,W293 $f
done

All you need to do is install autopep8

pip install autopep8

and put the post-commit script at .git/hooks/post-commit. In this way, once I do a commit, it will correct PEP8 issues for me. I can review what is modified there, and make another PEP8 correction commit. With this, you can eventually enjoy coding rather than wasting time in removing trailing whitespace everywhere :D

Slow down, it's faster, let's build Python 2.8

I really love and enjoy programming in Python, as it is one of my favorite programming languages, I also love to recommend Python to other developers or people who is about to learn their very first programming language. However, there will always be an awkward thing - when they ask you which of Python 2 or Python 3 to use. My answer could be:

Ah..., you should lean Python 2 first, major libraries support Python 2 only. It will takes about one or two years before these third-party resource catch up Python 3.

Sadly, five years has been passed since Python 3 was released, but there is still only 3% of Python developers are using version 3. If people ask me the same question now, I really don't know how to answer. And I even start thinking is this the end of the programming language I like so much? These days, I read a lots of articles talking about Python 2 and Python 3 crisis:

About Python 3

Re: About Python 3

Google+ Post by Ian Bicking

the Gravity of Python 2

More About Unicode in Python 2 and 3

There are many different opinions, some of them said you should kick people harder, so that they will start using Python 3. Some of them said you should build Python 3 even better, so people will start using it. For me? I don't believe you can kick people harder to make them jump to the moon, I also don't believe people can jump to the moon just simply because you put treasure on it and say come and get it if you can, it has been 5 years passed, why we are still not on the moon?

I think the problem there is simple, the goal is too far, the community is too eager. I recall a story I hear when I was a child.

There was a young person carring all kind of goods on him, a ship is about to leave. He asked an local elder man.

Can I make it in time?

The elder man took a glance on him, and say,

If you walk slowly, take it easy, you can make it.

The young guy didn't take the advice, he ran to the port as fast as he can. Unfortunately, he fell on the road, and all his goods dropped around, he didn't make it in time.

Python community is a little bit like the eager young person in the story. Urgent to build so many fancy advanced features, but what is the point if no body is using it? I think maybe it is the time to slow it down, then we can go far.

Interestingly, there are calls for Python 2.8 in these discussions, and personally, I also believe Python 2.8 could be the missing bridge from Python 2 to Python 3. And if it is necessary, maybe there should be Python 2.9 and Python 2.10. I know it is the nature of a developer to discard old stuff, to eager to build and embrace awesome new widgets. But in real software world, you don't get to awesome suddenly, instead, you keep getting better gradually. So, let's stop blaming anyone and build Python 2.8 :)

Technical debts

When you buy a house, you raise a mortgage. When you buy a car, you raise an auto loan. Maybe you are rich enoguh to clear all the debts at once, but anyway, we all live with debts, more or less. Infact, for software developers, we all live with debts as well. They're so called technical debts. I really like the debt analog, technical debts are similiar to debts in real life. Funny enough, most of people know how real debts work, but technical debts are not well known by developers. However, it makes sense. People live with the idea of debts maybe for thousands years, but there are only few decades history of computer.

Financial debts

That's really nice to have an accurate analog. It allows me to explain things by borrowing some mature experience from financial world. I am not an expert in finance, however, when I see a cash flow diagram, I realized this is exactly the same diagram for explaining technical debts. Let's see what the cash flow diagram looks like when you rase a loan from bank.

As the name implies, it's all about flow of cash. Green arrows above the axis are the income, red arrows below the axis are the cost. When you raise a loan, you have an immediate income at the begining, but it doens't come for free. You need to pay interest to the bank periodically. And eventually, you need to refund the initial debt (not shown in the diagram). There are various different situations, for example, you may can rent the house to others, then you have recurring income, sell the house when the prise raised and refund the mortgage eventually. Nevertheless, we are not teaching finance here, the point here is that we can borrow the diagram for visualizing technical debts.

Raise a technical debt

For software development, I see production as the income, production reduced or extra time spend as the cost. So, how to raise technical debts you may ask, well, the fact is, you don't have to, there are many build-in debts during the software development process.

Let's see an example, say, you are developing a system. At first, there is only one feature in it, everytime you modify the code, you have to check that is this feature working correctly. The code you write is the income, and the time for testing is the cost. Overtime, you fix bugs, you improve the code, you always need to make sure does the feature work correctly. This is actually a very common build-in techincal debt in software development. Not to write automated testing is the debt itself, by doing that, you save some development time, it is an immediate income (production gain), however, you need to pay interest everytime you modify the code. The diagram would look like this

Things could even get worser when there are new features added to the system, you have more features to test each time you modify the code. The diagram for a scale-growing system looks like this

You said, hey, why not just ignore them, I believe they will not be broken so easily. Well, this could be true, however, saving time by not to do test, the cost for testing will become risk, your customers or end-users are going to test those undetected issues for you.

Moreover, when the system came to a really big scale, you may found yourself are always testing for countless functions, and there will also be endless bugs to fix. That's simply the debt is too high, your productivity is all eaten by the interest, you are never going to deiliver the product unless you refund the debt.

For the same case, you have more and more features in the system, but you spend your time on autmoated testing at begining. It just like you refund the debt at very first, this makes the interest in control. When you added a new feature, all you have to do is to write corresponding tests for it. In this way, You can make some real progress.

technical_debt03.png

Source of debts

Unlike real debts, you don't have a sheet tells you how much they are. Technical debts sometimes just aren't obvious. Nevertheless, we know some certain source of debts. Such as

  • Bad coding style
  • Bad design
  • No automated testing
  • No documents
  • No proper comments in code
  • No version control system
  • Dirty hacks

Maybe there are other debts not listed above, however, the point is, these debts all have similiar effect - you or team members need to pay the interest when developing on these code. For example, a badly written function, everytime developers read it, they all need extra time to understand it, that is the interest. Interestingly, although you have to pay the interest, not all debts gain you a big boost in development, some debts can be avoided easily. Experienced developers can usually produce code with good style and design.

Debts are not all that bad

So far, we talked like debts are all evil, they are demons, you should never deal with them. But the truh is raising debts can be a good thing sometimes. As raising debts buy you some time, even for real life finance world, raising debts could be a key to success. When a company has no debts, investors actually see the company must be inefficient. So this is about trade off, experienced developers not only produce code with lower debts than inexperienced ones, they also know when to raise debts, how much to raise.

For example, you are running a startup, you even don't know is your product going to work. At this moment, you can do some dirty hack to make things work, refund the technical debts later after you survive.

Nice, I am not the one who pays bill

People love free lunch, it is really nice you don't have to pay the bill, isn't it? Developers also like it. There are many situations that you are not the one paying interest for technical debts. For example, you accept a software development contract, you are pretty sure once you deliever the project, you are never going to see it again. In thise case, many developers just don't care, they are not the one who pays bill, why should they?

This is an actual moral hazard. Funny enough, it also happens in real finance world, like Wall Street bankers, they know taxpayers are going to pay the bill, why should they care risk? Unfortunately, unlike bankers, ignoring moral hazard won't earn you billion dollars. It only earns curse from the next developer. And sometimes, you just have no choice, the deadline is right ahead, all you can say is

screw it!

Despite the situation you have no choice, sometimes you can raise as much technical debts as possible without worrying about it. For instance, you are writing a run once and throw alway script, then do your best to raise debts.

Summary

For software development, it is important to understand technical debts, there is no easy or accurate way to measure them, but you can tell from your experience. My debt analog here may not be 100% precise, but surely it gives you a feeling about it. To build a successful software, you should keep the idea of technical debts in mind, you should also control them rather than letting them control you.

Keep a readable Git history

Readability of code is important, there is a fact

Code is read more than it is written

This is so true, even when you are writing code, actually, you need to read the current code base again and again. Back to the ancient programming era, people are still using forbidden black magic - goto statements. Powered by the black magic, there is a demond Spaghetti code, and it looks like this

Spaghetti.jpg

(From http://en.wikipedia.org/wiki/File:Spaghetti.jpg under Creative Commons 2.0 license)

Spaghetti code is the code hard to read and maintain, it was killing countless developers. Then a brave developer invented a new form of magic - structure programming, eventually, the demond was defeated and the black magic was forbidden since then.

This story told us readability is important, but what about readability of Git commit history? There are chances we need to look into the development history, such as finding what are the corresponding ticket for those commits? Who is the author? when are the changes introduced. Although there are tools to use, sometimes you still need to read the history and it is just unreadable and hard to understand. I bet you see much worser history than this one

1ced9f18-ffeb-11e2-854a-21d3b661b672.png

It makes reading painful. Despite chance of reading development history is less than reading code, it is still very helpful to have a clean readable linear history. Today, I am going to share some experience about keeping a readable history.

Use SourceTree

It is never pleasant to use a command line tool when there is a nice GUI tool. I hate ASCII git history graph, they are just ugly. Luckly, we have an awesome free GUI tool to use - SourceTree.

eae918a6-ffec-11e2-88a4-d6cd5858afd6.png

Always create a new branch for ticket

When you are working for an ticket or issue, you should always create a branch for it.

add07604-fff0-11e2-92b6-1ac3af04ae5f.png

You should try to keep commits in the branch only for resolving that ticket. It is okay to have some typo corrections or minor changes in it. However, if you put unrelative commit for major changes into the branch, other developers cannot know that you have some off topic changes in that branch easily.
By doing branch only for one purpose, here you have

  • Easier to understand what this branch is for
  • Easier to reverse changes introduced by this branch

Here you are working on a new branch, then you can commit

2ef071bc-fff1-11e2-94f7-e9e99d61f194.png

After then, you have several commits and they are good to merge

b9358abe-fff2-11e2-8bc0-45c240afd901.png

We want to keep the branch in history, so remember to use non-fast-forward merge, check the Do not fast-forward when merging, always create commit option

7d245a0a-fff2-11e2-8be5-9289e6d86d8d.png

It's time to merge, first, right click one the master branch and click Checkout. You should be at master branch now. Then, right click new-feature branch and click Merge.

73d9eafe-fff3-11e2-8086-799dd1e77edb.png

Remember to check Commit merged changes immediately to make a new commit directly.

Whoa, here we are, a beautiful linear history still with branch information.

068000bc-fff6-11e2-9e81-641ba0e09083.png

Always rebase before merge

For now, you are writing the next awesome feature - foobar-2000!

c67a7e4c-fff6-11e2-8d70-9bf3ad080a51.png

Things go well, however, in the mean time, a new branch is merged from other guys repo. Oh my god, foobar 3000! awesome!

f6953c1a-fff7-11e2-94c3-2676b2064b0a.png

Okay, let's see what it looks like to merge it directly

3b3b1592-fff8-11e2-889c-858a6f3c65c0.png

Ugly, let's try something better - rebase. First, right click on foobar-2000 and click checkout. Then right click on master and click Rebase

0e721cbc-fff9-11e2-99c9-4b248a38a6ea.png

This is better! And we can merge it like before

72b1754c-fff9-11e2-8754-a50629b3bd95.png

Rebase and force push

As usual, you keep work on this nice and beautiful linear history, however, you won't feel safe to leave your commits on your local machine will you? We always push our working branch to GitHub to keep it safe, get reviews and feedbacks from others

01230738-fffe-11e2-8a02-9ea6e80c50a7.png

Yes, again, you may hate this, there is another branch is merged into the master.

d6860ede-fffe-11e2-9406-05781418eec3.png

Okay, you said, this is not a big deal, I can always rebase and merge as usual. Here you rebase

12799f14-ffff-11e2-9a47-e421f30877c9.png

Well, it is still under development, you want to push to your fork, but not to merge it. Then you push, and oops!

44715750-ffff-11e2-8d0e-417ae238e6e6.png

So what just happened?

As you can see there is a branch origin/foobar-bugfix, that the HEAD in your origin remote, which is, your GitHub fork repo. When you are pushing your local foobar-bugfix to the fork repo, it means the remote one will be overwritten. It is a little bit dangerous to overwrite a HEAD in Git repo. So, it doesn't allow you to do this by default.

Again, it has risk, so you need to be sure what you are doing (although the commit will still stored in the repo, but without HEAD you cannot find them easily, you will need some low level operations to get them back). In this case, we just want to rebase our commits on the master and push it to our own repo, that won't be a big problem in most cases. It appears SourceTree doesn't support --force push, so you need to click Terminal button. Then type

git push origin foobar-bugfix -f

This will force git to push your local branch to overwrite the remote one. Let's see

$ git push origin foobar-bugfix -f
Counting objects: 11, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (9/9), 809 bytes, done.
Total 9 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (9/9), done.
To /Users/VictorLin/foobar_fork
 + 178c9a4...cc5d760 foobar-bugfix -> foobar-bugfix (forced update)

Here we are

c32d8b08-0000-11e3-9350-cd8bc9cb7ca4.png

(Tips: you can click Repository and Refresh Remote Status to update your git history status on the UI)

Notice When you are the only one working on the branch, it is fine to do a force push, otherwise, be careful. For more details, please reference to
http://git-scm.com/book/ch3-6.html#The-Perils-of-Rebasing

Always rebase current developing branch when there are new commits

As you know, there would be conflicts when you are doing merge or rebase. When there are more new commits in the master branch, the more likely you are going to have a tons of confliction. So, it is a good practice to always rebase your working branch on the master when there are new commits on it.

4727a568-0003-11e3-994d-fe37950d731b.png

However, sometimes, you have some works on the branch, but they are not committed, you don't want to commit something in middle like this. But when you are doing rebase, Git won't allow you to have change to files in the workspace. In this case, you can use Stash. Click Repository and Stash Changes.

f1415576-0003-11e3-9654-16fb89934ea4.png

Then you can see your stash appears in the sidebar

260587e6-0004-11e3-99f2-32412bf3eae6.png

After you finish the rebasing, you can right click on the stash and click Apply Stash, then here you are. Your saved changes are back to working space.

Again, happy ending :D

0ca5aa82-0005-11e3-9678-ee0b9f7b4354.png

Use interactive rebase to clean dirty commits

People make mistake. Sometimes there are small commits which are for formatting or fixing typo. And these commits are all based on your own newly submitted commits.

fdd95d30-0006-11e3-8ba1-bccb8edc39f1.png

In this case, you might wonder would it be nice to adopt some black magic to make your stupid mistakes disappear? Well, yes, there is magic. You can use interactive rebase to squash some commits into pervious one. Now, you are at awesome branch, right click on master branch, then click Rebase children of xxx interactively. Then you will see interface like this

8b2c2532-0007-11e3-9da6-cb3b965a02eb.png

Select those mistake fixing commits, and click Squash with previous. And you will see multiple commits to be put altogether. And you can click Edit message to modify the commit message of the squashed commit.

f4c735d6-0007-11e3-84d7-b385c85671ad.png

Then press OK and look this!

233957d2-0008-11e3-889d-7b7148e46f65.png

Just like what he said in Mad Man, nothing happened!

This is actually even more powerful, you can arrange order of commits around, edit commit messages, delete specific commits. But be careful, like what Spider Man told you

Great power comes great responsibility

It is kind of history rewrite, it is fine to use it on the branch only you are working on, you should not use it on a shared branch unless you know exactly what you are doing.

The benefits of readable history

That's it, the history is linear, another wonderful day.

35d7c65c-0009-11e3-9dc1-12f77d3dc843.png

Readable history doeson't only look beautiful, it provides easy-to-understand development history. All team members in the project can follow the development progress easily. When something goes wrong, it is also easier to trace down the problem, especially when you need to fix it as soon as possible.

Fall in love with continuous testing and integration - Travis-CI

You know, testing is important for software development. With good continuous integration and testing there, you have confidence that your software has a certain quality. It doesn't mean your software is prefect then, however, when things broken, you can catch them and fix it. Jenkins is a pretty awesome and easy-to-use open source continuous integration tool, but for developing my own hobby open source projects, I just don't want to rent a server and run Jenkins. So, I am always wondering, wouldn't it be nice to have something like CI as service? I can just put my code there, and it could do the rest for me.

Meet the Travis-CI

Recently, I meet an awesome service which really fits what I want - Travis-CI. It has GitHub integration, so all you have to do is to grant some GitHub permissions to Travis-CI and write a configuration .travis.yml file like this:

python:
  - "2.7"
env:
  - PYRAMID_VERSION="pyramid>=1.2,<1.3"
  - PYRAMID_VERSION="pyramid>=1.3,<1.4"
  - PYRAMID_VERSION="pyramid>=1.4,<1.5"
# command to install dependencies

install: 
  - "pip install flake8 --use-mirrors"
  - "pip install $PYRAMID_VERSION"
  - "pip install -r requirements.txt --use-mirrors"
  - "pip install -r test_requirements.txt --use-mirrors"
  - "python setup.py develop"
before_script: "flake8 pyramid_genshi --ignore=E501,W291,W293"
# command to run tests

script: "python setup.py nosetests"

From https://github.com/victorlin/pyramid_genshi/blob/master/.travis.yml

Then it works like a charm, you can see the building results here https://travis-ci.org/victorlin/pyramid_genshi

The best part of it is, if you are testing open source project, it is totally free. I really love it!

Test Chef code on it, a dream in the dream

Currently, I am working on an open source Chef project for deployment. I think it would be neat to setup Travis-CI for testing my Chef code, so I tried to run Vagrant with VirtualBox on it. However, it turns out that Travis-CI testing environment is already running under a virtual machine, and it is based on OpenVZ, which is actually a container rather than hardware simulation. I cannot find a way to install VirtualBox with Travis-CI. Saddly, this is not Inception, I cannot have a dream in the dream.

Inception-Wallpaper-inception-2010-12396931-1440-900 (1).jpg

Fine, I changed my mind then, it is already a virtual environment, why don't I just run my Chef code against the Travis-CI environment?

The missing feature - interactive debugging

Okay, it appears that it is a better idea to run Chef code against Travis-CI instance instead of to have a dream in the dream. Nevertheless, it is still a pain in ass to make my Chef code works on Travis CI. You can't never get the thing done at the very first time. And you always have to push a commit to kick it starting to build, so it results in a painful trial and error loop looks like this

travis-ci-failures.PNG

In the process, you will see error output like this

  • The PostgreSQL server failed to start. Please check the log output. ...fail!

Okay... check the log out? but how? I can add a "cat /path/to/log_file" to the .travis.yml and push the commit to make it run again, but it would only be another painful waiting. I tried to reproduce the testing environment with Vagrant on my machine, but I can only find some outdated information there and some important Vagrant boxes are missing.

Like what Niko said in GTA IV

This no touching rule is killing me!

This no touching rule to Travis CI is also killing me. I think it would be nice to have a chance to interact with the CI environment after something went wrong. Fortunately, I contacted the support, they said they are working on it.

Green dots make you happy

Once I setup Travis-CI for one project, after I realize how easy it is, I just can't wait to setup for most of my open source projects. When there are red dots, you really want to erase them all. However, when it is all in green, like this

Traivs-CI-repo-list

That's really pleasant to see a all green list in Travis-CI. If you are also an open source lover, just try it now, you will love it :)

Zero-downtime Internet process migration in same host

When I am running my website, something troubles me.  While there is a bug in the production server, I need to modify code and restart them.  Sounds fine, right? Yes, for most of web servers, they are stateless,  it is not a big deal to restart them whenever you want, but it is not true for me, they are realtime audio streaming servers. When you restart a realtime streaming server, it means audience connected to the server will be interrupted.  Here is a diagram shows the problem:

server_diagram.png

You can see there are some gaps in the plot, that’s caused by server restarting.  Of course, for users, that would definitely be a bad experience.  Therefore, I’m thinking how to solve this problem recently.  Before we go into the design, let’s look the reasons for restarting server first.

  • To deploy new version of program
  • To fix bugs
  • The process is using to much memory
  • To reload environment, ulimit -n for example (the limit count of file descriptor under unix-like environment)
  • To migrate from host A to host B

For simply deploying new version of program, we can use reload function of Python to reload modules.  But there are some problems, reload function only rerun the module, those created instances are still there (if they are copied into some namespaces), it might work if the change is minor.  On the other hand,  reloading can’t solve memory usage problem, process environment change problem.  And here comes the final reason, to migrate service from host A to B.  Indeed, it is difficult not to make any down time for such migration, we only focus on migration in same host.

The solution

The biggest challenge is - how to migrate those existing connections? I did some research and have an idea in my mind. Create a new process, and transfer those connections (socket file descriptors) to the new process, and shut the old one down. Following diagrams illustrate my solution.

mig_011.png

The Master is a process which is in charge of managing migration and receiving commands.  And the process A is for running the service.

mig_02.png

Before we perform the migration, the Manager spawns process B, and wait it says "i'm ready".

mig_03.png

When process B says “Hey! I’m ready”, then the manager tells process A to send the connection state descriptor to process B.  Process B receives the state, and takes over the responsibility of running service.

mig_04.png

Finally, process B took over the service, then master tells process A “You are done.” and the process A kills himself.

That’s it, the service was migrated from one process to the other, and there is no any down time.

The problem – socket transfer

The idea sounds good, right? But still, we have some technical problem to solve. It is “How to transfer socket (file descriptor) from one process to another?”. To solve this problem, I did study, and eventually found solutions.

Child process

For most of unix-like OS, child processes inherit file descriptors from their parent. Of course we can use this feature to migrate our service, but however, it got its limitation. You can only transfer file descriptors from parent to child process.

Sendmsg

Another way to achieve same goal is, to use sendmsg through a unix domain socket to send the file descriptors. With sendmsg, you can transfer file descriptors to almost any processes you like, that’s much flexible.

A simple demonstration

To simplify the example, we only implement process A and process B here, it is quite enough for two processes to complete the migration. Before we go into the details, there is another problem to solve, sendmsg is not a standard function in Python. Fortunately, there is a third-party package sendmsg provides this function. To install sendmsg, just type

easy_install sendmsg

And here you are. Okay, following are the two programs.

a.py

import os
import socket
import sendmsg
import struct

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', 5566))
s.listen(1)
conn, addr = s.accept()
conn.send('Hello, process %d is serving' % os.getpid())
print 'Accept inet connection', conn
us = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
us.bind(('mig.sock'))
us.listen(1)
uconn, addr = us.accept()
print 'Accept unix connection', uconn
payload = struct.pack('i', conn.fileno())
sendmsg.sendmsg(
    uconn.fileno(), '', 0, (socket.SOL_SOCKET, sendmsg.SCM_RIGHTS, payload))
print 'Sent socket', conn.fileno()
print 'Done.'

b.py

import os
import socket
import sendmsg
import struct

us = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
us.connect(('mig.sock'))
print 'Make unix connection', us

result = sendmsg.recvmsg(us.fileno())
identifier, flags, [(level, type, data)] = result
print identifier, flags, [(level, type, data)]
fd = struct.unpack('i', data)[0]
print 'Get fd', fd

conn = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
os.close(fd)
conn.send('Hello, process %d is serving' % os.getpid())
raw_input()

The a.py accepts an inet socket and opens an unix domain socket, waits b.py to take over the service.  And here we run b.py, it connects to a.py and receives the fd of socket and takes it over and run the service.

The result

螢幕快照-2011-02-04-下午4.07.07.png

As the result shows, there is no down time between two Internet service processes migration in the same host.

It can be very useful to employ this trick in Internet programs which need to keep active connections. You can even migrate connections from a Python program to a C/C++ program, or vice versa. Also, to keep the memory usage low, you can migrate the service to the same program in different process periodically.

However, although it works in our demo program, but for real life complex server programs, it would be very difficult to implement migration mechanism like this. You need to dump connection state completely in one process and restore them in the other process. Due to the fact internal state of connections can be very complex, it could be impractical.

DreamOnline

I started a MMORPG project named DreamOnline ten years ago when I was a kid. So far, I have no time to finish it. Recently, I restarted the project again, and hope that I can finish it. :)

Screenshot:

client_screenshot_01.png

You can visit DreamOnline.tw to download the very basic demo.

Treeview v0.1 is released

To make it easy to manage folder and items on web-based application, I developed a new open source project Treeview.

螢幕快照-2013-02-18-上午11.25.56.png

It makes folders and items in a web-based tree become draggable. You can see the online demo. It's under the MIT license, feel free to use it or fork it.