Bugbuzz is an online debugger, one of my pet projects. What I wanted to provide is kind of really easy-to-use debugging experience, I envisioned the debugging with it should be like dropping one line as what you usuallly do with ipdb or pdb.

You can do it anywhere, no matter it's on your Macbook or it's on server. Then there comes a fancy Ember.js based online debugging UI.

To make this happen, instead of providing debugging service on your local machine, a debugging API service is needed.

The architecture is pretty simple, Python debugger library sends source code and local variables with all other necessary information to the Bugbuzz API server, then Bugbuzz API server will notify the Ember.js dashboard via PubNub. When user clicks buttons like next line / next step, the dashboard calls to the API server, then the API server publishes these commands vai PubNub to the debugging Python program. Upon receiving the commands, the debugging library executes then sends source code and local variables to the API server again.

### No, you should not trust me.

Although Bugbuzz does provide easy-to-use service, it still concerns some developers, as all source code and local variables will be passed to the server. You may ask

Can I trust you with my source code and debugging data?

No, you should not trust me.

In fact, this not only concerns you but also concerns me, I don't want to have any chance to read your source code and the debugging data either. It feels like a paradox to me, I want to provide you an easy-to-use service, but I want to know nothing about your data. So how do you solve this problem?

### Anonymous computing

I have this concept for a long while, I called it Anonymous Computing. The idea is to provide service without knowing senstive data while processing. As a service provider, it's really hard to do as the less I know, the less I can provide. But if one can manage to do so, users don't need to trust the service provider, they can trust the encryption mechanism.

One approach to do it is to encrypt the data in the source, pass it to the client via server, then decrypt the data in the client side. As long as the server doesn't know the secret key, the data shall remain unknown to the server.

In the past, this is almost impossible to do with web, as

• Web server renders the web page, i.e.
• server will know your data anyway
• Functionalities of browser is pretty limited

Fortunately, it's 2015 now, browser is not merely a web-page viewer anymore, it's an application platform. Not just in terms of functionality, also the performance has been enhanced over time. Even better, there are booming web technology communities all around in this era, you can pick any one you like and start crafting awesome web app without worrying low level details, enjoy the beautiful view on shoulders of giants.

### How does it work?

For Bugbuzz, I use Embjer.js for developing the dashboard app. It works like this

Instead of sending plain text source code and debugging information, when a debugging session starts, the library creates a secret key, then encrypts source code and debugging information with the secret key and pass it to the server. All Bugbuzz API server can see is encrypted data. To allow the Ember.js dashboard to decrypt the data, the secret key will be passed to the dashboard as part of hash in URL.

It's Ember.js nature to use hash style URL, by doing this, the web server cannot see the secret key, as the browser will only send URL part to the server. Visiting a debugging session without the secret key, you can only see it asks you to provide access key

### Encryption with Ember.js in action

The encryption algorithm we use here is AES. I am not teaching you cryptography here, we will focus on how encryption works with Ember.js only, if you are interested it cryptography, you can read CRYPTO101.

To understand how encryption works with Ember.js model, let's see a very simple file model

As you can see the model has content property, it supposes to be encrypted. I will suggest you use base64 to encode it. Given an example

Let's decode it and see what it looks like

Looks like completely nonsense huh? Well, that's the point of encryption :P

To decrypt it, you need the access key. It will be passed in as a queryParam for the controller like we mentioned, you can define your access_key parameter like this

The secret key usually needs to be passed as a part of URL, you can also encode it in base64, but remember to use URL-safe base64 encoding. Upon receiving that access key, you can validate it and set it to the model like this

I also clear the access_key parameter then call self.transitionToRoute('session', session), that's because I don't like to leave the access key in URL.

Since even with a wrong access key, the decryption still works, but just the output is garbage. It's hard to tell whether is the output correctly decrypted or not sometimes. In this case, you can provide a validation_code as plain text in the data along with a encrypted validation_code. So that you can decrypt it and see if the validation code matches, like this

If the access key is not valid, you can prompt user to input correct one. With the access key properly set to the debugging session model, we can now write this:

It reads accessKey from session model and decrypt it. So in the template, you can access file.source_code

I open-sourced the Ember.js dashboard project, you can see how it works and play around it by yourself. Oh, and by the way the javascript AES library I use is aes-arraybuffer.

### The future

I feel we actually only unleashed a minor portion of cryptography power with modern browser technologies. I envision in the future, more interesting anonymous computing browsed-based application will be introudced, by leveraging asymmetric key encryption, blind signature, Bitcoin block chain and all awesome technologies in cryptography world. Bugbuzz is just a very simple example shows how we can build accessible but also trustable service with Ember.js + encryption.

Feel free to let me know what you think :P

Discuss on Hacker news

AWS Elastic Beanstalk is a PaaS service for web application hosting pretty much like Heroku, but instead of designed to be a PaaS at very beginning, it was actually built by combining different AWS services together. Since Elastic Beanstalk is a composition of different AWS services, it's an open box, you can tune different AWS service components in the system you're already familiar with, like load balancer, VPC, RDS and so and so on, you can also login the provisioned EC2 instances in the cluster and do whatever you want. However, as the all systems were not designed only for Elastic Beanstalk, a drawback there is - the system is a little bit too complex. Sometimes when you adjust the configuration, it takes a while to take effect, and sometimes there are some glitchs during the deployment process. Despite these minor issues, it's still a great platform, if you build a higly scalable and higly available cluster on your own, it would be way more time consuming, and you will probably run into more problems Elastic Beanstalk already solved for you.

### Overview of Elastic Beanstalk

Elastic Beanstalk supports many popular environments, Python, Java, Ruby and etc. The way it works is pretty simple, you upload a zip file which contains the application code in certain predefined file structure, and that's it, AWS runs it for you. For example, to use Python, you need to provide a requiements.txt file in the root folder. The structure in the application zip file would look like this

In Elastic Beanstalk, this file is called a Version of the application. You can upload several versions to the application. Then, to deploy the application, you need to create an entity called Environment. An environment is actually a cluster running a specific verion of application with certain adjustable configuration. An environment may look like this

• Application: my_application-1.0.1
• Min instances number: 3
• Max instances number: 5
• VPC: vpc-abcdefgh
• ...

And for the same application, you can have mutiple environments, like this

It's pretty neat, you can run different versions of your application in different stack with different configuration. This makes testing much easier, you can simply create a new environment, run some tests against it, tear it down once the test is done. You can also launch a new production environment, make sure it is working then point the DNS record from the old environment to the new one.

### Deploy an application as Docker image steps-by-steps

Although the Elastic Beanstalk system itself is very complex, using it is not so difficult. However, it seems there is no an obvious walkthrough guide for setting things up. The offical AWS document is really not so readable. And for Docker, it's still pretty new technology, you can find very few articles about how to run Docker with Ealstic Beanstalk out there. So, here I write a guide about running a simple application as Docker image with Elastic Beanstalk steps-by-steps.

#### Install Elastic Beanstalk commandline tool

Before we get started, you need to install Elastic Beanstalk commandline tool, it's written in Python, so you need to have pip installed in your system, here you run

Then, remember to expose your AWS credentials in the bash environment

#### Get started with your project and application

Okay, now, let's get started with our demo project.

Next, init our Elastic Beanstalk app.

You have created an application now, to see it in the AWS dashboard, you can type

And you should be able to see our docker-eb-demo application there.

Actually, you can also create the application first in the dashboard, then use eb init command and select the existing application, either way it creates a config file at .elasticbeanstalk/config.yml.

#### Let's build a simple Flask app

We are here just to demonstrate how to run a Docker application with Elastic Beanstalk, so no need to build a complex system, just a simple Flask app.

What it does is very simple, it prints the WSGI environment dict of request, hence, we call it echoapp. You may notice that we read PRINT_INDENT as the print indent, and many other variables for running the HTTP server. As long as either Docker or Elastic Beanstalk use environment variables for application configuration, to make your application configurable, remember always to read application settings from environment variables instead of configuration files.

#### Build the docker image

To build the docker image, I like to use git archive make a snapshot of the project and add it into container by ADD command. By doing that, I won't build an image contains some development modification accidently. However, since Dockerfile is not good at doing some preparing steps before building the image, so I like to use a Makefile for doing that. Here you go

and for the Dockerfile

We use phusion/baseimage as the base image. It's basically a modified version of Ubuntu, to make it suitable for running inside Docker container. It provides runit service daemon, so we simply install the app and create the service at /etc/service/echoapp/run.

With these files, here you can run

to build the Docker image. Then you can test it by running

and use docker ps to see the mapped port

and curl to the server

Notice: if you are using boot2docker under OSX environment, you should run boot2docker ip to see what's the IP address of the virtual machine and connect to it instead of 0.0.0.0.

There are two ways to run Docker apps with Elastic Beanstalk, one is to let them build the Dockerfile for you everytime you deploy an application. I don't really like this approach, since the value of Docker is that you can build your software as a solid unit then test it and ship it anywhere. When you build the Docker image on the server every time you deploy, then it's meanlingless to use it. So I would prefer another approach. The other way for running Docker is to create a Dockerrun.aws.json file in the root folder of your project. In that file, you indicate where your Docker image can be pulled from. Here is the JSON file

As you can see we indicate the Docker image name is victorlin/echoapp, Elastic Beanstalk will then pull and run it for you. If your Docker image in Docker hub is a private one, you will need to provide Authentication information, which points to an S3 file contains .dockercfg file (the file can be generated by docker login command at your home directory). If you provide the S3 .dockercfg file, remember to add proper permissions to the EC2 instance profile for running Elastic Beanstalk so that it can be accessed. And yes, of course, in the previous step, we didn't upload it to Docker hub. You can do it by

Or if you prefer to do it manually, you can also use docker push command to do that.

The Ports and Logging indicate which port your docker image exposes, and the path to logging files. Elastic Beanstalk will redirect traffic to the port and tail the logging files in that folder for you.

#### Create our development environment

Okay, we have our Docker image ready to be deployed now. To deploy it, we need to create an environment first. Here you run

It takes a while before the environment gets ready. To create an environment, you can also use AWS dashboard, then run eb use <name of environment> to bind current git branch with the created environment. To see your created environment, you can type eb console and see it in the dashboard

If you see the environment is red, or there was some errors when running eb create, you can run

and

to see whats going on there. You can also visit the application in browser by typing

to see the status of environment, type

#### Deploy a new version

After you do some modifications to your app, you do a git commit, build a new Docker image and push it to the Docker hub. Then you can run

to deploy the new image to all servers.

For production usage, I would suggest you pin the version number in Dockerrun.aws.json file. For example, the image name should be something like this

In that way, when you run eb deploy, it takes a snapshot of your current git commit and upload it as a Version. When it get deployed, the specific version of Docker image can then be pulled and installed. If you don't specify the tag, then latest image will be pulled and installed. That's not a good idea for production environment since you may want to roll back to the previous version if the new one is broken.

#### Set the environment variable

To see current environment variables, it's easy, simply type

And to update it, for example, we want to change PRINT_INDENT to 4 and enable DEBUG, here you type

### That's it

That's it. It's actually not that hard to run your Docker image with Elastic Beanstalk, just trivials. Once you get familiar with it, that's a piece of cake. The whole demo project can be found here: docker-eb-demo. Hope you enjoy running Docker with Elastic Beanstalk as I do :)

There are many deployment tools, such as Puppet, Chef and Salt Stack, most of them are all pull-based. Which means, when you deploy to a machine, the provisioning code will be downloaded to the target machine and run locally. Unlike many others, Ansible is a push-based deployment tool, instead of pulling code, it pushes SSH commands to the target machine. It's great to have push-based approach in many situations. For example, you don't need to install Ansible runtimes on the target machine, you can simply provision it. However, there are also shortcomings of this approach. Say if you want to provision EC2 instances in an AWS auto-scalling group, you don't know when a new instance will be launched, and when it happens, it needs to be provisioned immediately. In this case, Ansible's pushing approach is not that useful, since you need to provision the target machine on demand.

There are many ways to solve that problem, namely, to run Ansible provisioning code in a pulling manner.

### Ansible-pull

One obvious approach is to use ansible-pull, it's an Ansible commandline tool clones your Ansible git repo for you, and run them locally. It works, however, there are some drawbacks. First thing is the dependencies issue, to run ansible-pull on the target machine, you will need to install Ansible runtimes on the machine first, if you are running an Ansible playbook depends on newer version of Ansible, then you need to find a way to upgrade the runtimes. Another problem is the provisioning code is installed via git or other version control system, it's hard to verify the integrity of those playbooks, and the code cannot be shipped as a single file.

### Ansible Tower

Ansible Tower is the official commercial tool for managing and running Ansible. There is an interesting feature it provides, which is so-called "phone home". It works like this, when a new machine is launched, it makes an HTTP request to the Ansible Tower server, just like calling home and says

Then the server will run ansible-playbook against the machine. It works, but one problem we see there is, when your Ansible Tower can SSH into different machines and run sudo commands, it usually means you need to install your SSH private key in the tower server, and also need to preinstall the corresponding public key to all other machines. Allowing one machine to be able to SSH into all other machines makes me feels uncomfortable, it's like to put all eggs in single bucket. Although you can actually set pass-phase for your private key on the tower server, since your machines in AWS auto-scalling group need to be provisioned at anytime, so that you cannot encrypt your private key with pass-phase.

### An interesting approach - Docker

With the requirements in mind

• No runtime dependencies issue
• Provision code can be shipped as a file
• Provision code integrity can be verified (signed)

an interesting idea came to my mind. Why I don't simply put Ansible playbooks into a docker container, and ship the image to the target machine, then run the Ansible playbooks from inside the docker image and SSH against the host? With a docker image, I don't need to worry about Ansible dependencies issue, including Ansible runtimes themselve and many other necessary runtimes, such as boto, can all be installed into the docker image. And the docker image, can be shipped as a single file, we can sign the file and verify it on the target machine to ensure its integrity.

### A simple example

I wronte a simple project to demonstrate this idea, the project can be found on github. It's actually pretty easy, for the Dockerfile, we install ansible dependencies and install necessary roles. We also copy our own site.yml into the docker image.

You can build the ansible image with

Then, to run it, before you do it, you need to create a host file, and insert the private IP address of your host machine. Like this

hosts:

You should notice that since the Ansible is executed inside the docker container, so localhost simply doesn't work. You need to specify an address which is accessable from the Docker container network. To allow SSH connection from the docker container, you also need to provide a temporary SSH public key installed in the host machine, and the private key for the container to connect to the host. Here is pretty much the command you run

We map our hosts file to the docker container at /tmp/hosts, and the SSH private key at /tmp/insecure_private_key, then we can use it in the ansible-playbook command arguments. That's it!

### It's so powerful to combine Ansible and Docker

It's so powerful to combine Ansible and Docker together, as you can see, the software for provisioning machines now is packed as a Docker image, so that you can run it anywhere. It's a solid unit, you can sign it, verify it, tag it, ship it, share it and test it. Everything is installed in the container, you don't need to worry about missing some plugins or roles on the target machine.

The only drawback I can think of is you need to install Docker on the target machine before you can use this approach, but it's not a problem since Docker gets more and more popular, you can preinstalled it in your AMI. And the only thing I am not happy with docker is the image registry system, it's very slow to push or pull an image if you have many layers in it and the size is big. Actually I have an idea about building a way more better docker registry, hopefully I have time to do it.

I am already using this approach for provisioning machines in our production environment, and it works like a charm so far. I am looking forward to see people using this technique to pack deployment code into docker image, imagine this:

and boom! you have a fully functional Swift cluster in AWS EC2 now, isn't that awesome?

Docker is something really hot recently. It allows you to run your software with linux container easily. It's actually kind of OS level isolation rather than hardware simulation nor kernel simulation. So you won't have too much performance penalty but still have pretty nice virtual machine features. I really like the analog used by Docker community, shipping software should be easier and Docker serves as just like the standard container in shipping industry.

### Building docker images is not hard, but ...

Although docker provides an easy way to deliver and run your software in linux container, there is still no an obvious and easy way to build a docker image for big projects. For building a large and complex project docker image, you would probably need to

• Clone your private software repo in to build folder
• Ensure base images are built before your project image
• Generate some files dynamiclly, such as current git commit revision
• Generate templates

With Dockerfile, you can only have static steps for building the image. Obviously, it was not designed for doing any of these listed above. And since docker uses a kind of layering file system, you probably don't want to put your Github credentials into the container and pull the repo inside it, because it works pretty similar to git commits, once you commit, then it's hard to remove it from the history. So you defititely want to do these things outside the container and then put them together.

### My first solution - Crane

With these requirements in mind, I actually feel it's pretty similar to the experience I had with Omnibus - a tool for packing your software into a standalone dep package. So I built a simple tool in Python for building docker images, named Crane. It allows you to define steps for building the image, it also provides template generating with jinja2.

### The final solution - ansible

Crane was working fine, but I actually don't like to make a new wheel and maintain it if there is already an obvious better solution. After I played ansible for a while, I realized it is actually a way better solution for building docker images. So, what is ansible you may ask, well, it's yet another deployment tool, like Puppet, Chef or SaltStack.

Wait, what? Why you are using a deployment tool for building docker image? It may sound odd to you at very begining. But ansible is not actually just yet another deployment tool. Its design is pretty different from its predecessors. It uses SSH for pushing commands to target marchines, other tools are all pulling based. It also provides many modules for different operations, including creating instances in EC2 or other cloud computing providers. Most importantly, it is able to do orchestration easily.

Of course it meets requirements we mentioned before

• Clone git repo? Check.
• Build base image? Check.
• Generate dynamic file? Check.
• Generate templates? Check.

Moreover, with ansible, you can launch an EC2 instance, build the image inside it, and run a series of tests before you publish the image. Or you can simply build the image in your vagrant machine or in the local machine. It makes building software extremely flexible, since you can run the building process anywhere you want as long as they can be pushed as commands via SSH, you can also provision the whole building environment, or even drive a fleet in cloud for building, that's pretty awesome huh, isn't it?

### Show me the code

Okay, enough of talking, let's see the code. The tasks/main.yml looks like this

and the playbook looks like this

So, to build with vagrant, you can run something like this

You can find the complete example here - ansible-docker-demo.

### A tool for deployment but also amazing for building software

Although ansible was not designed for building software, it doesn't necessary mean you cannot do not it. And surprisingly, it does so well in building software. With its machine provisioning and orchestration capability, you can integrate building and deployment togehter easily. The building environment itself can also be provisioned before building the software. Cloud computing resource can also be liveraged. I feel there are actually lots more interesting things can be done with ansible. Looking forward to see how people use it not just for deployment but also for building software :P

It's always an hateful job to correct PEP8 warnings manually.

I bet you don't like this either. Today I cannot take it anymore. I was wondering, why I should do this thing machine should do? So I seek solutions on the Internet, and I found an article looks helpful - Syntax+pep8 checking before committing in git. The basic idea is to add a pre-commit hook script to git for checking PEP8 syntax before commit. By doing that, you cannot commit code with PEP8 warnings anymore, when you do, you see errors like this

Which is great, but still, you need to correct these syntax issues manually. I thought there must be something can do these boring job for you. And yes, I found autopep8. I can use autopep8 in pre-commit hook to correct PEP8 issues for me. However, I don't think it is a good idea to let code formatting tool modifies your code silently when committing. I want to know what is modified by the tool. So here comes another solution, I use post-commit hook instead to compare the latest commit with previous commit:

All you need to do is install autopep8

and put the post-commit script at .git/hooks/post-commit. In this way, once I do a commit, it will correct PEP8 issues for me. I can review what is modified there, and make another PEP8 correction commit. With this, you can eventually enjoy coding rather than wasting time in removing trailing whitespace everywhere :D

I really love and enjoy programming in Python, as it is one of my favorite programming languages, I also love to recommend Python to other developers or people who is about to learn their very first programming language. However, there will always be an awkward thing - when they ask you which of Python 2 or Python 3 to use. My answer could be:

Ah..., you should lean Python 2 first, major libraries support Python 2 only. It will takes about one or two years before these third-party resource catch up Python 3.

Sadly, five years has been passed since Python 3 was released, but there is still only 3% of Python developers are using version 3. If people ask me the same question now, I really don't know how to answer. And I even start thinking is this the end of the programming language I like so much? These days, I read a lots of articles talking about Python 2 and Python 3 crisis:

the Gravity of Python 2

More About Unicode in Python 2 and 3

There are many different opinions, some of them said you should kick people harder, so that they will start using Python 3. Some of them said you should build Python 3 even better, so people will start using it. For me? I don't believe you can kick people harder to make them jump to the moon, I also don't believe people can jump to the moon just simply because you put treasure on it and say come and get it if you can, it has been 5 years passed, why we are still not on the moon?

I think the problem there is simple, the goal is too far, the community is too eager. I recall a story I hear when I was a child.

There was a young person carring all kind of goods on him, a ship is about to leave. He asked an local elder man.

Can I make it in time?

The elder man took a glance on him, and say,

If you walk slowly, take it easy, you can make it.

The young guy didn't take the advice, he ran to the port as fast as he can. Unfortunately, he fell on the road, and all his goods dropped around, he didn't make it in time.

Python community is a little bit like the eager young person in the story. Urgent to build so many fancy advanced features, but what is the point if no body is using it? I think maybe it is the time to slow it down, then we can go far.

Interestingly, there are calls for Python 2.8 in these discussions, and personally, I also believe Python 2.8 could be the missing bridge from Python 2 to Python 3. And if it is necessary, maybe there should be Python 2.9 and Python 2.10. I know it is the nature of a developer to discard old stuff, to eager to build and embrace awesome new widgets. But in real software world, you don't get to awesome suddenly, instead, you keep getting better gradually. So, let's stop blaming anyone and build Python 2.8 :)

When you buy a house, you raise a mortgage. When you buy a car, you raise an auto loan. Maybe you are rich enoguh to clear all the debts at once, but anyway, we all live with debts, more or less. Infact, for software developers, we all live with debts as well. They're so called technical debts. I really like the debt analog, technical debts are similiar to debts in real life. Funny enough, most of people know how real debts work, but technical debts are not well known by developers. However, it makes sense. People live with the idea of debts maybe for thousands years, but there are only few decades history of computer.

### Financial debts

That's really nice to have an accurate analog. It allows me to explain things by borrowing some mature experience from financial world. I am not an expert in finance, however, when I see a cash flow diagram, I realized this is exactly the same diagram for explaining technical debts. Let's see what the cash flow diagram looks like when you rase a loan from bank.

As the name implies, it's all about flow of cash. Green arrows above the axis are the income, red arrows below the axis are the cost. When you raise a loan, you have an immediate income at the begining, but it doens't come for free. You need to pay interest to the bank periodically. And eventually, you need to refund the initial debt (not shown in the diagram). There are various different situations, for example, you may can rent the house to others, then you have recurring income, sell the house when the prise raised and refund the mortgage eventually. Nevertheless, we are not teaching finance here, the point here is that we can borrow the diagram for visualizing technical debts.

### Raise a technical debt

For software development, I see production as the income, production reduced or extra time spend as the cost. So, how to raise technical debts you may ask, well, the fact is, you don't have to, there are many build-in debts during the software development process.

Let's see an example, say, you are developing a system. At first, there is only one feature in it, everytime you modify the code, you have to check that is this feature working correctly. The code you write is the income, and the time for testing is the cost. Overtime, you fix bugs, you improve the code, you always need to make sure does the feature work correctly. This is actually a very common build-in techincal debt in software development. Not to write automated testing is the debt itself, by doing that, you save some development time, it is an immediate income (production gain), however, you need to pay interest everytime you modify the code. The diagram would look like this

Things could even get worser when there are new features added to the system, you have more features to test each time you modify the code. The diagram for a scale-growing system looks like this

You said, hey, why not just ignore them, I believe they will not be broken so easily. Well, this could be true, however, saving time by not to do test, the cost for testing will become risk, your customers or end-users are going to test those undetected issues for you.

Moreover, when the system came to a really big scale, you may found yourself are always testing for countless functions, and there will also be endless bugs to fix. That's simply the debt is too high, your productivity is all eaten by the interest, you are never going to deiliver the product unless you refund the debt.

For the same case, you have more and more features in the system, but you spend your time on autmoated testing at begining. It just like you refund the debt at very first, this makes the interest in control. When you added a new feature, all you have to do is to write corresponding tests for it. In this way, You can make some real progress.

### Source of debts

Unlike real debts, you don't have a sheet tells you how much they are. Technical debts sometimes just aren't obvious. Nevertheless, we know some certain source of debts. Such as

• No automated testing
• No documents
• No proper comments in code
• No version control system
• Dirty hacks

Maybe there are other debts not listed above, however, the point is, these debts all have similiar effect - you or team members need to pay the interest when developing on these code. For example, a badly written function, everytime developers read it, they all need extra time to understand it, that is the interest. Interestingly, although you have to pay the interest, not all debts gain you a big boost in development, some debts can be avoided easily. Experienced developers can usually produce code with good style and design.

### Debts are not all that bad

So far, we talked like debts are all evil, they are demons, you should never deal with them. But the truh is raising debts can be a good thing sometimes. As raising debts buy you some time, even for real life finance world, raising debts could be a key to success. When a company has no debts, investors actually see the company must be inefficient. So this is about trade off, experienced developers not only produce code with lower debts than inexperienced ones, they also know when to raise debts, how much to raise.

For example, you are running a startup, you even don't know is your product going to work. At this moment, you can do some dirty hack to make things work, refund the technical debts later after you survive.

### Nice, I am not the one who pays bill

People love free lunch, it is really nice you don't have to pay the bill, isn't it? Developers also like it. There are many situations that you are not the one paying interest for technical debts. For example, you accept a software development contract, you are pretty sure once you deliever the project, you are never going to see it again. In thise case, many developers just don't care, they are not the one who pays bill, why should they?

This is an actual moral hazard. Funny enough, it also happens in real finance world, like Wall Street bankers, they know taxpayers are going to pay the bill, why should they care risk? Unfortunately, unlike bankers, ignoring moral hazard won't earn you billion dollars. It only earns curse from the next developer. And sometimes, you just have no choice, the deadline is right ahead, all you can say is

screw it!

Despite the situation you have no choice, sometimes you can raise as much technical debts as possible without worrying about it. For instance, you are writing a run once and throw alway script, then do your best to raise debts.

### Summary

For software development, it is important to understand technical debts, there is no easy or accurate way to measure them, but you can tell from your experience. My debt analog here may not be 100% precise, but surely it gives you a feeling about it. To build a successful software, you should keep the idea of technical debts in mind, you should also control them rather than letting them control you.

Readability of code is important, there is a fact

Code is read more than it is written

This is so true, even when you are writing code, actually, you need to read the current code base again and again. Back to the ancient programming era, people are still using forbidden black magic - goto statements. Powered by the black magic, there is a demond Spaghetti code, and it looks like this

(From http://en.wikipedia.org/wiki/File:Spaghetti.jpg under Creative Commons 2.0 license)

Spaghetti code is the code hard to read and maintain, it was killing countless developers. Then a brave developer invented a new form of magic - structure programming, eventually, the demond was defeated and the black magic was forbidden since then.

This story told us readability is important, but what about readability of Git commit history? There are chances we need to look into the development history, such as finding what are the corresponding ticket for those commits? Who is the author? when are the changes introduced. Although there are tools to use, sometimes you still need to read the history and it is just unreadable and hard to understand. I bet you see much worser history than this one

It makes reading painful. Despite chance of reading development history is less than reading code, it is still very helpful to have a clean readable linear history. Today, I am going to share some experience about keeping a readable history.

### Use SourceTree

It is never pleasant to use a command line tool when there is a nice GUI tool. I hate ASCII git history graph, they are just ugly. Luckly, we have an awesome free GUI tool to use - SourceTree.

### Always create a new branch for ticket

When you are working for an ticket or issue, you should always create a branch for it.

You should try to keep commits in the branch only for resolving that ticket. It is okay to have some typo corrections or minor changes in it. However, if you put unrelative commit for major changes into the branch, other developers cannot know that you have some off topic changes in that branch easily.
By doing branch only for one purpose, here you have

• Easier to understand what this branch is for
• Easier to reverse changes introduced by this branch

Here you are working on a new branch, then you can commit

After then, you have several commits and they are good to merge

We want to keep the branch in history, so remember to use non-fast-forward merge, check the Do not fast-forward when merging, always create commit option

It's time to merge, first, right click one the master branch and click Checkout. You should be at master branch now. Then, right click new-feature branch and click Merge.

Remember to check Commit merged changes immediately to make a new commit directly.

Whoa, here we are, a beautiful linear history still with branch information.

### Always rebase before merge

For now, you are writing the next awesome feature - foobar-2000!

Things go well, however, in the mean time, a new branch is merged from other guys repo. Oh my god, foobar 3000! awesome!

Okay, let's see what it looks like to merge it directly

Ugly, let's try something better - rebase. First, right click on foobar-2000 and click checkout. Then right click on master and click Rebase

This is better! And we can merge it like before

### Rebase and force push

As usual, you keep work on this nice and beautiful linear history, however, you won't feel safe to leave your commits on your local machine will you? We always push our working branch to GitHub to keep it safe, get reviews and feedbacks from others

Yes, again, you may hate this, there is another branch is merged into the master.

Okay, you said, this is not a big deal, I can always rebase and merge as usual. Here you rebase

Well, it is still under development, you want to push to your fork, but not to merge it. Then you push, and oops!

So what just happened?

As you can see there is a branch origin/foobar-bugfix, that the HEAD in your origin remote, which is, your GitHub fork repo. When you are pushing your local foobar-bugfix to the fork repo, it means the remote one will be overwritten. It is a little bit dangerous to overwrite a HEAD in Git repo. So, it doesn't allow you to do this by default.

Again, it has risk, so you need to be sure what you are doing (although the commit will still stored in the repo, but without HEAD you cannot find them easily, you will need some low level operations to get them back). In this case, we just want to rebase our commits on the master and push it to our own repo, that won't be a big problem in most cases. It appears SourceTree doesn't support --force push, so you need to click Terminal button. Then type

This will force git to push your local branch to overwrite the remote one. Let's see

Here we are

(Tips: you can click Repository and Refresh Remote Status to update your git history status on the UI)

Notice When you are the only one working on the branch, it is fine to do a force push, otherwise, be careful. For more details, please reference to
http://git-scm.com/book/ch3-6.html#The-Perils-of-Rebasing

### Always rebase current developing branch when there are new commits

As you know, there would be conflicts when you are doing merge or rebase. When there are more new commits in the master branch, the more likely you are going to have a tons of confliction. So, it is a good practice to always rebase your working branch on the master when there are new commits on it.

However, sometimes, you have some works on the branch, but they are not committed, you don't want to commit something in middle like this. But when you are doing rebase, Git won't allow you to have change to files in the workspace. In this case, you can use Stash. Click Repository and Stash Changes.

Then you can see your stash appears in the sidebar

After you finish the rebasing, you can right click on the stash and click Apply Stash, then here you are. Your saved changes are back to working space.

Again, happy ending :D

### Use interactive rebase to clean dirty commits

People make mistake. Sometimes there are small commits which are for formatting or fixing typo. And these commits are all based on your own newly submitted commits.

In this case, you might wonder would it be nice to adopt some black magic to make your stupid mistakes disappear? Well, yes, there is magic. You can use interactive rebase to squash some commits into pervious one. Now, you are at awesome branch, right click on master branch, then click Rebase children of xxx interactively. Then you will see interface like this

Select those mistake fixing commits, and click Squash with previous. And you will see multiple commits to be put altogether. And you can click Edit message to modify the commit message of the squashed commit.

Then press OK and look this!

Just like what he said in Mad Man, nothing happened!

This is actually even more powerful, you can arrange order of commits around, edit commit messages, delete specific commits. But be careful, like what Spider Man told you

It is kind of history rewrite, it is fine to use it on the branch only you are working on, you should not use it on a shared branch unless you know exactly what you are doing.

### The benefits of readable history

That's it, the history is linear, another wonderful day.

Readable history doeson't only look beautiful, it provides easy-to-understand development history. All team members in the project can follow the development progress easily. When something goes wrong, it is also easier to trace down the problem, especially when you need to fix it as soon as possible.

You know, testing is important for software development. With good continuous integration and testing there, you have confidence that your software has a certain quality. It doesn't mean your software is prefect then, however, when things broken, you can catch them and fix it. Jenkins is a pretty awesome and easy-to-use open source continuous integration tool, but for developing my own hobby open source projects, I just don't want to rent a server and run Jenkins. So, I am always wondering, wouldn't it be nice to have something like CI as service? I can just put my code there, and it could do the rest for me.

### Meet the Travis-CI

Recently, I meet an awesome service which really fits what I want - Travis-CI. It has GitHub integration, so all you have to do is to grant some GitHub permissions to Travis-CI and write a configuration .travis.yml file like this:

Then it works like a charm, you can see the building results here https://travis-ci.org/victorlin/pyramid_genshi

The best part of it is, if you are testing open source project, it is totally free. I really love it!

### Test Chef code on it, a dream in the dream

Currently, I am working on an open source Chef project for deployment. I think it would be neat to setup Travis-CI for testing my Chef code, so I tried to run Vagrant with VirtualBox on it. However, it turns out that Travis-CI testing environment is already running under a virtual machine, and it is based on OpenVZ, which is actually a container rather than hardware simulation. I cannot find a way to install VirtualBox with Travis-CI. Saddly, this is not Inception, I cannot have a dream in the dream.

Fine, I changed my mind then, it is already a virtual environment, why don't I just run my Chef code against the Travis-CI environment?

### The missing feature - interactive debugging

Okay, it appears that it is a better idea to run Chef code against Travis-CI instance instead of to have a dream in the dream. Nevertheless, it is still a pain in ass to make my Chef code works on Travis CI. You can't never get the thing done at the very first time. And you always have to push a commit to kick it starting to build, so it results in a painful trial and error loop looks like this

In the process, you will see error output like this

• The PostgreSQL server failed to start. Please check the log output. ...fail!

Okay... check the log out? but how? I can add a "cat /path/to/log_file" to the .travis.yml and push the commit to make it run again, but it would only be another painful waiting. I tried to reproduce the testing environment with Vagrant on my machine, but I can only find some outdated information there and some important Vagrant boxes are missing.

Like what Niko said in GTA IV

This no touching rule is killing me!

This no touching rule to Travis CI is also killing me. I think it would be nice to have a chance to interact with the CI environment after something went wrong. Fortunately, I contacted the support, they said they are working on it.

### Green dots make you happy

Once I setup Travis-CI for one project, after I realize how easy it is, I just can't wait to setup for most of my open source projects. When there are red dots, you really want to erase them all. However, when it is all in green, like this

That's really pleasant to see a all green list in Travis-CI. If you are also an open source lover, just try it now, you will love it :)

When I am running my website, something troubles me.  While there is a bug in the production server, I need to modify code and restart them.  Sounds fine, right? Yes, for most of web servers, they are stateless,  it is not a big deal to restart them whenever you want, but it is not true for me, they are realtime audio streaming servers. When you restart a realtime streaming server, it means audience connected to the server will be interrupted.  Here is a diagram shows the problem:

You can see there are some gaps in the plot, that’s caused by server restarting.  Of course, for users, that would definitely be a bad experience.  Therefore, I’m thinking how to solve this problem recently.  Before we go into the design, let’s look the reasons for restarting server first.

• To deploy new version of program
• To fix bugs
• The process is using to much memory
• To reload environment, ulimit -n for example (the limit count of file descriptor under unix-like environment)
• To migrate from host A to host B

For simply deploying new version of program, we can use reload function of Python to reload modules.  But there are some problems, reload function only rerun the module, those created instances are still there (if they are copied into some namespaces), it might work if the change is minor.  On the other hand,  reloading can’t solve memory usage problem, process environment change problem.  And here comes the final reason, to migrate service from host A to B.  Indeed, it is difficult not to make any down time for such migration, we only focus on migration in same host.

### The solution

The biggest challenge is - how to migrate those existing connections? I did some research and have an idea in my mind. Create a new process, and transfer those connections (socket file descriptors) to the new process, and shut the old one down. Following diagrams illustrate my solution.

The Master is a process which is in charge of managing migration and receiving commands.  And the process A is for running the service.

Before we perform the migration, the Manager spawns process B, and wait it says "i'm ready".

When process B says “Hey! I’m ready”, then the manager tells process A to send the connection state descriptor to process B.  Process B receives the state, and takes over the responsibility of running service.

Finally, process B took over the service, then master tells process A “You are done.” and the process A kills himself.

That’s it, the service was migrated from one process to the other, and there is no any down time.

### The problem – socket transfer

The idea sounds good, right? But still, we have some technical problem to solve. It is “How to transfer socket (file descriptor) from one process to another?”. To solve this problem, I did study, and eventually found solutions.

### Child process

For most of unix-like OS, child processes inherit file descriptors from their parent. Of course we can use this feature to migrate our service, but however, it got its limitation. You can only transfer file descriptors from parent to child process.

#### Sendmsg

Another way to achieve same goal is, to use sendmsg through a unix domain socket to send the file descriptors. With sendmsg, you can transfer file descriptors to almost any processes you like, that’s much flexible.

#### A simple demonstration

To simplify the example, we only implement process A and process B here, it is quite enough for two processes to complete the migration. Before we go into the details, there is another problem to solve, sendmsg is not a standard function in Python. Fortunately, there is a third-party package sendmsg provides this function. To install sendmsg, just type

And here you are. Okay, following are the two programs.

a.py

b.py

The a.py accepts an inet socket and opens an unix domain socket, waits b.py to take over the service.  And here we run b.py, it connects to a.py and receives the fd of socket and takes it over and run the service.

### The result

As the result shows, there is no down time between two Internet service processes migration in the same host.

It can be very useful to employ this trick in Internet programs which need to keep active connections. You can even migrate connections from a Python program to a C/C++ program, or vice versa. Also, to keep the memory usage low, you can migrate the service to the same program in different process periodically.

However, although it works in our demo program, but for real life complex server programs, it would be very difficult to implement migration mechanism like this. You need to dump connection state completely in one process and restore them in the other process. Due to the fact internal state of connections can be very complex, it could be impractical.