In reality, logging is important. When you transfer money, there are transfer records. When an airplane is flying, black box (flight data recorder) is recording everything. If something goes wrong, people can read the log and has a chance to figure out what happened. Likewise, logging is important for system developing, debugging and running. When a program crashes, if there is no logging record, you have little chance to understand what happened. For example, when you are writing a server, logging is necessary. Following screenshot is the log file of a EZComet.com server.
Without the log, I can hardly know what’s wrong if a service goes down. Not only for the servers, logging is also important for desktop GUI applications. For instance, when your program crashes on your customer’s PC, you can ask them to send the log files to you, and you may can figure why. Trust me, you will never know what kind of strange issues there will be in different PC environments. I once received an error log report like this
And eventually figure out that the customer PC is infected by a virus which makes call to gethostname failed. See, how can you even know this if there is no log to read?
print is not a good idea
Although logging is important, not all developers know how to use them correctly. I saw some developers insert print statements when developing and remove those statements when it is finished. It may looks like this
It works when the program is a simple script, but for complex systems, you better not to use this approach. First of all, you cannot leave only important messages in the log, you may see a lots of garbage messages in the log, but can’t find anything useful.You also cannot control those print statements without modifying code, you may forgot to remove those unused prints. And all printed messages go into stdout, which is bad when you have data to output to stdout. Of course you can print messages to stderr, but still, it is not a good practice to use print for logging.
Use Python standard logging module
So, how do you do logging correctly? It’s easy, use the standard Python logging module. Thanks to Python community, logging is a standard module, it was well designed to be easy-to-use and very flexible. You can use the logging system like this
You can run it and see
What’s different between the “print” approach you asked. Well, of course there are benefits:
- You can control message level and filter out not important ones
- You can decide where and how to output later
There are different importance levels you can use, debug, info, warning, error and critical. By giving different level to logger or handler, you can write only error messages to specific log file, or record debug details when debugging. Let’s change the logger level to DEBUG and see the output again
As you can see, we adjust the logger level to DEBUG, then debug records appear in output. You can also decide how these messages are processed. For example, you can use a FileHandler to write records to a file.
There are different handlers, you can also send records to you mailbox or even a to a remote server. You can also write your own custom logging handler. I’m not going to tell you details, please reference to official documents: Basic Tutorial, Advanced Tutorial and Logging Cookbook.
Write logging records everywhere with proper level
With flexibility of the logging module, you can write logging record everywhere with proper level and configure them later. What is the proper level to use, you may ask. Here I share my experience.
In most cases, you don’t want to read too much details in the log file. Therefore, debug level is only enabled when you are debugging. I use debug level only for detail debugging information, especially when the data is big or the frequency is high, such as records of algorithm internal state changes in a for-loop.
I use info level for routines, for example, handling requests or server state changed.
I use warning when it is important, but not an error, for example, when a user attempts to login with wrong password or connection is slow.
I use error level when something is wrong, for example, an exception is thrown, IO operation failure or connectivity issue.
I seldom use critical, you can use it when something really bad happen, for example, out of memory, disk is full or a nuclear meltdown (Hope that will not happen :S).
Use _name_ as the logger name
You don’t have to set the logger name as _name, but by doing that, it brings us some benefits. The variable \name_ is current module name in Python. For example, you call logger.getLogger(_name_) in a module “foo.bar.my_module”, then it is logger.getLogger(“foo.bar.my_module”). When you need to configure the logger, you can configure to “foo”, then all modules in “foo” packages shares same configuration. You can also understand what is the module of message when reading the log.
Capture exceptions and record them with traceback
It is always a good practice to record when something goes wrong, but it won’t be helpful if there is no traceback. You should capture exceptions and record them with traceback. Following is an example:
By calling logger methods with exc_info=True parameter, traceback is dumped to the logger. As you can see the result
You can also call logger.exception(msg, *args), it equals to logger.error(msg, exc_info=True, *args).
Do not get logger at the module level unless disable_existing_loggers is False
You can see a lots of example out there (including this article, I did it just for giving example in short) get logger at module level. They looks harmless, but actually, there is a pitfall – Python logging module respects all created logger before you load the configuration from a file, if you get logger at the module level like this
And you expect to see the records appear in log, but you will see nothing. Why? Because you create the logger at module level, you then import the module before you load the logging configuration from a file. The logging.fileConfig and logging.dictConfig disables existing loggers by default. So, those setting in file will not be applied to your logger. It’s better to get the logger when you need it. It’s cheap to create or get a logger. You can write the code like this:
By doing that, the loggers will be created after you load the configuration. The setting will be applied correctly.
Since Python2.7, a new argument name “disable_existing_loggers” to fileConfig and dictConfig (as a parameter in schema) is added, by setting it to False, problem mentioned above can be solved. For example:
Use JSON or YAML logging configuration
You can configure your logging system in Python code, but it is not flexible. It’s better to use a logging configuration file. After Python 2.7, you can load logging configuration from a dict. It means you can load the logging configuration from a JSON or YAML file. Although you can use the old .ini style logging configuration, it is difficult to read and write. Here I show you an logging configuration example in JSON or YAML
Following recipe shows you how to read logging configuration from a JSON file:
One advantage of using JSON configuration is that the json is a standard library, you don’t need to install it. But personally, I prefer YAML. It’s very clear to read and easy to write. You can also load the YAML configuration with following recipes
Now, to setup logging, you can call setup_logging when starting your program. It reads logging.json or logging.yaml by default. You can also set LOG_CFG environment variable to load the logging configuration from specific path. For example,
or if you prefer YAML
Use rotating file handler
If you use FileHandler for writing logs, the size of log file will grow with time. Someday, it will occupy all of your disk. In order to avoid that situation, you should use RotatingFileHandler instead of FileHandler in production environment.
Setup a central log server when you have multiple servers
When you have multiple servers and different log files. You can setup a central log system to collect all important (warning and error messages in most cases). Then you can monitor it easily and notice what’s wrong in your system.
I’m glad that Python logging library is nicely designed, and the best part is that it is a standard library, you don’t have to choose. It is flexible, you can write your own handlers and filters. There are also third-party handlers such as ZeroMQ logging handler provided by pyzmq, it allows you to send logging messages through a zmq socket. If you don’t know how to use the logging system correctly, this article might be helpful. With good logging practice, you can find issues in your system easier. It’s a nice investment, don’t you buy it? :D