#第八章-处理旧版本代码
In this chapter, we will discuss the following topics:
• Reading a Django code base
• Discovering relevant documentation
• Incremental changes versus full rewrites
• Writing tests before changing code
• Legacy database integration
It sounds exciting when you are asked to join a project. Powerful new tools and cutting-edge technologies might await you. However, quite often, you are asked to work with an existing, possibly ancient, codebase.
To be fair, Django has not been around for that long. However, projects written for older versions of Django are sufficiently different to cause concern. oeties, having the entire source code and documentation might not be enough.
Yeah, said Brad, Where is Hart? Mada hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked, was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."
There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this." If you are asked to recreate the environment, then you might need to fumble with the configuration, database settings, and running services locally or on the network. There are so many pieces to this puzzle that you might wonder how and where to start.
Understanding the Django version used in the code is a key piece of information. As Django evolved, everything from the default project structure to the recommended best practices have changed. Therefore, identifying which version of Django was used is a vital piece in understanding it.
#####Change of Guards Sitting patiently on the ridiculously short beanbags in the training room, the SuperBook team waited for Hart. He had convened an emergency go-live meeting. Nobody understood the "emergency" part since go live was at least 3 months away.
Madam O rushed in holding a large designer coffee mug in one hand and a bunch of printouts of what looked like project timelines in the other. Without looking up she said, "We are late so I will get straight to the point. In the light of last week's attacks, the board has decided to summarily expedite the SuperBook project and has set the deadline to end of next onth. ny questions?
Yeah, said Brad, Where is Hart? Mada hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked, was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."
There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this."
"That's correct," said Evan from the far end of the room. "Nearly there," he smiled at others, as they shifted focus to him. Madam O peered above the rim of her glasses and smiled almost too politely. "Considering that we already have an extremely well-tested and working Archiver in our Sentinel code base, I would recommend that you leverage that instead of creating another redundant system."
"But," Steve interrupted, "it is hardly redundant. We can improve over a legacy archiver, can't we? If it isn't broken, then don't fix it, replied Madam O tersely. He said, "He is working on it," said Brad almost shouting, What about all that work he has already finished?
van, how uch of the work have you copleted so far? asked , rather impatiently. "About 12 percent," he replied looking defensive. veryone looked at hi incredulously. What? That was the hardest percent" he added.
O continued the rest of the meeting in the same pattern. Everybody's work was reprioritied and shoehorned to fit the new deadline. s she picked up her papers, readying to leave she paused and removed her glasses.
"I know what all of you are thinking... literally. But you need to know that we had no choice about the deadline. All I can tell you now is that the world is counting on you to meet that date, somehow or other." Putting her glasses back on, she left the room.
"I am definitely going to bring my tinfoil hat," said Evan loudly to himself.
##Finding the Django version Ideally, every project will have a requirements.txt or setup.py file at the root directory, and it will have the exact version of Django used for that project. Let's look for a line similar to this:
Django==1.5.9
Note that the version number is exactly mentioned (rather than Django>=1.5.9), which is called pinning. Pinning every package is considered a good practice since it reduces surprises and makes your build more deterministic.
Unfortunately, there are real-world codebases where the requirements.txt file was not updated or even completely missing. In such cases, you will need to probe for various telltale signs to find out the exact version.
##Activating the virtual environment In most cases, a Django project would be deployed within a virtual environment. Once you locate the virtual environment for the project, you can activate it by jumping to that directory and running the activated script for your OS. For Linux, the command is as follows:
$ source venv_path/bin/activate
Once the virtual environment is active, start a Python shell and query the Django version as follows:
$ python
>>> import django
>>> print(django.get_version())
1.5.9
The Django version used in this case is Version 1.5.9.
Alternatively, you can run the manage.py script in the project to get a similar output:
$ python manage.py --version
1.5.9
However, this option would not be available if the legacy project source snapshot was sent to you in an undeployed form. If the virtual environment (and packages) was also included, then you can easily locate the version number (in the form of a tuple) in the __init__.py
file of the Django directory. For example:
$ cd envs/foo_env/lib/python2.7/site-packages/django
$ cat __init__.py
VERSION = (1, 5, 9, 'final', 0)
...
If all these methods fail, then you will need to go through the release notes of the past Django versions to deterine the identifiable changes for exaple, the AUTH_PROFILE_MODULE setting was deprecated since Version 1.5) and match them to your legacy code. Once you pinpoint the correct Django version, then you can move on to analyzing the code.
##文件都放在哪里了?这可不是PHP啊
最大的一个困难点是用到的场合,特别是如果你来自PHP或者APS.NET的世界,即,源文件并不位于web服务器的文档根目录,而目录却通常命名为wwwroot
或者public_html
。此外,代码的目录结构和网站的URL结构之间并没有直接的关系。
In fact, you will find that your Django website's source code is stored in an obscure path such as /opt/webapps/my-django-app. Why is this? ong any good reasons, it is often ore secure to ove your confidential data outside your public webroot. This way, a web crawler would not be able to accidentally stumble into your source code directory.
As you would read in the Chapter 11, Production-ready the location of the source code can be found by exaining your web server's configuration file. Here, you will find either the environment variable DJANGO_SETTINGS_MODULE being set to the module's path, or it will pass on the request to a WI server that will be configured to point to your project.wsgi file.
##Starting with urls.py Even if you have access to the entire source code of a Django site, figuring out how it works across various apps can be daunting. It is often best to start from the root urls.py URLconf file since it is literally a ap that ties every request to the respective views.
With normal Python programs, I often start reading from the start of its execution—say, from the top-level main module or wherever the main check idiom starts. In the case of Django applications, I usually start with urls.py since it is easier to follow the ow of execution based on various URL patterns a site has.
In Linux, you can use the following find command to locate the settings.py file and the corresponding line specifying the root urls.py:
$ find . -iname settings.py -exec grep -H 'ROOT_URLCONF' {} \;
./projectname/settings.py:ROOT_URLCONF = 'projectname.urls'
$ ls projectname/urls.py
projectname/urls.py
Jumping around the code Reading code sometimes feels like browsing the web without the hyperlinks. When you encounter a function or variable defined elsewhere, then you will need to jup to the file that contains that definition. oe IDs can do this autoatically for you as long as you tell it which files to track as part of the project.
If you use Emacs or Vim instead, then you can create a TAGS file to quickly navigate between files. Go to the project root and run a tool called Exuberant Ctags as follows:
find . -iname "*.py" -print | etags -
This creates a file called TAGS that contains the location inforation, where every syntactic unit such as classes and functions are defined. In Emacs, you can find the definition of the tag, where your cursor or point as it called in acs is at using the M-.
command.
While using a tag file is extreely fast for large code bases, it is quite basic and is not aware of a virtual environent where ost definitions ight be located. n excellent alternative is to use the elpy package in acs. It can be configured to detect a virtual environent. uping to a definition of a syntactic eleent is using the same M-. coand. However, the search is not restricted to the tag file. o, you can even jup to a class definition within the Django source code sealessly.
##Understanding the code base It is quite rare to find legacy code with good documentation. Even if you do, the documentation might be out of sync with the code in subtle ways that can lead to further issues. Often, the best guide to understand the application's functionality is the executable test cases and the code itself.
The official Django docuentation has been organied by versions at https://docs. djangoproject.com. On any page, you can quickly switch to the corresponding page in the previous versions of Django with a selector on the bottom right-hand section of the page:
图片:略
In the same way, documentation for any Django package hosted on readthedocs. org can also be traced back to its previous versions. For example, you can select the documentation of django-braces all the way back to v1.0.0 by clicking on the selector on the bottom left-hand section of the page:
图片:略
##Creating the big picture Most people find it easier to understand an application if you show them a high-level diagram. While this is ideally created by someone who understands the workings of the application, there are tools that can create very helpful high-level depiction of a Django application.
A graphical overview of all models in your apps can be generated by the graph_models management command, which is provided by the django-command-extensions package. As shown in the following diagram, the model classes and their relationships can be understood at a glance:
图片:略
Model classes used in the SuperBook project connected by arrows indicating their relationships
This visualization is actually created using PyGraphviz. This can get really large for projects of even medium complexity. Hence, it might be easier if the applications are logically grouped and visualized separately.
#####PyGraphviz Installation and Usage If you find the installation of Pyraphvi challenging, then don't worry, you are not alone. Recently, I faced numerous issues while installing on Ubuntu, starting from Python 3 incompatibility to incomplete documentation. To save your time, I have listed the steps that worked for me to reach a working setup.
On Ubuntu, you will need the following packages installed to install PyGraphviz:
$ sudo apt-get install python3.4-dev graphviz libgraphviz-dev pkg-config
Now activate your virtual environment and run pip to install the development version of PyGraphviz directly from GitHub, which supports Python 3:
$ pip install git+http://github.com/pygraphviz/pygraphviz.git#egg=pygraphviz
Next, install django-extensions and add it to your INSTALLED_ APPS. Now, you are all set.
Here is a saple usage to create a raphi dot file for just two apps and to convert it to a PNG image for viewing:
$ python manage.py graph_models app1 app2 > models.dot
$ dot -Tpng models.dot -o models.png
##Incremental change or a full rewrite? Often, you would be handed over legacy code by the application owners in the earnest hope that most of it can be used right away or after a couple of minor tweaks. However, reading and understanding a huge and often outdated code base is not an easy job. Unsurprisingly, most programmers prefer to work on greenfield developent.
In the best case, the legacy code ought to be easily testable, well documented, and exible to work in odern environents so that you can start aking incremental changes in no time. In the worst case, you might recommend discarding the existing code and go for a full rewrite. Or, as it is commonly decided, the short-term approach would be to keep making incremental changes, and a parallel long-term effort might be underway for a complete reimplementation.
A general rule of thumb to follow while taking such decisions is—if the cost of rewriting the application and maintaining the application is lower than the cost of maintaining the old application over time, then it is recommended to go for a rewrite. Care must be taken to account for all the factors, such as time taken to get new programmers up to speed, the cost of maintaining outdated hardware, and so on.
Sometimes, the complexity of the application domain becomes a huge barrier against a rewrite, since a lot of knowledge learnt in the process of building the older code gets lost. Often, this dependency on the legacy code is a sign of poor design in the application like failing to externalize the business rules from the application logic.
The worst form of a rewrite you can probably undertake is a conversion, or a mechanical translation from one language to another without taking any advantage of the existing best practices. In other words, you lost the opportunity to modernize the code base by removing years of cruft.
Code should be seen as a liability not an asset. As counter-intuitive as it might sound, if you can achieve your business goals with a lesser amount of code, you have dramatically increased your productivity. Having less code to test, debug, and maintain can not only reduce ongoing costs but also make your organization ore agile and exible to change.
Code is a liability not an asset. Less code is more maintainable.
Irrespective of whether you are adding features or trimming your code, you must not touch your working legacy code without tests in place.
##Write tests before making any changes In the book Working Effectively with Legacy Code, Michael Feathers defines legacy code as, simply, code without tests. He elaborates that with tests one can easily odify the behavior of the code quickly and verifiably. In the absence of tests, it is impossible to gauge if the change made the code better or worse.
ften, we do not know enough about legacy code to confidently write a test. Michael recommends writing tests that preserve and document the existing behavior, which are called characterization tests.
Unlike the usual approach of writing tests, while writing a characterization test, you will first write a failing test with a duy output, say X, because you don't know what to expect. When the test harness fails with an error, such as "Expected output X but got Y", then you will change your test to expect Y. So, now the test will pass, and it becomes a record of the code's existing behavior.
Note that we might record buggy behavior as well. After all, this is unfamiliar code. Nevertheless, writing such tests are necessary before we start changing the code. Later, when we know the specifications and code better, we can fix these bugs and update our tests (not necessarily in that order).
##Step-by-step process to writing tests Writing tests before changing the code is similar to erecting scaffoldings before the restoration of an old building. It provides a structural framework that helps you confidently undertake repairs.
You ight want to approach this process in a stepwise anner as follows:
1. Identify the area you need to make changes to. Write characterization tests focusing on this area until you have satisfactorily captured its behavior.
2. Look at the changes you need to ake and write specific test cases for those. Prefer smaller unit tests to larger and slower integration tests.
3. Introduce incremental changes and test in lockstep. If tests break, then try to analyze whether it was expected. Don't be afraid to break even the characterization tests if that behavior is something that was intended
to change.
If you have a good set of tests around your code, then you can quickly find the effect of changing your code.
On the other hand, if you decide to rewrite by discarding your code but not your data, then Django can help you considerably.
##Legacy databases There is an entire section on legacy databases in Django documentation and rightly so, as you will run into them many times. Data is more important than code, and databases are the repositories of data in most enterprises.
You can odernie a legacy application written in other languages or frameworks by importing their database structure into Django. As an immediate advantage, you can use the Django admin interface to view and change your legacy data.
Django makes this easy with the inspectdb management command, which looks as follows:
$ python manage.py inspectdb > models.py
当你的设置文件使用旧版本的数据库配置过了,这个命令可以自动地生成应用到模型文件的Python代码。
如果你正在把该方法集成到就旧版本数据库,这里给你一些最佳实践建议:
• Know the limitations of Django ORM beforehand. Currently, multicolumn (composite) primary keys and NoSQL databases are not supported.
• Don't forget to manually clean up the generated models, for example, remove the redundant 'ID' fields since Django creates the autoatically.
• Foreign ey relationships ay have to be anually defined. In soe databases, the autogenerated odels will have the as integer fields suffixed with _id).
• Organize your models into separate apps. Later, it will be easier to add the views, forms, and tests in the appropriate folders.
• Remember that running the migrations will create Django's administrative tables (django_* and auth_*) in the legacy database.
In an ideal world, your auto-generated models would immediately start working, but in practice, it takes a lot of trial and error. Sometimes, the data type that Django inferred might not match your expectations. In other cases, you might want to add additional meta information such as unique_together to your model.
Eventually, you should be able to see all the data that was locked inside that aging PHP application in your familiar Django admin interface. I am sure this will bring a smile to your face.
##Summary In this chapter, we looked at various techniques to understand legacy code. Reading code is often an underrated skill. But rather than reinventing the wheel, we need to judiciously reuse good working code whenever possible. In this chapter and the rest of the book, we emphasize the importance of writing test cases as an integral part of coding.
In the next chapter, we will talk about writing test cases and the often frustrating task of debugging that follows.