To get my code, start with Getting code…
For Arduino, start with Arduino on Eclipse, and then follow the links…
For Agile, start with Agile Methodologies…
Categories
Recent Comments
-
Recent Posts
Meta
Archives
To get my code, start with Getting code…
For Arduino, start with Arduino on Eclipse, and then follow the links…
For Agile, start with Agile Methodologies…
The value of test driven development (TDD) is being shown in many projects and by many people. It is being shown in environments on-the-job, not in a rarefied or artificial environment. It isn’t theoretical but borne out in day-to-day work of developing real applications.
But what is the underlying mechanism of unit tests (UTs)? In the human effort required to program a piece of software, what is it that UTs bring to the table? Simply put, UTs form a set of “mini-requirements” around a function (Code Under Test, CUT). The CUT behaves according to the UTs that run it or the asserts within the UTs fail and so warn the software developer. So UTs provide a developer a powerful one-two punch: they can use UTs to specify how a CUT should behave and they can run the UTs to double-check the CUT actually behaves as the requirement specifies.
A small scale example is:
void test_square() { assert(f(0), 0); assert(f(1), 1); assert(f(-1), 1); assert(f(2), 4); assert(f(-2), 4); assert(f(3), 9); assert(f(-3), 9); } |
The function f() returns the square of it’s parameter. The asserts double check that behavior for representative positive and negative values and for zero.
A large scale example occurred on a project where we had approximately 20K+ lines of code written in C# (Linux Mono) and we wanted to convert it to Java. Luckily we had a full set of UTs for it. The conversion process was:
At the end of this process, our code base was converted to Java and we had strong confidence it behaved exactly the same as the old code base in C#.
But in some ways UTs are worse than a written set of requirements. To find out what the CUT actually does, you have to read the UT (which is yet more code) and infer the expected behavior. Here’s an example:
void test_func() { assert(f(1), 1); assert(f(2), 2); assert(f(3), 3); assert(f(4), 5); } |
What is f() supposed to be doing? Hard to tell, but perhaps it can be guessed.
If the UT is written well, this effort can be much simpler.
void test_prime() { // zeroth prime is not implemented try { f(0); assert(false); } catch (ParameterRangeExcp ex) { assert(true); } // negative indices should not be used try { f(-1); assert(false); } catch (BadParameterExcp ex) { assert(true); } // try some representative samples assert(f(1), 1); assert(f(2), 2); assert(f(3), 3); assert(f(4), 5); // can only handle the first 10 primes try { f(11); assert(false); } catch (ParameterRangeExcp ex) { assert(true); } } |
In this case it looks like f() returns one of the first 10 primes given an index. Well maybe it does and maybe it doesn’t, but it looks plausible enough based on the UT.
Once a UT is written and passing, it is tempting to ignore them after that. This can lead to false positives. For example a change is made to the CUT, and, by chance, the existing UTs continue to pass. But because they are passing, there is little impetus to the developer to review them to see if they should be changed.
A UT only tests what it tests. Does a set of UTs, in fact, test all of the functionality of the CUT? If a UT is missing, if there is some functionality of the CUT that isn’t tested, all of the existing UTs will blithely pass. There is no indication that a test is missing, that a part of the functionality is untested. And there’s no automatic way to determine it. It is up to the developer to ensure it (see Code Reviews for one way to double-check). The use of coverage tools can also help.
In other ways, UTs are better than a written set of requirements. You can run the UTs and explicitly find out if the CUT is behaving correctly or not. A text-based set of requirements can’t be run that way. Although there are some symbolic requirement systems (e.g. Z) that have verifier utilities that can check for bugs, they only find problems in the requirements themselves. There are none that I am aware of that check the requirements against the actual code.
The UTs create a close simulation of the final run-time environment. UTs execute the CUT in roughly the same run-time environment as when the CUT runs in the “real” application. This means we can trust the CUT is behaving as it will be in its final resting place in the application. The strong similarity between the two environments gives solid evidence for the validity of the assert results.
If a set of UTs are built up as development progresses, the portion of a CUT’s functionality that is relevant to the rest of the system will tend to be fully tested. There may be parts of the CUT’s functionality that is untested but they are very likely irrelevant because they aren’t used or are trivial. This implies that, on the whole, all of the application code is fully tested and more specifically, fully tested for all of the relevant behavior of the system as a whole.
But note that this also implies that writing UTs post-fact doesn’t work very well. There is no conclusive way to tell which part of a CUTs functionality are relevant. A profiler can give us a clue and known unused paths through the CUT could raise an exception, but neither technique is conclusive. But having too much unit testing is clearly better than not having enough!
There are two kinds of unit tests in TDD, both of which are heavily related to each other.
The first kind of test will confirm that the code you’ve just added works as you expect in and around the existing functionality. It examines slightly different but related parts of the solution space that are important to you. These do not have to be comprehensive, but they do have to be relevant. You write these until your anxiety is satisfied. This kind was explored in the first section.
The second kind is a speculative test, it will cause you to write code for a very small part of new functionality. The new code satisfies some small part of your spec, requirements, task, Story, etc.
The second kind moves the design forward in an area, the first kind fleshes out the design in that area. Bill Wake calls the second set “generative” and the first set “elaborative”.
Hardware folks use oscilloscopes to peek into a circuit while it is running. It shows the behavior of the circuit in real-time, in actual operation. UTs perform the same function for software.
A simple example is to use a UT to explore how a system call works. Don’t run the whole application just to find out how the Date function works in leap years. Isolate the function and test it’s behavior. Isolate the small, rare conditions and test them directly.
The odds of your code working correctly in ALL conditions suddenly went up. And that knowledge is pure advantage to you. Once you have a solid understanding of the low-level function behavior, write the next level design.
A more sophisticated use is to use of UTs to write tentative high(-er) level code and test it piecemeal, as you write it.
These UTs are exploration. If your code is straightforward you don’t need it. But sometimes complex designs or algorithms are necessary. Use UTs to display or check various aspects of the code as you develop the algorithm. Design X doesn’t work because Y happens when the Date function returns a particular function. But Design Z does work correctly in all these cases.
Once this code is working, write the final “elaborative” UTs. These can be simple tweaks or even rewrites of the “generative” versions. But sometimes they are completely scrapped and new UTs need to be written from scratch. This can occur if the generative tests do not clearly show the relevant expected behavior of the system. Perhaps they show some actual behavior but not as important or relevant. Or they don’t transmit the expected behavior as clearly or cleanly as they could.
This slide show has a very short (30 slides) introduction to XP introtoxp
The key points the presentation covers are:
This are some notes about refactoring legacy code that I’ve come up with from my own experiences doing the same. I do not claim it is in line with the “official” XP doctrine on refactoring. Any errors, idiosyncracies and other perturbations are strictly attributable to me.
The first question to answer is: Why do the refactoring in the first place? Considering the amount of time refactoring legacy code can use up, it is an important question to address.
There are two broad categories of refactorings: the architectural or redesign refactorings, and the “little” refactorings. The architectural refactorings readjust the interaction of various parts of the code base. They are used to clean the code at a higher level. I am not just cleaning up the code, I’m redesigning on the fly. For architectural refactorings, the emphasis is on class extraction and fixing the interaction between classes. When I’m done, the interface between classes and groups of classes is clean and simple. Individual classes have a single conceptualization and are generally small and simple.
The little refactorings are meant to clean the code at a low level. When I’m done the code for individual classes is clean, simple and clear. The code is transparent in it what it is trying to achieve. I can read the code and with little effort or comments determine what it is doing.
When both types of refactorings are complete, the code is easy to follow at a high level, the architecture is transparent and how each component of the architecture achieves its function is transparent. The code base is more maintainable when I’m done because other developers can read the code quickly and easily.
The next question to consider is: When am I done? The answer is it depends on what I want to achieve. Some examples:
If I’m clear on what I’m trying to achieve, it’s easier to know when I’m done. If it is to uncover and fix bugs, I’m done when I’ve found the bug and it’s fixed (with the new UTs proving it). If it is to understand the code, I’m done when I feel comfortable with the depth of my understanding. If it’s merciless refactoring, I’m done when I cannot see any other conceptual chunks to cleave into a separate class or method.
I’ve also taken the incremental approach and done several passes at the code, starting with coarse-grain refactorings and refining as I get more familiar with the code (and have more tests in place). This approach can span many weeks or even months. The code is slowly migrated to a fully refactored and unit tested form.
Let me state up front: I like rules-of-thumb, and heuristic and probabilistic reasoning. Therefore my approach to refactoring is based heavily on a few rules-of-thumb I’ve come up with.
If legacy code were written in a good OO style, there would be little need for refactoring except for polishing. Since I’m generally in the code because it’s in bad shape or indecipherable, my first priority is to disentangle the classes inherently in the code.
But wait, let me back up a little. The code is bad right? Otherwise I wouldn’t be in there except for pedagogic reasons. If I were refactoring strictly for pedagogy, I’m betting that I’ll learn more refactoring really bad code then by polishing good code.
But whether for pedagogic purposes or for practical reaons, there are intertwined classes in the code. If a class has a single, simple purpose then almost by default the code will be clear and simple. The only code in such a class is directly related to it’s purpose and it’s purpose is supported by all the code in the class. When there are two entangled classes in a piece of code, they interfere with each other, creating a sort of moire pattern in my understanding of what the code is trying to do. If the interference is weak, I can still make out what is going on but if the interference is strong, it may overwhelm my attempts at understanding it.
So understanding the entanglement and the teasing out the intertwined classes is my first priority.
The interface of a class is it’s public methods/variables and their signatures. The simpler that interface, the simpler the class tends to be. Or should I say “The simpler the class, the simpler the interface”? It doesn’t really matter, the two are heavily related and any effort in one area will generally benefit the other.
The fewer the number of methods the better. The fewer the number of parameters on a method the better. Most or all of the methods should be involved when typically using the class. If there is a partition of methods, i.e. certain methods are used in some scenarios, and other methods are used in other scenarios, then the class should be split (see above).
The interaction among the public methods also affects the simplicity of the interface. A common example is the Open() method must be called before the Close() method and the Close() must be called before the Open() is called again. There is a dependency between Open() and Close() that is not apparent by just looking at the signatures of the methods. The dependency, to be made clear to the user of the class, must be documented somewhere or be so idiomatic (as it is in this case) that the user already knows it a priori.
Almost all methods in the code base should have 1 line of code (excluding guard clauses, lines with just braces on them, wrapped lines, and other syntactic necessities.) I like this rule very much because it reinforces what “merciless” really means. When I feel like I’ve done enough refactoring on a class, I double check the number of lines in each method. If there are any over 1, I justify them to myself or continue refactoring them. The rule motivates me to extract code and methods. The end result are final classes that are very simple and singular in purpose.
When I’ve extracted out whatever objects I can and I’ve refactored the code, I’ll have a system that is more OO than it was before. In general, it will not be as clean as code from the normal XP merciless-refactoring-as-you-go style of coding.
If I have a perfectly clean code base, for example the very first line of code of a brand new application I’m writing. I refactor it mercilessly. I then add two or three lines of new code and refactor mercilessly. Compare that to refactoring a method with 250 lines of code.
The mental or psychological effort is nearly trivial in the first case. Any “code smells” are readily apparent. Complicated areas of code are easily found and corrected. The effort is substantially more for the legacy code. The qualitative difference is perhaps due to mental fatigue or some sort of psychological phenomenon, but it is real and translates into more time spent performing the refactoring.
XP’s test-first, UT and FT (functional test) practices work very well in the forward direction: I start with a clean system, add clean pieces and end with a clean system. But I am going the other way. I never know and will never know that I have a clean system in the way that XP allows me to know.
Here’s an outline of the method I use to refactor legacy code. I have listed most of the activities I do; I may or may not do them in the order listed.
It is very easy to introduce bugs when refactoring legacy code. I write a handful of ATs that test the basic functionality of the entire system. I make them completely automated and invocable by one command/button. I invoke the ATs many times so I take a few extra minutes to make them easy to use:
I don’t write UTs until I’ve got a stable set of objects. I usually don’t know they’re stable until I’m nearly done, but sometimes the class just seems done. When I’ve written UTs up front, I end up rewriting the UTs as I sling the code around. This slows me down and causes me to hesitate before I change a bunch of code. This is bad.
Is there more to say? For me, a pretty-printer makes things much easier to understand. Know the keyboard shortcuts for the editor. Have multiple monitors, multiple screens. A merge/diff tool is very useful too.
It is very easy to make mistakes. I try to make very small changes at any one time. Even if it looks “obvious” the code is correct, I’ve been wrong often enough to distrust that obviousness. I do not do global replaces. If there is a code change in 10 places, I do the first in its entirety before I move onto the next place. I recompile and run the ATs at least once per each of the 10 places.
Sometimes I’ve been forced into doing a large change. I prepare for these by writing a quick backup script, e.g. copy all the source files to a backup directory. I make sure the backup script works and try rolling back to the backup versions. Then I proceed with the large change. I’ve found a source control tool has been generally too clumsy for this kind of work. The simple backup is quicker, easier to rollback and I can have multiple versions of backup. I can also use my favorite full-directory tree comparison tool at will.
I put all final (or const) variables or classes into Constants. The objects in Constant have to be constant across the lifetime of the session and generally used by the rest of the system.
The objects put into Globals have to be global for the lifetime of the session and generally used by the rest of the system. If it is not possible to initialize all Global variables, I put an Init() method in Global and call it from the mainline to initialize. I add methods to Global as needed to access the Global instances. I try to create more than just setter/getter methods. I move as much code surrounding the use of a Global instance into the new Global method as I can.
I don’t use Globals and Constants loosely. On the other hand, I don’t agonize whether or not to put an object into them. If later on I feel that something should not be global, I just extract it out.
The Globals class usually starts out as a dumping ground. I need to have a place somewhere in the system where I can quickly place methods and variables. I don’t want to slow down the work just because I can’t find a spot to put a particular method. Eventually, I find that most variables I put in this class are not globals after all.
I tend to place as much as code as I can into Globals and then extract a class from it every so often. When the refactoring is done, Globals is usually empty or nearly empty. Anything left over should be truly a global method or instance: if my entire code base uses it, it is a global. If Globals is empty, I delete it.
A good “bang for the buck” effort is to break up long methods into smaller ones based on “hey these 10 lines of code *belong* together”. I don’t worry about performance or getting the break-up right the first time. And I don’t clean up the new methods. I don’t “fix” bugs or things just yet. I just break up the code into lots of pieces. I’ll be reorganizing and fixing things in a little while anyway.
Finding these 10 lines is sometimes very easy: there is a comment heading them up explaining what they do, or there is a blank line separating groups of lines. In other words, the original programmer intuitively knew that the lines belonged together but didn’t write a method to group them. I just help the code along by extracting the method.
Generally, poorly structured code also has poorly named variables. It’s amazing what a difference just these two steps (split long methods and rename variables) can make. I find that the structure of the code becomes substantially clearer after only a couple of passes.
Look at Switches, if and else blocks, for blocks, while blocks, try/catch blocks, etc. The language blocks are a pretty straightforward sign that those lines of code belong together.
Be careful though, it isn’t necessarily a good thing to extract out all of the blocks into methods. I look for lines of code that are logically associated or “belong” together, not that just happen to be together in the same block.
Sometimes I find many similar signatures in the methods of a class. This sometimes indicates that a class can be extracted. The formal parameters of the methods are the instance variables of the new class. The extracted methods then have empty or nearly empty parameter lists.
I create the new class as soon as I can, but I take a moment to think about why those particular methods group together. I name the new class based on that relationship, that becomes the conceptualization of the class.
When I extract a new class, I immediately look for as much code that I can add to the new class, but it has to fit the conceptualization of the class. While I do this, I keep track of how the methods in the new class are fitting together. Do they, as a group, make sense under the class’s name? If I need to, I change the name of the class to better reflect what the class is now doing. If the class has gotten too complex, I’ve gone too far and I need to split the new class.
The simplest classes have one instance variable. Multiple instance variables indicate multiple classes (but not necessarily so). I look at the code surrounding instance variables of a class. If there is repetitive use with exact or similar code around the instance variable, I extract a method. If I’ve extracted several methods that all use the same instance variable, I see if I can extract out a new class.
I look for repetitive uses of a particular API call or object, e.g. InputFileStream (java). If those calls have any commonality, e.g. I open a file with InputFileStream and look for a few special lines in the file. If so, I extract out a method. If the API calls are being applied to the same sort of object, e.g. the code is only opening an ini file, I extract out the new methods into a new class, e.g. an ini file class.
If a group of public methods in a class are logically associated to one another, I extract the methods into a new class. I periodically look at the Global methods with this rule in mind. Can I partition them into logical associations? If so, I group the methods into a new class. I move as much of the Global code as I can into the new object.
Do some of the methods only interact with a few instance variables and those instance variables are only used by those methods? If so, I extract those methods and instance variables into a new class. I move as much code as possible into the new class. See if you can turn the calls to those objects inside out: instead of an object calling a method on the Global class with a set of parameters, see if you can pass the object into the Global method. If it makes sense to do so for a number of Global methods, extract those methods out into a new class.
Doing all of the above might cause you to create objects that really don’t stand alone, so you may end up merging two objects after the fact. If you find that you have split a class, merged it and then split it again, stop. Find out why you are thrashing.
While you’re creating the objects, try to develop a deeper understanding of how the application is structured. Creating and naming new objects becomes a whole lot easier if you know what the major components of the code are and how they relate to each other. You might even be able to predict the existence of future objects this way and make it a self-fulfilling prophecy.
At some point in time, you will not be able to discover any new objects using the above methods. Try some other techniques to change your perspective and to take a breather: Review Refactoring by Michael Feathers and apply all his techniques to your code.
At this point, UTs (unit tests) may be helpful. Balance this off with the effort of writing the UTs. Have someone else look at the code. If they cannot understand the intent of the code easily, refactor the code until it’s structure and intent is easily apparent to both of you.
Resist adding comments! You can however, leave class level comments to clarify why that class exists and any notes on how to use it. It is better to change the code such that the comments are not necessary, e.g.
// jump to the next date curdate += 10; curdate %= get_days_of_the_month();
refactored to:
jump_to_the_next_date(currdate);
In the second case a comment is not required.
Apply a re-engineering tool to your new code. What does the UML look like? Does it make sense? Do any of the classes look like “Controller” classes? Are there any classes that are disconnected? Do the class names make sense from this perspective? Check the top level interfaces. Can you change them to reduce the total amount of code in the system? If so, keep the new macro-architecture in mind while you refactor at a micro level. You can slowly convert to the new architecture taking small and safe steps along the way.
Also flipping back and forth from the micro- to the macro- level can give you a different perspective. Things discovered at the macro- level can give you a clue to overall program structure and therefore to new objects. Look at the overall class structure of your program.
Apply these good OO heuristics: Is the class structure flat? Reduce the hierarchy depth as much as makes sense. Is there associations between classes? Are classes as dumb to the outside as possible? Reduce associations between classes to a minimum by making each class dumb about everything outside itself. Does any class know about other classes in the rest of the program? Refactor the class so that it knows only about itself.
There are two exceptions to these rules. First is that each class lives in the operating system or environment provided libraries. I freely include those anywhere they may be useful. The other exception is the main program or a small number of control objects. These act as glue to tie together all of the tools to solve the problem at hand.
You may object that this makes it more difficult to port to a new OS or library. I would argue that a well factored application using good OO style is more important to achieve than portability. It is of no use to sacrifice code maintainability and readability in the medium term (a few years) for a long term goal that may or may not come to fruition. I think that cross-platform compatibility is overrated as a requirement although there have been tragedies because it wasn’t. Simply said, cross-platform techniques are very difficult to do and to get right. Languages like Java and Perl mitigate these problems but do not make them go away. I would say the vast majority of applications would be better off if they were done right for one platform rather than done badly for several.
Remove dead code. Find it and remove it, don’t just comment it out. Rely on your source code repository to get it back if you really needed it.
Use idempotency. Allow methods to be called multiple times and still work correctly.
Use a class to encapsulate many formal parameters. When several methods have signatures with 4 or 5 or more parameters, create a class for them. Start with an empty class except for these parameters and then start moving bits of code into them.
When you just extracted a method, immediately look for duplicate code similar to the method I just extracted and replace it with a method call.
Look for lines of code that use the same instance variable, or same temporary. Extract them into a method. -
Look to extract very simple classes even if it had only one method and one member variable. Look for additional methods to add to the class.
Break up a big method into a class. The old temporaries and formal parameters to the method become instance variables in the class. All duplication within the old method will disappear it’s contents are refactored into the new class. That done, look at making the new class an instance variable in the original class and getting rid of the old method entirely.
Look for series of methods with similar signatures; extract them into a new class. The replication within the signatures is code duplication. The class extraction gets rid of that duplication.
Look for series of methods with similar prefixes or suffixes in their names. e.g. BobInit(), BobTerm(), BobOpen(), BobClose() strongly suggest that “Bob” is in fact a class.
Check if each class has a singular purpose. A quick *indirect* measure of this: there will be very few, if any, ‘if’ statements in any one class. If not, look at moving the ‘if’s higher up in the calling chain and using Design Patterns to remove the ‘if’s.
Another quick *indirect* measure of this: there are only 1 or 2 member variables in the class. More are ok but justify their existence.
Check if there’s no more duplication. A quick *indirect* measure for this: avg # of statements per method => between 1 and 3.
If a puff of air hits your eardrum, you hear a noise. If a puff of air vibrates back and forth very quickly against your eardrum, you hear a tone. If the vibration happens to be at 440 cycles per second (440 Hertz or Hz), you hear middle A (the A above middle C on the piano).
A line on a graph can represent the vibrations of the air. The rising and falling of the line as time proceeds represents the increase and decrease in air pressure as it propagates over time (aka the signal). In audio signals, the line represents the puff of air as it approaches or recedes away from your eardrum. If the air is approaching your ear, the line is above zero. If receding it is below zero. The distance from the signal to the zero line is called the amplitude.
The Fourier Transform depends on a theory that says that a signal can be represented by a infinite summation of sine waves (also called sinusoids). To recreate any given signal, you take an infinite number of sine waves of the right kind, add their amplitudes at all times, and they will faithfully reproduce the original signal.
Each of the sinusoids has three properties that define them:
The Frequency is the number of times the sine wave crosses the zero line per second. The Amplitude is how high above or below the zero line the sine wave rises to. The Phase indicates where the sine wave was at time = 0. If it was at zero amplitude and rising it has Phase 0 by definition. If it is at it’s maximum amplitude and falling it has Phase 90 degrees (PI/2). The Phase ranges from 0 to 2PI.
The vibrations of the puff of air are continuous. That is, between any two points in time, there is another point of time where the air is doing something. On the other hand, if you wish to represent a sound in a digital way, discretely that is, you must sample the sound periodically and represent the amplitude at those times as a number, positive or negative.
The Fourier Transform (FT) takes a continuous signal (the time domain) and tells you which sinusoids are required to represent it (the frequency domain).
F(f)= ∫+∞-∞s(t) e -j 2 π f t dt
F(f) is the transform at frequency f.
s(t) is the amplitude of the signal at time t.
e is the natural exponent.
j is the square root of -1 as used by complex numbers.
The Inverse Fourier Transform (IFT) goes from the frequency domain to the time domain, that is it takes a set of sinusoids and tells you what signal they represent. The IFT is
s(t)=∫+∞-∞F(f) e j 2 π f t df
To convert Fourier Transform formula to its discrete form, the DFT, we do the following:
F[f] = 1/N Σ s[k] e -j k π f / N for k=0..N-1, for f= 0..N-1
The Inverse DFT (IDFT) is similar:
s[t] = Σ F[f] e j k π f / N for k=0..N-1, for f= 0..N-1
i.e. there’s no 1/N coefficient and the -1 sign is removed from the exponent.
The identity
ejΘ = cos(Θ) + j sin(Θ)
gets rid of the exponent term:
F[f] = 1/N Σ s[k] cos( -j k Θ f / N) + j sin( -j k Θ f / N)
for k=0..N-1, for f= 0..N-1
To convert to C++, start with the innermost and common pieces. This piece:
-j k Θ f / N
is common in two places and if we temporarily remove the varying terms:
#include <complex> using namespace std; double arg = -1 * 2.0 * M_PI / (double) num_samples; |
Where num_samples is the number of samples, i.e. N. Substitute back into the equation:
F[f] = 1/N SumOf s[k]*(cos(k * f * arg) + j*sin(k * f * arg))
for k=0..N-1, for f=0..N-1
Multiply by the signal component assuming its samples are stored in an array sig:
sig[k] * complex<double>(cos(k * f * arg), sin(k * f * arg));
If the result is held in an array result[]:
result[f] += sig[k] * complex<double>;(cos(k * f * arg), sin(k * f * arg)); |
Now add the for loops:
for (int f = 0; f < num_samples; f++) for (int k = 0; k < num_samples; k++) result[f] += sig[k] * complex<complex>(cos(k * f * arg), sin(k * f * arg)); |
The indicies range from 0 to N-1 and are integers. Because of this the multiplication k*f will take on the same values as k and f range across 0..N-1. For example if N = 4:
f k f*k 0 0 0 0 1 0 0 2 0 0 3 0 1 0 0 1 1 1 1 2 2 1 3 3 2 0 0 2 1 2 2 2 4 2 3 6 3 0 0 3 1 3 3 2 6 3 3 9
In fact, k*f only has 7 values 0, 1, 2, 3, 4, 6, 9. Here’s a table comparing N to the number of unique values generated by k*f:
#unique N values 2 2 3 4 4 7 5 10 6 15 7 19 8 26 9 31
One observation is that a large number of k and f combinations result in a 0 value. We can speed up the algorithm by detecting when f or k is zero, and therefore k*f is zero, and using these identities:
cos(0) = 1
sin(0) = 0
if (f == 0 || k == 0) result[f] += sig[k] * complex<double>(1.0, 0.0); else result[f] += sig[k] * complex<double>(cos(k * f * arg), sin(k * f * arg)); |
Further increases in speed can be found by using the Fast Fourier Transform (see links below).
And finally divide the result by the number of samples:
result[f] /= num_samples; |
The whole function looks like this:
void dft() { double arg = -1 * 2.0 * M_PI / (double) num_samples; for (int f = 0; f < num_samples; f++) { for (int k = 0; k < num_samples; k++) { if (f == 0 || k == 0) result[f] += sig[k] * complex<double>(1.0, 0.0); else result[f] += sig[k] * complex<double>(cos(k * f * arg), sin(k * f * arg)); } result[f] /= num_samples; } } const double sample_rate; // samples / second const double sample_length; // in seconds const int num_samples; double* sig; complex<double>* result; |
The value of the zeroth term result[0] is the amplitude of the bias or DC (direct current) component, to use electronic terms. It is the offset above or below zero that the signal’s vibrates around. If the signal is all pure sine waves, the offset is zero and result[0] will be 0.0. But if the signal is offset slightly then result[0] will be non-zero.
The remainder of the array is information about the frequencies in the original signal. The frequency information is duplicated so only the first half of the array needs to be examined.
For each array item:
Note that the frequencies in the result array are an arithmetic progression. Each index i is related to i+1 by adding a constant value to it. The relationship between the input signal samples and the resulting frequencies in the result array is via the sampling rate. For example:
sample_rate = 3000
num_samples = 3000
bias: 0.4
freq: 110.00 ampl: 10.0000 phase: 1.1000
freq: 220.00 ampl: 20.0000 phase: 1.2000
freq: 330.00 ampl: 30.0000 phase: 1.3000
freq: 440.00 ampl: 40.0000 phase: 4.4000
The Bias, Frequency, Amplitude and Phase are:
double bias() { return result[0].real(); } double frequency(int i) { return (i * sample_rate) / num_samples; } double amplitude(int i) { return 2.0 * ::sqrt(result[i].real() * result[i].real() + result[i].imag() * result[i].imag()); } double phase(int i) { return (2.0 * ::atan(result[i].imag() / result[i].real())) + M_PI; } |
The first step to test the DFT is to generate known sine waves and then apply the routine to see if we get the same result back.
This code sets up an array of samples:
Samples(double r, double l) : sample_rate(r), sample_length(l), num_samples(r * l) { // allocate the result and signal arrays result = new complex<double> [num_samples]; sig = new double[num_samples]; // initialize signal array for (int i = 0; i < num_samples; ++i) { sig[i] = 0.0; } } // ... const double sample_rate; // samples / second const double sample_length; // in seconds const int num_samples; double* sig; complex<double>* result; }; |
And this code is used to populate the sig array with samples with a known bias, amplitude and bias:
void addSine(double frequency, double ampl, double phase, double bias) { if (sample_rate < 2.0 * frequency) { cout << "The sample_rate has to be at 2 times the highest frequency in the signal" << endl; return; } double omega = 2.0 * M_PI * frequency; double convertedPhase = phase / 2.0; // fill the signal array with samples of the sine wave for (int i = 0; i < num_samples; ++i) { // calculate the instantaneous time double t = (double) i / sample_rate; sig[i] += bias + (ampl * ::sin((omega * t) + convertedPhase)); } } |
Note the warning message. If the sampling rate is less than (2 * highest frequency) the DFT will not work correctly.
Also note the phase is divided by 2.0. This was done because of limitations in the sin() and atan() functions. The 2.0 factor is adjusted in the DFT code.
The main line code looks like:
Samples s(3000.0, 1.0); s.addSine(110.0, 10.0, 1.1, 0.1); s.addSine(220.0, 20.0, 1.2, 0.1); s.addSine(330.0, 30.0, 1.3, 0.1); s.addSine(440.0, 40.0, 4.4, 0.1); |
This create an array of 1024 samples for a 0.5 second long sound to get 512 samples. There are four frequency 110, 220, 330 and 440Hz. Each frequency had a bias of 0.1. The amplitudes and phases were chosen to make the numbers unique.
Applying the DFT is simply:
s.dft(); |
The result is printed out like so:
cout << "sample_rate = " << s.sample_rate << endl; cout << "num_samples = " << s.num_samples << endl; cout << "bias: " << s.bias() << endl; for (int i = 1; i < s.num_samples / 2; ++i) { double ampl = s.amplitude(i); if (ampl > 0.00001) { cout << setiosflags(ios::fixed) << "freq: " << setw(7) << setprecision(2) << s.frequency(i) << " ampl: " << setw(8) << setprecision(4) << ampl << " phase: " << s.phase(i) << endl; } } |
The check for the amplitude causes only significant frequencies to be displayed.
Here is the output:
sample_rate = 1024 num_samples = 512 bias: 0.4 freq: 110 ampl: 0.1 phase: 1.1 freq: 220 ampl: 0.2 phase: 1.2 freq: 330 ampl: 0.3 phase: 1.3 freq: 440 ampl: 0.4 phase: 4.4 |
The Bias is 0.4 which is a sum of the four 0.1 biases of each frequency. The Frequencies, Amplitudes and Phases all match with the generated samples.
http://arrizza.org/public-hg/arrizza.org-src/file/tip/jTestDft
The source code above can be found here:
http://www.arrizza.com/downloads/signalprocessing/signalprocessing.html
A good reference for the DFT is here:
http://astronomy.swin.edu.au/~prourke/analysis/dft
An article outlining the use of utcpp and utjava (both superseded by utx) utarticle.
The key points covered:
If you’ve worked with ISO9000 and the like, you know there is a Document Control Librarian. This person is responsible for handing out and maintaining “official” copies of documents as required by ISO. If you want to get access to a document, you go to the Librarian and request a copy.
The Librarian doesn’t know very much at all about the documents he’s handing out. Someone else created the document and gave it to him. He just knows a very few things:
From time to time, he’s given updates to a document. Again, he doesn’t know what the updates are or what they mean, he just knows that he now hands out copies of the new version.
You don’t know who gave the original copy to the Librarian. It may have been an outside consultant, it could be another Engineer in your company, or it may have even been you. You don’t know how that original copy was created in the first place. It may be a very complicated document requiring elaborate steps to create or it took many years to create.
You can’t ask the librarian for a slightly different version of a document, he won’t know what you’re talking about. However, the copy you received might give you a way to modify itself, perhaps with some fill-in-the-blank pages.
| Document Control Librarian | => | PrototypeManager class |
| the documents | => | the Prototypical classes |
| the photocopier | => | the Clone() method(s) on all the prototype classes |
| document created elsewhere | => | the construction of the prototype class is done by another class |
In contrast, a Factory RWE might be McDonalds. If you need a hamburger, you go to the local McDonalds to get one. You don’t go to the local bakery because they don’t make hamburgers there. You somehow magically know that hamburgers are created at McDonalds.
McDonalds builds the hamburger on the spot. They don’t, for example, have Ray Kroc’s first hamburger, ask it to copy itself and give you the copy. And you, as the customer, don’t have to know what’s in the secret sauce (1000 Island dressing).
The people who work at McDonalds have some knowledge about constructing hamburgers but can only generate what’s on the menu. You can tailor your hamburger on the spot, e.g. “Hold the pickles”. But you have to know what you can and can’t change about the hamburger, with a little help from the McDonald’s staff and menu, e.g. you have to know what “Supersize” means, and you can’t ask for Adovo sauce instead of secret sauce.
The people who work at McDonald’s don’t really create the hamburgers, they just assemble it according to instructions given to them from McDonald’s HQ. The “hamburger creation knowledge” is not at the local McDonald’s, just the ingredients and the instructions.
| McDonalds | => | the Factory class |
| the hamburger | => | the object instance |
| the staff | => | the Factory method used to create the object instance |
| McDonald’s HQ | => | the actual class constructor(s) |
Pair Drawing game based on Joshua Kierevsky’s http://industriallogic.com/games/pairdraw.html
Materials
Introduce the Game (5 minutes or less)
Pair up (5 minutes or less)
Discussion (30 – 45 minutes)
Change Partners (5 minutes or less)
Discussion (20 minutes or more)
Pair Programming (15 minutes or more)
Brain Storming (20 minutes or more)
(from ExtremeProgrammingForOne )
Code Division Multiple Access. CDMA developers group
CDMA is a “spread spectrum” technology, which means that it spreads the information contained in a particular signal of interest over a much greater bandwidth than the original signal.
First and second generation. Different types of cellular systems employ various methods of multiple access. The traditional first generation (1G) analog cellular systems, such as those based on the Advanced Mobile Phone Service (AMPS) and Total Access Communications System (TACS) standards, use Frequency Division Multiple Access (FDMA). FDMA channels are defined by a range of radio frequencies, usually expressed in a number of kilohertz (kHz), out of the radio spectrum.
For example, AMPS systems use 30 kHz “slices” of spectrum for each channel. Narrowband AMPS (NAMPS) requires only 10 kHz per channel. TACS channels are 25 kHz wide. With FDMA, only one subscriber at a time is assigned to a channel. No other conversations can access this channel until the subscriber’s call is finished, or until that original call is handed off to a different channel by the system.
CDMA is a “spread spectrum” technology, which means that it spreads the information contained in a particular signal of interest over a much greater bandwidth than the original signal.
A CDMA call starts with a standard rate of 9600 bits per second (9.6 kilobits per second). This is then spread to a transmitted rate of about 1.23 Megabits per second. Spreading means that digital codes are applied to the data bits associated with users in a cell. These data bits are transmitted along with the signals of all the other users in that cell. When the signal is received, the codes are removed from the desired signal, separating the users and returning the call to a rate of 9600 bps.
Traditional uses of spread spectrum are in military operations. Because of the wide bandwidth of a spread spectrum signal, it is very difficult to jam, difficult to interfere with, and difficult to identify. This is in contrast to technologies using a narrower bandwidth of frequencies. Since a wideband spread spectrum signal is very hard to detect, it appears as nothing more than a slight rise in the “noise floor” or interference level. With other technologies, the power of the signal is concentrated in a narrower band, which makes it easier to detect.
CDMA cell coverage is dependent upon the way the system is designed. In fact, three primary system characteristics-Coverage, Quality, and Capacity-must be balanced off of each other to arrive at the desired level of system performance. In a CDMA system these three characteristics are tightly inter-related. Even higher capacity might be achieved through some degree of degradation in coverage and/or quality. Since these parameters are all intertwined, operators cannot have the best of all worlds: three times wider coverage, 40 times capacity, and “CD” quality sound. For example, the 13 kbps vocoder provides better sound quality, but reduces system capacity as compared to an 8 kbps vocoder.
The name of the .cls file is important: the progid for that class is made up: projectname.classname.XX
Private Sub Class_Initialize()
'body
End Sub
Private Sub Class_Terminate()
'body
End Sub
Class_initialize() corresponds to the FinalInitialize() in COM/ATL. It is called once, when the first reference to this component is created.
Class_terminate() corresponds to the FinalTerminate() in COM/ATL. It is called once, when the reference count goes to 0.
dim XX as AReferenceGoesHere
set XX = new AReferenceGoesHere
The ‘Set’ is required. Intellisense works.
if abc = 21 then
abc = 22
ElseIf abc = 22 Then
abc = 23
end if
dim i
for i = 1 to 10
'loop body
next i
Do While Not aValue
' loop body
loop
For Each Child In Children
'loop body
Next
Exit Sub
Exit Function
Sub xx
On Error GoTo ErrHandler
'sub body here
Exit Sub
ErrHandler:
'error handler code here
errNbr = Err.Number 'holds the error id
errDescr = Err.Description 'a string explaining the error
End Sub
This causes any errors to set the Err global variable and continue on.
On Error Resume Next
This causes error handling to revert to the system default (i.e. put up a msg box.
On Error GoTo 0
UCase$("abc") ' returns "ABC"
Right$("ABC", 2) ' returns "BC"
Left$("ABC", 1) ' returns "A"
Len$("ABC") ' returns 3
Trim(" ABC ") ' returns "ABC"
Mid("ABC", 1, 1) ' returns "B"
CInt("123") ' returns an integer 123
CStr(123) ' returns a string "123"
CSng("1.23") ' returns a single (C++ => float)
Public Property Let CustomerID(RHS As String)
mLocalVrbl = RHS
End Property
Public Property Get CustomerID() As Variant
CustomerID = mLocalVrbl
End Property
private sub XX(ByVal theIndex As String, ByRef theValue As String)
All parameters are passed by reference. The ByVal is default and is not required.ByVal indicates that the reference is is a constant, e.g. it is an [in] parameter. The value contained in theIndex is not constant, it can be changed. However Strings are immutable and so the caller sees no changes. If theIndex was an object, it can be changed by XX.
ByRef indicates that it is an [out] parameter. The reference can be changed. If theValue points to a new string, the caller sees the change.
Dim fs As Scripting.FileSystemObject
Set fs = New Scripting.FileSystemObject
If Not fs.FolderExists(sBuildPath) Then
fs.CreateFolder (sBuildPath & "\")
endif
Dim aFlatFile As Scripting.TextStream
Set aFlatFile = fs.CreateTextFile(sFileName, True)
aFlatFile.WriteLine ("abc" & Chr(9) & "def")
aFlatFile.Close
MsgBox mAvariable