Thursday, June 26, 2008

Combinations of a String using recursion.


Implement a function that prints all possible combinations of the characters in a string. These combinations range in length from one to the length of the string. Two combinations that differ only in ordering of their characters are the same combination. In other words, “12” and “31” are different combinations from the input string “123”, but “21” is the same as “12”.

This is a companion problem to finding the permutations of the characters in a string. If you haven’t yet worked through that problem, you may want to do so before you tackle this one.

Following the model of the solution to the permutation problem, try working out an example by hand and see where that gets you. Because you are trying to divine an algorithm from the example, you again need to be systematic in your approach. You might try listing combinations in order of length. The input string “wxyz” is used in the example. Because the ordering of letters within each combination is arbitrary, they are kept in the same order as they are in the input string to minimize confusion.

w

wx

wxy

wxyz

x

wy

wxz

y

wz

wyz

z

xy

xyz

xz

yz

Open table as spreadsheet

Some interesting patterns seem to be emerging, but there’s nothing clear yet, certainly nothing that seems to suggest an algorithm. Listing output in terms of the order of the input string (alphabetical order, for this input string) turned out to be helpful in the permutation problem. Try rearranging the combinations you generated and see if that’s useful here.

w

x

y

z

wx

xy

yz

wxy

xyz

wxyz

xz

wxz

wy

wyz

wz

Open table as spreadsheet

This looks a little more productive. There is a column for each letter in the input string. The first combination in each column is a single letter from the input string. The remainder of each column’s combinations consist of that letter prepended to each of the combinations in the columns to the right. Take, for example, the “x” column. This column has the single letter combination “x”. The columns to the right of it have the combinations “y”, “yz”, and “z”, so if you prepend “x” to each of these combinations you find the remaining combinations in the “x” column: “xy”, “xyz”, and “xz”. You could use this rule to generate all of the combinations, starting with just “z” in the rightmost column and working your way to the left, each time writing a single letter from the input string at the top of the column and then completing the column with that letter prepended to each of the combinations in columns to the right. This is a recursive method for generating the combinations. It is space inefficient because it requires storage of all previously generated combinations, but it indicates that this problem can be solved recursively. See if you can gain some insight on a more-efficient recursive algorithm by examining the combinations you’ve written a little more closely.

Look at which letters appear in which positions. All four letters appear in the first position, but “w” never appears in the second position. Only “y” and “z” appear in the third position, and “z” is in the fourth position in the only combination that has a fourth position (“wxyz”). Therefore, a potential algorithm might involve iterating through all allowable letters at each position: w–z in the first position, x–z in the second position, and so on. Check this idea against the example to see if it works: It seems to successfully generate all the combinations in the first column. However, when you select “x” for the first position, this candidate algorithm would start with “x” in the second position, generating an illegal combination of “xx”. Apparently the algorithm needs some refinement.

To generate the correct combination “xy”, you really need to begin with “y”, not “x”, in the second position. When you select “y” for the first position (third column), you need to start with “z” because “yy” is illegal and “yx” and “yw” have already been generated as “xy” and “wy”. This suggests that in each output position you need to begin iterating with the letter in the input string following the letter selected for the preceding position in the output string. Call this letter your input start letter.

It may be helpful to summarize this a little more formally. You need to track the output position as well as the input start position. Begin with the first position as the output position, and the first character of the input as the input start position. For a given position, sequentially select all letters from the input start position to the last letter in the input string. For each letter you select, print the combination and then generate all other combinations beginning with this sequence by recursively calling the generating function with the input start position set to the next letter after the one you’ve just selected and the output position set to the next position. You should check this idea against the example to make sure it works. It does - no more problems in the second column. Before you code, it may be helpful to outline the algorithm just to make sure you have it.


Tip

For comparison, the performance side of the arms-length recursion style/performance trade-off discussed in the permutation problem was chosen. The performance and style differences between the two possible approaches are not as dramatic for the combination algorithm as they were for the permutation algorithm.

For each letter from input start position to end of input string
Select the letter into the current position in output string
Print letters in output string
If the current letter isn't the last in the input string
Generate remaining combinations starting at next position with iteration starting
at next letter beyond the letter just selected

After all that hard work, the algorithm looks pretty simple! You’re ready to code it. Here’s a C# version:

void combine( string str ){
int length = str.Length;
char[] instr = str.ToCharArray();
StringBuilder outstr = new StringBuilder();
doCombine( instr, outstr, length, 0, 0 );
}

void doCombine( char[] instr, StringBuilder outstr, int length,
int level, int start ){
for( int i = start; i < length; i++ ){
outstr.Append( instr[i] );
Console.WriteLine( outstr );

if( i < length - 1 ){
doCombine( instr, outstr, length, level + 1, i + 1 );
}

outstr.Length = outstr.Length - 1;
}
}

This solution is sufficient in most interviews. Nevertheless, you can make a rather minor optimization to doCombine that eliminates the if statement. Given that this is a recursive function, the performance increase is probably negligible compared to the function call overhead, but you might want to see if you can figure it out just for practice.

You can eliminate the if statement by removing the final iteration from the loop and moving the code it would have executed during that final iteration outside the loop. The general case of this optimization is referred to as loop partitioning, and if statements that can be removed by loop partitioning are called loop index dependent conditionals. Again, this optimization doesn’t make much difference here, but it can be important inside nested loops.

No comments: