Thursday, June 26, 2008

Permutations of a String using recursion.

Implement a routine that prints all possible orderings of the characters in a string. In other words, print all permutations that use all the characters from the original string. For example, given the string “hat”, your function should print the strings “tha”, “aht”, “tah”, “ath”, “hta”, and “hat”. Treat each character in the input string as a distinct character, even if it is repeated. Given the string “aaa”, your routine should print “aaa” six times. You may print the permutations in any order you choose.

Manually permuting a string is a relatively intuitive process, but describing an algorithm for the process can be difficult. In a sense, the problem here is like being asked to describe how you tie your shoes: You know the answer, but you probably still have to go through the process a few times to figure out what steps you’re taking.

Try applying that method to this problem: Manually permute a short string and try to reverse-engineer an algorithm out of the process. Take the string “abcd” as an example. Because you’re trying to construct an algorithm from an intuitive process, you want to go through the permutations in a systematic order. Exactly which systematic order you use isn’t terribly important - different orders are likely to lead to different algorithms, but as long as you’re systematic about the process you should be able to construct an algorithm. You want to choose a simple order that will make it easy to identify any permutations that you might accidentally skip.

You might consider listing all the permutations in alphabetical order. This means the first group of permutations will all start with “a”. Within this group, you first have the permutations with a second letter of “b”, then “c”, and finally “d”. Continue in a like fashion for the other first letters.

abcd

bacd

cabd

dabc

abdc

badc

cadb

dacb

acbd

bcad

cbad

dbac

acdb

bcda

cbda

dbca

adbc

bdac

cdab

dcab

adcb

bdca

cdba

dcba

Open table as spreadsheet

Before you continue, make sure you didn’t miss any permutations. Four possible letters can be placed in the first position. For each of these four possibilities, there are three remaining possible letters for the second position. Thus, there are 4×3 = 12 different possibilities for the first two letters of the permutations. Once you’ve selected the first two letters, two different letters remain available for the third position, and the last remaining letter is put in the fourth position. If you multiply 4×3×2×1 you have a total of 24 different permutations; there are 24 permutations in the previous list, so nothing has been missed.

This calculation can be expressed more succinctly as 4! - you may recall that n! is the number of possible arrangements of n objects.

Now examine the list of permutations for patterns. The rightmost letters vary faster than the leftmost letters. For each letter that you choose for the first (leftmost) position, you write out all the permutations beginning with that letter before you change the first letter. Likewise, once you’ve picked a letter for the second position, you write out all permutations beginning with this two-letter sequence before changing the letters in either the first or second position. In other words, you can define the permutation process as picking a letter for a given position and performing the permutation process starting at the next position to the right before coming back to change the letter you just picked. This sounds like the basis for a recursive definition of permutation. Try to rephrase it in explicitly recursive terms: To find all permutations starting at position n, successively place all allowable letters in position n, and for each new letter in position n find all permutations starting at position n + 1 (the recursive case). When n is greater than the number of characters in the input string, a permutation has been completed; print it and return to changing letters at positions less than n (the base case).

You almost have an algorithm; you just need to define “all allowable letters” a little more rigorously. Because each letter from the input string can appear only once in each permutation, “all allowable letters” can’t be defined as every letter in the input string. Think about how you did the permutations manually. For the group of permutations beginning with “b”, you never put a “b” anywhere but the first position because when you selected letters for later positions, “b” had already been used. For the group beginning “bc” you used only “a” and “d” in the third and fourth positions because both “b” and “c” had already been used. Therefore, “all allowable letters” means all letters in the input string that haven’t already been chosen for a position to the left of the current position (a position less than n). Algorithmically, you could check each candidate letter for position n against all the letters in positions less than n to determine whether it had been used. You can eliminate these inefficient scans by maintaining an array of Boolean values corresponding to the positions of the letters in the input string and using this array to mark letters as used or unused, as appropriate.

In outline form, this algorithm looks like the following:

If you're past the last position
Print the string
Return
Otherwise
For each letter in the input string
If it's marked as used, skip to the next letter
Else place the letter in the current position
Mark the letter as used
Permute remaining letters starting at current position + 1
Mark the letter as unused

Tip

Separating the base case from the recursive case as performed here is considered good style and may make the code easier to understand, but it does not provide optimum performance. You can significantly optimize the code by invoking the base case directly without a recursive call if the next recursive call invokes the base case. In this algorithm, that involves checking whether the letter just placed was the last letter - if so, you print the permutation and make no recursive call; otherwise, a recursive call is made. This eliminates n! function calls, reducing the function call overhead by approximately a factor of n (where n is the length of the input string). Short-circuiting the base case in this manner is called arms-length recursion and is considered poor style, especially in academic circles. Whichever way you choose to code the solution, it is worthwhile to mention the advantages of the alternate approach to your interviewer.

Here’s a Java version of this algorithm:

void permute( String str ){
int length = str.length();
boolean[] used = new boolean[ length ];
StringBuffer out = new StringBuffer();
char[] in = str.toCharArray();

doPermute( in, out, used, length, 0 );
}

void doPermute( char[] in, StringBuffer out,
boolean[] used, int length, int level ){
if( level == length ){
System.out.println( out.toString() );
return;
}

for( int i = 0; i < length; ++i ){
if( used[i] ) continue;

out.append( in[i] );
used[i] = true;
doPermute( in, out, used, length, level + 1 );
used[i] = false;
out.setLength( out.length() - 1 );
}
}

Structurally, this function uses a wrapper method called permute to allocate two arrays, one for the flags and one to hold the input characters, and a StringBuffer for building the output string. Then it calls the recursive portion, doPermute, to actually do the permutation. The characters are appended to the StringBuffer as appropriate. When the recursive call returns, the last character is dropped simply by reducing the buffer’s length.

1 comment:

Sake Sam said...

Post your sources. I think I know where you yanked this from.