問題描述
這是我朋友的一個編程課上的一個問題.
This was a question in one my friend's programming class.
問.如何對 int
數組進行排序,然后排列它們,使所有重復的元素都出現在數組的末尾?
Q. How do you sort an array of int
s and then arrange them such that all duplicate elements appear at the end of the array?
例如,給定輸入
{5, 2, 7, 6, 1, 1, 5, 6, 2}
輸出將是
{1, 2, 5, 6, 7, 1, 2, 5, 6}
注意,數字是排序的,重復的數字在7之后,這是數組中的最大值.
Note that the numbers are sorted and duplicate numbers are after 7, which is the maximum in the array.
這必須在不使用任何 Java 庫包/工具的情況下實現.
我建議先使用插入或冒泡排序對數組進行排序,然后遍歷數組,執行如下操作:
I suggested to sort the array first using insertion or bubble sort, and then go over the array, perform something like the following :
for (int i = 0; i < nums.length - 2; i++) {
for (int j = i + 1; j < nums.length; j++) {
//current and next are same, move elements up
//and place the next number at the end.
if (nums[i] == nums[j]) {
int temp = nums[j];
for (int k = j; k < nums.length - 1; k++) {
nums[k] = nums[k + 1];
}
nums[nums.length - 1] = temp;
break;
}
}
}
我后來自己嘗試了這個(上面的代碼就是這樣) - 當我嘗試這個時,我認為這可以通過使用更少的代碼來實現,效率更高.可能是我給出了錯誤的建議.
I tried this myself later (and that is how the code above) - As I try this out, I think this could be achieved by using less code, be more efficiently. And may be I gave a wrong advice.
有什么想法嗎?
推薦答案
根據你的問題的參數,有很多方法可以解決這個問題.
Depending on the parameters of your problem, there are many approaches to solving this.
如果不允許使用 O(n) 外部存儲器,那么一種選擇是使用標準排序算法在 O(n log n) 時間內對數組進行就地排序,然后在它上面運行第二遍以將重復項移動到最后(如您所建議的那樣).您在上面發布的代碼需要 O(n2) 時間,但我認為這一步可以使用稍微復雜的算法在 O(n log n) 時間內完成.這個想法分兩步起作用.在第一步中,在 O(n log n) 時間內,您將所有非重復元素按排序順序放在前面,并將所有重復元素按非排序順序放在后面.完成后,您可以使用第一步中的排序算法在 O(n log n) 時間內對數組的后半部分進行排序.
If you are not allowed to use O(n) external memory, then one option would be to use a standard sorting algorithm to sort the array in-place in O(n log n) time, then to run a second pass over it to move the duplicates to the end (as you've suggested). The code you posted above takes O(n2) time, but I think that this step can be done in O(n log n) time using a slightly more complicated algorithm. The idea works in two steps. In the first step, in O(n log n) time you bring all non-duplicated elements to the front in sorted order and bring all the duplicates to the back in non-sorted order. Once you've done that, you then sort the back half of the array in O(n log n) time using the sorting algorithm from the first step.
我不打算進入對數組進行排序的代碼.我真的很喜歡排序,但是關于如何對數組進行就地排序還有很多其他好的資源,所以我在這里沒有很好地利用我的時間/空間來進入它們.如果有幫助,這里是 堆排序、快速排序和smoothsort,所有這些都在 O(n log n) 時間內運行.堆排序和平滑排序僅使用 O(1) 外部內存,而快速排序在最壞的情況下可以使用 O(n)(盡管好的實現可以使用可愛的技巧將其限制為 O(log n)).
I'm not going to go into the code to sort the array. I really love sorting, but there are so many other good resources on how to sort arrays in-place that it's not a good use of my time/space here to go into them. If it helps, here's links to Java implementations of heapsort, quicksort, and smoothsort, all of which runs in O(n log n) time. Heapsort and smoothsort use only O(1) external memory, while quicksort can use O(n) in the worst case (though good implementations can limit this to O(log n) using cute tricks).
有趣的代碼是將所有非重復元素帶到范圍前面的邏輯.直觀地說,代碼通過存儲兩個指針來工作——一個讀指針和一個寫指針.讀指針指向下一個要讀取的元素,而寫指針指向應該放置下一個唯一元素的位置.例如,給定這個數組:
The interesting code is the logic to bring all the non-duplicated elements to the front of the range. Intuitively, the code works by storing two pointers - a read pointer and a write pointer. The read pointer points to the next element to read, while the write pointer points to the location where the next unique element should be placed. For example, given this array:
1 1 1 1 2 2 3 4 5 5
我們從最初指向 1 的讀寫指針開始:
We start with the read and write pointers initially pointing at 1:
write v
1 1 1 1 2 2 3 4 5 5
read ^
接下來,我們將讀取指針向前跳過到下一個不是 1 的元素.這會找到 2:
Next, we skip the read pointer ahead to the next element that isn't 1. This finds 2:
write v
1 1 1 1 2 2 3 4 5 5
read ^
然后,我們將寫指針撞到下一個位置:
Then, we bump the write pointer to the next location:
write v
1 1 1 1 2 2 3 4 5 5
read ^
現在,我們將 2 交換到寫指針所持有的位置:
Now, we swap the 2 into the spot held by the write pointer:
write v
1 2 1 1 1 2 3 4 5 5
read ^
將讀取指針前進到下一個不是 2 的值:
advance the read pointer to the next value that isn't 2:
write v
1 2 1 1 1 2 3 4 5 5
read ^
然后推進寫指針:
write v
1 2 1 1 1 2 3 4 5 5
read ^
再次,我們交換'read'和'write'指向的值,并將寫指針向前移動,然后將讀指針移動到下一個唯一值:
Again, we exchange the values pointed at by 'read' and 'write' and move the write pointer forward, then move the read pointer to the next unique value:
write v
1 2 3 1 1 2 1 4 5 5
read ^
再次收獲
write v
1 2 3 4 1 2 1 1 5 5
read ^
最后的迭代給出了
write v
1 2 3 4 5 2 1 1 1 5
read ^
如果我們現在從寫指針到讀指針排序,我們得到
If we now sort from the write pointer to the read pointer, we get
write v
1 2 3 4 5 1 1 1 2 5
read ^
還有賓果游戲!我們已經得到了我們正在尋找的答案.
and bingo! We've got the answer we're looking for.
在(未經測試,抱歉...)Java 代碼中,此修復步驟可能如下所示:
In (untested, sorry...) Java code, this fixup step might look like this:
int read = 0;
int write = 0;
while (read < array.length) {
/* Swap the values pointed at by read and write. */
int temp = array[write];
array[write] = array[read];
array[read] = temp;
/* Advance the read pointer forward to the next unique value. Since we
* moved the unique value to the write location, we compare values
* against array[write] instead of array[read].
*/
while (read < array.length && array[write] == array[read])
++ read;
/* Advance the write pointer. */
++ write;
}
該算法在 O(n) 時間內運行,這導致該問題的整體算法為 O(n log n).由于重新排序步驟使用 O(1) 內存,因此總體內存使用量將是 O(1)(對于平滑排序或堆排序等)或 O(log n)(對于快速排序等).
This algorithm runs in O(n) time, which leads to an overall O(n log n) algorithm for the problem. Since the reordering step uses O(1) memory, the overall memory usage would be either O(1) (for something like smoothsort or heapsort) or O(log n) (for something like quicksort).
在與朋友討論后,我認為基于快速排序的修改有一個更優雅的問題解決方案.通常,當您運行快速排序時,您最終會將數組劃分為三個區域:
After talking this over with a friend, I think that there is a much more elegant solution to the problem based on a modification of quicksort. Typically, when you run quicksort, you end up partitioning the array into three regions:
+----------------+----------------+----------------+
| values < pivot | values = pivot | values > pivot |
+----------------+----------------+----------------+
然后遞歸對第一個和最后一個區域進行排序,以將它們排序.但是,我們可以針對我們的問題版本進行修改.我們需要 rotation 算法作為原始算法,它在一個數組中獲取兩個相鄰的值塊,并在 O(n) 時間內交換它們.它不會改變這些塊中元素的相對順序.例如,我們可以使用旋轉來轉換數組
The recursion then sorts the first and last regions to put them into sorted order. However, we can modify this for our version of the problem. We'll need as a primitive the rotation algorithm, which takes two adjacent blocks of values in an array and exchanges them in O(n) time. It does not change the relative order of the elements in those blocks. For example, we could use rotation to convert the array
1 2 3 4 5 6 7 8
進入
3 4 5 6 7 8 1 2
并且可以在 O(n) 時間內完成.
and can do so in O(n) time.
快速排序的修改版本可以通過使用 Bentley-McIlroy 三路分區算法(描述 here) 使用 O(1) 額外空間,將數組元素重新排列為上面顯示的配置.接下來,我們應用旋轉來重新排序元素,使它們看起來像這樣:
The modified version of quicksort would work by using the Bentley-McIlroy three-way partition algortihm (described here) to, using O(1) extra space, rearrange the array elements into the configuration shown above. Next, we apply a rotation to reorder the elements so that they look like this:
+----------------+----------------+----------------+
| values < pivot | values > pivot | values = pivot |
+----------------+----------------+----------------+
接下來,我們執行交換,以便將樞軸元素的一個副本準確地移動到至少與樞軸一樣大的元素集合中.這可能在后面有額外的樞軸副本.然后我們遞歸地將排序算法應用于 <和 > 范圍.當我們這樣做時,結果數組將如下所示:
Next, we perform a swap so that we move exactly one copy of the pivot element into the set of elements at least as large as the pivot. This may have extra copies of the pivot behind. We then recursively apply the sorting algorithm to the < and > ranges. When we do this, the resulting array will look like this:
+---------+-------------+---------+-------------+---------+
| < pivot | dup < pivot | > pivot | dup > pivot | = pivot |
+---------+-------------+---------+-------------+---------+
然后我們對范圍應用兩次旋轉以將其放入最終順序.首先,使用大于樞軸的值旋轉小于樞軸的重復值.這給了
We then apply two rotations to the range to put it into the final order. First, rotate the duplicate values less than the pivot with the values greater than the pivot. This gives
+---------+---------+-------------+-------------+---------+
| < pivot | > pivot | dup < pivot | dup > pivot | = pivot |
+---------+---------+-------------+-------------+---------+
此時,第一個范圍是升序排列的唯一元素:
At this point, this first range is the unique elements in ascending order:
+---------------------+-------------+-------------+---------+
| sorted unique elems | dup < pivot | dup > pivot | = pivot |
+---------------------+-------------+-------------+---------+
最后,對大于樞軸的重復元素和等于樞軸的元素進行最后一次旋轉,得到這個:
Finally, do one last rotation of the duplicate elements greater than the pivot and the elements equal to the pivot to yield this:
+---------------------+-------------+---------+-------------+
| sorted unique elems | dup < pivot | = pivot | dup > pivot |
+---------------------+-------------+---------+-------------+
請注意,最后三個塊只是排序后的重復值:
Notice that these last three blocks are just the sorted duplicate values:
+---------------------+-------------------------------------+
| sorted unique elems | sorted duplicate elements |
+---------------------+-------------------------------------+
瞧!我們已經按照我們想要的順序得到了一切.使用與普通快速排序相同的分析,再加上我們在每個級別只做 O(n) 的工作(三個額外的旋轉),這在最好的情況下可以達到 O(n log n)O(log n) 內存使用量.在 O(log n) 內存的最壞情況下仍然是 O(n2),但這種情況發生的概率極低.
and voila! We've got everything in the order we want. Using the same analysis that you'd do for normal quicksort, plus the fact that we're only doing O(n) work at each level (three extra rotations), this works out to O(n log n) in the best case with O(log n) memory usage. It's still O(n2) in the worst case with O(log n) memory, but that happens with extremely low probability.
如果允許您使用 O(n) 內存,一種選擇是從所有存儲鍵/值對的元素中構建一個平衡的二叉搜索樹,其中每個鍵是數組的一個元素,值是它出現的次數.然后,您可以按以下格式對數組進行排序:
If you are allowed to use O(n) memory, one option would be to build a balanced binary search tree out of all of the elements that stores key/value pairs, where each key is an element of the array and the value is the number of times it appears. You could then sort the array in your format as follows:
- 對于數組中的每個元素:
- 如果該元素已存在于 BST 中,則增加其計數.
- 否則,向 BST 添加一個新節點,該節點的計數為 1.
這個算法的運行時間是 O(n log n),但是從頭開始編寫 BST 會非常棘手.它還需要外部空間,我不確定你是否可以這樣做.
The runtime of this algorithm is O(n log n), but it would be pretty tricky to code up a BST from scratch. It also requires external space, which I'm not sure you're allowed to do.
但是,如果您被允許使用外部空間并且您正在排序的數組很小并且包含小整數,您可以通過使用修改后的 計數排序.只需將 BST 替換為一個足夠大的數組,以使原始數組中的每個整數都成為鍵.這將運行時間減少到 O(n + k),內存使用 O(k),其中 k 是數組中的最大元素.
However, if you are allowed external space and the arrays you are sorting are small and contain small integers, you could modify the above approach by using a modified counting sort. Just replace the BST with an array large enough for each integer in the original array to be a key. This reduces the runtime to O(n + k), with memory usage O(k), where k is the largest element in the array.
希望這會有所幫助!
這篇關于在將重復項移動到末尾時對數組進行排序?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!