The sorting algorithm is used to sort a series of data into a specific sequence. For example, the sorting algorithm is used to sort a school’s exam scores or a company’s data reports in descending order. Common sorting algorithms include bubble sort, quicksort, dictionary sort, merge sort, and heap sort. Merge sort is a stable and efficient sorting algorithm, and the time complexity is N*log2N. This document uses the Go merge sort algorithm optimization case in the open source community as an example to describe common algorithm analysis and optimization methods.

### 1. Merge Go Sorting Algorithm

Go’s sort package supports insertion sort, merge sort, quicksort, and heapsort. Sort by merge is a stable sorting algorithm implemented by the sorting package. Go implements the recursion-based merge sort algorithm. The code is as follows:

``````// The code for merge sort before optimization uses recursion to merge two ordered subsequences: data[a:m] and data[m:b].
func symMerge(data Interface, a, m, b int) {
// Initialize the values of mid, start, and r. mid indicates the median of [a:b], start indicates the start subscript for data exchange in [a:m], and r indicates the rightmost subscript for data exchange in [a:m].
mid := int(uint(a+b) >> 1)
n := mid + m
var start, r int
if m > mid {
start = n - b
r = mid
} else {
start = a
r = m
}
p := n - 1

// Use binary search for symmetrical comparison of data block [a:m] and [m:b], and find the start subscript position where data block [a:m] begins to be greater than data block [m:b].
for start < r {
c := int(uint(start+r) >> 1)
if !data.Less(p-c, c) {
start = c + 1
} else {
r = c
}
}

// Flip the data blocks [start:m] and [m:end] to ensure that the data block [start:end] is in order.
end := n - start
if start < m && m < end {
rotate(data, start, m, end)
}
// Invoke the recursive function to sort two groups of data sub-blocks [a:start] and [start:mid], and [mid:end] and [end:b].
if a < start && start < mid {
symMerge(data, a, start, mid)
}
if mid < end && end < b {
symMerge(data, mid, end, b)
}
}``````

### 2. Scenario and problem analysis

The merge sort algorithm is mainly used for sequential data block merging, for example, sequential slice data merging [0:20] and [20:40]. The main treatment procedure is as follows:

1. Call the recursive function symMerge to initialize the slice parameters.
2. Slice data [0:20] becomes less than [20:40] via binary search, symmetric comparison and flipping.
3. Call the recursive symMerge function again to merge the sort data subslices [0:start] and [start:20]and return to step 1.
4. Call the recursive symMerge function again to merge the sort data subslices [20:end] and [end:40]and return to step 1.
5. All data slices [0:40] will be in order when all sub-slices of data are sequenced.

When processing the common application scenario, it is seen that the recursive function needs to be called continuously to process the subsequence in the merge sort, and this is also a main reason for the performance loss in the algorithm merge sort. In merge-sorted data slices, if recursive processing of sub-slices can be avoided or reduced, the execution performance of the algorithm can be improved. For example, in a merge scenario sort sub-slice data  whose length is 1 and sub-slice data [1:10] whose length is 9, insertion sort is used to avoid invoking recursion, so that the execution time of an algorithm can be reduced and the performance of the algorithm can be optimized.

### 3. Optimization solution and implementation

Based on code analysis and scenario of merge sort algorithm in Go, an optimization method is found. That is, in merge sort, for a block of data whose length is 1, a binary insertion sort algorithm is used to process the blocks of data, so that the invocation of the recursive function can be reduced and the performance of the algorithm can be improved. The open source Go community also uses the optimization method based on binary insertion sort to improve the performance of the merge sort algorithm. The following figure shows the code comparison before and after optimization:

Optimization code:

``````// The optimization code adds a judgment on the slice length. When the slice length is 1, the binary insertion sort is used to directly sort the slices and return the slices.
func symMerge(data Interface, a, m, b int) {
// If data[a:m] contains only one element, use the binary insertion sort to find the subsequential insertion point and insert data[a] to data[m:b].
// Avoid invoking recursion function to improve the algorithm performance.
if m-a == 1 {
i := m
j := b
// Use binary search to find a proper sequential insertion point.
for i < j {
h := int(uint(i+j) >> 1)
if data.Less(h, a) {
i = h + 1
} else {
j = h
}
}
// Insert data.
for k := a; k < i-1; k++ {
data.Swap(k, k+1)
}
return
}

// If data[m:b] contains only one element, use binary insertion to find the proper sequential insertion point and insert data[m] to data[a:m].
// Avoid invoking recursion function to improve the algorithm performance.
if b-m == 1 {
i := a
j := m
// Use binary search to find a proper sequential insertion point.
for i < j {
h := int(uint(i+j) >> 1)
if !data.Less(m, h) {
i = h + 1
} else {
j = h
}
}
// Insert data.
for k := m; k > i; k-- {
data.Swap(k, k-1)
}
return
}

// Recursive implementation of merge sort.
......
}``````

### 4. Result of optimization

Utilize Go to benchmark To test the performance of the algorithm before and after optimization, use Go to the test bench to compare performance test results before and after optimization, and collect statistics on the number of times the recursive function is called before and after optimization. The following table lists the statistics:

Test item Test cases Time per operation before optimization Time per operation Time after optimization Run Time Comparison Total number of recursions before optimization Total number of recursions after optimization Comparison of total recursions
BenchmarkStableString1K-8 1 KB string slice 302278 ns/op 288879 ns/op -2.71% 1110 790 -28.8%
BenchmarkStableInt1K-8 1 KB integer slice 144207 ns/op 139911 ns/op -0.96% 689 586 -14.9%
BenchmarkStableInt64K-8 64 KB full slice 12291195 ns/op 12119536 ns/op -0.75% 48465 41280 -14.8%
ReferenceStable1e2-8 1e2(100) structure slice 135357 ns/op 124875ns/operation -7.17% 1079 807 -25.2%
BenchmarkStable1e4-8 1e4(10000) structure slice 43507732 ns/op 40183173 ns/op -7.64% 449735 350205 -22.1%
BenchmarkStable1e6-8 1e6(1000000) structure slice 9005038733 ns/op 8440007994 ns/op -7.69% 79347843 61483110 -22.5%

To note: `-8` indicates the GOMAXPROCS value when the function is running, and `ns/op` indicates the average time in nanoseconds to execute the function each time.

The previous table lists performance test results for different data types, such as string slice, integer slice, and structure slice. The comparative analysis is as follows:

• Compare the 1 KB string slice data with the integer slice data. After optimization, string slice data recursion times are reduced by 28.8% and integer slice data recursion times are reduced by only 14.9%. This indicates that the optimization technology is playing a bigger role in the string slice dataset and the performance is significantly improved.
• Comparing the data of whole slices with length of 64 KB and 1 KB respectively shows that the total number of recursion times decreases by about 14.8% after optimization and the performance improvement is 0, 5% to 1%.
• Compare structure slice data whose lengths are 1e2, 1e4, and 1e6. After optimization, the total number of recursion times are all reduced by about 22%, and the performance is all improved by about 7%.

Comparison and analysis of performance test results show that the performance improvement after optimization is closely related to the test data set. The longer the data slice length of 1 occurred in the recursion, the more effect the optimization will have, and the smaller the number of recursion times, the better the performance will be.

### 5. Summary

The merge sort algorithm optimization case analyzes performance issues in a specific scenario, provides an optimization solution based on binary insertion sort, and finally verifies the optimization result. This is an algorithm optimization practice worth learning.

Share. 