Fast approximation of the traveling salesman problem shortest route by rectangular cell clustering pattern to parallelize solving

A method of quickly obtaining an approximate solution to the traveling salesman problem (TSP) is suggested, where a dramatic computational speedup is guaranteed. The initial TSP is broken into open-loop TSPs by using a clustering method. The clustering method is based on either imposing a rectangular lattice on the nodes or dividing the dataset iteratively until the open-loop TSPs become sufficiently small. The open-loop TSPs are independent and so they can be solved in parallel without synchronization, whichever the solver is. Then the open-loop subroutes are assembled into an approximately shortest route of the initial TSP via the shortest connections. The assemblage pattern is a symmetric rectangular closed-loop serpentine. The iterative clustering can use the rectangular assembling approach as well. Alternatively, the iterative clustering can use the centroid TSP assembling approach, which requires solving a supplementary closed-loop TSP whose nodes are centroids of the open-loop-TSP clusters. Based on results of numerical simulation, it is ascertained that both the iterative clustering and rectangular cell clustering pattern are roughly equally accurate, but the latter is way more computationally efficient on squarish datasets. Fast approximation of the TSP shortest route serves as an upper bound. In addition, the route can be studied to find its bottlenecks, which subsequently are separated and another bunch of open-loop TSPs is approximately solved. For big-sized TSPs, where the balance of the accuracy loss and computational time is uncertain, the rectangular cell clustering pattern allows obtaining fast solution approximations on multitudinous non-synchronized parallel processor cores.

In combinatorial optimization, the traveling salesman problem (TSP) is often referred to as the most important problem whose solution for any number of nodes would lead to solving a variety of related problems [25,36].However, the existing methods of finding routes of the minimal length are limited to either a few hundred nodes or being designed for solving specific TSPs [10,22].Moreover, the supplementary task of finding all versions of the TSP shortest route is much harder [36,1].Whereas the exact methods are inapplicable to big-sized TSPs not returning a single solution, it is obvious that the tradeoff here is quite acceptable as for balancing the accuracy loss and computational time (resources), as well as for finding at least a subset of the TSP approximate solutions [38,52].The tradeoff implies finding approximately shortest routes (whose lengths may differ) rather than the exactly shortest routes (of the same, minimized, length) by dramatically speeding up the process of solving [6,17].However, the balance of the accuracy loss and computational time usually faces uncertainty [29,8].It is reasoned and established mainly using experience and expert judgments depending on an engineering application [36,5,20].
Unlike the parallel tabu search algorithm [9], where several moves are performed concurrently requiring some synchronization, the parallelization herein is implied in multiple set-ups [44,27].The parallelization set-up is determined by how many processor cores or parallel computers can be simultaneously used to run the process of solving open-loop TSPs on them.The set-up when only one open-loop TSP can be solved on a single machine is not excluded.This case does not cancel the parallelization, though.By the reason mentioned above, it is naturally expected that solving open-loop TSPs in a sequence, subproblem by subproblem, will take less computational time than solving the whole TSP.Thus, a TSP approximate solution can be obtained faster even by this, the "least parallel", parallelization set-up [44].An additional speedup may be achieved by selecting the most appropriate number of clusters [7,31].
Therefore, the TSP parallelization by a set-up is efficient if the amount of computational resources spent on solving the subproblems according to this set-up is less than that spent on solving the whole problem.Along with reducing the usage of computational resources, the assembled route is supposed to be not much longer than the route in the solution of the whole problem.At least, the difference must be tolerable despite being mostly uncertain.Henceforward, the goal is to suggest an efficient method of the fast approximation of the TSP shortest route by parallelizing its solving, while the difference between the exactly minimal route length and the approximated one is uncertain.The efficiency criterion is to reduce the usage of computational resources (in particular, to shorten the computational time) by lengthening the assembled route at most by some tolerance.For achieving the goal, the following eight tasks are to be fulfilled: 1. To describe TSP variables, flags, constraints, and objective.2.
To describe the open-loop TSP. 3. To suggest a method by which the initial TSP is broken into open-loop TSPs the quickest and the initial TSP solution is efficiently assembled from the solutions of the open-loop TSPs.
4. To describe basic steps in the genetic algorithm as a solver of the TSP and open-loop TSP. 5. To describe how solving the initial TSP is parallelized and the approximately shortest route is subsequently assembled.
6. To obtain statistics of the parallelization performance in order to estimate its efficiency.7. To discuss the parallelization efficiency and limitations.8. To conclude on the scientific and practical contribution.The paper, whose structure is directly based on the list of these tasks, proceeds as follows.Section 3 presents a formalization of the TSP and its objective along with variables, flags, and constraints used in it.The open-loop TSP is formalized in Section 4. Section 5 presents a pattern by which the initial TSP is broken into open-loop TSPs the quickest, whereupon these open-loop TSPs are solved and an approximate solution of the initial TSP is efficiently assembled from the solutions of the open-loop TSPs.The genetic algorithm as a solver of the TSP and open-loop TSP is described in Section 6. Section 7 describes how solving the initial TSP is parallelized and the approximately shortest route is subsequently assembled.The efficiency of the suggested approach is estimated in Section 8.It is further discussed in Section 9. Section 10 concludes on the scientific and practical contribution.

TSP variables, flags, constraints, objective
In the flat TSP of N nodes, where the depot has number 1, the salesman's route has only two-coordinate node locations, without ascensions or descents [38,39,34,2,49], and is the set of nodes with horizontal p k1 and vertical p k2 components of the location of node k.If the salesman visits node j directly after node k, then this is flagged as x kj = 1; otherwise x kj = 0.Moreover, to avoid surplus flagging, the direct connection of nodes k and j is flagged only in one direction (by which the salesman factually goes): x jk = 0 if x kj = 1 and x kj = 0 if x jk = 1. ( FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN So, the flags showing which nodes are connected, are binary: and In fact, flags (3) by ( 2) and ( 4) are variables in the TSP.
There is only one departure from node k towards only one following node that is constrained by an equality Symmetrically, there is only one arrival at node j from only one node that is constrained by an equality To exclude closed-loop subtours so that the route be a single tour and not a union of smaller tours, the third constraint is k∈Q j∈Q If the salesman visits node j directly after node k, it is accomplished by a straight line with some constant speed.
is the distance covered by the salesman between nodes k and j.There are 0.5N (N − 1) nonzero symmetric distances ( 8) in the TSP of N nodes.Distances (9) can be mapped into time or other units implying the cost of completing the TSP route.The respective objective function is to be minimized subject to constraints (2) - (7).The minimization goal is to find such flags at which Flags (11) give the shortest route length ρ * Σ by (12).In addition, flags (11) allow building a minimum-length route (an optimal route).Nevertheless, minimum (12) can be reached by more than one set (13) of flags (11).So, the TSP may have multiple minimum-length routes having the same length ρ * Σ [25,1,34,2,49].

Open-loop TSP
In the open-loop TSP of N nodes (1), the salesman must depart from the depot (node 1) and arrive at node N (the destination node).There is only one departure from node k being not node N towards only one following node that is constrained by an equality Symmetrically, there is only one arrival at node j from only one node being not node N that is constrained by an equality Closed-loop subtours in the open-loop TSP are excluded by constraint (7) as well.The respective objective function (10) is to be minimized subject to constraints ( 2) -( 4), ( 7), ( 14), (15).

Rectangular cell clustering pattern
Nodes (1) of an initial TSP can be broken into multiple groups (clusters) by applying a clustering method.In particular, it can be a method of K-means or K-medoids [40,14].Then each of the clusters correspond to an openloop TSP.Breaking into just two clusters is the quickest.As the number of clusters is increased, the clustering is performed slower.The slowdown becomes more significant as the number of nodes increases.
As the first two clusters are obtained by the quickest clustering, they can be further clustered.Each of the clusters is broken into two clusters, which is fulfilled the quickest.Such a clustering process can be kept until it is enough -the open-loop TSPs become sufficiently small.There are M = 2 n clusters by n ∈ N\ {1}.At step n of the clustering, these clusters are by and where Obviously, these clusters are non-overlapping, so It is clear that the clusters are broken quicker as number n is increased.An example of 250000 nodes clustered in 16 groups is shown in Figure 1, where the depot is marked as square, and the centroids of the clusters are marked as circles.The 16 clusters appear roughly squarish.The same set divided into 64 clusters is shown in Figure 2. The 64 clusters now appear far less squarish, even roughly.Nevertheless, the clusters in Figure 1 fit the square (rectangular) lattice pattern shown in Figure 3.In general, centroids (23) of real clusters are only approximately close to the cell centers but generally do not coincide with them.As the number of clusters is too increased, the respective rectangular lattice pattern becomes inappropriate (Figure 2).So, there is a number of clusters (or number n), for which the pattern and the factual clustering result are approximately the closest.This implies that, instead of the iterative clustering by ( 16) -( 22), the 2 n clusters can be made by just the rectangular cell clustering pattern (exemplified for 16 clusters in Figure 3).The initial set of nodes (1) for m hor = 1, M hor and m vert = 1, M vert .The clusters in this case are created by just checking whether a node belongs to the respective cell (25).An example of applying the rectangular cell clustering pattern to the set of nodes in Figure 1 is shown in Figure 4. Despite the rectangular clusters in Figure 4 do differ from those in Figure 1 (although the "boundary" clusters in these Figures are nearly similar), they are obtained a few hundred times faster.Thus, the division in Figure 4 takes up to 0.29 seconds, whereas the division in Figure 1 takes about 20 seconds.Dividing a million nodes into such a square of 16 clusters by the rectangular cell clustering pattern takes up to 1.3 seconds, and applying the iterative clustering by ( 16) - (22) takes more than 2 minutes in this case.
Every cluster corresponds to its respective open-loop TSP.Its (approximately) shortest open-loop route should be assembled into the (approximately) shortest route of the initial TSP.Owing to the rectangular lattice consisting of 2 n cells, the assembling is quite easy being a symmetric rectangular closed-loop serpentine (Figure 5).It is worth noting that the serpentine patterns in Figure 5, except for the 4 and 8 cells, are not the only possible ones.There are other, symmetric and non-symmetric, patterns whose lengths are the same equal to the length of the respective pattern in Figure 5. Obviously, when the cell is a unit square, the length is 2 n units.

Genetic algorithm
The solver of both the TSP and open-loop TSP is the genetic algorithm principally requiring at its input a set of node locations (1), the depot location p 11 p 12 , a population size, and a set of mutation operators.The other auxiliary options are usually set at their default values.The population is a series of pseudorandom routes called chromosomes.For the genetic algorithm of solving an ordinary (closed-loop) TSP, each element of the population is an (N − 1)-dimensional vector of non-depot nodes the salesman should visit.For every route of the population, the following routine is executed during an iteration of the algorithm.First, the distance to the node following the depot is calculated as by (8).Second, the remaining distances except the last one are accumulated into the running variable d: Third, the distance of returning to the depot is included the last: Then, the accumulated distance covered by route ( 26) is calculated as and minimized over the population.Inequality (30) is the relationship between the length of a heuristically found route (26) and the exactly shortest route length from an exact solution to problem (12).A new population is generated using mutation operators of slide, flip, swap, and crossover within a subpopulation of the currently best chromosomes.First, the slide operator moves the last node from each chromosome to the beginning of another one.Next, the flip operator swaps a random sequence of nodes inside a chromosome: a sequence from a route is extracted and flipped as = S (r) , by n (r) Stat., Optim.Inf.Comput.Vol. 12, September 2024 V. ROMANUKE 1435 The flip operator returns then an updated vector (33) after (34).The swap operator selects the same-index-and-length sequence of nodes from two chromosomes (33) and for random integers h 3 and h 4 by and for r ̸ = q, whereupon they are interchanged.Thus, sequences and are interchanged as = S (r) , and by n The swap operator returns then updated vectors (33) and ( 35) after ( 39) and ( 40), respectively.The crossover operator, somewhat resembling the swapping, takes two chromosomes (33) and (35) for r ̸ = q, and cuts each chromosome in two random parts using random integers h r and h q by and where h r first nodes in chromosome (33) are left and h q first nodes in chromosome (35) are left.Thereupon the chromosome parts of H nodes long by are interchanged: and The crossover operator returns new (mutated) routes ( 44) and (45).
The algorithm for solving the open-loop TSP is slightly modified: instead of ( 26), each element of the population is an (N − 2)-dimensional vector of non-start-end nodes the salesman should visit (where N is the number of nodes in an open-loop TSP, and the salesman should start its route at node 1 and complete at node N ).The distance to the node following the depot is calculated as (27) and the remaining distances except the last one are accumulated into the running variable d: The distance to node N is included the last: Then, the accumulated distance covered by route ( 46) is calculated as (30).Instead of ( 31) - (45), a new population is similarly generated by = S (r) , by n and respectively.

Parallelization and assembling the approximately shortest route
The shortest route passing through the cells of the rectangular lattice pattern is easily assembled by using the following routine.The cell centers can be numbered starting from the left top corner downwards (see, e. g., Figure 6 and Figure 7, where the lattices with 8 and 32 cells are shown in two versions).
u 0.5M +mM hor +q = (M vert − 2m + 1) M hor − M hor + q for q = 1, 0.5M hor and m = 1, 0.5M vert − 1, (66) It is clear that if √ M is integer then M hor = M vert .Henceforward, the approximately shortest route by the rectangular lattice pattern is assembled in accordance with (62) -(67).The assembling is started with the cluster containing the depot.Prior to the assembling, the openloop TSPs (subproblems) can be solved in any succession (e. g., sequentially, subproblem by subproblem) or in parallel (simultaneously, on parallel processor cores or on parallel computers).Thus, the sequential parallelization is distinguished from the in-parallel parallelization implying the way of coherence and simultaneousness of solving the open-loop TSPs.Meanwhile, there is no requirement of any sort of synchronization.
The two subtours for open-loop TSP u m and open-loop TSP u m+1 , m = 1, M − 1, should be connected by a node from cluster u m which is the closest to cluster u m+1 .For example, if the depot is in cluster 14 (see Figure 6, second row), then the starting subtour for TSP 14 should be connected by a node from cluster 14 closest to cluster 13; cluster 13 should be connected by its node closest to cluster 9; cluster 9 should be connected by its node closest to cluster 10; ...; cluster 15 should be connected by its node closest to cluster 14.Instead of searching through all the nodes of both clusters, the closest node can be approximately determined as one of the four nodes within the rectangular cell (cluster) which are the westernmost, easternmost, southernmost, northernmost.The westernmost node has the least value of its first (horizontal) component; the easternmost node has the largest value of its first (horizontal) component; the southernmost node has the least value of its second (vertical) component; the northernmost node has the largest value of its second (vertical) component.
and three versions of the rectangular cell clustering pattern are used for i. e. n ∈ {2, 4, 6}.To randomly generate nodes, pseudorandom numbers independently drawn from the standard uniform distribution on the open interval (0; 1) and independently drawn from the standard normal distribution are used.Denote uniformly distributed variates by θ 1 , θ 2 and normally distributed ones by η 1 , η 2 , η 3 , η 4 .Then the node locations are generated as whereas the depot location is generated as Denote the number of nodes in cluster m by N m , where The maximal number of iterations for solving the whole TSP is (N m , M ; w) for the w-th whole problem generated for N nodes by (68) and M clusters by (69), w = 1, 30.Then a ratio reflects an accuracy gain of the parallelization by using the rectangular cell clustering pattern.In addition, denote by l * Σ (N ; w) the number of iterations taken to solve the whole TSP, and denote by l □∃ * (m) Σ (N m , M ; w) the number of iterations taken to solve the open-loop TSP for cluster m.Then a ratio reflects a speed gain of the sequential parallelization by using the rectangular cell clustering pattern.Furthermore, the rectangular cell clustering pattern can be compared with the iterative clustering by ( 16) -( 22).Then the approximately shortest route is still assembled in accordance with (62) -( 67), where centroids (23) are used (e.g., see Figure 8) instead of the rectangular cell centers.For this case, the two successive subtours for open-loop TSP u m and open-loop TSP u m+1 , m = 1, M − 1, are connected by a node from cluster u m which is the closest to cluster u m+1 and simultaneously is the farthest from the depot.Inasmuch this problem usually does not have a solution, a node from the respective Pareto set is selected [35,13] reflects an accuracy gain of the parallelization with respect to the iterative clustering by using the rectangular assembling approach.A ratio reflects a speed gain of the sequential parallelization with respect to the iterative clustering by using the rectangular assembling approach, without taking into account the time spent on the iterative clustering.
Another approach is to build an assembling polyline by solving a supplementary TSP, in which the nodes are centroids (23).It is called the centroid TSP taking l  (N m , M ; w) is the number of iterations taken to solve centroid-based subproblem m, reflects a speed gain of the sequential parallelization with respect to the iterative clustering by using the centroid TSP assembling approach, without taking into account the time spent on the iterative clustering.
Each of ratios (73), (75), (77) implies that the open-loop TSPs are not solved in parallel, though.When they are solved in parallel, then the speed gains are: FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN by the supposition of that the open-loop TSPs are simultaneously solved on M parallel processor cores.In contrast to gains (78) -(80), gains (73), ( 75), (77) suppose that the TSPs are solved on a single processor core.If g (N, M ; w) > 1 then it means that the length of the route assembled from the rectangular-cell-clusteringbased subroutes is shorter than that length ρ * Σ (N ; w) obtained without the parallelization.If g (N, M ; w) < 1 then the parallelization worsens the accuracy of the given TSP solution.If f (N, M ; w) > 1 then it means that the length of the route assembled from the rectangular-cell-clustering-based subroutes is shorter than the length of the route assembled from the iterative-clustering-based subroutes.If f (N, M ; w) < 1 then the rectangularcell-clustering-based parallelization is worse than the iterative-clustering-based one for the given TSP.However, in the case f (N, M ; w) = 1 the rectangular-cell-clustering-based parallelization still works because the iterativeclustering-based parallelization additionally spends computational time to cluster nodes (1) iteratively rather than to apply the rectangular cell clustering pattern.If c (N, M ; w) > 1 then it means that the length of the route assembled from the rectangular-cell-clustering-based subroutes is shorter than that length assembled from the centroid-based subroutes.If c (N, M ; w) < 1 then the rectangular-cell-clustering-based parallelization is worse than the centroid-based one for the given TSP.However, in the case c (N, M ; w) = 1 the rectangularcell-clustering-based parallelization still works because the centroid-based parallelization additionally spends computational time to cluster nodes (1) iteratively, whereupon the centroid TSP is additionally solved before assembling the subroutes.
If λ (N, M ; w) > 1 then it means that, even when the open-loop TSPs are solved sequentially, the route assembled from the rectangular-cell-clustering-based subroutes is obtained faster than a solution without the parallelization.If λ (par) (N, M ; w) > 1 then it means that the rectangular-cell-clustering-based parallelization works faster by solving the subproblems in parallel.If µ (N, M ; w) > 1 then it means that, even when the subproblems are solved sequentially, the rectangular-cell-clustering-based parallelization is faster than the iterativeclustering-based one for the given TSP; if µ (par) (N, M ; w) > 1 then it means that the rectangular-cell-clusteringbased parallelization is faster than the iterative-clustering-based one when the subproblems are solved in parallel.Speed gains (77) and (80) are treated similarly.
To study the statistics of the parallelization gains by ( 72) -(80), their minimal, average, and maximal values calculated respectively as are to be considered.The statistics presented in Table 1 shows that parallelizing the TSPs by the rectangular cell clustering is at least four times faster than solving the whole TSP.The accuracy losses by four clusters (M = 4) do not exceed 1.5 % on average, where the worst case is  2 shows the statistics with respect to the iterative clustering by ( 16) - (22), where centroids ( 23) are used to assemble a closed-loop route instead of using the rectangular cell centers.Based on comparing to the rectangular cell clustering pattern, accuracy gain (74) bounces above 1 (it is highlighted bold) and below.As the number of clusters is increased and as the TSP size increases, the assembling by the rectangular cell clustering becomes more accurate reaching a 4.45 % accuracy gain at 4001 nodes and 64 clusters.The worst case is at 2001 nodes and 16 clusters, where an accuracy loss is about 2.5 %.The speedups by (75), (79), (88), μ (N, M ) by ( 90), (93), μ(par) (N, M ) by (95) are badly scattered around 1 (when both the iterative clustering and rectangular cell clustering have the same computational speed).The rectangular cell clustering is 1.12 % to 24.79 % faster at the fewest clusters (M = 4), when the four open-loop TSPs are simultaneously solved on four cores.Nevertheless, it is twice as faster at 64 clusters and 6001 nodes.Contrary to that, there is a TSP with 4001 nodes divided into 16 open-loop TSPs, where the iterative clustering is more than 55 % faster on 16 cores.Highlighted bold, the speedups by the rectangular cell clustering have a vague trend, especially when TSPs are solved on a single processor core.Solving on M cores seems more preferable for the rectangular cell clustering, where using four cores is always faster.Computational speed compared to the centroid TSP assembling approach (Table 3) does not have a distinct trend either.The speedups by ( 77), ( 80), (89), γ (N, M ) by ( 90), (94), γ(par) (N, M ) by ( 95) are badly scattered around 1 also, although solving on M cores seems more preferable for the rectangular cell clustering (there are more speedups highlighted bold in the bottom three lines corresponding to solving on M parallel processor cores).Using four cores is almost always faster, with two exceptions at 2001 and 4001 nodes.Accuracy gain (76) bouncing above 1 (it is highlighted bold) and below shows that the rectangular cell clustering is more accurate on average, with four exceptions at 2001 nodes and M ∈ {4, 16}, at 6001 nodes and 16 clusters, and at 8001 nodes and 16 clusters.
A visual example of solving a whole TSP generated for 8001 nodes is presented in Figure 9, where the route length is 13035.4899.It is seen that the density of nodes closer to the margins is lesser.This is reasoned by partially using the normal distribution in generating nodes by ( 70) and (71).The rectangular cell clustering thus produces too scattered cluster sizes.By dividing the exemplified TSP into 64 clusters, the cluster size varies between 6 and  10, where the cluster of 6 nodes is at the top right corner) being just 2.51 % longer than that in Figure 9.However, the approximated solution in Figure 10 is obtained 20.4818 times faster if the 64 open-loop TSPs are solved on a single processor core.Furthermore, it is obtained 425.8264 times faster if every open-loop TSP is solved on its own processor core -it is the worst case wherein λ (par) min (8001, 64) = 425.8264 in Table 1.
When the TSP in Figure 9 is solved by using the iterative clustering by ( 16) - (22), where an approximately shortest route is assembled in accordance with (62) -(67) passing through 64 centroids (23), the results become better in speedup but worse in accuracy.Whereas the speed gains (75), (79) here are µ (8001, 64; 5) = 0.8451 and µ (par) (8001, 64; 5) = 0.6299, the respective assembled route (Figure 11) turns out to be 1.54 % longer than that in Figure 10.The huge drop in the rectangular cell clustering speedup is reasoned by that the cluster size upon the iterative clustering varies between 45 and 210 nodes, which is a significantly narrower range compared to that for the rectangular cell clustering.
Using the centroid TSP assembling approach results in similar rectangular cell clustering speedup drops and accuracy gain: The assembled closed-loop route shown in Figure 12 is slightly shorter than that in Figure 11; its length is 13525.3716.It is also worth noting that the set of nodes {k * * * m }

63
m=1 connecting the open-loop subroutes differs from the analogous set in Figure 11, although the clusters are the same.
It may seem that the rectangular cell clustering pattern does not have a clear advantage over the iterative clustering, whether a route is assembled by the centroid TSP assembling approach or by the rectangular assembling approach.However, as the portion of the normal distribution in generating nodes by (70) and ( 71) is reduced, the FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN Figure 9.An approximately shortest route for a TSP with 8001 nodes solved without clustering and parallelization rectangular cell clustering approach becomes more accurate and faster.Thus, if the node locations are generated as instead of (70), then it is 0.13 % more accurate and 1.02 % faster than the rectangular assembling approach (Table 4).Compared to the latter, the rectangular cell clustering approach is 26.42 % faster also when M parallel processor cores are used.In addition, it is still 0.06 % more accurate than the centroid TSP assembling approach being 2.61 % slower on a single processor core and 52.22 % slower on M parallel processor cores (Table 5).Measuring computational time in seconds, the rectangular cell clustering approach is 67.16 % and 59.03 % faster than the rectangular assembling and centroid TSP assembling approaches, respectively.This time includes an amount of time spent on the clustering itself, i. e. on the preparation to solve.As it has been mentioned above, the rectangular cell clustering is very fast, so it is no wonder that the preparation time of the rectangular cell clustering approach is 60000 to 70000 times shorter than preparing to solve by the other two approaches.If to consider only "pure" computational time spent on solving the subproblems, without the preparation time, then the rectangular Figure 11.An approximately shortest route for the TSP in Figure 9 solved by the iterative clustering by ( 16) - (22), where centroids (23) are used to assemble a closed-loop route instead of using the rectangular cell centers in Figure 10; the route length is 13569.1552which is 1.54 % longer than that in Figure 10 cell clustering approach is 2.43 % and 1 % faster than the rectangular assembling and centroid TSP assembling approaches, respectively.As the number of clusters is increased, the advantage strengthens with respect to the rectangular assembling, and it slightly weakens with respect to the centroid TSP assembling approach.As the TSP size increases, the advantage strengthens with respect to them both following a quadratic pattern.
It is quite obvious that numbering the cell centers by ( 62) -(67) is not the only possible version.Before the assembling, the clusters are re-numbered so that their new numbers correspond to the consecution of how the clusters are connected by the symmetric rectangular closed-loop serpentine.In particular, the cluster containing the Stat., Optim.Inf.Comput.Vol. 12, September 2024 V. ROMANUKE 1449 Figure 12.The assembled closed-loop route for the TSP in Figure 9 solved by the centroid TSP assembling approach depot is always re-numbered so that its number is 1.The same is done for the polyline closed-loop serpentine (see Figure 8), where the cluster closest to every cell center is determined.
Connecting successive subtours for open-loop TSPs in the case of iterative clustering (when either the rectangular assembling or centroid TSP assembling approach is used) is fulfilled by determining connectors {k * * * m } M −1 m=1 , where node k * * * m belonging to cluster m (after re-numbering) is the destination node in open-loop TSP m.The assembling is done almost trivially for just two clusters (19) and (20).The salesman departing from the depot must complete the open-loop subtour at a node of cluster (19) that is the farthest from the depot.On the other hand, this node must be the nearest to cluster (20) to resume building the route for the initial TSP via the shortest connection of the two but this number also should be k * * 1 ∈ K 1 (1) such that Although the case k * 1 = k * * 1 is not impossible, it is rather unlikely.So, in the case that is the best Pareto-efficient point is selected.For this, distances ρ (k 1 , 1) and δ (k 1 , k 2 ), where FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN is selected such that for every Ṽ km , km+1 ∈ Ṽ and every or a pair of inequalities m+1 ∈ Ṽ either a pair of inequalities m+1 or a pair of inequalities Node k * * * M −1 ∈ KM−1 (n) is the destination node for cluster M − 1 and it is the starting node for cluster M .The destination node for cluster M is the depot.The open-loop subroutes (subtours) are assembled through the depot and nodes {k * * * m } M −1 m=1 making thus a closed-loop route as an approximate solution to the initial TSP.This method of determining connectors {k * * * m } M −1 m=1 is especially efficient for a few clusters.It is explained by moving farther away from the depot reduces likelihood of that these nodes are located too close that may lower the quality of an approximate solution.Indeed, if nodes {k * * * m } M −1 m=1 are too close, then it is more probable that a route in which they are connected directly one after another is shorter than a route assembled by connecting the M open-loop subroutes through these nodes.
However, the M open-loop subroutes can be connected in a simpler manner, without moving farther away from the depot.For this, the depot is not initially considered as a specific node.The iterative clustering by ( 16) -( 22) is slightly modified.At step n of the clustering, these clusters are still (16) by inclusion union-intersection statements (18), and where union-intersection statements ( 21) and ( 22) hold.Then where k * * 1 becomes the fictional depot.An example of applying such a technique is shown in Figure 13, where an approximate solution to the well-known Mona Lisa problem [3] is found by the rectangular cell clustering approach for 16 clusters.There are 10 5 nodes and the assembled route length is 6195821.4779,while it is claimed that a lower bound found by the Concorde solver [32,26] is 5757084, and no shorter route exists.The assembling Nevertheless, the Mona Lisa problem is solved far faster when it is divided into 64 clusters [41,42].Figure 14 presents another approximate solution, which is just 0.24 % longer than that in Figure 13.The assembling polyline , where the pattern from Figure 6 is used.The computation has lasted for 186.123 hours on the abovementioned single CPU, which is at least 0.136 % faster than by the existing state-of-the-art parallel approximation algorithms [9,44,30,18].To solve 64 open-loop TSPs whose number of nodes varies between 158 and 3102, it has taken 53388914 iterations, which is 2.7887 times less than that for the 16 clusters.Thus, the smaller-cluster division ensures a quite significant speedup (it is 8.5822 times faster) by worsening the approximate solution only by 0.24 %.Meanwhile, the resulting Mona Lisa image reconstruction appears to be of much the same quality (Figure 15).In this particular case, the approximate solution in Figure 14 nicely balances the accuracy loss and computational time, making the tradeoff appropriate.Figure 14.An approximate solution to the Mona Lisa problem obtained with the rectangular cell clustering approach for 64 clusters (highlighted by varying colors); the route is 0.24 % longer than that in Figure 13, but obtained 8.5822 times faster

Discussion of efficiency and limitations
It is apparent that more squarish datasets fit better the rectangular cell clustering pattern.As both the rectangular assembling and the centroid TSP assembling approaches are slower, the rectangular cell clustering pattern is way more efficient than the iterative clustering on squarish datasets.Meanwhile, the accuracies of these three approaches are comparable, even when the dataset becomes less squarish and more roundish, oval, ellipsoidal, etc.
For a given TSP, the rectangular cell clustering pattern can be used for fast approximation of the TSP shortest route.An approximately shortest route suggests two things.First, it serves as an upper bound.Due to the uncertainty of the approximate solution, which depends on the pseudorandom number generator initial state [36,52,6,39,28,23], the upper bound may be lowered by re-solving the TSP for a few times in a row or in parallel.Second, the route can be studied and scrutinized to find its bottlenecks (or vulnerable zones), which subsequently are separated and another bunch of open-loop TSPs is approximately solved.
Another merit of the rectangular cell clustering pattern is the parallelization efficiency.The pattern quickly makes solving any TSP parallelizable.It is true for big-sized TSPs similar to that in Figures 12 -14.Availability of multitudinous parallel processor cores is not a trouble today.Besides, the processors are not really required to be perfectly synchronized, unlike other parallel-based approaches [9,18,44,30,53].For that matter, a TSP with a million nodes similar to the Mona Lisa problem can be approximately solved within an hour by using 1024 processor cores (here the average number of nodes per cluster is 976.5625, which requires about 45 minutes).A TSP with a billion nodes, to be approximately solved within an hour or so, would require 1048576 processor cores.If this amount is not available, the TSP is parallelized on an available amount.For instance, if 1024 processor cores are available, then the respective 1024 × 1024 rectangular lattice is imposed on a billion nodes, and 1024 bunches of 1024 open-loop TSPs are solved within roughly 1024 hours, which is 43 days.
The key limitation of the suggested approach is its dependence upon squarish datasets.However, if a dataset is of an irregular nonconvex shape, its protuberances can be cut off as separate open-loop TSPs to obtain and solve an open-loop "mother" TSP (of roughly a convex shape).Thereupon the approximate solutions of the open-loop TSPs, including the "mother" TSP are assembled.Such an approach fits serpentine-like datasets as well [19,43].
The assemblage mostly implying a selection of the connectors is also important.The suggested assemblage in accordance with (62) -(67) exemplified in Figures 6 and 7 is not the only possible one.There are closed-loop serpentines of other forms.For instance, the serpentine for a 4 × 4 rectangular lattice reminds the letter I (see the second row right subplot in Figure 6), but it can be "transposed" so that it will remind the letter H.
Another limitation is the 2 n pattern.For instance, if a dataset is still squarish, but its shape has an aspect ratio of 3-to-1, applying a 2 × 4 lattice (or 4 × 8, 8 × 16, 16 × 32, etc.) will result in stretched clusters.This may lead to accuracy losses from inefficient horizontal connections in the assemblage.However, a more relevant, not necessarily 2 n -pattern, rectangular lattice can be applied by subsequently mixing it with the centroid TSP assembling approach.Therein, the iterative clustering must be modified so that not all clusters are further divided at a step n of the clustering, but only those whose size is not sufficiently small.In this case, the iterative clustering resembles a binary tree with pruned branches, whose leaves are clusters of a sufficiently small size.The criteria for deciding when open-loop TSPs are sufficiently small depend on the topology of nodes, though.Overall, the sufficient smallness cannot be certainly formalized.
If density of nodes badly varies, it does not seem to affect the accuracy heavily.Indeed, the Mona Lisa problem has visible parts of badly varying density (hair and shoulders against the upper part of the background and the other light parts like face and neck), but the two divisions into 16 and 64 clusters (Figures 12 and 13) are not followed by a significant accuracy difference.Therefore, it is expected that the suggested approach can successfully handle realworld TSPs with capacity limits, time windows, vehicle speed inconstancy, etc., because such additional constraints are embedded into the genetic algorithm [49,16,41,42].
The comparative analysis of Tables 1 -5 has shown that the accuracy gain and speedup, if any, is not stable through fewer TSPs.Nevertheless, Table 1 confirms the parallelization gain by using the rectangular cell clustering pattern is positively scalable -as either the TSP size increases or the number of clusters is reasonably increased, or they both increase, the gain grows.

Conclusion
This paper basically suggests two ways of dividing a closed-loop TSP into smaller-sized open-loop TSPs: the rectangular cell clustering and iterative clustering.The latter has two ways to assemble the solutions of the openloop TSPs: the rectangular assembling approach, by which the assemblage is done via a symmetric rectangular closed-loop serpentine, and the centroid TSP assembling approach, which requires solving a supplementary closedloop TSP whose nodes are centroids of the open-loop-TSP clusters.The main intention of the clustering is to parallelize the TSP for quickly determining its approximate solution that serves as an upper bound of the TSP solution whose bottlenecks can be studied also in the approximation.
While the rectangular cell clustering is determined by only the lattice size, the iterative clustering is primarily determined by the clustering method.The method for clustering can be any method allowing to efficiently divide a set of nodes into two groups by minimizing the distance within the group and maximizing the distance between the groups, taking into consideration possible variation of node density.
Based on the results obtained from the numerical simulation, it is ascertained that both the rectangular cell clustering pattern and iterative clustering are roughly peers at the accuracy, but a significant difference exists in the computational time.An approximate solution to the TSP is obtained faster by the rectangular cell clustering pattern, whereas it performs much better on squarish datasets.The main scientific contribution consists in further improving the approaches to an efficient approximate analysis of closed-loop TSPs by shortening the computational time and not exceeding tolerable accuracy losses being uncertain for big-sized TSPs.This is done by using a subset of nodes connecting open-loop TSPs via either a rectangular closed-loop serpentine or the polyline from a solution of the supplementary centroid TSP.The suggested approach has a significant impact and practical contribution as it allows approximately solving big-sized TSPs on multitudinous parallel processor cores without requiring their synchronization.
The research can be extended onto building approximately shortest routes accomplished by multiple salesmen.Although such TSPs are still parallelizable, their feasible solutions must obey specific constraints issuing from the multiplicity of salesmen.In addition, minimization of the number of salesmen may be an additional criterion of solution optimality.

Figure 1 .
Figure 1.A set of 250000 nodes divided into 16 clusters

Figure 2 .Figure 3 .
Figure 2. The set of 250000 nodes from Figure 1 divided into 64 clusters

Figure 4 .
Figure 4.The set of 250000 nodes from Figure 1 (the depot is not marked) divided into 16 clusters by the rectangular cell clustering pattern If the lattice is of M hor horizontal and M vert vertical cells, where M = M hor • M vert , rectangle (24) is uniformly broken into M = 2 n cells (subrectangles) [a m hor mvert ; b m hor mvert ] × [g m hor mvert ; h m hor mvert ] =

Figure 5 .
Figure 5. Assembling the open-loop routes by the serpentine patterns of the shortest route passing through centers of the lattice cells(4, 8, 16, 32, 64, 128 cells) . Here, denote the shortest length of the subroute in the open-loop TSP for cluster m by ρ∃ * (m) Σ (N m , M ; w), and denote by l ∃ * (m) Σ (N m , M ; w) the number of iterations taken to solve this open-loop TSP.Then a ratio f (N, M ; w) = w) iterations to obtain a solution.For the case of the centroid TSP, denote the shortest length of the subroute in the subproblem for cluster m by ρ(C) * (m) Σ (N m , M ; w).Then a ratio c (N, M ; w) = M m=1 ρ(C) * (m) Σ (N m , M ; w) M m=1 ρ□∃ * (m) Σ (N m , M ; w) (76) Stat., Optim.Inf.Comput.Vol. 12, September 2024

Figure 8 .
Figure 8.The set of 250000 nodes from Figure 1 (the depot is not marked), where the approximately shortest route is assembled in accordance with (62) -(67) passing through the 16 centroids of the clusters reflects an accuracy gain of the parallelization with respect to the iterative clustering by using the centroid TSP assembling approach.A ratio

Figure 10 .
Figure 10.An approximately shortest route for the TSP in Figure 9 solved by the rectangular cell clustering pattern; the route, whose length is 13363.2911, is assembled in accordance with (62) -(67) passing through the 64 numbered centers of the lattice cells, where nodes {k * * * m } 63 m=1 connecting the open-loop subroutes are marked as circles

Figure 13 . 16 m=1
Figure 13.An approximate solution to the Mona Lisa problem obtained with the rectangular cell clustering approach for 16 clusters (highlighted by varying colors) Stat., Optim.Inf.Comput.Vol. 12, September 2024 1454 FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN is also shown along with connectors {k * * * m } 64 m=1

Figure 15 .
Figure 15.The original image (upper left) as the route drawn by Robert Bosch [3], the 16-clustered route by Figure 13 (upper right), and the 64-clustered route by Figure 14 (bottom) the shortest route length found for the w-th whole TSP generated for N nodes by (68).For the case of the rectangular cell clustering, denote the shortest length of the subroute in the open-loop TSP for cluster m by ρ□∃ * (m) 200 • (N − 1), and the maximal number of iterations for solving the open-loop TSP for cluster m is set similarly -it is 200 • N m .The algorithm early stop condition is used, by which a run of the algorithm is stopped if the shortest route length does not change for a one tenth of the maximal number of iterations.To obtain reliable and stable statistical data, the whole TSP is re-generated 30 times for every (68) and (69).FAST APPROXIMATION OF TSP SHORTEST ROUTE BY RECTANGULAR CELL CLUSTERING PATTERN Denote by ρ * Σ (N ; w)