Comparison of the Classical Algorithm with the Training Algorithm in Scheduling Problem ADI Production

A classical algorithm Tabu Search was compared with Q Learning (named learning) with regards to the scheduling problems in the Austempered Ductile Iron (ADI) manufacturing process. The first part comprised of a review of the literature concerning scheduling problems, machine learning and the ADI manufacturing process. Based on this, a simplified scheme of ADI production line was created, which a scheduling problem was described for. Moreover, a classic and training algorithm that is best suited to solve this scheduling problem was selected. In the second part, was made an implementation of chosen algorithms in Python programming language and the results were discussed. The most optimal algorithm to solve this problem was identified. In the end, all tests and their results for this project were presented


Introduction
In the production of castings, the problem of scheduling tasks concerns both the manufacturing process of a given product and the queuing of accepted orders.This allows to increase efficiency or optimize the production process itself.The development of artificial intelligence is also used in the production of steel.It allows you to create an intelligent knowledge system that allows for task scheduling and production planning in an optimal way.Integrated planning of steel production is a problem of high complexity, therefore intelligent algorithms are used, which will allow you to obtain more accurate results than the classic algorithms, in a shorter time [1].The tasks performed by training algorithms are often used for optimization.Their use requires less calculation power than the use of classical algorithms, especially for problems with a large number of possible solutions [2].Scheduling is the process of assigning limited resources to tasks in order to obtain the optimal value due to the selected criterion [3].In production systems, resources are understood as objects such as workstations, machines, means of transport, as well as fuel, energy and people.However, the tasks are the processes of processing, transporting or producing a given product [3] [4].In order to present and systematize the scheduling problems, the three-field notation α | β | γ is used, where α is the task processing system, β is the task characteristics, γ is optimization criterion [3] [4] [5] [6].The symbol α can be represented by equation 1, where α1 describes the way in which tasks are performed by the system, and the symbol α2 describes the amount of resources in the system, in a particular case it is the number of machines or stands.The α2 symbol can take the empty character "ο", which means that the number of resources is part of the problem and is not predetermined [3].

𝛼 = 𝛼1𝛼2
(1) The way in which tasks are performed by the system can be divided into two main groups: systems with parallel machines, where each task can be performed on one of the machines, and systems with dedicated machines, where each task must go through by specific positions.The second group can be further divided into the following subgroups: flow shop systems, where each task must pass through all machines in the same order, open systems, where each task must be performed at all workstations, but the order is arbitrary and general systems (called job shop), where each job position and the order is arbitrary.Table 1 shows examples of the values that the symbol α1 can take [3].

Description of the problem
Learning methods are also used in scheduling problems [7] [8] [9] [10].The most commonly used solutions are neural networks.An example of this is the Hopfield network.This is a selfassociative non-linear searching network that aims to minimize the function representing activation in the network units {system energy}.The operation of this network can be compared to an electrical circuit with an operational amplifier [11].The simplest form of a Hopfield network is a single-layer feedback network, an example of which is shown in Figure 1.In this figure, each neuron ni is excited by external signals such as threes holds or signals entering ain, additionally, it is excited by internal signals in the form of the feedback of the output signal aout,j multiplied by the respective weights in ij.The output signals are connected to all other neurons except the own [2].With other learning algorithms, decision trees are used for solving problems.In the considered example, a decision tree was used, for real-time planning of the flow system problem.In this case, the decision tree selects the scheduling rule from a previously prepared list.The applied solution is presented graphically in Figure 2 [12].

Fig. 1 Hopfield network
Three main elements can be distinguished in the applied solution: a real-time controller, a scheduler and a decision tree.The real state controller is receiving data from the flow system and sends jobs to be executed according to the rule issued by the scheduler.Additionally, it monitors the state of the system.The scheduler decides when a new scheduling rule is selected.It then fills in and releases the rules selected by the decision tree.The decision tree selects a new scheduling rule based on the state of the system.In addition, the flow system itself, the learning data, the decision tree and the system state, which stores current information about the system, are also distinguished.It is updated with every change.

Fig. 2 Planning program with decision tree
The ADI cast iron manufacturing process can be represented as scheduling problem Fn | 6 | Emin.It is a flow system with 6 stations and n tasks, the optimization criterion of which the purpose is to minimize the energy consumption of the process.The first station (M1) corresponds to the processes taking place in the furnace, the second station (M2) of the molding and casting line, the third station (M3) is optional technical processing depending on the order, the fourth station (M4) corresponds to the annealing process, station five (M5) to the cooling process and station six (M6) to the final technical treatment.In order to simplify the problem, it has been assumed that the machining is fully automated.The total energy consumption during the production of Ec was assumed as the optimization criterion.In the case under consideration, not only the duration of the process itself is of great importance for the costs incurred, but also the change of temperature parameters between individual tasks.For stations M2, M3 and M6, the power of devices and the duration of the process will be important factors.The power of these devices in turn is P2 equal to 350 kW, P3 equal to 185 kW, P6 equal to 140 kW.For stations M1, M4 and M5, an additional factor influencing energy consumption will be the influence of temperature on the consumed power.In the paper, it was assumed that in the considered temperature ranges this relationship will be linear as shown in equation 2.

P(θ)= c θ (2)
Where P (θ) -power is dependent on temperature, c-dependence coefficient, θ -process temperature expressed in Kelvin.The dependence coefficient is: 1.119 for the M1 station kW / K, for M4 0.1063 kW / K, for M5 0.2972 kW / K. Contrary to previous positions, the time from the beginning of the first tp1 task to the end of the last tkn task will be important here, not just the duration.For the remaining machines, it was assumed that they only consume energy while working.The optimization criterion for the entire process is presented in Equation 3, where Ei is the energy w consumed in the i-th station and is described by Equation 4. (3) Where Pi is the power consumed by the i-th station, θij is the temperature at the i-th station during the j-th task, and ie is the duration of the j-th task.The number of tasks depends on the number of ADI cast-iron variants with specific properties, listed in Table 1.It was assumed that one task allows for the production of 20 elements with a total weight of 3 tons.The work assumes the implementation of a maximum of 3 orders from each cast iron variant.Table 2 summarizes the required work parameters for each position for each task.Machining on an M3 machine depends on the requirements of the order and can be applied to any task.This work assumes that only the first two orders of each variant require this processing.

Classic algorithm
The search algorithm was selected from among the classic algorithms presented in this paper, due to the typical and selfdefined target function.Additionally, this algorithm is used, inter alia, for flow problems in scheduling tasks [14].In the problem under consideration, the algorithm will schedule tasks on the first machine.Then, in accordance with the ADI cast iron production process, will determine the energy consumed on both individual stations and across the entire process.The process of determining the energy consumption will be determined during each iteration.In order to fully describe the algorithm for this solution, the following should be defined: initial scheduling, traffic, final condition and the length of the prohibition for given traffic [13].In a given study, due to the lack of information on the optimal method of selecting the initial solution, the algorithm will be performed for two different initial orderings.In each of them, the criterion will be the total energy consumption of a single task, assuming that only that one task is involved in the process.In the first case, the tasks are ranked in ascending order of energy consumption, and in the second case, in descending order.The move will be to bring the selected item to the top of the list and move it the remaining elements to the empty space in the order [13].The final condition is that the maximum number of iterations, which is the product of the number of tasks and the number of stations, has been exceeded.Two values of the forbidden length were used and compared: 10% of the maximum number of iterations and half of the number of tasks rounded up because in the literature this value is not clearly defined for this problem.The algorithm receives the tasks with the data as input, as shown in Table 2.The results are presented in the form of a sequence of tasks on the first machine, the total energy consumption of this ranking, and a graphical representation of the ranking in the form of a Gantt chart.In addition, it will return data from subsequent iterations to determine which variant of the algorithm parameters gave the best results.4Fig. 3

TS algorithm scheme
The scheme of the algorithm's operation is shown in Figure 3, where the subsequent elements of the algorithm's operation include selecting the initial solution, searching for the best possible motion along with its execution, and checking the final condition.In addition, the diagram shows the elements that allow you to control the list of prohibited moves by adding more moves and reducing the forbidden time at the beginning of the loop, as well as controlling the improvement in subsequent moves to determine one of the conditions completion of the algorithm [13].

Learning algorithm
The learning algorithm was chosen due to the architecture properties that allow it to be used for optimization processes, without the need to create an additional database of learning examples.The algorithm checks the successive possibilities of ranking only the given problem and selects the best combination found.In the problem under consideration, the algorithm will change the order of tasks only on the first machine, and then, in accordance with the described process, it will determine energy consumption, the same as in the classic algorithm.The scheme of the algorithm's operation, adapted to the problem of scheduling tasks in the flow system, is shown in Figure 4.The algorithm uses the Q table to store values for each pair of environmental states (si) and actions that can be performed (a).The action to be performed is to select one of the tasks, which means the total number of possible actions is equal to the number of tasks to be performed, in this case, it is 12.The total number of environmental states is given by equation 5, which for the described case gives 4096 states.It follows that the size of the array Q is 4096 x 12.At the beginning of the algorithm, the array is initialized with the values 0 in each cell [14].
The next steps of the algorithm are cycled over a specified number of iterations.Based on the analysed literature, the number of iterations was determined as the product of the number of tasks and machines, in this case, it is 72 iterations.The first step in this loop is to initialize a randomly sorted set of tasks.In this implementation, it will be the alphabetical order of the task designation.Primary rankings in this algorithm have no effect on the final rankings [14].The next step is, the next loop and this is performed until all tasks from the set are completed.In this loop, a number between 0 and 1 is first drawn and compared with the 'greedy strategy' coefficient ε to decide whether the model will search arrays.If the value is greater than ε then the action is selected based on the Q table, such an action is called an array search because it is already based on the known actions in a given state and their values in the Q table.If the value is smaller than the action is selected randomly, it is called this is discovery because the model selects actions without checking the values in the Q table for that state.This situation is advantageous when, for example, there are no values other than 0 [14] in the table yet.This solution provides two values of the ε coefficient to choose from.In the first solution, it is a value that is constant and unchanging throughout the duration of the program and amounts to 0.1.In the latter case, the value decreases from 1 by 10% for each iteration.This allows the algorithm to initially discover the values of the Q array by randomizing actions, and the more information the model has about that array, the more likely it will be to choose an action based on the values of the Q array.However, there will always be a low probability of taking a random action.After selecting the action, it is performed, in this case, it consists in assigning a given task to the spot list.After completing the entire loop, the program has full task scheduling for a given iteration.After the action is performed, the values of the predicted future state (s) and the reward (r) determined according to equation 6 are examined.At the end of the loop, table Q is updated according to equation 7 and the task set by deleting the task that has been selected [14].Where, f (sopt) -the value of the objective function for a list of ordered tasks, Q '(s, a) -the value in the Q table after the update, Q (s, a) -the value of the Q table before the update, α ql -the constant learning coefficient, γql -constant discount factor [14].The learning coefficient αql can be interpreted as the degree of updating the values in the Q table, for the case considered in this paper its value was assumed to be 0.1.The discount factor γql determines the significance level of the future, anticipated reward, here adopted at the level of 0.8 [14].If the set of tasks in this iteration is already empty, it means that each task has been ranked.Then the obtained value of the objective function for this ranking is compared with the currently best ranking, and if the new ranking is better, it is stored as the currently best ranking.After all, iterations are completed, the algorithm returns the best-found solution.Additionally, the program has the ability to print out the solution after each iteration [14].Works on similar solutions have also been described in other literature books.[15 -18].

Description of achieved results of own research
The results of assigning tasks from the described problem were presented separately for each algorithm, and then their comparison was made.Each algorithm was performed for three data sets.The materials for checking the operation of the algorithms came from item [19].

Classical algorithm results
The classic algorithm allows the selection of three work parameters, two of which may affect the value of the result.These are the length of the prohibition and the primary alignments.The last parameter determines whether all results are saved or only one.Table 3 summarizes the results for all three data sets, performed for all combinations of parameters affecting the result.The first column contains the designation of sets, where 1 is a set with four tasks, 2 is a set with eight tasks, and 3 is a set with twelve tasks.In the second column, the selected prohibition length.Primary alignments are selected in the third column.The fourth column shows the final rankings.In the fifth, the value for this ranking.The last column shows the execution time of the algorithm.Only the versions with the best ranking were selected for these results.The table shows that different orderings give the same results, which is additionally confirmed by the analysis of the results after each iteration.Results after each iteration were also generated for each variant and each combination of parameters.By analysing the results after each iteration, it can be concluded that the rankings presented in the table are optimal.On the basis of the obtained results, there is no visible influence of the original ranking and the length of the prohibition on the solution.This may result from the characteristics of the considered example, and not the characteristics of the issue itself.For the first two data sets, there is no additional impact on the length of the algorithm's execution.In the option with the prohibition length equal to half the number of tasks, the time is significantly longer than with the 10% number of iterations.You can see the effect of the forbidden length on the algorithm by analysing the results after each iteration.For the bypass length equal to half the number of tasks, the algorithm more often returned the result already achieved earlier than for the bypass length equal to 10% of the number of iterations.Additionally, extreme results were achieved more often.

Half of tasks amount
From higher energy consumption g, j,

Learning algorithm results
The learning algorithm requires the selection of only two parameters: the ε coefficient and the result variant.Only the first factors can have an impact on finding the optimal solution.In Table 4, as for the classical algorithm, the results for all three data variants were collected for both options of the ε coefficient.Only the versions with the best ranking were selected for these results.When analyzing the ranking for this algorithm, it is worth highlighting its random nature.The results do not need to repeat when the algorithm is called again.The results after each iteration were also generated for each variant and each option of the ε coefficient.By analyzing these results, it can be concluded that the rankings presented in the table are optimal.No influence of the ε coefficient on the final solution was observed, nor on the algorithm execution time.However, you can see its effect on the operation of the algorithm by analyzing the results after each iteration.For the ε coefficient decreasing by 10% from 1, it can be seen how the algorithm initially selects solutions that are far from optimal, and only from a certain moment selects the optimal ones.For a constant value coefficient, optimal results started to appear earlier, leading to fewer extremely bad results appearing.

Comparison of algorithms
The comparison of algorithms was also made for all three data sets, but one set of algorithm operation parameters was selected.For comparison, the following were selected: for the classical algorithm, the length of the sequence equal to 10% of the number of iterations and the primary ordering from the highest energy consumption, and for the learning algorithm, the value of the ε coefficient equal to 0.1.It was decided to set these parameters because they were recognized for the most advantageous.For the classical algorithm, this parameter value gives a shorter algorithm execution time and greater differentiation during the search.In addition, this value more often allowed to find various optimal solutions.The comparison of the results is presented in Table 5, where the first column is the number of the data variant, in the second column is the name of the algorithm, in the third is the optimal sequence, in the fourth column is the objective function value for this ranking, and in the last column is the algorithm execution time.Algorithms return different orderings, but with similar values.After analysing the results after each iteration, these rankings can be considered optimal.The only difference between the algorithms shown in the table is the execution time for the set with the most tasks.Classic algorithms here return results much faster than the learning algorithm.When additionally analysing the results obtained after each iteration for the described parameters of the algorithms, it can be observed that the learning algorithm returns more differentiated orderings.No line-up repeats itself.This gives the possibility of finding a greater number of optimal orderings.The downside of this algorithm, in this case, is its randomness, which causes it to be returned in an extremely unfavourable case the end result will not be the optimal ranking.Table 5.

Conclusions
The goals of the work were fully achieved.Two algorithms were implemented and compared: Classic and Learning.An Algorithm has been implemented to solve the scheduling problem for the simplified ADI cast iron manufacturing process.The operation of the algorithms has been tested for various tasks.
For the described problem, the algorithms returned different orderings, but with the same value of the objective function.All the solutions returned were considered optimal.The work of the algorithms for different parameter values was checked and the sets best suited to this problem were selected.For the learning algorithm, the ε coefficient with a constant value equal to 0.1 more often, during the work of the algorithms, it found the optimal ordering and found different orderings, which reduces the chances of finding the local optimum instead of the global one.In the classical algorithm, for a sequence length equal to 10% of the number of iterations, the algorithm was faster and found more differentiated orderings during operation.The original rankings in this case did not matter.Of these two algorithms, Classic algorithms were found to be better for this problem.

Figure 5 Fig. 5 .
Figure5shows the Gantt charts for the optimal ordering.In Figure5 a) for option 1, at 5 b) for the 2nd option and at 5 c) for the 3rd option.In all examples, the forbidden length was chosen to be 10% of the number of iterations and the primary ordering decreasing with respect to energy consumption.The drawings clearly show the breaks in the work of individual machines and the desire to avoid breaks in the machines that draw power not only during the performance of tasks.Additionally, one can observe in Figures5 b) and 5c) interruptions resulting from the lack of performance of some tasks on the M3 machine.

Figure 6 Fig. 6
Figure6shows the Gantt charts for the optimal ordering.In Figure6 a) for variant 1, to 6 b) for the 2nd variant and to 6 c) for the 3rd variant.In all examples, a constant value of the ε coefficient equal to 0.1 was chosen.The charts have the same characteristics as the classical algorithm.

Table 1 .
Cast iron variants ADI

Table 3 .
Ranking results for the classical algorithm

Table 4 .
Classification results for the learning algorithm