Tool selection method based on transfer learning for CNC machines

Abstract. Owing to the changes in product requirements and development of new tool technology, traditional tool selection approach based on the human experience is leading to time-consuming and low efficiency. Under the cooperation of historical data resource accumulated by manufacturing enterprises, with human expert resource, a new tool selection mechanism can be established. In this paper, we apply transfer learning to tool selection issue. Starting from the foundation of migration, we showed a unified expression of expert experience and process case in a multi-source heterogeneous environment. Then, we propose a transfer learning algorithm (TLrAdaBoost) based on AdaBoost, which uses a small amount of target domain data (expert experience sample) and a large number of source domain low-quality data (process case sample), to build a high-quality classification model. Experimental results show the effectiveness of the proposed algorithm.


Introduction
Tool is the executive part of CNC machine, which directly affects the machined surface. It needs to realize the function of CNC machine tool on the upper layer, and determines the processing quality of products on the lower layer. Tool selection is reasonable or not, not only related to the machine tool processing efficiency, workpiece size accuracy and surface roughness or other related indicators, but also plays an important role in production costs and enterprise efficiency. In the traditional tool selection process, due to the level of experience and knowledge of the process personnel uneven, there is a great difference in the cutting performance of tools selected by different process personnel. In other words, there is uncertainty in the selection results, resulting in time-consuming and low efficiency. In recent years, traditional tool selection based on the human experience is increasingly unable to adapt to the development needs of manufacturing enterprises, mainly for the following reasons: (1) The growing demand for personalized products, which leads to an increasingly diversified product type and product structure of manufacturing enterprises, and the product cycle changes rapidly (Car et al., 2009); (2) New tool materials and structures continue to emerge, the general process personnel lack experience of tool selection, and need more help from human experts. Therefore, for the celerity and accuracy of tool selection, it is desirable to have a new tool selection mechanism to make up for deficiencies above.
Manufacturing enterprises have accumulated a large amount of historical data on the selection of tool, tool selection knowledge exists in these massive data and information, at the same time, machine learning and knowledge discovery technology development makes using these data and information possible. For example, Ahmad et al. (2010) propose a new system approach to optimize tool sequences using genetic algorithms. Oral and Cakir (2004) define computeraided optimum operation and tool sequencing to be used in the generative process planning system developed for rotational parts. These literatures are mainly focused on tool sequence optimization. In addition, Geng et al. (2013) present a new method for selecting the optimal multi-cutter set for five-axis finish machining. Meng et al. (2014) present a new method of the optimal barrel cutter selection for the flank milling of blisk. These literatures are focused on tool geometry selection or path generation. In this paper, we focus on identifying and extracting features from tool selection data and information to learn a more intelligent selection method.
In order to identify and extract features, firstly, we need to understand what kind of composition the data and information has. They mainly come from the following two types of resources: (1) Expert experience resources: the domain professional knowledge and technical level is high, but domain experts have the characteristics of scarcity, so the quality of resources is high, but the number is less. (2) Process cases resources: manufacturing enterprises have a large quantity of process cases, so these resources are numerous but with lower quality. It can be seen that these two types of resources and the sample sets extracted from them will be unbalanced in quantity and quality. Besides, within the same type of resources, there is almost no guarantee that the number of samples collected for each tool is equivalent. Therefore, due to the imbalance in the data source, our method for tool selection is to solve an unbalanced problem. If we only use a small number of high-quality expert experience resources as training samples, it is not enough to learn a reliable classifier. Meanwhile, only using lots of low-quality process resources as training samples can't guarantee that the learned classifier has a low error rate. Owing to the unbalanced nature of the data sources, we try to apply transfer learning to the knowledge balance and integration of expert experience and process cases in tool selection.
The remaining sections are organized as follows: In the next section, we present the literature review on tool selection and transfer learning. After that, in Sect. 3, we generalize our approach, explain its principles, and decompose it into two main parts. In Sect. 4, we give a unified expression of multisource heterogeneous knowledge, establishing the basis of transfer learning. In Sect. 5, we define some notations we will use later and give our algorithm. Section 6 briefly introduces the scene vector similarity calculation method. In Sect. 7, we give examples to verify the validity of the method. Finally, concluding remarks are given in Sect. 8.

Related work
Over the years, there has been some reported work on transfer learning. At first, it is a key point to establish the basis of transfer learning, which means we must get the unified representation of the knowledge. It is difficult and laborious to extract empirical knowledge from human experts and formalize the knowledge into decision rules that can characterize the expert performance (Leake et al., 1996). But under certain circumstances, it works well. According to the characteristics that human experts diagnose the fault of transformer, Shi et al. (2009) analyze and discuss the system structure, knowledge representation and reasoning mechanism to build fault diagnosis expert system of transformer. Besides, the structured characteristics of the case-based process system and the large amount of process cases can help us to model well. Therefore, we extract features from multi-source heterogeneous knowledge resource of tools, which contains expert experience and process cases, and unify them to establish the basis of transfer learning.
Secondly, transfer learning theory resolves the problem of unbalanced learning and sample migration between different domains, which is mainly used for internet applications, like, text classification, clustering problem, collaborative filtering, image recognition, emotion classification, etc. Currently, transfer learning has well resolved unbalanced problem (Pan and Yang, 2009). Transfer learning first appeared in the field of human learning, mankind is able to quickly learn new knowledge largely due to the ability of knowledge transfer. For example, the knowledge transfer will occur between cycling and riding motorcycles. It focuses on the knowledge transfer between different but similar areas, tasks and distributions. When the task from one new domain comes, new domain samples are relabeled costly, and it would be a waste to discard all the old domain data (Li et al., 2015). Wang et al. (2014) propose a transfer learning method for collaborative filtering, called Feature Subspace Transfer (FST) to overcome the sparsity problem in collaborative filtering. Kuhlmann and Stone (2007) proposed a graph-based method for identifying previously encountered games, and applied this technique to automate domain mapping for value function transfer and speed up reinforcement learning on variants of previously played games. Wu and Dietterich (2004) integrated the source domain data in Support Vector Machine (SVM) framework for improving the classification performance. Argyriou et al. (2008) proposed a transfer learning algorithm in heterogeneous environment, and presented methods for learning and expressing the heterogeneous environment structure. In this paper, the data source characteristics of transfer learning are similar to those of the AdaBoost algorithm, then we ameliorate the AdaBoost algorithm with continuous confidence output to make it have the ability of sample migration, and improve the classification performance, so that it is successfully applied to the tool selection in the field of industrial manufacturing.

Principle explanation
The new tool selection method we proposed is based on transfer learning, we first identify and extract features from multi-source heterogeneous knowledge resources of tools and unify them to establish the basis of transfer learning. And then, we ameliorate the traditional AdaBoost algorithm, and propose our transfer learning algorithm TLrAdaBoost to solve the problem of imbalance within and between domains in the sample sets. Finally, we use the scene similarity matching method to select the tool model or name. Based on this, a new tool selection mechanism is established. The schematic diagram is shown in Fig. 1. There are two key points in our approach: we establish the unified expression of expert experience and process case. Nevertheless, these two parts are multi-source heterogeneous and require a uniform representation of knowledge from different resources. By doing so, a large number of highquality training samples can be provided for selected algorithm model. Secondly, TLrAdaBoost algorithm, using the ideology of transfer learning, is proposed to deal with the problem of unbalanced learning and sample migration between different domains. For learning an effective classifier for tool selection problem, a sufficient number of high quality training samples are required. AdaBoost algorithm has a solid theoretical basis and efficient computing performance, its advantage is that after several iterations, it can easily provide such high-quality training samples, and thus improve the classification performance of weak classifier, which has made a great success in face recognition, such as Freund and Schapire (1995), Schapire et al. (1998) and Schapire and Singer (1999). However, it has two shortcomings when dealing with the problem of tool selection: (1) AdaBoost is based on the assumption that the distribution of the class within the domain is roughly balanced, but the tool selection problem is an unbalanced classification problem in one field. It will lead to a decline in classifier performance. (2) AdaBoost requires samples of training classifiers and test classifiers from the same domain, which does not have the ability to migrate samples from other areas and can't solve the problem of imbalance between domains. To resolve the existing problems of AdaBoost algorithm, our TLrAdaBoost algorithm makes a use of transfer learning theory, so that it has the ability to deal with unbalanced classification learning and data migration. At the same time, we introduce the similarity matching of scene vectors, and finally realize the selection of tools.
To sum up, we will introduce the core content of our algorithm.

The unified representation of expert experience and process case
Based on the above analysis, we firstly extract empirical knowledge from human experts and formalize the knowledge into clear representation. Empirical knowledge in broad sense refers to people with the ability to identify and handle problems, which contains cognitive elements and skill elements (Von Krogh and Roos, 1996). Cognitive elements of problem identification are referred to as the "mental model" by Johnson Laird (Polanyi, 1966), who believes that it is a reflection of reasoning capability achieved in the practice of dealing with the similar problems in brain. Skill elements of solving problems mainly include specific knack, craft and skill in a certain context. Empirical knowledge exists in the various forms of "cases" in everyday life, which is generally more obscure. The knowledge embedded in these "cases" has three characteristics: operational, contextual, specific (Zhou et al., 2010). We use these characteristics to describe "cases", which means that we use a scenario -"scene information" to define general feature knowledge, and define specific knowledge in the "cases" with the "sign information". Then, we further establish the knowledge expression model, so as to solve the problem. Tool selection expert experience refers to knowledge and skills about tool selection of the experts from tool manufacturer and manufacturing enterprises, which exists in the brain of these experts or their related discourses. According to the method above, a tool selection expert experience can be represented as a "case" using the logical process of "scene information + sign information → tool parameters → tool type". The specific representation, which contains four elements, is as follows.

I = {E, S, C, B}
I -indicates a case; E -is a finite set which represents scene information; S -is a finite nonempty set that represents sign information; C -is a finite nonempty set that represents the information of tool parameters; B -is a nonempty set that represents the name or type of the cutting tool.
Scene information: it refers to the information about machine tool, workpiece, process, etc, related to a specific processing work step.
Sign information: it refers to a detailed description of the main signs of different processing problem under a certain scene information. The basic form of processing will be the sign information here.
Cutting tool parameter information: it refers to the attribute features of the selected tool under requirements of actual processing work step, including toolbar information, the clamping way of blade, blade information.
Name or type of cutting tool: it refers to the experts select the cutting tool parameter information according to the specific scene and sign information, and then determines the tool type for the actual machining. Such as cylindrical turning tool (CTGNR2020K12).
For example, in the selection of CNC turning tool, the instances of expert experience can be expressed in four member groups which mainly include the characteristic items in Table 1.
Process case is referred to the results of cutting tool selection by the technicians according to the specific production requirements. It exists in a large number of CNC machining process cards and cutter specification cards. It is not hard to find those two parts including all the information of the above four-member group. Therefore, the representation of a process case can be referred to the representation of expert experience. Tool selection of expert experience and process cases can be expressed by a data structure as shown in Fig. 2. The arrow reflects the internal logical relationship of the structure.
Through the above data structure, the sample set of expert experience and process cases can be expressed by vector I = {E, S, C, B}. In the two groups of samples, sign information S only has one characteristic attribute. Taking the selection of CNC lathe tool as an example, characteristic attribute S refers to the basic processing form that has 5 attribute values. Most of the models of the CNC lathe tool can be divided into these 5 categories based on the basic processing form, so the sign information S can be used as the label information of samples.

Relevant definition
In order to make the algorithm more clearly, we give the definitions related to the problem. This algorithm is concerned with the migration of instances between the similar domains, sharing the same classification objectives between domains.

Test data set
m is the number of elements in the collection.

Training data set
is true label of x, T T is target training data set, T S is source training data set and including T S1 T S2 , . . ., T SK .

Description
The description of the TLrAdaBoost algorithm is as follows: Input: Tagged data set: T T , T S1 T S2 , . . ., T SK ; Test data set S; Basic classification algorithm "Learner"; Iteration number N.
-Step 1. Merge all source domain training set and target domain training set.
Step 2. Initialization: Step 3. Do For t = 1, 2, . . ., N 1. Based on the weights w t i of the training set, call the basic classification algorithm "Learner", training weak classifier.
e. To computer the Transfer Efficiency of each source training data set and the target training data set: a k t , k = 1, . . ., K. 2. To adjust sample weight: In the formula, f (x, l) = T t=1 h t (x, l).

Analysis
TLrAdaBoost, based on real-AdaBoost that has continuous confidence output, is proposed, which is able to deal with data migration and sample imbalance among different classes.
The training sample set collected from the whole sample space is S = {(x 1 , y 1 ) , . . ., (x m , y m )}, for multi-classification problem y i ∈ {1, 2, . . ., C}, the confidence level of label l is h t (x, l) that is the output of weak classifier h t (x). The strong classifier that has better classification performance consists of T (t = 1, . . ., T ) weak classifiers h t (x) in some way. Linear combination is the most commonly used combination so we use combination function f (x, l) = In the formula, w 1 i = 1/m, the value of H (x i ) = y i is 1 if the conditions meet, otherwise is 0.
When the class distribution is unbalanced, assuming that the prior probability of different labels is p l = Pr x∈S y = l , a reasonable approach is to change the average confidence value to f (x) = C l=1 p l f (x, l). It is proved that when T is very large, h t (x) is independent of each other and h t (x, l) is uniformly bounded, so the error rate of the training error rate expressed by the symbolic function can be transformed into the extreme value problem of the exponential function. In other word, Eq. (1) can be approximated as Eq. (2).
(2) can be done by training h t (x) one by one in a recursive way, specifically by selecting h t (x) such that Z t = n t j =1 L l=1 (p j,l t /p l ) p l is minimized. Z t is called as the normalization factor. Therefore, the training error rate can be estimated as Eq. (3).
Transfer Efficiency is a measure of the performance of the target task before and after the migration. In the T iteration process, the algorithm first will be trained on the target field training set for weak classifier h t (x) that obeys the distribution of w(T T )/ w(T T ) , while v is the L1 norm of feature vector. Then, each source training set is combined with the target training set in turn to generate a new training set T S I i ∪ T T that obeys the distribution of w(T S i ∪ T T )/ w(T S i ∪ T T ) . On the basis of this, the new classifier h * t (x) is trained. Weight error of the training set can be calculated by Eq. (4).
Calculated by the formula, ε t is the weight error of h t (x), and ε * i t is the weight error of h * t (x). The transfer efficiency is defined as the difference of the weight error that is α i t = ε t −ε * i t . If T S I i is the positive migration, there is exp(α i t ) (1, e], or exp(α i t ) (1/e, 1] when it is the negative migration, otherwise exp(α i t ) = 1.

Scene information vector similarity calculation
The ultimate goal of tool selection is to select a tool that can meet the requirements of machining according to the scene information of a practical application of the CNC cutter. TL-rAdaBoost algorithm only achieves the preliminary classification of tools, it is necessary to select a specific tool name or model by measuring the similarity between the scene information vector of all the sample samples in the class space and the actual application scene vector. Vector similarity refers to the degree of similarity between two equal dimension vector objects. There are two kinds of measurement methods: distance measure method and similarity function method (Zhang et al., 2009). In this paper, the angle cosine method (Eq. 5) is used to measure the degree of similarity between two vectors.
sim (x, y) = cos(x, y) = (x, y)/ ( x · y ) The geometric significance of the cosine angle is the cosine value of two vectors in N-dimensional space consisting of N elements. Each element in a vector needs to be dimensionless before using it, so that the elements are positive and the range of cosine value is [0,1]. The larger the value is, the smaller the angle between two vectors is and the more similar the two vectors are. When two vectors are exactly the same, the value is 1 (Tian and Xie, 2006).

Instance verification
Our method based on transfer learning, which is different from the traditional one, automatically selects tools through the computer, ensuring the rapid and accurate selection of tools. The method is verified by an example as follows.
A process diagram of shaft parts is shown in Fig. 3. Machining inner holes of ∅ 28 × 25 is a step in the processes. Requirements: turning, rough machining, the workpiece material is cast steel. For the sake of convenient calculation, we give parametric representation and construct the training sample set only for machine type, workpiece material, machining accuracy, basic processing form, blade shape and tool number. The results are shown in Tables 2, 3.
We use the sample set of target domain and the sample set of source domain as the input data, and the basic processing form as the class label. Meanwhile, the TLrAdaBoost algorithm is used to study the training samples, the training results are 6 kinds of samples: inner hole turning, end surface turning, spherical turning, threading, groove turning, end milling, groove milling. And inner hole turning is in line with the requirements of instances. Its parametric matrix is A: Lathe + cast steel + rough turning + turning the end + S-blade + 4# (1, 2, 1, 1, 2, 4) 3 Lathe + cast steel + rough turning + turning spherical + T-blade + 5# (1, 2, 1, 5, 1, 5) 4 Lathe + cast iron + finish turning + threading + A-blade + 3# (1, 1, 2, 6, 3, 3) Miller + cast steel + finish milling + milling the end + face milling cutter + 10# (2, 2, 1, 1, 1, 10) 3 Lathe + cast steel + finish turning + turning inner hole + T-blade + 2# (1, 2, 1, 3, 1, 2) 4 Lathe + cast steel + rough turning + threading + A-blade + 3# (1, 2, 1, 6, 3, 3) 5 Miller + cast steel ++ finish milling + milling groove + face milling cutter + 10# (2, 2, 1, 5, 1, 10) 6 Lathe + cast iron + rough turning + turning inner hole + S-blade + 4# The parameterized representation of the scene information vector is C = [1 2 1 2]. According to Eq. (5), the angle cosine of each row vector in matrix B and vector C is calculated separately. And the final result shows that the scene "number 1" in the target domain sample set is exactly the same as instance scene. Therefore, the "T-blade + 2#" is the final selection of CNC tool. The results proved to be correct. We also choose groove processing tool selection as an experimental validation. The dataset comes from Xi'an Winway Tools Co. Ltd., including tool design drawings, process files, and expert experience data. For the sake of data representation and experiment convenience, 12 properties of tool selection knowledge representation were selected as the sample shared attribute space, covering the machine tool factors, process factors and workpiece factors, as shown in Table 4.
A total of 50 sets of samples are used in the experiment, of which 35 groups are in the form of turning and the rest are in the form of milling. During training, samples A0001-A0049 are randomly input to the classifier and the number A0050 is used as a test sample to verify the feasibility of the algorithm. Sample data set as shown in Appendix A. Table A1 shows tool selection scene information and tool model of the sample dataset, Table A2 shows tool design drawings corresponding to the tool model.
In order to ensure the unity of presentation of tool selection scene information, data pre-processing is required before tool selection. Discrete attributes X 1 -X 5 , X 9 -X 11 need to be binarized. Table 5 uses the clamping method X 2 as an example. Other discrete attributes are handled in the same way. Continuous attributes X 6 -X 8 uses K-means clustering to divide continuous attribute space into six intervals, and then uses different integer values to represent the data falling in each sub-interval. The representation of scene information is shown in Appendix B. In Table B1, the clustering center values represents the continuous attributes X 6 -X 8 .
The scene information selected for test sample A0050 is: turning, screw fastening form, A03550, bar, cutting performance 2.5, diameter 21.00 mm, groove deep 0.75 mm, groove width 2.40 mm, finishing, Ra1.6. The size of the part slot is shown in Fig. 4. Continuous attributes diameter, groove depth and groove width are divided into appropriate clustering intervals according to the value of the distance from the clustering center. Therefore, the scene information of No. A0050 can be expressed as a vector: The basic processing form X 11 is as a class label space, scene vector P V 's class label of test sample A0050 is predicted with the learned classifier. Then, we get its class label is cylindrical groove machining. Finally, the sample data set under the same class label is extracted, including A0002, A0003, A0012, A0014, A0016, A0026, A0027, A0031 and A0040. These samples are pre-processed to form a tool scene information matrix KV, expressed here as integer values. Surface roughness X 10 /µm Basic processing form X 11 Tool model X 12  P V are converted to binary, is calculated respectively, and the result is shown in Table 6. Among them, the similarity between the scene of sample A0002 and the experimental scene is greater than 85 %, that is, the selection results of No. A0002 may conform to the test sample scene, and the tool scheme drawing is shown in Appendix A, Fig. A1. After inquiry from the technical department of Xi'an Winway Tools Co. Ltd, the tool of No. A0002 can process the slot of test sample A0050. The corresponding tool type is Winway CFIL2525P1902-GK-20XD. The actual selection result of the test sample A0050 is Winway CFIL2525P04-T0881, as shown in Appendix A, Fig. A2. Comparing the two processing scenarios (Figs. 4 and 5), it can be found that the two tools have some interchangeability

Conclusion
We apply transfer learning to tool selection issue in the field of industrial manufacturing in this paper. Starting from the foundation of migration, we showed a unified expression of expert experience and process case in a multi-source heterogeneous environment. In addition, we proposed a transfer learning algorithm (TLrAdaBoost) based on AdaBoost, which uses a small amount of target domain data (expert experience sample) and a large number of source domain lowquality data (process case sample), to build a high-quality classification model. In this process, the imbalance data problem that AdaBoost can't solve is resolved. Finally, we use the scene similarity matching method to select the tool model or name. The results show that the proposed method using TLrAdaBoost can effectively classify samples by learning cross-domain knowledge.   www.mech-sci.net/9/123/2018/ www.mech-sci.net/9/123/2018/ MCA-16-C45-SP05-SF20 CFIL2525P04-T0881