TRC-2018 Learning Zone 42:32
by Carles CG, Reliable Dynamics
The second most important part of any analysis process is, to have the right data correctly stored, labeled and clean. The most important is to have domain-specific knowledge of the problem we would like to solve. Value can be created where the objective is not to predict outcomes, but instead, to clean and make certain data sets easily usable for future use. Furthermore, is necessary to acknowledge that the machine learning world is evolving quickly. What might not be usable now, might be in a few years, if and only if, the raw material (data) is correctly labeled and made usable. This is why leading companies open source their algorithms for everyone to use and develop. If the killer algorithm that gives you competitive advantage existed, most probably won’t be open sourced that easy. Same principles apply to fault data. Without the proper labeling of the failure data, it is quite challenging to extract meaningful information. Ideally, we will have information such as failure initiator, mode & mechanism, as well as, the environmental conditions in which the component was designed to operate, and the conditions in which was operating among others. Then we could go one layer deeper and gather more contextual information from the previous steps like a batch number, time in operation (time driven failures), warehousing conditions, recommended maintenance plan, executed the maintenance plan, etc. Therefore, there is a long way to go from domain-specific knowledge (reliability engineering) to automatically feed an algorithm and extract more signal than noise. Instead of just using machine learning as the end purpose, I propose to frame the process in the problem that we are trying to solve. Tools are not the solution, but the means to try to find a solution.