A mining company has a two-way train line that goes from one of their mines to the port and vice-versa. Whenever one of the trains fails in the middle of the track, it costs a lot of money. It's specially problematic because they only have one garage for maintenance, and the train might break down all the way back in the opposite end of the track, adding a lot of time to an already costly and slow process. How can we detect that a specific train is going to fail in their next travel and send them to maintenance before that happens?
After speaking with the client's specialists in the area, we identify that one of the most common issues that makes their trains need maintenance has to do with the wheel bearings. When they overheat, they can present failure, leaving the train stranded in the tracks.
We can shape this as a binary classification problem, but first we need to define how to create this failure flag. Once again, with the client's specialists' help, we define that if the wheel bearing temperature passed a certain threshold, it would be a failure.
After cleaning and exploring the dataset, features based on several parts and sensors of the train were engineered to be used in a model. But one of the issues with this specific project is that the data was very imbalanced, with around 1% of failures in our dataset. To counter that, we use undersampling (to remove some of the not failure rows) and oversampling (to create synthetic examples of our failure class) techniques to create a better class ratio for training.
A neural network was then trained on this data and to fine tune it, we use the precision score. This metric was chosen because the maintenance can only handle a limited number of trains at a time, so focusing on precision, our alerts needed to be very precise.
After the model was deployed, technicians had access to a dashboard with information about the trains and were able to know if any of them were likely to have a problem in their next travel cycle, making it easier to detect which train should have their maintenance prioritized.
Several model architecures were tested in Python using TensorFlow, with the final model being a Multilayer Perceptron using droupout regularization and early stopping to reduce the number of epochs whenever we stopped seeing an improvement in training. The model was deployed in AWS using SageMaker and a dashboard was developed with data refreshing in constant intervals.
If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn