Proactive Train Maintenance through Predictive Modelling

Harnessing machine learning to forecast potential train failures, facilitating proactive maintenance and minimizing operational disruptions.
Mining train (image from Wikipedia:

A mining company operates a crucial two-way railway system, linking their mine with a port. The cost of a train breakdown is substantial, given the disruption it creates and the subsequent repair expenses. This problem is exacerbated by the fact that the maintenance garage is only located at one end of the track, potentially leading to lengthy delays if a train fails at the farthest end. The pressing question we seek to answer is - how can we accurately anticipate a specific train's impending failure and route it for preventive maintenance before a costly breakdown occurs?


Collaborating closely with the client's train experts, we learned that overheating wheel bearings frequently cause the trains to require maintenance. To address this, we structured our approach as a binary classification problem, where crossing a certain temperature threshold defined a bearing failure.

After data preprocessing and exploratory analysis, we engineered features based on the readings from various train parts and sensors. Our challenge was that the data was heavily imbalanced so to balance the data, we employed undersampling and oversampling techniques to improve the failure to non-failure ratio.

We then trained a neural network on the adjusted data, optimizing for precision, since the maintenance crew can only manage a limited number of trains at a time. After deployment, a dashboard accessible to the technicians displayed predicted train failure risks, thus enabling them to proactively prioritize maintenance for trains at highest risk of malfunction in the next cycle.


The first step in our process involves data gathering, specifically focusing on sensor data from the trains, which is critical in determining potential failures. We then proceed to clean this data and undertake feature engineering, creating additional relevant attributes that would enhance the predictive ability of our model. Given the highly imbalanced nature of our dataset, we use the Imblearn library to employ both undersampling and an oversampling technique called SMOTE (Synthetic Minority Over-sampling Technique) to enhance our model's ability to detect failures.

Following this, we train a neural network model with Python using TensorFlow library with the balanced dataset. During this training phase, our primary focus is on improving the precision score of our model. The reasoning behind this is simple - with the maintenance team's capacity being limited, it's crucial that our model's alerts are as accurate as possible to avoid any wastage of resources on false alerts.

Once the model is adequately trained and tuned, we proceed to deploy it. The model's output, in terms of which train is likely to face an issue in its next travel cycle, is visualized on a dashboard. This dashboard proves to be a useful tool for the technicians, assisting them in prioritizing which train should be sent for maintenance, thereby enabling an efficient allocation of resources and ensuring timely maintenance of the trains.

Other projects

Get in touch

If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.