Context and problematic
In a rail transport company, contingencies and train delays are rather frequent. Developing a program that accurately predicts the delay of each train is a great asset both for the information which is provided to travelers but also for traffic regulation decisions.
Goals
An API (or similar service) that exposes train delays in real time, with an improved precision of 30% compared to the current one.
Our intervention
2 Data Scientist
- Implementation of an algorithm for cleaning input data (train observations).
- Improved embeddings, to have a vectorized representation of the network on which the trains run.
- Improvement of the existing deep learning algorithm (transformer model along with attention mechanisms) by playing on hyperparameters, input data, loss, etc.
- Production of the model, so that one can provide access to real-time train delay predictions.
Results
Technical environment
Python, AWS, Pytorch, Git, Airflow