Context et problematic
A large part of the data science projects conducted by the client is based on text documents. To process and model this data, Data Scientists often use the same tools/models/functions.
In order to make these steps faster, more generic and accessible to non Data Scientists, we have developed a common package that performs all the classical tasks related to document classification by Machine Learning.
Goals
Our intervention
1 Data Scientist
- Development of data pre-processing pipelines
- Managing the link with underlying scikit-learn objects (integration)
- Development of the interpretability module
- Development of the hyperparameter search module
- Publishing the package and demos to other Data scientists
- Adding features based on specific needs of Data scientists
Results
Package published !
Objectives achieved.
Used by several projects (including the support ticket classification project).
Technical environment
Python (scikit-learn, pandas, optuna, lime, shap, nltk, MLflow, plotly)
Pytest
Git/Github