Document classification tool

Context et problematic

A large part of the data science projects conducted by the client is based on text documents. To process and model this data, Data Scientists often use the same tools/models/functions.

In order to make these steps faster, more generic and accessible to non Data Scientists, we have developed a common package that performs all the classical tasks related to document classification by Machine Learning.

Goals

  • Intuitive and very high-level user API
  • Support scikit-learn models and methods
  • Ability to customize each end of the pipeline
  • Generic code making it easy for other data scientists to contribute
  • Be able to interpret predictions
  • Be able to search for hyper parameters
  • Our intervention

    1 Data Scientist

    • Development of data pre-processing pipelines
    • Managing the link with underlying scikit-learn objects (integration)
    • Development of the interpretability module
    • Development of the hyperparameter search module
    • Publishing the package and demos to other Data scientists
    • Adding features based on specific needs of Data scientists

    Results

    Package published !
    Objectives achieved.
    Used by several projects (including the support ticket classification project).

    Technical environment

    Python (scikit-learn, pandas, optuna, lime, shap, nltk, MLflow, plotly)
    Pytest
    Git/Github

    Together with our customers, we build solutions that change and facilitate their daily lives.

    Aide à la création de médicaments

    Plateforme d’analyse de besoins clients

    Conception et industrialisation du SI analytics

    Prédiction de retards

    Analyse de visage pour recommandation produits

    Application d’optimisation de la Supply Chain

    Scoring et analyse
    de la peau

    Analyse de Forums

    Personnalisation de contenu

    Analyse des activités de support IT

    Détection de tendances sur les réseaux sociaux

    Détection
    de beaconing

    Outil de classification de documents

    Détection de cancer via Deep Learning

    Conception de plateforme de veille stratégique

    Rendements
    des champs agricoles

    Conception du Data Hub et implémentation

    Analyse et prévention des problèmes Skype

    Assistant d’aide à la recherche

    Classification de pages Web