Context and problematic
Having hundreds of thousands of web pages, and having reliable knowledge of the data, Société Générale wishes to classify all the pages of its websites and be able to improve the product recommendation processes through the log user connection.
Prove the added value of deep learning in textual classification even on small samples (<1500 web pages) Provide an analyzable model: the business must be able to understand the model choice. Classify these images according to several categories defined by the profession.
2 Data Scientist and 1 Data Engineer in SCRUM mode
- Web scraping and data set cleaning
- Preparation of the data (standardization, etc.)
- Data encoding using Tf-IDF, Word2vec, Doc2Vec
- Modeling using a Bidirectional Bidirectional Sequential Neural Network-LSTM
- Model interoperability via heatmaps
95% accuracy and 90% of F-measure on the test set