Context and problematic
Medical forums are full of important information, but it remains very complicated to extract and analyze.
Through these forums, one can genuinely understand the “utility of the patent”.
Goals
Analyze the content of medical forums, to highlight trends allowing a better understanding of patient issues and thus initiate concrete actions.
Our intervention
Setting up a data collection system :
- Scrapping the Sjogrensworld.org forum
- Inserting data into a MongoDB database
Data Modeling :
- Descriptive analysis in order to have an overview of the collected data (Number of posts, engagement rate, link between posts, link between users etc…)
- Discovery of patient-specific themes via topic modeling coupled with sentiment analysis to understand their impacts and thus be able to detect and dissect current and past trends.
- Extracting latent knowledge from the data via word embedding techniques to detect certain patient issues as early as possible.
Results
Technical environment
Python
Scrapy
MongoDB
PyTorch
NLTK
Spacy
Sklearn