The bibliography process means every scientist regularly has to go through a lot of published articles in parallel to her/his research. The aim is to:
- know what other researchers are doing: they might be ahead of you, they might have proven your project is a dead end.
- get some context to interpret your research results.
Using specialised search engines can be inefficient if you don't use the "right" keywords. Researcher also tend to find bibliography boring so it would be interesting to automate part of the process!
In my talk I'll answer the following question:
- can Python machine learning libraries (nltk, scikit-learn) be used to determine whether a research article is worth reading?
I'll use the TF-IDF measure to identify frequent topics appearing in specific scientific articles and train a classifier to distinguish between relevant and non-relevant articles depending and someone's interests.