All kinds of businesses are using data science and machine learning to understand themselves, lowering costs, engineering better products, and improving customer experiences. Similarly, we use data science to improve science itself, understanding how scientific topics are discovered and modeling institutional expertise. In our work, we use a combination of Python-powered big data analytics and web-based tools to achieve this goal, including pyspark (http://spark.apache.org), scikit-learn (http://http://scikit-learn.org), Django (https://www.djangoproject.com/), Celery (http://www.celeryproject.org/), and or-tools (https://developers.google.com/optimization).
First, we will present the infrastructure behind Scholarfy, a recommender system for massive scientific conferences (http://www.scholarfy.net). We will also present a machine learning approach to automatically match expert scientific reviewers to research proposals (http://pr.scienceofscience.org). Finally, we will present the work behind our award-winning visualization, World’s Science Map (http://map.scienceofscience.org), where we modeled the institutional expertise, collaboration network, and funding of all institutions in the world. At the end of our talk, we will argue that Python-powered data science can improve not only businesses but also science, making it more agile and accurate.