PyData Berlin 2016
Predicting what people like when they choose what to wear is a non-trivial task involving several ingredients. At Mallzee, the data is variegated, large and has to be processed quickly to produce recommendations on products, tailored to each user. We use PySpark to crunch large sets of different data and create models in order to generate robust and meaningful suggestions.
Mallzee is a fashion app where people can see products (clothes, shoes and accessories) and decide whether they like them or not. They can also buy products and create feeds of preferred brands and categories of items. We have large amounts of data generated by the users when they scroll and search through products and we use it to understand the user. We want to give everyone meaningful recommendations on the items they might like, hence tailoring the experience to who they are. We build pseudo-intelligent algorithms capable of extracting the style profile of a user and we crunch products data to match items to the user based on a classification model. The model is validated and statistical analyses are performed to determine the tipping point when recommendations are valuable to the user. The talk will go through the steps we implement by showing how the full data stack of Python is used in achieving this goal and how it is interfaced with a Spark cluster through PySpark in order to run Machine Learning algorithms on big data.