Curating quality datasets for machine learning
In the contemporary world of machine learning algorithms - “data is the new oil”. For the state-of-the-art ML algorithms to work their magic it’s important to lay a strong foundation with access to relevant data. Volumes of crude data are available on the web nowadays, and all we need are the skills to identify and extract meaningful datasets. This talk aims to present the power of the most fundamental aspect of Machine Learning - Dataset Curation, which often does not get its due limelight. It will also walk the audience through the process of constructing good quality datasets as done in formal settings with a simple hands-on Pythonic example. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.
Was lernen die Zuhörer*innen in dem Vortrag?
It functions as the first act of the play of Machine Learning. It aims at enlightening Machine Learning and Artificial Intelligence enthusiasts, practitioners, and data scientists about one of the fundamental aspects of this realm, Dataset Curation. This stage often does not get its due limelight yet has high relevance in both Academia and Industry. The feature is that it puts forward a step-by-step guide on constructing a good quality dataset from scratch.