Curating quality datasets for machine learning

In the contemporary world of machine learning algorithms - “data is the new oil”. For the state-of-the-art ML algorithms to work their magic it’s important to lay a strong foundation with access to relevant data. Volumes of crude data are available on the web nowadays, and all we need are the skills to identify and extract meaningful datasets. This talk aims to present the power of the most fundamental aspect of Machine Learning - Dataset Curation, which often does not get its due limelight. It will also walk the audience through the process of constructing good quality datasets as done in formal settings with a simple hands-on Pythonic example. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.

Was lernen die Zuhörer*innen in dem Vortrag?

It functions as the first act of the play of Machine Learning. It aims at enlightening Machine Learning and Artificial Intelligence enthusiasts, practitioners, and data scientists about one of the fundamental aspects of this realm, Dataset Curation. This stage often does not get its due limelight yet has high relevance in both Academia and Industry. The feature is that it puts forward a step-by-step guide on constructing a good quality dataset from scratch.

Rishabh Misra
Rishabh Misra

Rishabh has a Masters in CS from the University of California San Diego and currently works at Twitter as an ML Engineer in the Timelines...


Jigyasa Grover
Jigyasa Grover

Jigyasa Grover is a Machine Learning Engineer at Twitter and the co-author of the book ‘Sculpting Data for ML’. She has a myriad of...

110 Minuten Kurzworkshop

Einsteiger
Zeit

15:30-17:10
07. Oktober


Zielgruppe

tba


Themengebiet

Tooleinsatz


Raum

Britz


ID

KWS5

Zurück

© 2021 Softwareforen Leipzig