On the Diagramatic Diagnosis of Data
The wrong way to start your machine learning project is to “chuck everything into a model to see what happens”. The better way is to visualise your data to expose the relationships that you see, to confirm that your data looks good and to identify problems that are likely to make your life difficult. You’ll save time, you’ll understand “why” your data works and you’ll uncover problems sooner.
We’ll review ways to quickly and visually diagnose your data, to check it meets your assumptions and to prepare it for discussion with your colleagues. We’ll look at tools including Pandas, Seaborn and Pandas Profiling. At the end you’ll have new tools to help you confidently investigate new data with your associates.
This talk introduces Ian’s new discover_feature_relationships tool which will save you time during your Exploratory Data Analysis phase.
CEO, Mor Consulting Ltd
Ian is a Senior Data Science Coach, he co-organises the annual PyDataLondon conference with 500+ attendees and the associated 8,000+ member monthly meetup. He runs the established ModelInsight.io Data Science consultancy as Principal Data Scientist in London, gives conference talks internationally often as keynote speaker and is the author of the bestselling O’Reilly book High Performance Python. He has 16 years of experience as a senior technical leader, data scientist and coach.