Step by Step Introduction to Python’s Matplotlib Library

In this tutorial series, we will look at how to get started with using Python and Matplotlib to visualise our data. In this first part we’ll take you through the basics of how to create a simple line chart, how to customise and format it while working with instances of Matplotlib’s figure and axes objects. In future parts we will focus on plotting different chart types and then concentrate on some intermediate methods such as adding secondary axes and creating subplots.

Before we begin, let’s quickly look at the data we’ll be using for the first part of this tutorial…


How to Quickly Get Started with One of the Most Popular Machine Learning Algorithms

In this tutorial we will see how to implement the Catboost machine learning algorithm in Python. We will give a brief overview of what Catboost is and what it can be used for before walking step by step through training a simple model including how to tune parameters and analyse the model.

What is Catboost?

Catboost is a boosted decision tree machine learning algorithm developed by Yandex. …


Kaggle is Still Hard to Beat For Becoming a More Technical Data Scientist

Ok, here we go, i’ll stick my head above the parapet. There’s a debate that’s been gathering pace for a while now, a backlash against Kaggle by those exclaiming a number of points as to why Kaggle isn’t worth doing, that we should maybe not hold winners in such high esteem and that experience gathered doing competitions will not transfer to real life. Some of the more common criticisms include:

  • You can only do well in Kaggle if you have access to expensive hardware
  • Kaggle favours overfitting and finding leaks
  • Kaggle is not representative of actually working in data science


Learn When to Use Stratified K-Fold and How to Implement in Python with Sci-Kit Learn

In this tutorial, we are going to look at stratified kfold cross validation: what it is and when we should use it. We’ll then walk through how to split data into 5 stratified folds using the StratifiedKFold function in Sci-Kit Learn and use those folds to train and test a model before exporting all the splits to csv files.

What is Stratified KFold Cross Validation?

Stratified kfold cross validation is an extension of regular kfold cross validation but specifically for classification problems where rather than the splits being completely random, the ratio between the target classes is the same in each fold as it is in…


Model Interpretability

Be Careful of Evaluation Metric Tunnel Vision

As data scientists we pride ourselves on the accuracy of our machine learning models to predict the future, striving for increasingly accurate results that make our predictions and classifications appear unbelievable to those outside the field. As the progression of machine learning and deep learning has brought more dependency on data science methods for businesses trying to gain a competitive advantage and with the advent of data science competition platforms like Kaggle, the accuracy of our models is often under the spotlight. …

Daniel Hargreaves

Data Scientist | Developer at www.datasnips.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store