Intro to Data Visualization with Python

Intro to Data Visualization with Python-1

Course Description

This course extends Intermediate Python for Data Science to provide a stronger foundation in data visualization in Python. The course provides a broader coverage of the Matplotlib library and an overview of Seaborn (a package for statistical graphics). Topics covered include customizing graphics, plotting two-dimensional arrays (e.g., pseudocolor plots, contour plots, images, etc.), statistical graphics (e.g., visualizing distributions & regressions), and working with time series and image data.

Chapters:

1

Customizing plots

100%

Following a review of basic plotting with Matplotlib, this chapter delves into customizing plots using Matplotlib. This includes overlaying plots, making subplots, controlling axes, adding legends and annotations, and using different plot styles.

2

Plotting 2D arrays

100%

This chapter showcases various techniques for visualizing two-dimensional arrays. This includes the use, presentation, and orientation of grids for representing two-variable functions followed by discussions of pseudocolor plots, contour plots, color maps, two-dimensional histograms, and images.

3

Statistical plots with Seaborn

100%

This is a high-level tour of the Seaborn plotting library for producing statistical graphics in Python. The tour covers Seaborn tools for computing and visualizing linear regressions as well as tools for visualizing univariate distributions (e.g., strip, swarm, and violin plots) and multivariate distributions (e.g., joint plots, pair plots, and heatmaps). This also includes a discussion of grouping categories in plots.

4

Analyzing time series and images

100%

This chapter ties together the skills gained so far through examining time series data and images. This involves customizing plots of stock data, generating histograms of image pixel intensities, and enhancing image contrast through histogram equalization.

DATASETS:

 

 

Advertisements

Writing Functions in R

Writing Functions in R Course-1

Course Description

Functions are a fundamental building block of the R language. You’ve probably used dozens (or even hundreds) of functions written by others, but in order to take your R game to the next level, you’ll need to learn to write your own functions. This course will teach you the fundamentals of writing functions in R so that, among other things, you can make your code more readable, avoid coding errors, and automate repetitive tasks.

Chapters:

1

A quick refresher

100%

Before we embark, we’ll make sure you’re ready for this course by reviewing some of the prerequisites. You’ll review the syntax of writing a function in R, the basic data types in R, subsetting and writing for loops. It won’t all be review, we’ll introduce you to a few new things that will be helpful throughout the course.

2

When and how you should write a function

100%

Writing your own functions is one way to reduce duplication in your code. In this chapter, you’ll learn when to write a function, how to get started and what to keep in mind when you are writing. You’ll also learn to appreciate that functions have two audiences: the computer (which runs the code) and humans (who need to be able to understand the code).

3

Functional programming

100%

You already know how to use a for loop. The goal of this chapter is to teach you how to use the map functions in the purrr package which remove the code that’s duplicated across multiple for loops. After completing this chapter you’ll be able to solve new iteration problems with greater ease (faster and with fewer bugs).

4

Advanced inputs and outputs

100%

Now you’ve seen how useful the map functions are for reducing duplication, we’ll introduce you to a few more functions in purrr that allow you to handle more complicated inputs and outputs. In particular, you’ll learn how to deal with functions that might return an error, how to iterate over multiple arguments and how to iterate over functions that have no output at all.

5

Robust functions

100%

In this chapter we’ll focus on writing functions that don’t surprise you or your users. We’ll expose you to some functions that work 95% of the time, and 5% of the time fail in surprising ways. You’ll learn which functions you should avoid using inside a function and which you should use with care.

Dataset Used:

Swimming Pools

Summary:  

  • Functions automate repetitive tasks.
  • Functions make the code more readable
  • Functions help you avoid coding errors
  • I learned to use [ ] and [[ ]] for subsetting a data frame
  • Scoping
  • Best practices for function definition
  • How to use data moments in the function
  • The purrr package and its functions
  • How to catch errors and display error messages
  • How to change the global options

 

What I took away from this course:

Functions reduce the amount of code in a script, which makes the script code easier to read and understand. The R “purrr” package makes the code even cleaner by further reducing the verbosity of the base R code functions with its own less verbose functions.

 

 

Exploring 67 Years of LEGO

Project Description

In this project we will explore a database of every LEGO set ever built.

The Rebrickable database includes data on every LEGO set that ever been sold; the names of the sets, what bricks they contain, what color the bricks are, etc. It might be small bricks, but this is big data! In this project, you will get to explore the Rebrickable database. To do this you need to know your way around pandas dataframes and it’s recommended that you take a look at the courses pandas Foundations and Manipulating DataFrames with pandas.

DataBase:

LEGO Database

I used PANDAS to explore and manipulate the dataset which was converted from a csv file to a pandas dataframe.

Github project URL:

Github project URL

lego-bricks

Introduction to Git for Data Science

Version control is one of the power tools of programming. It allows you to keep track of what you did when, undo any changes you have decided you don’t want, and collaborate at scale with other people. This course will introduce you to Git, a modern version control tool that is very popular with data scientists and software developers alike, and show you how it can help you get more done in less time and with less pain.

Chapters:

  • Basic workflow

    This chapter explains what version control is and why you should use it, and introduces the most common steps in a common Git workflow.

  • Repositories

    This chapter digs a little deeper into how Git stores information and how you can explore a repository’s history.

  • Undo

    Since Git saves all the changes you’ve made to your files, you can use it to undo those changes. This chapter shows you several ways to do that.

  • Working with branches

    Branching is one of Git’s most powerful features, since it allows you to work on several things at once without tripping over yourself. This chapter shows you how to create and manage branches.

  • Collaborating

    This chapter shows Git’s other greatest feature: how you can share changes between repositories to collaborate at scale.

    What I learned :
  • the concept of local and remote git repos
  • add, commit, push commands
  • pull and merge
  • each commit is encoded with a unique hash id
  • concept of master branch – branches model
  • cloning of remote repos to work on them locally
  • used a git bash terminal to issue git commands
     Introduction to Git for Data Science Course-1

Python Data Science Toolbox (Part 1)

It’s now time to push forward and develop your Python chops even further. There are lots and lots of fantastic functions in Python and its library ecosystem. However, as a Data Scientist, you’ll constantly need to write your own functions to solve problems that are dictated by your data. The art of function writing is what you’ll learn in this first Python Data Science toolbox course. You’ll come out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. You’ll gain insight into scoping in Python and be able to write lambda functions and handle errors in your very own function writing practice. On top of this, you’ll wrap up each Chapter by diving into using your acquired skills to write functions that analyze twitter DataFrames and are generalizable to broader Data Science contexts.

1

Writing your own functions

FREE

100%

Here you will learn how to write your very own functions. In this Chapter, you’ll learn how to write simple functions, as well as functions that accept multiple arguments and return multiple values. You’ll also have the opportunity to apply these newfound skills to questions that commonly arise in Data Science contexts.

  • Icon exercise video done
    User-defined functions

    50 xp

  • Icon exercise mc done
    Strings in Python

    50 xp

  • Icon exercise mc done
    Recapping built-in functions

    50 xp

  • Icon exercise interactive done
    Write a simple function

    100 xp

  • Icon exercise interactive done
    Single-parameter functions

    100 xp

  • Icon exercise interactive done
    Functions that return single values

    100 xp

  • Icon exercise video done
    Multiple parameters and return values

    50 xp

  • Icon exercise interactive done
    Functions with multiple parameters

    0 xp

  • Icon exercise interactive done
    A brief introduction to tuples

    0 xp

  • Icon exercise interactive done
    Function that return multiple values

    100 xp

  • Icon exercise video done
    Bringing it all together

    50 xp

  • Icon exercise interactive done
    Bringing it all together (1)

    70 xp

  • Icon exercise interactive done
    Bringing it all together (2)

    100 xp

  • Icon exercise video done
    Congratulations!

    2

    Default arguments, variable-length arguments and scope

    100%

    In this Chapter, you’ll learn to write functions with default arguments, so that the user doesn’t always need to specify them, and variable-length arguments, so that they can pass to your functions an arbitrary number of arguments. These are both incredibly useful tools! You’ll also learn about the essential concept of scope. Enjoy!

    • Icon exercise video done
      Scope and user-defined functions

      50 xp

    • Icon exercise mc done
      Pop quiz on understanding scope

      35 xp

    • Icon exercise interactive done
      The keyword global

      100 xp

    • Icon exercise mc done
      Python’s built-in scope

      50 xp

    • Icon exercise video done
      Nested functions

      50 xp

    • Icon exercise interactive done
      Nested Functions I

      100 xp

    • Icon exercise interactive done
      Nested Functions II

      100 xp

    • Icon exercise interactive done
      The keyword nonlocal and nested functions

      0 xp

    • Icon exercise video done
      Default and flexible arguments

      50 xp

    • Icon exercise interactive done
      Functions with one default argument

      100 xp

    • Icon exercise interactive done
      Functions with multiple default arguments

      70 xp

    • Icon exercise interactive done
      Function with variable-length arguments (*args)

      100 xp

    • Icon exercise interactive done
      Function with variable-length keyword arguments (**kwargs)

      100 xp

    • Icon exercise video done
      Bringing it all together

      50 xp

    • Icon exercise interactive done
      Bringing it all together (1)

      100 xp

    • Icon exercise interactive done
      Bringing it all together (2)
  • 3

    Lambda functions and error-handling

    100%

    Herein, you’ll learn about lambda functions, which allow you to write functions quickly and on-the-fly. You’ll also get practice at handling errors that your functions, at some point, will inevitably throw. You’ll wrap up once again applying these skills to Data Science questions.

    • Icon exercise video done
      Lambda functions

      50 xp

    • Icon exercise mc done
      Pop quiz on lambda functions

      50 xp

    • Icon exercise interactive done
      Writing a lambda function you already know

      70 xp

    • Icon exercise interactive done
      Map() and lambda functions

      0 xp

    • Icon exercise interactive done
      Filter() and lambda functions

      0 xp

    • Icon exercise interactive done
      Reduce() and lambda functions

      70 xp

    • Icon exercise video done
      Introduction to error handling

      50 xp

    • Icon exercise mc done
      Pop quiz about errors

      50 xp

    • Icon exercise interactive done
      Error handling with try-except

      100 xp

    • Icon exercise interactive done
      Error handling by raising an error

      70 xp

    • Icon exercise video done
      Bringing it all together

      50 xp

    • Icon exercise interactive done
      Bringing it all together (1)

      100 xp

    • Icon exercise interactive done
      Bringing it all together (2)

      70 xp

    • Icon exercise interactive done
      Bringing it all together (3)

      70 xp

    • Icon exercise mc done
      Bringing it all together: testing your error handling skills

      50 xp

    • Icon exercise video done
      Congratulations!

      50 xp

Conclusion:

  • Functions take parameters, default parameters, variable length parameters
  • Functions can return values
  • Docstrings  “”” “”” are code comments on what the code does
  • Concept of scope: local, global, enclosing. This determines how variables are used
  • Concept of nesting functions in an outer / inner model
  • How to construct Lambda functions and pass them to other functions
  • How to handle errors and exceptions and raise exceptions

Dataset used:

Tweets

Python Data Science Toolbox (Part 1) Course-1

Foundations of Probability in R

Foundations of Probability in R Course-1Probability is the study of making predictions about random phenomena. In this course, I learned about the concepts of random variables, distributions, and conditioning, using the example of coin flips. I also gained intuition for how to solve probability problems through random simulation. These principles will help me understand statistical inference and how they can be applied to draw conclusions from data.

Chapters:

  • The Binomial Distribution

One of the simplest and most common examples of a random phenomenon is a coin flip: an event that is either “yes” or “no” with some probability. Here I learned about the binomial distribution, which describes the behavior of a combination of yes/no trials and how to predict and simulate its behavior.

  • Laws of Probability

In this chapter I learned to combine multiple probabilities, such as the probability two events both happen or that at least one happens, and confirm each with random simulations. I also learned some of the properties of adding and multiplying random variables.

  • Bayesian Statistics

Bayesian statistics is a mathematically rigorous method for updating your beliefs based on evidence. In this chapter, I  learned to apply Bayes’ theorem to draw conclusions about whether a coin is fair or biased, and back it up with simulations.

  • Related Distributions

So far we’ve been talking about the binomial distribution, but this is one of many probability distributions a random variable can take. In this chapter we’ll introduce three more that are related to the binomial: the normal, the Poisson, and the geometric.

Functions I used:

The Binomial Distribution:

  • rbinom() – to simulate coin flips. it’s arguments are:  #of draws, #of coins, probability

Density and Cumulative Density:

  • dbinom()
  • pbinom()

Expected value and Variance:

  • mean()
  • var()
  • sd() – the sqrt of the variance

Bayesian Theorem Formula:

  • fair / (fair + biased)

Normal Distribution:

  • rnorm()
  • pnorm()

The Poisson Distribution:

  • rpois()
  • dpois()

The Geometric Distribution:

  • replicate()
  • rgeom()
  • pgeom()

 

Conclusion:

  • Some probabilities are dependent on prior events
  • Some probabilities are independent of prior events
  • Sampling with and without replacement can preserve or alter the probability

 

Importing Data in R (part 1)

Importing Data in R (Part 1) Course-1    Before a data scientist can work with a dataset it first has to be imported into R Studio. since data comes in many formats such as csv, tsv, slash separated, and xls, using the right package is key to easily and efficiently import the data.

    The packages used in this course:

  • utils

  • readr

  • data.table

  • readxl

  • gdata

  • XLConnect

    The packages were loaded using the library() after which you can call the package functions and pass in the required arguments.

The datasets used:

Conclusion:

  • Data comes in different formats.

  • There are multiple packages to choose from.

  • Many  functions are wrappers  for an inner core function.

  • Excel is a very widely used data analysis tool.

 

 

 

 

Cleaning data in Python.

Cleaning Data in Python Course-1

The goal of exploring and cleaning data is to make it            tidy for further analysis. In this  course I used the            following Python modules to clean the datasets:

  • Numpy

  • Pandas

  • Glob

  • Re

The  functions included:

  • concat()

  • drop_duplicate()

  • apply()

  • lambda()

  • pd.melt()

  • .split()  method

Other cleaning tasks included:

  • Check for missing data

  • Converting data types

  • String manipulation

  • Regular Expressions

  • Test with assert statements

Datasets used:

The things about data cleaning I took from this course:

  • The importance of checking for missing values

  • Having the proper data types for the column variables

  • Dropping duplicate data

  • Combining data frames

 

Summarizing with dplyr in R

I used the dplyr  and nhanes packages by

    using the library() function to load              them. The dataset I used was from the        National Center for Health Statistics            (NCHS).

I used the filter() function on the rows,          and used the mean() and sd() functions      for the  average and standard                       deviation. The min(), max(), arrange(),       and groupby() functions were also used     as was the %>% operator.

 

 

Phyllotaxis in R programming language

link to my github repo

    In this project I used R, math, and data visualization package ggplot2 to simulate phyllotaxis, the arrangement of leaves on a plant stem and ruled by spirals.  I also used a Jupyter notebook. 

    The first thing I did was load the options() to set the image size. then I loaded the ggplot() package using the library() function.

    In the second step I initialized some variables for the sine() and cosine(), sin() and cos() respectively.  I put the sin() on the x axis and cos() on the y axis. Then I made a simple scatterplot of points in a circle.

    In the third step I defined variables for the number of points and the golden angle. pi * (3 – sqrt(5)) is the formula I used for the angle variable. I set the points variable to 500.

    In step four I used the theme() function along with the element_blank() method as an argument to remove some components of the plot.

    In step five I set the size,color, and alpha attributes of the plot.

    In step six I added a shape attribute and moved the size to the aes() as its first argument. the aes() function was set as the first argument to geom_point().

    In step seven I set the shape attribute to 17 which maps to a filled triangle and the color to “yellow”.

    In step eight I modified the angle from 2.4 to 2.0 between the points to get a very different image.

    In step nine I modified some elements and set the angle to  13 * pi / 180.

 

download (1)