Cleaning data in Python.

Cleaning Data in Python Course-1

The goal of exploring and cleaning data is to make it            tidy for further analysis. In this  course I used the            following Python modules to clean the datasets:

  • Numpy

  • Pandas

  • Glob

  • Re

The  functions included:

  • concat()

  • drop_duplicate()

  • apply()

  • lambda()

  • pd.melt()

  • .split()  method

Other cleaning tasks included:

  • Check for missing data

  • Converting data types

  • String manipulation

  • Regular Expressions

  • Test with assert statements

Datasets used:

The things about data cleaning I took from this course:

  • The importance of checking for missing values

  • Having the proper data types for the column variables

  • Dropping duplicate data

  • Combining data frames



Summarizing with dplyr in R

I used the dplyr  and nhanes packages by

    using the library() function to load              them. The dataset I used was from the        National Center for Health Statistics            (NCHS).

I used the filter() function on the rows,          and used the mean() and sd() functions      for the  average and standard                       deviation. The min(), max(), arrange(),       and groupby() functions were also used     as was the %>% operator.



Phyllotaxis in R programming language

link to my github repo

    In this project I used R, math, and data visualization package ggplot2 to simulate phyllotaxis, the arrangement of leaves on a plant stem and ruled by spirals.  I also used a Jupyter notebook. 

    The first thing I did was load the options() to set the image size. then I loaded the ggplot() package using the library() function.

    In the second step I initialized some variables for the sine() and cosine(), sin() and cos() respectively.  I put the sin() on the x axis and cos() on the y axis. Then I made a simple scatterplot of points in a circle.

    In the third step I defined variables for the number of points and the golden angle. pi * (3 – sqrt(5)) is the formula I used for the angle variable. I set the points variable to 500.

    In step four I used the theme() function along with the element_blank() method as an argument to remove some components of the plot.

    In step five I set the size,color, and alpha attributes of the plot.

    In step six I added a shape attribute and moved the size to the aes() as its first argument. the aes() function was set as the first argument to geom_point().

    In step seven I set the shape attribute to 17 which maps to a filled triangle and the color to “yellow”.

    In step eight I modified the angle from 2.4 to 2.0 between the points to get a very different image.

    In step nine I modified some elements and set the angle to  13 * pi / 180.


download (1)


My Stratego obsession

Who remembers the classic board game Stratego from the 70’s ? well now you can play it online for free.

This is my profile which shows my rank, wins and losses, and achievements. the achievements are split between profile and battle. I currently have 119 wins, when I reach 500 wins I will earn a 500 wins battle achievement.