In this analysis, we are creating a histogram distribution of the average number of passes attempted during the 2014 World Cup games in Brazil. The number of passes attempted during a game is an important measure of the team performance and control of the ball. Good team tend to pass the ball more, thus playing a most poised soccer game, whereas out of control teams tend to hurry the kicks. Germany and Brazil in general are among the top teams that tend to make higher number of passes compared to other teams. View the code on Gist.
In this analysis, we are looking at the World Cup average game attendance distribution. We decided to restrict the data from 1970 onward to give an overall distribution of relatively current attendance data. The analysis proceeds with basic data manipulation in pandas. After the basic data manipulation, we use Seaborn to create the distribution. From this analysis, we can see that the World Cup in the USA in 1994 has the highest attendance in modern days followed by the latest in Brazil, 2014. View the code on Gist.
For the last few years, migrants deaths have made the news headlines. The horror of many migrants dying in the Mediterranean sea trying to reach Europe in small boats has shocked many in the world and still is today a great concern for many nations. In this analysis, we showcase the distribution of sub-Saharan Africans migrants deaths in various parts of the world using data from the Missing migrants project. The analysis suggests that most of the Sub-saharan African migrant’s deaths occur in the Mediterranean sea. In particular, the year 2016 registered over 5000 deaths. That number was reduced substantially in 2017, however, the crisis is still ongoing. The second most area where the African migrant’s deaths occur is in North Africa, mainly in Lybia which has become a lawless country with jihadists killings and enslaving of migrants. View the code on Gist.
In this blog I discuss a bug that I found in Python 3.6.4 statistics module version 3.4 concerning the computation of the median, median_high and median_low function when missing values (NaN, nan) are present in the data. I reported this bug to the Python team and had investigated it with different data, with all results pointing to a bug in the current version. You are invited to add your comments if you have any questions pertaining to this blog. View the code on Gist.
These snippets of code will guide you to uploading/downloading files to amazon s3 buckets with the require permissions to read and write objects. View the code on Gist.
View the code on Gist.
View the code on Gist.
View the code on Gist.
For those of you interested in learning how to leverage the power of Python Seaborn and Matplotlib libraries to build high-quality professional visualizations that answer many relevant Data Science questions, my new course entitled Data Visualization and Descriptive Statistics with Python 3 is the solution. The course uses a set of real-world data sources from the United Nations, sport, government to showcase the visualizations. In addition to data visualizations, the second part of the course is focused on how to effectively analyze and compute Descriptive Statistics in Python 3. It shows how to leverage each of the libraries numpy, statistics, scipy and pandas to effectively compute descriptive Descriptive statistics in Python 3. You can take the course by clicking on any of the links above for just 10% the price of the course or a reduction of about 90% Register to the course for just $9.99 View the code on Gist.
Testing statistical hypotheses are an integral part of Data Science. In my new course, I used real-world data sets to test parametric and nonparametric hypotheses using Python 3. The course has several strengths that should not be ignored. It is hands-on, uses real-world data and focuses on testing statistical hypotheses using Python 3. It is taught by an Adjunct Professor of Statistics who taught statistics for twelve years it is extensive and cover all aspects of testing statistical hypotheses using Python It uses Jupyter notebook and mark-downs to clearly document the codes and make them professional The course uses latex to write the statistical hypotheses to help users understand what is being tested In this course, you will learn how to test various statistical hypotheses using Python 3. The course covers the most relevant tests about the population parameters for one, two and many samples. In addition, the course covers ANOVA (Analysis of Variance) and many nonparametric tests. This course is hands-on with real-world datasets to help the students understand how to carry on the various tests. View the code on Gist.
In this article, I want to visualize the percent of undernourishment using data from the Food and Agriculture Organization of the United Nations (FAO). The visualization will be done using data about African and Asian continents. In each of the continent, we will also use the world undernourishment statistics across as a baseline reference point. According to FAO, “The prevalence of undernourishment is an estimate of the proportion of the population whose habitual food consumption is insufficient to provide the dietary energy levels that are required to maintain a normal active and healthy life. It is expressed as a percentage. Undernourishment means that a person is not able to acquire enough food to meet the daily minimum dietary energy requirements, over a period of one year. FAO defines hunger as being synonymous with chronic undernourishment.” The data period being used for this analysis covers 2005 to 2017, although only six data years measurements were available. For this analysis, I am using a stacked barchart with the prevalence of undernourishment plotted inside each region section. This allows an easy comparison by year and also within year for each of the region. The analysis of the prevalence of undernourishment in the African […]