Generating Data With Jupyter Notebook


This post is inspired by a problem I was too busy/lazy to solve when I initially had it, but was also solved by finding an easier solution. Nevertheless, I’ve come back to it because there are times that the available easier solutions won’t go far enough to giving access to the types of data you might want to work with.

So the problem was, I wanted to have access to a store of data to practice working with Jupyter Notebook. At the time I had none to work with so I thought about a way to generate some random(ish) data to work with in order to get my practice in. This solution is kind of meta, I use Jupyter Notebook to generate data, so that I can work on the data in Jupyter Notebook.

Continue reading “Generating Data With Jupyter Notebook”

Python Distributions Package, Testing and Github


So, I’m starting from the end on this one, I have the work done, but I’m going to document some important elements of how to go about creating a Python package, how to run some tests to ensure that it works and how to store it on Github. First I’m going to start with Github, then I’ll work on some testing and finally we’ll discuss the code.

Continue reading “Python Distributions Package, Testing and Github”

Sunday Quicky #2: Git Bash Tab Completion Setup


Right now I’m working away on a bigger post, part of the topic being covered is the use of Git. If you don’t know what Git is follow the link, but basically it helps you to keep track of and manage changes made to files you use as part of a project. If you do any sort of coding on your machine and you’re not currently using version control, you really should consider it, it’s a game changer. There’s even a free course on Udacity to get you started, I did a previous version and it was most definitely worth it.

Continue reading “Sunday Quicky #2: Git Bash Tab Completion Setup”

Could the Real “Probability Density Function” Please Stand Up

This is a quicky post, not even worthy to be a full post in and of itself because the topic is so short. However, I’m hoping that the wording and content might lead a weary Data Analyst traveler who is just trying to find the correct Probability Density Function, when the web insists on showing multiple formulae, to the answer they seek.

Continue reading “Could the Real “Probability Density Function” Please Stand Up”

Sunday Quicky #1 – Ingest an SQL file into MySQL


Edit: I ran into an issue when setting up phpMyAdmin subsequent to this post. Issue and solution explained at the end.

I was asked earlier today how to view a .sql file in a friendly manner. I didn’t quite get to the answer to that question, I’ve added it to my list. However, while attempting to arrive at a solution I thought that ingesting the .sql file into a database system, MySQL for example, might be a step in the right direction. In my efforts to make it somewhat friendly I didn’t want to get down to the command line level. In order to successfully avoid that I thought that installation of phpmyadmin would help, and it did. I won’t bore you with the steps of how to set that up, in fact I found an excellent resource for setting it up on my system (Manjaro by the way), and I bet there’s a handy resource available for whatever system you happen to be running. A word of warning, unless you want the servers running constantly in the background, skip the steps which say enable, just start the services when you need them.

Continue reading “Sunday Quicky #1 – Ingest an SQL file into MySQL”

Business Email Compromise and Email Header Analysis


I have recently had cause to dive into Business Email Compromise (CEO Fraud, Supplier Fraud, email redirect etc.). This then leads to email header analysis as that is your first step in trying to identify the author of the fraudulent emails. So having done the research it would be a shame to let it go to waste. Let’s go!

Continue reading “Business Email Compromise and Email Header Analysis”

Some Probability Distribution Problems


Over in the Udacity course I’m working my way through (AWS Machine Learning Scholarship Program), I came across my first batch of probability distribution problems. I thought that it was a good jumping off point for the blog in terms of working through their solutions. The first problem below I found to be very difficult to find the answer for. Any of my readers who are seasoned Data Analysts or statisticians will have a smug chuckle when they read it, and I don’t blame them, because the answer is pretty obvious. I will, however, ask you to bear in mind that it is about 20 years since I worked consistently with any type of statistics. I’ve dipped in now and again since then, but it really is a use it or lose it skill. I haven’t been using it, so…

The first batch of problems I came across were in a quiz and were in a multiple choice answer format. I’m going to list out the questions and possible answers first and then walk through the terminology involved and how to solve the problems. You smug chucklers see if you can spot why the answer to question 1 should have been instantaneous.

Continue reading “Some Probability Distribution Problems”

First Post: An Introduction

This is my maiden post so before I get into things I thought I should introduce the reason for this blog. Essentially, I’m looking to drive my career towards Data Analysis (or some associated field), but I don’t currently work at Data Analysis (DA). In my current role a kind of analysis of data does get performed, but it’s not DA as it’s understood when companies are looking for Data Analysts. So I wanted to come up with a platform to demonstrate my passion for and knowledge of DA.

An obvious solution is to develop some sort of portfolio of work that I can point to as evidence that I at least have some chops. Credit where it’s due, I was inspired by a friend who began a blog on the subject he was passionate about in order to have something to point to during interview situations, among other reasons. Thanks to Maciej of And so the idea for this blog was birthed.

Continue reading “First Post: An Introduction”