My Journey to Data Science

Zion Oladiran
3 min readJul 21, 2021

“The best way to learn data science is to do data science.” — Chanin Nantasenamat

In this post, I will share my learning process with you and enlighten you about data analysis. The first thing I did was to have a guide towards my goals. I’m really grateful to

for putting me through. She did a good job of pointing out some useful resources. I’ll strongly advise you to get yourself a mentor also.

What is Data Analysis?

It is a process of inspecting, cleansing, transforming, and modeling data to discover useful information, informing conclusions, and support decision-making.

For the inspection, cleansing, and transformation process, I’ll be making use of Python and Pandas Library. This will be explained in detail in my next post.

Data Analysis Process

Data Analysis Lifecycle/Process

1. Data Collection

This is the process of gathering quantitative and qualitative information on specific variables to evaluate outcomes or actionable insights. Today, with help from Web and analytics tools, data can be collected from website traffic, server activity, and other relevant sources, depending on the project. Note that the quality of the data plays a major role in getting an accurate prediction.

2. Data Extraction

Data extraction is the process of retrieving data from data sources for further data processing or storage. The retrieved data should contain only the quantity of data that will prove to be useful for the analysis. This is an important step in data analysis because excess data causes slower operations and biased or inaccurate results.

3. Data Preparation

To prevent false and misleading results, we need to check the dataset for anomalies like missing values or empty data, duplicate values, incorrect data types, invalid values, outliers, and non-relevant data. Such anomalies need to be treated before analyzing the data.

4. Data mining

This is the process of extracting usable information by identifying trends and patterns between the variables of a data set. It gives us an insight into the correlation between the variables. Hence, we can say that it helps us to gain insights from the raw data before moving towards the final model building.

5. Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to identify and understand trends, outliers, and patterns in data.

6. Model Building

Model building helps us achieve the result by analyzing the most important factors and giving a solution to the problem. The major steps involved are; creating the model, validating the model, and predicting based on the model. Various machine learning algorithms such as Classification, Regression, and Clustering are used to build the model.

Skills for Data Scientist

Before going into Data Science, there are some skills you need to know ;

  1. Programming in Python or R (either work).

2. Fluency with popular packages and workflows for data science tasks in your language of choice.

3. Statistics knowledge and methods.

4. Workflow and collaboration skills (Git, command line/bash, etc.)

Don’t be scared, you only need to be more intentional about learning, A step at a time!

Comparison between R programming and Python

Why Python for Data analysis?

  1. Very simple & intuitive to learn.
  2. Powerful Libraries (not just for data analysis).
  3. Free and open source community, docs, and conferences.

When to choose R

  1. When dealing with advanced statistical methods
  2. When extreme performance is needed.

Now I believe you understand what data analysis is and the processes involved in analyzing data. I’ll be using some of the processes listed above to analyze a dataset in my next post. Feel free to comment and share also😊

--

--

Zion Oladiran

Daughter of God || B(Eng.) Computer Engineering || Tech Enthusiast || Interested in Data Science, Machine Learning and AI