Data Analytics Complete Reskill Certificate Program- Yayasan Peneraju(12.12.2023 to 21.12.2023)

As my past experiences include Digital Marketer, Property Valuer, and Property Operation Executive have deeply transformed me into a somewhat logical analyzer, I found R language to be another level of analysis I have yet to master. It is, without doubt, one of the best tools to extract, explore, and visualize data most efficiently and attractively and serve many department areas including Real Estate.

Data analysis for project Real Estate Housing Price Prediction-2.png

Me and my team, we decided to take on the Real Estate Data set. We conducted a series of serious discussions on our understanding of the business before we started with Data Preprocessing, Data Exploration, and Data Visualization with Google Collab. Determining the Dependent column and the Independent column from the get-go also allows us to ensure the path as a Data Analyst is on the right track!

Coding with Google Collab for R is easier to handle. I reached a basic understanding and application of the R language in just 4 days.

I love the way Dr Sara ensures everybody is not left behind. She always ensures the beginners like me can follow the class, while those with intermediate and advanced knowledge know what and how to apply advanced formulas. The one thing she would always advise us in the class, BE BOLD WITH YOUR ANALYSIS! Be Bold is what I tried to do the entire time. And my classmates are all very supportive and cooperative.

Communication is key. Listening is the way. Also intense focus. Without good cooperation with each other in the class, nobody among us could finish what we started and have a good understanding of R.

As always,
Emma Shahrir.

My journey into data analytics and R programming begins when I apply for an upskill program at AirAsia academy and finally gets the opportunity to join the classes. I would say that this journey here is valuable for me because Dr. Sara has the gift of explaining the theoretical and practicality of data analytics and R programming very well. I like her intonation when she is teaching in the class where her tone is tone monotone and she knows how to make us as a student of a complete beginner to pay attention. Moreover, her storytelling style helps us as a student to grasp the info very well.

Specific moments that stand out are when I was assigned in a team, and we have to complete a mini R project. The project is due in less than 24 hours after we have learnt the theory and practical terms of R programming using Google Collaborators. We as a team chose the Olympic dataset to work on our mini R project.

We changed our problem statement a few times because we werenโ€™t sure which columns, we can use to relate the data to the graph that will be plotted later on using ggplot. However, itโ€™s fascinating we get to delved into the world of Winter Olympics. We as a team specifically decided to focus on the participation of male and female athletes from the United States during the winter seasons.

There were missing values in the 70000 columns Age, Height, and Weight, therefore we decided to replace the missing values by average.

Next, we convert the categorical column into factor to check if there are any missing values in the Sex, Season, Sport, City, and NOC.

Initially, the olympic dataset has around 70000 columns data but we narrow down the data to USA during winter only and print the new data which is usa_winter_data by using this command below in R:

usa_winter_data <- subset(olympictable,NOC=="USA" & Season=="Winter")

To remove the unwanted table, we use the command library(dplyr) and create a new OLympic table with which data we want to focus on which is Sex, Age, NOC, Year, Season, City and Sport.

Further data cleaning was done by finding the duplicated values in a new Olympic table and removing the duplicate values. The new data is checked again to see there are no duplicates.

Data Visualization is the last goal to achieve in this mini R project, I would say this part is the most challenging because there are variables that we need to plot and find hypothesis. In most cases, when plotting a graph using R, the numerical value is typically placed on the y-axis, representing the dependent variable, while the categorical value is placed on the x-axis, representing the independent variable. However, itโ€™s important to note that there are situations where the axes can be switched depending on the specific analysis or visualization we want to achieve. The choice of the variable depends on which axis is the challenging part as it depends on the nature and the goal of our graph.

one variable.png

two variable.png

one variable flip 2.png

two variable betul.png

Four Variable (Two Numerical & Two Categorical)

four variables.png

  1. During the Winter Olympics, there is a higher representation of male athletes compared to female athletes from USA.

  2. There are three sports where male athletes have a significant presence

i.e Ski jumping,Skeleton,Nordic Combined

Five Variables
five variables.png

1b4738cf-9a3e-48cf-810d-e1393254fd42-image.png

We have gained knowladge of data mining and explotary data analysis by using R programming.By using this skill we have succesfully answering our question about the factor that affecting gross.We get know the Certificate ,movies duration the real factor

How did your journey into data analytics and R programming begin?
To be honest i have zero idea how it works at first..but now i got the the rough face how it works,but it still gonna need more for me to get more familiar with it.
It quite hard I know it gonna takes more time for me but i will try not to giveUP..
Let's walk together.

I would like to say thank you to Dr.Sara for being so patience guiding us and for making this course so much fun.

In addition, i would like to say how lucky i'm to be here.

As for our final mini project, to be honest it was not easy for me but i was so grateful to my team mates since they are so helpful, even though i might have drag the team a bit since my pace a bit slow in this area.

Anyway the end is just a new beginning.

graph.png

@saranyaravikumar

Sorry Dr Sara, we have mistakenly 'replied as a topic' to your post for our visualisation data presentation yesterday. We will reply here again with the results of our Colab.

First of all, our group members consists of:

  1. Nur Fera Ereen
  2. As Noor Aqilah
  3. Nur Hidayah
  4. Nazatul Shahirah

We have selected the HR_Data.

Problem Statement: To identify the duration of last employed job impacts on employee's decision on changing the job.

f6582a7e-308b-43a4-a56d-c29d6b6cacd5-image.png

Our dataframe have six variables:

  • gender: Categorical variable representing the gender of the individuals (e.g., "Male," "Female," "Other").

  • relevent_experience: Categorical variable indicating whether the individual has relevant experience (e.g., "Has relevent experience," "No relevent experience").

  • experience: Numeric variable representing the years of experience.

  • last_new_job: Numeric variable indicating the time since the last job change.

  • company_size: Numeric variable representing the size of the company.

  • target: Binary variable (0 or 1) indicating whether the individual is the target of interest.

Four Variable (Two Numerical, Two Categorical)
4 variables.png
Interpretation: This grouped bar plot visualizes the interaction between company size, last new job, gender, and relevant experience.

Conclusion: The plot suggests that the distribution of last new job changes varies across different company sizes, and this relationship differs between genders and relevant experience groups.

Five Variable (Three Numerical,Two Categorical)
5 variables.png

Interpretation: This scatter plot shows the relationships between experience, last new job, and target, with points colored and shaped according to gender and relevant experience.

Conclusion: The plot provides a multidimensional view of the dataset, allowing for the examination of multiple variables simultaneously.

Findings:
We have identified that the duration of last employed job does give impacts on employee's decision on changing the job. As their duration of last employed job and experience increases over the years, they have the more tendency to stay at their jobs.

Thank you for your teaching, we truly appreciate your efforts and patience to guide us through! You make us become super interested in R-coding ๐Ÿ™‚ โค

Mini Project-Heart

  1. Thanks to Dr. Saranya, for the valuable knowledge in R programming.
  2. From this mini project we can apply the knowledge and improve the part we miss out on and know which part we lack.
  3. Our advice to those who read this, is don't stop practicing and never give up.

Attached is our team data from the heart mini project.

  • Yatt
  • Zulhusaini
  • Azmi
  • Ahmad Ariff

download.png download (1).png download (2).png download (3).png ![alt text]download (4).png download (5).png

@saranyaravikumar

Dear Dr. Sara and all Air asia academy community or whoever that going to read this ,

I feel really amazing given the oppurtunity to get a deeper knowledge about data analytic from expert like Dr. Sara , Thank you so much for a really great experience on this journey . Im taking the Data Analytics course because i want to get to a deeper knowledge on how data can influence in our life .

1st day of class the introduction that Dr. Sara explained really blow my mind how data can make a bigger change in our life . Before this , i read a lot of data analyze summary but didnt know the process behind it .

eg in Stock market trading , we always need to read the summarize of daily , weekly and yearly price . Now i know how to summarize the data using R and start to find corelation between them.

In our mini project, i ask Dr. Sara many question to understand better about the process of finding corelation between our variables .What is dependent and undependent between one another . Well understanding about the data is the key in problem solving and finding outcomes .

Our datasets is about drug classification . How different types of Drug affect Blood Pressure , Cholesterol Level in different age and gender . Analyze the effect of Drug type towards heart condition patient .

My group consist of

  1. NUR AIN SHAFIKA BINTI ZAKARIA
  2. NUR HANIAH
  3. NUR AINA MUNIRAH
  4. SITI SABIHAH

Dataframe
Screenshot_20231217_215628_Chrome.jpg

Data Understanding and Preprocessing

From the dataframe we can see 2 type of data which is numerical ( integer and double) , character (binary , nominal and identity). For character/categorical we con convert it to numerical and summarize it . There is no missing value and duplication . So we proceed to data visualization .

SmartSelect_20231217_220617_Chrome.jpg

Data Visualization using ggplot

SmartSelect_20231215_221425_Chrome.jpg SmartSelect_20231215_220525_Chrome.jpg

What we can see here is even Drug Y has the highest na to k value but it shows different type of blood pressure level , while Drug A and B shows a high blood pressure level . Its mean there some other factor that affecting the level of blood pressure not necessarily be related to Na to K ratio such as direct psychological effect on heart rate , electrolytes and mineral balance in the body , metabolic process (fluid retention , blood volume ) .

SmartSelect_20231215_223111_Chrome.jpg

Same case like blood pressure , different drug has different effect towards cholestrol level . From the graph we can see that drug c has significant high level of cholestrol while drug x shows a significant normal level . Both type has lower na to k ratio . This is because certain drugs may have direct effect on lipid metabolism leading to increased cholestrol synthesis or decreased cholestrol clearance .

SmartSelect_20231215_232543_Chrome.jpg

Many other factor to be considered to relate the age with na to k ratio like lifestyle and dietary pattern , genetics and overall health status .

SmartSelect_20231215_234526_Chrome.jpg

SmartSelect_20231215_234750_Chrome.jpg

Age Vs BP

SmartSelect_20231216_002653_Chrome.jpg

Age Vs CL

SmartSelect_20231216_003926_Chrome.jpg

Blood pressure and Cholesterol level doesnt necessarily related to Age

Drug Vs Sex

SmartSelect_20231216_004912_Chrome.jpg

More male taking drug a , b and c
More female taking drug y
While drug x suitable for male and female

SmartSelect_20231216_005551_Chrome.jpg

Patient that prescribed DrugA and DrugB has higher BP level but lower Na to K ratio.
Patient that prescribed DrugC has low BP and Na to K ratio.
Patient that prescribed DrugX has normal BP level and low Na to K ratio.
Patient that prescribed DrugY has highest Na to K ratio but has different BP level.

SmartSelect_20231216_012911_Chrome.jpg

Each type of drug has its effect toward bp , cl and na to k ratio in blood . This data analyze needed by doctor to prescribe the best medication based on individual health status.

SmartSelect_20231216_085507_Chrome.jpg

Thats all our findings in given dataset about drug classification .
Last but not least , just want to quote some interesting words for my inspiration and all community in data analytics , " Big Data isn't about bits , its about talent " like Dr. Sara always says "BE BOLD "

~ayyinzakaria~

Me and my team, we decided to take on the Sample Superstore Dataset. We conducted a series of serious discussion on our understanding of the business before we started with Data Preprocessing, Data Exploration and Data Visualization with Google Collab. Determining the Dependent and Independent column.

We do the Data Cleaning Process by identify the Missing Data and Duplicate Data. After that we do the Data Scaling process to standardize the values.

The final process we do is the Visualization and Interpreting Data.

Thank you so much to Dr Saranya for being so patience and helpful in guiding us along the journey.

Group Members :

  • Noor Nabilah Shahida Binti Fadhil
  • Muhammad Zulfadhli Bin Aswadi
  • Aiman Haziq Bin Abdul Jabar
  • Nik Mohd Aiman Bin Alwai

Screenshot 2023-12-19 004138.png Screenshot 2023-12-19 004112.png Screenshot 2023-12-19 004058.png Screenshot 2023-12-19 004027.png Screenshot 2023-12-19 004010.png Screenshot 2023-12-19 003946.png

@saranyaravikumar
First of all, I would like to thank Dr @saranyaravikumar for guiding us through the boot camp and also AirAsia for the opportunity here.
At the start, my journey into the world of data analytics and R programming has been wonderful and it is such a dream for me because it is like a dream come true which was just a month ago that I decided to go deeper for Data Analytics. I also had a terrific time attending physical classes and meeting new people to help me start my new chapter in the data world. Because there will be a strong demand for data analytics in the following year, my heart erupted.

I've been collaborating with teammates, @Yushi ๐Ÿ˜ƒ & Haziq MudZakir๐Ÿ˜€ on an R programming project. The project is about telecom provider churn. This was the first time I heard the term "Churn," and it has given me a rollercoaster ride because I need to complete the job. Nevertheless, in a little amount of time, we were able to exceed the presentation. This is our video presentation on the churn dataset analysis and a few sample visualizations that we managed to create.
Churn in R Programming Project

The number of user by state in USA7e5cc820-c1f1-4578-99a5-0c2c4d581fde-OneVariableCat.jpg

The number of users by state vs the count of user that churn
b86fe8ca-beff-4896-aff9-95be8fb0a276-TwoVariable3.jpg

Five Variable Chart
f6250170-c93d-4cf1-8e5e-216ff156ada8-FiveVariable.jpg

Still, it's a priceless memory that I might not be able to recreate anywhere. Everyone I've met here has been kind and eager to offer their knowledge and experiences.

My advice to all of my friends is that to learn something new, we must be passionate about the course or subject. That will then motivate you to give it your all๐Ÿ˜