What is Data Visualization?
Data Visualization is representation of the data in the form of visual format majorly used in the field of data science. The data is usually visualized using graphs, plots, lines etc. The various features of the data are represented pictorially for better understanding. The patterns, trends in huge amount of data can be easy understood or interpreted with the help of visual representation. It is not only important for Data Science or Data Analytics but for every career option, this serves as a tool to understand the data.
What is the need for Data Visualization?
We need Data Visualization to summaries the data and interpret the meaning out of it. In addition, it is a better way to understand the data by looking at the visual summary rather than reading thousands of lines of excel sheets. It also eases out the problem of communicating the data. In Data Science and Data Analytics, it is important to find trends or patterns and make conclusions, which is only possible with the help of Data Visualization.
How is the Data Visualized?
The data is visualized depending upon the kind of inference we are going to draw from it. As we have understood Data Visualization and the need for it, let us look at a few of the Data Visualization techniques.
Line Charts are used when one of the variable varies greatly with respect to time. In these plots, usually, the X-axis represents the time-axis and the Y-axis represents the quantity that is taken into consideration. It represents how one variable changes with time.
Example: Line charts are used in time-series analysis, stock market analysis, population analysis, runs scored by a batsman in various years etc. Usually used when time is a variable.
Area charts are derived from Line Charts in which the area under the lines are filled to interpret its significance. The area under the lines are filled with various colors to draw significance and differentiate among the lines.
Example: These charts usually used to compare the revenues of financial years, product sales per month etc.
A bar chart is a graph that presents categorical data with rectangular bars with heights proportional to the values that they represent. These charts are used most, in case of categorical data with fewer categories. The difference between the categories can be easily seen based on the size of the plot.
Example: Comparison of marks scored by boys and girls, runs scored in every over in a cricket match, number of monthly sales etc.
Histograms are similar to bar charts, but it displays the spread and distribution of continuous sample data. The bins are used to represent the interval of the variable in the X-axis and frequency of the other variable is represented on the Y-axis. These are based on areas rather than the heights.
Example: Usually are used for distribution analysis, distribution of income of people of various age in the process of data visualization etc.
Scatter plots are used to represent the relationship between the two variables. The similarities within the data and the trends can be easily visualized using the scatter plots. The correlation is represented using this graph. The trend between the variables can be seen.
Example: Comparing salaries and years of experience, comparing height and weight of people etc.
The code for this can be written as:
The percentages are illustrated using the Pie charts. The variables can be shown as a part of whole. The proper proportions can be represented using the pie charts. This chart is usually in the form of disc and different variables are shown as sectors of the circle.
Example : The percentage of people living in urban and rural areas, market share of various companies etc.
Box plot are used to draw statistical inferences from the variables. The box plot represents the median, quartiles, inter quartile ranges and outliers. It facilitates for the comparison of between variables and across the categorical variables.
Example: Used to detect outliers in debits and credits, transportation problems etc.
Data visualization is one of the most crucial steps in Data Science. It allows us to draw inference from the data before building the actual models. Important trends, insights and generalizations would be missed if Data visualization were not performed. Proper communication of the data wouldn’t be possible without visualizing it.
There are also other visualizations like heatmap, kdeplot, ridgeline plot etc. Thus, data visualization can be seen as one of the important perquisites for any Data Science or Data Analytics problem.