How to Read F Distribution Table if Df Is Inbetween

Plots

Matplotlib, Python, Visualization

Acme 50 matplotlib Visualizations – The Primary Plots (with total python code)

Nov 28, 2018
Selva Prabhakaran

A compilation of the Top 50 matplotlib plots most useful in data analysis and visualization. This list lets you lot choose what visualization to testify for what situation using python's matplotlib and seaborn library.

Introduction

The charts are grouped based on the vii different purposes of your visualization objective. For example, if you want to picturize the relationship between two variables, cheque out the plots nether the 'Correlation' section. Or if you lot want to show how a value changed over fourth dimension, await under the 'Alter' section and then on. An effective chart is one which:

Conveys the right and necessary information without distorting facts.
Simple in blueprint, you don't have to strain in order to get it.
Aesthetics support the information rather than overshadow it.
Non overloaded with information.

__Related Posts: __ Matplotlib Total Tutorial Matplotlib Subplots

(right click and open in new page if the links don't work) Correlation

Scatter plot
Chimera plot with Encircling
Scatter plot with line of best fit
Jittering with stripplot
Counts Plot
Marginal Histogram
Marginal Boxplot
Correlogram
Pairwise Plot

Deviation

Diverging Bars
Diverging Texts
Diverging Dot Plot
Diverging Lollipop Chart with Markers
Area Chart

Ranking

Ordered Bar Chart
Lollipop Nautical chart
Dot Plot
Slope Chart
Dumbbell Plot

Distribution

Histogram for Continuous Variable
Histogram for Categorical Variable
Density Plot
Density Curves with Histogram
Joy Plot
Distributed Dot Plot
Box Plot
Dot + Box Plot
Violin Plot
Population Pyramid
Chiselled Plots

Limerick

Waffle Chart
Pie Chart
Treemap
Bar Chart

Change

Time Series Plot
Time Series with Peaks and Troughs Annotated
Autocorrelation Plot
Cross Correlation Plot
Time Serial Decomposition Plot
Multiple Time Series
Plotting with different scales using secondary Y axis
Time Series with Mistake Bands
Stacked Area Chart
Area Chart Unstacked
Calendar Heat Map
Seasonal Plot

Groups

Dendrogram
Cluster Plot
Andrews Curve
Parallel Coordinates

Setup

Run this in one case before the plot's code. The individual charts, however, may redefine its ain aesthetics.

                # !pip install brewer2mpl import numpy as np import pandas as pd import matplotlib equally mpl import matplotlib.pyplot equally plt import seaborn every bit sns import warnings; warnings.filterwarnings(activeness='once')  large = 22; med = xvi; small = 12 params = {'axes.titlesize': big,           'fable.fontsize': med,           'figure.figsize': (16, ten),           'axes.labelsize': med,           'axes.titlesize': med,           'xtick.labelsize': med,           'ytick.labelsize': med,           'effigy.titlesize': large} plt.rcParams.update(params) plt.way.use('seaborn-whitegrid') sns.set_style("white") %matplotlib inline

# Version print(mpl.__version__) #> 3.0.0 print(sns.__version__) #> 0.nine.0

Correlation

The plots under correlation is used to visualize the relationship between 2 or more than variables. That is, how does one variable alter with respect to some other.

i. Scatter plot

Scatteplot is a classic and primal plot used to study the relationship betwixt two variables. If you have multiple groups in your data you may want to visualise each group in a different color. In matplotlib, y'all can conveniently practice this using plt.scatterplot(). Prove Code

                  # Import dataset  midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/main/midwest_filter.csv")  # Set up Data  # Create every bit many colors as there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/bladder(len(categories)-1)) for i in range(len(categories))]  # Draw Plot for Each Category plt.figure(figsize=(xvi, 10), dpi= 80, facecolor='w', edgecolor='chiliad')  for i, category in enumerate(categories):     plt.besprinkle('area', 'poptotal',                  data=midwest.loc[midwest.category==category, :],                  s=xx, c=colors[i], label=str(category))  # Decorations plt.gca().fix(xlim=(0.0, 0.1), ylim=(0, 90000),               xlabel='Area', ylabel='Population')  plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Scatterplot of Midwest Area vs Population", fontsize=22) plt.legend(fontsize=12)     plt.show()

2. Chimera plot with Encircling

Sometimes you want to show a grouping of points within a boundary to emphasize their importance. In this case, you get the records from the dataframe that should be encircled and pass it to the encircle() described in the code below. Evidence Code

                  from matplotlib import patches from scipy.spatial import ConvexHull import warnings; warnings.simplefilter('ignore') sns.set_style("white")  # Pace one: Prepare Information midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/midwest_filter.csv")  # As many colors equally there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]  # Step two: Draw Scatterplot with unique color for each category fig = plt.figure(figsize=(sixteen, ten), dpi= lxxx, facecolor='west', edgecolor='k')      for i, category in enumerate(categories):     plt.besprinkle('area', 'poptotal', information=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], characterization=str(category), edgecolors='black', linewidths=.v)  # Stride 3: Encircling # https://stackoverflow.com/questions/44575681/how-do-i-encircle-dissimilar-data-sets-in-scatter-plot def encircle(x,y, ax=None, **kw):     if non ax: ax=plt.gca()     p = np.c_[x,y]     hull = ConvexHull(p)     poly = plt.Polygon(p[hull.vertices,:], **kw)     ax.add_patch(poly)  # Select data to be encircled midwest_encircle_data = midwest.loc[midwest.state=='IN', :]                           # Draw polygon surrounding vertices     encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="one thousand", fc="gold", blastoff=0.1) encircle(midwest_encircle_data.surface area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=ane.5)  # Step four: Decorations plt.gca().set(xlim=(0.0, 0.one), ylim=(0, 90000),               xlabel='Surface area', ylabel='Population')  plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Chimera Plot with Encircling", fontsize=22) plt.legend(fontsize=12)     plt.prove()

3. Scatter plot with linear regression line of best fit

If y'all desire to understand how two variables modify with respect to each other, the line of best fit is the way to go. The below plot shows how the line of best fit differs amongst various groups in the information. To disable the groupings and to simply describe one line-of-best-fit for the entire dataset, remove the hue='cyl' parameter from the sns.lmplot() call beneath. Show Lawmaking

                  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_select = df.loc[df.cyl.isin([4,viii]), :]  # Plot sns.set_style("white") gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,                       meridian=7, aspect=1.half-dozen, robust=True, palette='tab10',                       scatter_kws=dict(s=60, linewidths=.seven, edgecolors='black'))  # Decorations gridobj.set(xlim=(0.v, 7.5), ylim=(0, fifty)) plt.title("Scatterplot with line of best fit grouped past number of cylinders", fontsize=20) plt.show()

Each regression line in its own column

Alternately, yous can show the best fit line for each group in its ain column. You cando this by setting the col=groupingcolumn parameter inside the sns.lmplot(). Prove Code

                  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/mpg_ggplot2.csv") df_select = df.loc[df.cyl.isin([iv,8]), :]  # Each line in its ain column sns.set_style("white") gridobj = sns.lmplot(x="displ", y="hwy",                       data=df_select,                       acme=7,                       robust=True,                       palette='Set1',                       col="cyl",                      scatter_kws=dict(s=60, linewidths=.vii, edgecolors='black'))  # Decorations gridobj.set up(xlim=(0.five, 7.v), ylim=(0, 50)) plt.show()

4. Jittering with stripplot

Oftentimes multiple datapoints take exactly the aforementioned X and Y values. As a result, multiple points go plotted over each other and hide. To avoid this, jitter the points slightly and then you can visually meet them. This is convenient to practise using seaborn's stripplot(). Show Code

                  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")  # Describe Stripplot fig, ax = plt.subplots(figsize=(16,x), dpi= fourscore)     sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)  # Decorations plt.title('Apply jittered plots to avoid overlapping of points', fontsize=22) plt.show()

v. Counts Plot

Some other option to avert the problem of points overlap is the increase the size of the dot depending on how many points lie in that spot. So, larger the size of the bespeak more is the concentration of points around that.

                # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/mpg_ggplot2.csv") df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')  # Draw Stripplot fig, ax = plt.subplots(figsize=(16,10), dpi= 80)     sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*two, ax=ax)  # Decorations plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22) plt.show()

6. Marginal Histogram

Marginal histograms have a histogram along the X and Y axis variables. This is used to visualize the relationship between the X and Y along with the univariate distribution of the X and the Y individually. This plot if often used in exploratory data analysis (EDA). Prove Code

                  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/mpg_ggplot2.csv")  # Create Fig and gridspec fig = plt.figure(figsize=(16, x), dpi= 80) grid = plt.GridSpec(iv, 4, hspace=0.5, wspace=0.2)  # Ascertain the axes ax_main = fig.add_subplot(filigree[:-i, :-1]) ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-i, 0:-1], xticklabels=[], yticklabels=[])  # Scatterplot on primary ax ax_main.scatter('displ', 'hwy', s=df.cty*iv, c=df.manufacturer.astype('category').cat.codes, alpha=.ix, data=df, cmap="tab10", edgecolors='gray', linewidths=.v)  # histogram on the correct ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink') ax_bottom.invert_yaxis()  # histogram in the bottom ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', colour='deeppink')  # Decorations ax_main.set(championship='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy') ax_main.title.set_fontsize(twenty) for detail in ([ax_main.xaxis.label, ax_main.yaxis.characterization] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):     detail.set_fontsize(14)  xlabels = ax_main.get_xticks().tolist() ax_main.set_xticklabels(xlabels) plt.show()

vii. Marginal Boxplot

Marginal boxplot serves a similar purpose as marginal histogram. Notwithstanding, the boxplot helps to pinpoint the median, 25th and 75th percentiles of the 10 and the Y. Evidence Lawmaking

                  # Import Information df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")  # Create Fig and gridspec fig = plt.figure(figsize=(16, 10), dpi= 80) grid = plt.GridSpec(iv, 4, hspace=0.5, wspace=0.2)  # Define the axes ax_main = fig.add_subplot(grid[:-one, :-ane]) ax_right = fig.add_subplot(grid[:-1, -i], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])  # Scatterplot on main ax ax_main.besprinkle('displ', 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)  # Add a graph in each part sns.boxplot(df.hwy, ax=ax_right, orient="five") sns.boxplot(df.displ, ax=ax_bottom, orient="h")  # Decorations ------------------ # Remove ten axis name for the boxplot ax_bottom.set(xlabel='') ax_right.set(ylabel='')  # Main Championship, Xlabel and YLabel ax_main.set(title='Scatterplot with Histograms \north displ vs hwy', xlabel='displ', ylabel='hwy')  # Set font size of different components ax_main.title.set_fontsize(twenty) for particular in ([ax_main.xaxis.characterization, ax_main.yaxis.characterization] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):     item.set_fontsize(14)  plt.show()

8. Correllogram

Correlogram is used to visually come across the correlation metric betwixt all possible pairs of numeric variables in a given dataframe (or 2D array).

                # Import Dataset df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")  # Plot plt.figure(figsize=(12,ten), dpi= eighty) sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', middle=0, annot=True)  # Decorations plt.championship('Correlogram of mtcars', fontsize=22) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.prove()

9. Pairwise Plot

Pairwise plot is a favorite in exploratory analysis to understand the relationship between all possible pairs of numeric variables. It is a must have tool for bivariate analysis.

                # Load Dataset df = sns.load_dataset('iris')  # Plot plt.effigy(figsize=(10,8), dpi= 80) sns.pairplot(df, kind="besprinkle", hue="species", plot_kws=dict(s=lxxx, edgecolor="white", linewidth=2.5)) plt.show()

                # Load Dataset df = sns.load_dataset('iris')  # Plot plt.figure(figsize=(10,eight), dpi= 80) sns.pairplot(df, kind="reg", hue="species") plt.show()

Deviation

x. Diverging Confined

If you desire to see how the items are varying based on a unmarried metric and visualize the social club and amount of this variance, the diverging bars is a great tool. It helps to rapidly differentiate the functioning of groups in your data and is quite intuitive and instantly conveys the bespeak.

                # Set up Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/10.std() df['colors'] = ['red' if ten < 0 else 'light-green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)  # Draw plot plt.figure(figsize=(14,x), dpi= eighty) plt.hlines(y=df.alphabetize, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.four, linewidth=5)  # Decorations plt.gca().prepare(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':xx}) plt.grid(linestyle='--', alpha=0.5) plt.evidence()

11. Diverging Texts

Diverging texts is like to diverging bars and it preferred if yous want to show the value of each items within the nautical chart in a prissy and presentable way.

                # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/primary/mtcars.csv") 10 = df.loc[:, ['mpg']] df['mpg_z'] = (x - ten.mean())/x.std() df['colors'] = ['red' if ten < 0 else 'light-green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)  # Draw plot plt.effigy(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):     t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left',                   verticalalignment='center', fontdict={'colour':'red' if x < 0 else 'green', 'size':14})  # Decorations     plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Confined of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.five, 2.5) plt.prove()

12. Diverging Dot Plot

Divering dot plot is also similar to the diverging bars. However compared to diverging bars, the absenteeism of bars reduces the amount of dissimilarity and disparity between the groups. Show Code

                  # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/primary/mtcars.csv") 10 = df.loc[:, ['mpg']] df['mpg_z'] = (ten - x.mean())/x.std() df['colors'] = ['crimson' if 10 < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=Truthful) df.reset_index(inplace=True)  # Draw plot plt.effigy(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.alphabetize, s=450, alpha=.vi, colour=df.colors) for 10, y, tex in zip(df.mpg_z, df.index, df.mpg_z):     t = plt.text(x, y, circular(tex, 1), horizontalalignment='middle',                   verticalalignment='eye', fontdict={'color':'white'})  # Decorations # Lighten borders plt.gca().spines["summit"].set_alpha(.iii) plt.gca().spines["lesser"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.three)  plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', blastoff=0.v) plt.xlim(-two.five, two.v) plt.show()

13. Diverging Lollipop Chart with Markers

Lollipop with markers provides a flexible mode of visualizing the divergence by laying accent on any significant datapoints you want to bring attending to and give reasoning within the nautical chart accordingly. Show Lawmaking

                  # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - 10.mean())/x.std() df['colors'] = 'blackness'  # color fiat differently df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange' df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True)   # Draw plot import matplotlib.patches as patches  plt.effigy(figsize=(fourteen,16), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1) plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-nine' else 300 for x in df.cars], blastoff=0.6) plt.yticks(df.alphabetize, df.cars) plt.xticks(fontsize=12)  # Annotate plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data',              fontsize=xv, ha='center', va='middle',             bbox=dict(boxstyle='foursquare', fc='firebrick'),             arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white')  # Add Patches p1 = patches.Rectangle((-2.0, -1), width=.three, superlative=three, alpha=.2, facecolor='red') p2 = patches.Rectangle((ane.5, 27), width=.8, height=v, alpha=.two, facecolor='green') plt.gca().add_patch(p1) plt.gca().add_patch(p2)  # Decorate plt.title('Diverging Bars of Machine Mileage', fontdict={'size':xx}) plt.grid(linestyle='--', alpha=0.5) plt.testify()

14. Surface area Chart

By coloring the surface area between the axis and the lines, the area nautical chart throws more emphasis not just on the peaks and troughs merely likewise the duration of the highs and lows. The longer the duration of the highs, the larger is the area nether the line. Show Code

                  import numpy as np import pandas as pd  # Ready Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['appointment']).caput(100) x = np.arange(df.shape[0]) y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(ane)).fillna(0) * 100  # Plot plt.figure(figsize=(16,10), dpi= fourscore) plt.fill_between(10[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='light-green', interpolate=True, blastoff=0.7) plt.fill_between(ten[1:], y_returns[1:], 0, where=y_returns[one:] <= 0, facecolor='carmine', interpolate=True, alpha=0.7)  # Annotate plt.annotate('Height \n1975', xy=(94.0, 21.0), xytext=(88.0, 28),              bbox=dict(boxstyle='foursquare', fc='firebrick'),              arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white')   # Decorations xtickvals = [str(m)[:three].upper()+"-"+str(y) for y,grand in zip(df.date.dt.year, df.appointment.dt.month_name())] plt.gca().set_xticks(10[::6]) plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment': 'centre', 'verticalalignment': 'center_baseline'}) plt.ylim(-35,35) plt.xlim(ane,100) plt.title("Month Economics Return %", fontsize=22) plt.ylabel('Monthly returns %') plt.filigree(alpha=0.v) plt.show()

Ranking

xv. Ordered Bar Chart

Ordered bar chart conveys the rank guild of the items effectively. But adding the value of the metric above the nautical chart, the user gets the precise information from the chart itself. Show Code

                  # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/chief/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').utilise(lambda x: ten.hateful()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True)  # Describe plot import matplotlib.patches every bit patches  fig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80) ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)  # Annotate Text for i, cty in enumerate(df.cty):     ax.text(i, cty+0.v, round(cty, 1), horizontalalignment='eye')   # Championship, Label, Ticks and Ylim ax.set_title('Bar Nautical chart for Highway Mileage', fontdict={'size':22}) ax.set(ylabel='Miles Per Gallon', ylim=(0, 30)) plt.xticks(df.alphabetize, df.manufacturer.str.upper(), rotation=lx, horizontalalignment='right', fontsize=12)  # Add together patches to color the X axis labels p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='dark-green', transform=fig.transFigure) p2 = patches.Rectangle((.124, -0.005), width=.446, height=.xiii, alpha=.1, facecolor='red', transform=fig.transFigure) fig.add_artist(p1) fig.add_artist(p2) plt.show()

sixteen. Lollipop Nautical chart

Lollipop nautical chart serves a similar purpose as a ordered bar chart in a visually pleasing mode. Prove Code

                  # Prepare Information df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda 10: x.hateful()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True)  # Depict plot fig, ax = plt.subplots(figsize=(16,ten), dpi= eighty) ax.vlines(10=df.index, ymin=0, ymax=df.cty, color='firebrick', blastoff=0.seven, linewidth=2) ax.besprinkle(ten=df.index, y=df.cty, s=75, color='firebrick', blastoff=0.7)  # Title, Characterization, Ticks and Ylim ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22}) ax.set_ylabel('Miles Per Gallon') ax.set_xticks(df.index) ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12}) ax.set_ylim(0, 30)  # Comment for row in df.itertuples():     ax.text(row.Index, row.cty+.5, due south=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)  plt.show()

17. Dot Plot

The dot plot conveys the rank guild of the items. And since it is aligned along the horizontal axis, you tin can visualize how far the points are from each other more easily.

                # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/primary/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=Truthful)  # Draw plot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.hlines(y=df.index, xmin=xi, xmax=26, color='gray', blastoff=0.7, linewidth=ane, linestyles='dashdot') ax.besprinkle(y=df.index, 10=df.cty, s=75, color='firebrick', alpha=0.seven)  # Title, Characterization, Ticks and Ylim ax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22}) ax.set_xlabel('Miles Per Gallon') ax.set_yticks(df.alphabetize) ax.set_yticklabels(df.manufacturer.str.championship(), fontdict={'horizontalalignment': 'correct'}) ax.set_xlim(10, 27) plt.show()

18. Slope Chart

Slope chart is most suitable for comparison the 'Before' and 'After' positions of a given person/particular. Show Code

                  import matplotlib.lines as mlines # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")  left_label = [str(c) + ', '+ str(round(y)) for c, y in zilch(df.continent, df['1952'])] right_label = [str(c) + ', '+ str(circular(y)) for c, y in zip(df.continent, df['1957'])] klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]  # draw line # https://stackoverflow.com/questions/36470343/how-to-depict-a-line-with-matplotlib/36479941 def newline(p1, p2, color='black'):     ax = plt.gca()     l = mlines.Line2D([p1[0],p2[0]], [p1[one],p2[1]], color='red' if p1[i]-p2[1] > 0 else 'dark-green', marker='o', markersize=six)     ax.add_line(l)     return l  fig, ax = plt.subplots(i,ane,figsize=(fourteen,14), dpi= 80)  # Vertical Lines ax.vlines(x=1, ymin=500, ymax=13000, color='blackness', blastoff=0.seven, linewidth=1, linestyles='dotted') ax.vlines(x=3, ymin=500, ymax=13000, color='black', blastoff=0.seven, linewidth=1, linestyles='dotted')  # Points ax.scatter(y=df['1952'], x=np.echo(1, df.shape[0]), s=x, color='black', blastoff=0.7) ax.besprinkle(y=df['1957'], 10=np.repeat(3, df.shape[0]), s=10, color='black', blastoff=0.seven)  # Line Segmentsand Annotation for p1, p2, c in aught(df['1952'], df['1957'], df['continent']):     newline([1,p1], [3,p2])     ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14})     ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})  # 'Before' and 'After' Annotations ax.text(1-0.05, 13000, 'Before', horizontalalignment='right', verticalalignment='heart', fontdict={'size':18, 'weight':700}) ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})  # Ornamentation ax.set_title("Slopechart: Comparing GDP Per Capita betwixt 1952 vs 1957", fontdict={'size':22}) ax.fix(xlim=(0,4), ylim=(0,14000), ylabel='Mean Gdp Per Capita') ax.set_xticks([one,3]) ax.set_xticklabels(["1952", "1957"]) plt.yticks(np.arange(500, 13000, 2000), fontsize=12)  # Lighten borders plt.gca().spines["tiptop"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.0) plt.gca().spines["right"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.0) plt.bear witness()

19. Dumbbell Plot

Dumbbell plot conveys the 'earlier' and 'after' positions of diverse items forth with the rank ordering of the items. Its very useful if yous want to visualize the event of a detail project / initiative on different objects. Show Code

                  import matplotlib.lines as mlines  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/chief/wellness.csv") df.sort_values('pct_2014', inplace=Truthful) df.reset_index(inplace=Truthful)  # Func to draw line segment def newline(p1, p2, colour='black'):     ax = plt.gca()     l = mlines.Line2D([p1[0],p2[0]], [p1[one],p2[one]], color='skyblue')     ax.add_line(l)     return fifty  # Effigy and Axes fig, ax = plt.subplots(ane,1,figsize=(14,fourteen), facecolor='#f7f7f7', dpi= 80)  # Vertical Lines ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.10, ymin=0, ymax=26, color='blackness', alpha=1, linewidth=i, linestyles='dotted') ax.vlines(x=.15, ymin=0, ymax=26, colour='blackness', alpha=ane, linewidth=1, linestyles='dotted') ax.vlines(ten=.20, ymin=0, ymax=26, color='blackness', blastoff=1, linewidth=1, linestyles='dotted')  # Points ax.scatter(y=df['index'], ten=df['pct_2013'], due south=fifty, color='#0e668b', alpha=0.7) ax.besprinkle(y=df['index'], 10=df['pct_2014'], s=50, colour='#a3c4dc', blastoff=0.7)  # Line Segments for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']):     newline([p1, i], [p2, i])  # Decoration ax.set_facecolor('#f7f7f7') ax.set_title("Dumbell Chart: Percent Change - 2013 vs 2014", fontdict={'size':22}) ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Hateful GDP Per Capita') ax.set_xticks([.05, .one, .fifteen, .twenty]) ax.set_xticklabels(['five%', '15%', '20%', '25%']) ax.set_xticklabels(['5%', '15%', '20%', '25%'])     plt.show()

Distribution

xx. Histogram for Continuous Variable

Histogram shows the frequency distribution of a given variable. The beneath representation groups the frequency bars based on a chiselled variable giving a greater insight about the continuous variable and the categorical variable in tandem. Bear witness Lawmaking

                  # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/main/mpg_ggplot2.csv")  # Prepare data x_var = 'displ' groupby_var = 'grade' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg]  # Draw plt.figure(figsize=(16,9), dpi= 80) colors = [plt.cm.Spectral(i/float(len(vals)-ane)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, colour=colors[:len(vals)])  # Decoration plt.legend({grouping:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored past ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 25) plt.xticks(ticks=bins[::three], labels=[round(b,1) for b in bins[::3]]) plt.testify()

21. Histogram for Categorical Variable

The histogram of a categorical variable shows the frequency distribution of a that variable. Past coloring the bars, yous can visualize the distribution in connectedness with another chiselled variable representing the colors. Show Code

                  # Import Information df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Prepare data x_var = 'manufacturer' groupby_var = 'class' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg]  # Describe plt.figure(figsize=(16,9), dpi= eighty) colors = [plt.cm.Spectral(i/float(len(vals)-ane)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=Imitation, color=colors[:len(vals)])  # Decoration plt.legend({group:col for group, col in nil(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 40) plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left') plt.bear witness()

22. Density Plot

Density plots are a commonly used tool visualise the distribution of a continuous variable. By group them by the 'response' variable, you lot tin inspect the relationship between the X and the Y. The below instance if for representational purpose to draw how the distribution of city mileage varies with respect the number of cylinders. Show Code

                  # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Draw Plot plt.figure(figsize=(16,10), dpi= 80) sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=Truthful, color="grand", label="Cyl=4", alpha=.vii) sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, colour="deeppink", label="Cyl=v", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == half-dozen, "cty"], shade=Truthful, colour="dodgerblue", label="Cyl=6", alpha=.vii) sns.kdeplot(df.loc[df['cyl'] == viii, "cty"], shade=True, color="orangish", label="Cyl=viii", alpha=.7)  # Ornamentation plt.title('Density Plot of Urban center Mileage by n_Cylinders', fontsize=22) plt.legend() plt.testify()

23. Density Curves with Histogram

Density curve with histogram brings together the collective information conveyed by the ii plots so yous can have them both in a single figure instead of two. Testify Code

                  # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/chief/mpg_ggplot2.csv")  # Draw Plot plt.figure(figsize=(13,x), dpi= 80) sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['form'] == 'suv', "cty"], color="orangish", label="SUV", hist_kws={'blastoff':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['course'] == 'minivan', "cty"], color="m", label="minivan", hist_kws={'alpha':.vii}, kde_kws={'linewidth':3}) plt.ylim(0, 0.35)  # Decoration plt.championship('Density Plot of Urban center Mileage by Vehicle Type', fontsize=22) plt.fable() plt.evidence()

24. Joy Plot

Joy Plot allows the density curves of different groups to overlap, it is a not bad way to visualize the distribution of a larger number of groups in relation to each other. It looks pleasing to the eye and conveys just the correct information clearly. Information technology can be hands congenital using the joypy package which is based on matplotlib.

                # !pip install joypy # Import Data mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Draw Plot plt.effigy(figsize=(16,x), dpi= 80) fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='ain', figsize=(14,ten))  # Ornamentation plt.title('Joy Plot of City and Highway Mileage by Grade', fontsize=22) plt.show()

25. Distributed Dot Plot

Distributed dot plot shows the univariate distribution of points segmented past groups. The darker the points, more is the concentration of data points in that region. Past coloring the median differently, the existent positioning of the groups becomes apparent instantly. Evidence Code

                  import matplotlib.patches as mpatches  # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") cyl_colors = {four:'tab:red', 5:'tab:green', vi:'tab:blue', 8:'tab:orange'} df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)  # Mean and Median city mileage by make df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').utilise(lambda ten: x.hateful()) df.sort_values('cty', ascending=False, inplace=True) df.reset_index(inplace=True) df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())  # Draw horizontal lines fig, ax = plt.subplots(figsize=(16,ten), dpi= 80) ax.hlines(y=df.index, xmin=0, xmax=40, colour='gray', alpha=0.five, linewidth=.v, linestyles='dashdot')  # Describe the Dots for i, make in enumerate(df.manufacturer):     df_make = df_raw.loc[df_raw.manufacturer==make, :]     ax.besprinkle(y=np.repeat(i, df_make.shape[0]), ten='cty', information=df_make, s=75, edgecolors='grayness', c='westward', alpha=0.v)     ax.scatter(y=i, x='cty', information=df_median.loc[df_median.index==make, :], s=75, c='firebrick')  # Comment     ax.text(33, thirteen, "$ruddy \; dots \; are \; the \: median$", fontdict={'size':12}, colour='firebrick')  # Decorations red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, colour='firebrick', characterization="Median") plt.legend(handles=red_patch) ax.set_title('Distribution of City Mileage past Make', fontdict={'size':22}) ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7) ax.set_yticks(df.alphabetize) ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, blastoff=0.7) ax.set_xlim(i, 40) plt.xticks(alpha=0.seven) plt.gca().spines["acme"].set_visible(Imitation)     plt.gca().spines["lesser"].set_visible(Faux)     plt.gca().spines["right"].set_visible(Imitation)     plt.gca().spines["left"].set_visible(False)    plt.grid(axis='both', alpha=.4, linewidth=.1) plt.show()

26. Box Plot

Box plots are a great way to visualize the distribution, keeping the median, 25th 75th quartiles and the outliers in heed. However, you need to be careful about interpreting the size the boxes which can potentially misconstrue the number of points contained within that grouping. So, manually providing the number of observations in each box can help overcome this drawback. For example, the first two boxes on the left take boxes of the same size fifty-fifty though they have 5 and 47 obs respectively. And so writing the number of observations in that grouping becomes necessary.

                # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Depict Plot plt.figure(figsize=(13,ten), dpi= eighty) sns.boxplot(x='class', y='hwy', data=df, notch=Simulated)  # Add N Obs inside boxplot (optional) def add_n_obs(df,group_col,y):     medians_dict = {grp[0]:grp[1][y].median() for grp in df.groupby(group_col)}     xticklabels = [10.get_text() for x in plt.gca().get_xticklabels()]     n_obs = df.groupby(group_col)[y].size().values     for (x, xticklabel), n_ob in nada(enumerate(xticklabels), n_obs):         plt.text(x, medians_dict[xticklabel]*ane.01, "#obs : "+str(n_ob), horizontalalignment='center', fontdict={'size':14}, color='white')  add_n_obs(df,group_col='class',y='hwy')      # Decoration plt.championship('Box Plot of Highway Mileage by Vehicle Class', fontsize=22) plt.ylim(10, 40) plt.show()

27. Dot + Box Plot

Dot + Box plot Conveys similar data as a boxplot split in groups. The dots, in add-on, gives a sense of how many data points lie within each group.

                # Import Information df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Depict Plot plt.figure(figsize=(13,ten), dpi= 80) sns.boxplot(x='course', y='hwy', data=df, hue='cyl') sns.stripplot(x='class', y='hwy', data=df, colour='black', size=3, jitter=i)  for i in range(len(df['form'].unique())-1):     plt.vlines(i+.5, x, 45, linestyles='solid', colors='grayness', alpha=0.2)  # Ornament plt.title('Box Plot of Highway Mileage by Vehicle Course', fontsize=22) plt.fable(title='Cylinders') plt.show()

28. Violin Plot

Violin plot is a visually pleasing alternative to box plots. The shape or surface area of the violin depends on the number of observations information technology holds. Even so, the violin plots tin be harder to read and it not commonly used in professional settings.

                # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Describe Plot plt.figure(figsize=(13,10), dpi= eighty) sns.violinplot(ten='class', y='hwy', data=df, scale='width', inner='quartile')  # Ornamentation plt.title('Violin Plot of Highway Mileage past Vehicle Course', fontsize=22) plt.show()

29. Population Pyramid

Population pyramid tin be used to show either the distribution of the groups ordered past the volumne. Or information technology tin can also be used to evidence the stage-by-stage filtering of the population as information technology is used below to evidence how many people laissez passer through each stage of a marketing funnel.

                # Read data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")  # Draw Plot plt.figure(figsize=(13,10), dpi= 80) group_col = 'Gender' order_of_bars = df.Stage.unique()[::-i] colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-one)) for i in range(len(df[group_col].unique()))]  for c, group in nil(colors, df[group_col].unique()):     sns.barplot(10='Users', y='Phase', data=df.loc[df[group_col]==group, :], social club=order_of_bars, colour=c, label=grouping)  # Decorations     plt.xlabel("$Users$") plt.ylabel("Stage of Purchase") plt.yticks(fontsize=12) plt.championship("Population Pyramid of the Marketing Funnel", fontsize=22) plt.legend() plt.show()

30. Categorical Plots

Categorical plots provided by the seaborn library can be used to visualize the counts distribution of 2 ore more categorical variables in relation to each other.

                # Load Dataset titanic = sns.load_dataset("titanic")  # Plot g = sns.catplot("alive", col="deck", col_wrap=4,                 data=titanic[titanic.deck.notnull()],                 kind="count", height=3.v, aspect=.viii,                  palette='tab20')  fig.suptitle('sf') plt.show()

                # Load Dataset titanic = sns.load_dataset("titanic")  # Plot sns.catplot(ten="historic period", y="embark_town",             hue="sex", col="class",             information=titanic[titanic.embark_town.notnull()],             orient="h", height=5, aspect=1, palette="tab10",             kind="violin", dodge=True, cut=0, bw=.2)

Composition

31. Waffle Chart

The waffle chart can be created using the pywaffle package and is used to evidence the compositions of groups in a larger population. Show Lawmaking

                  #! pip install pywaffle # Reference: https://stackoverflow.com/questions/41400136/how-to-do-waffle-charts-in-python-foursquare-piechart from pywaffle import Waffle  # Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Prepare Data df = df_raw.groupby('class').size().reset_index(proper noun='counts') n_categories = df.shape[0] colors = [plt.cm.inferno_r(i/float(n_categories)) for i in range(n_categories)]  # Draw Plot and Decorate fig = plt.figure(     FigureClass=Waffle,     plots={         '111': {             'values': df['counts'],             'labels': ["{0} ({ane})".format(due north[0], n[ane]) for n in df[['class', 'counts']].itertuples()],             'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12},             'title': {'label': '# Vehicles by Form', 'loc': 'centre', 'fontsize':18}         },     },     rows=vii,     colors=colors,     figsize=(16, 9) )

Show Lawmaking

                  #! pip install pywaffle from pywaffle import Waffle  # Import # df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Prepare Data # By Class Information df_class = df_raw.groupby('class').size().reset_index(name='counts_class') n_categories = df_class.shape[0] colors_class = [plt.cm.Set3(i/bladder(n_categories)) for i in range(n_categories)]  # By Cylinders Information df_cyl = df_raw.groupby('cyl').size().reset_index(name='counts_cyl') n_categories = df_cyl.shape[0] colors_cyl = [plt.cm.Spectral(i/float(n_categories)) for i in range(n_categories)]  # By Make Data df_make = df_raw.groupby('manufacturer').size().reset_index(name='counts_make') n_categories = df_make.shape[0] colors_make = [plt.cm.tab20b(i/float(n_categories)) for i in range(n_categories)]   # Draw Plot and Decorate fig = plt.figure(     FigureClass=Waffle,     plots={         '311': {             'values': df_class['counts_class'],             'labels': ["{ane}".format(n[0], north[1]) for n in df_class[['class', 'counts_class']].itertuples()],             'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, ane), 'fontsize': 12, 'title':'Class'},             'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18},             'colors': colors_class         },         '312': {             'values': df_cyl['counts_cyl'],             'labels': ["{one}".format(n[0], n[1]) for north in df_cyl[['cyl', 'counts_cyl']].itertuples()],             'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Cyl'},             'title': {'label': '# Vehicles by Cyl', 'loc': 'heart', 'fontsize':18},             'colors': colors_cyl         },         '313': {             'values': df_make['counts_make'],             'labels': ["{1}".format(due north[0], n[ane]) for n in df_make[['manufacturer', 'counts_make']].itertuples()],             'legend': {'loc': 'upper left', 'bbox_to_anchor': (ane.05, 1), 'fontsize': 12, 'title':'Manufacturer'},             'title': {'label': '# Vehicles by Brand', 'loc': 'eye', 'fontsize':eighteen},             'colors': colors_make         }     },     rows=9,     figsize=(sixteen, 14) )

32. Pie Chart

Pie chart is a classic way to prove the limerick of groups. However, its not generally advisable to apply nowadays because the area of the pie portions tin sometimes become misleading. So, if you lot are to apply pie nautical chart, its highly recommended to explicitly write down the pct or numbers for each portion of the pie.

                # Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Prepare Information df = df_raw.groupby('course').size()  # Make the plot with pandas df.plot(kind='pie', subplots=Truthful, figsize=(8, viii), dpi= fourscore) plt.title("Pie Nautical chart of Vehicle Grade - Bad") plt.ylabel("") plt.show()

Show Code

                  # Import df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Prepare Data df = df_raw.groupby('class').size().reset_index(proper noun='counts')  # Draw Plot fig, ax = plt.subplots(figsize=(12, 7), subplot_kw=dict(aspect="equal"), dpi= 80)  data = df['counts'] categories = df['class'] explode = [0,0,0,0,0,0.1,0]  def func(pct, allvals):     absolute = int(pct/100.*np.sum(allvals))     return "{:.1f}% ({:d} )".format(pct, absolute)  wedges, texts, autotexts = ax.pie(data,                                    autopct=lambda pct: func(per centum, data),                                   textprops=dict(colour="w"),                                    colors=plt.cm.Dark2.colors,                                  startangle=140,                                  explode=explode)  # Ornament ax.legend(wedges, categories, title="Vehicle Class", loc="middle left", bbox_to_anchor=(one, 0, 0.5, 1)) plt.setp(autotexts, size=ten, weight=700) ax.set_title("Class of Vehicles: Pie Chart") plt.evidence()

33. Treemap

Tree map is similar to a pie chart and information technology does a better piece of work without misleading the contributions by each group.

                # pip install squarify import squarify   # Import Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")  # Ready Data df = df_raw.groupby('class').size().reset_index(proper name='counts') labels = df.utilise(lambda x: str(10[0]) + "\n (" + str(ten[i]) + ")", axis=1) sizes = df['counts'].values.tolist() colors = [plt.cm.Spectral(i/float(len(labels))) for i in range(len(labels))]  # Draw Plot plt.effigy(figsize=(12,8), dpi= eighty) squarify.plot(sizes=sizes, characterization=labels, colour=colors, alpha=.8)  # Decorate plt.title('Treemap of Vechile Course') plt.centrality('off') plt.testify()

34. Bar Chart

Bar nautical chart is a classic style of visualizing items based on counts or any given metric. In beneath chart, I have used a different color for each item, only you might typically want to pick one color for all items unless you lot to color them past groups. The colour names become stored within all_colors in the lawmaking below. You can change the colour of the bars past setting the color parameter in plt.plot(). Show Code

                  import random  # Import Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/chief/mpg_ggplot2.csv")  # Prepare Data df = df_raw.groupby('manufacturer').size().reset_index(name='counts') n = df['manufacturer'].unique().__len__()+one all_colors = list(plt.cm.colors.cnames.keys()) random.seed(100) c = random.choices(all_colors, one thousand=northward)  # Plot Bars plt.figure(figsize=(16,x), dpi= 80) plt.bar(df['manufacturer'], df['counts'], colour=c, width=.v) for i, val in enumerate(df['counts'].values):     plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})  # Ornament plt.gca().set_xticklabels(df['manufacturer'], rotation=sixty, horizontalalignment= 'correct') plt.title("Number of Vehicles by Manaufacturers", fontsize=22) plt.ylabel('# Vehicles') plt.ylim(0, 45) plt.show()

Alter

35. Fourth dimension Series Plot

Time series plot is used to visualise how a given metric changes over fourth dimension. Here you can see how the Air Rider traffic inverse between 1949 and 1969. Show Code

                  # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/principal/AirPassengers.csv')  # Draw Plot plt.effigy(figsize=(xvi,10), dpi= 80) plt.plot('date', 'traffic', data=df, color='tab:ruby')  # Decoration plt.ylim(50, 750) xtick_location = df.index.tolist()[::12] xtick_labels = [10[-4:] for ten in df.appointment.tolist()[::12]] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7) plt.yticks(fontsize=12, alpha=.vii) plt.title("Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='both', alpha=.iii)  # Remove borders plt.gca().spines["top"].set_alpha(0.0)     plt.gca().spines["bottom"].set_alpha(0.three) plt.gca().spines["right"].set_alpha(0.0)     plt.gca().spines["left"].set_alpha(0.iii)    plt.show()

36. Time Series with Peaks and Troughs Annotated

The below time series plots all the the peaks and troughs and annotates the occurence of selected special events. Show Code

                  # Import Information df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')  # Go the Peaks and Troughs data = df['traffic'].values doublediff = np.diff(np.sign(np.unequal(data))) peak_locations = np.where(doublediff == -2)[0] + ane  doublediff2 = np.unequal(np.sign(np.diff(-1*data))) trough_locations = np.where(doublediff2 == -two)[0] + 1  # Draw Plot plt.figure(figsize=(16,10), dpi= lxxx) plt.plot('date', 'traffic', data=df, color='tab:blue', label='Air Traffic') plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.CARETUPBASE, color='tab:green', due south=100, label='Peaks') plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.CARETDOWNBASE, color='tab:ruddy', s=100, characterization='Troughs')  # Comment for t, p in zilch(trough_locations[1::5], peak_locations[::iii]):     plt.text(df.engagement[p], df.traffic[p]+fifteen, df.date[p], horizontalalignment='center', color='darkgreen')     plt.text(df.appointment[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred')  # Decoration plt.ylim(50,750) xtick_location = df.alphabetize.tolist()[::6] xtick_labels = df.date.tolist()[::vi] plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.seven) plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.yticks(fontsize=12, alpha=.7)  # Lighten borders plt.gca().spines["elevation"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["correct"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.3)  plt.legend(loc='upper left') plt.grid(centrality='y', alpha=.three) plt.show()

37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot

The ACF plot shows the correlation of the fourth dimension series with its own lags. Each vertical line (on the autocorrelation plot) represents the correlation between the series and its lag starting from lag 0. The blue shaded region in the plot is the significance level. Those lags that lie above the blue line are the significant lags. So how to interpret this? For AirPassengers, nosotros see upto 14 lags have crossed the blue line and then are significant. This means, the Air Passengers traffic seen upto fourteen years dorsum has an influence on the traffic seen today. PACF on the other had shows the autocorrelation of any given lag (of time series) confronting the electric current serial, but with the contributions of the lags-inbetween removed.

                from statsmodels.graphics.tsaplots import plot_acf, plot_pacf  # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')  # Describe Plot fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16,six), dpi= lxxx) plot_acf(df.traffic.tolist(), ax=ax1, lags=fifty) plot_pacf(df.traffic.tolist(), ax=ax2, lags=20)  # Decorate # lighten the borders ax1.spines["tiptop"].set_alpha(.iii); ax2.spines["top"].set_alpha(.3) ax1.spines["bottom"].set_alpha(.three); ax2.spines["bottom"].set_alpha(.3) ax1.spines["correct"].set_alpha(.three); ax2.spines["right"].set_alpha(.3) ax1.spines["left"].set_alpha(.iii); ax2.spines["left"].set_alpha(.3)  # font size of tick labels ax1.tick_params(axis='both', labelsize=12) ax2.tick_params(axis='both', labelsize=12) plt.evidence()

38. Cantankerous Correlation plot

Cross correlation plot shows the lags of 2 time series with each other. Show Code

                  import statsmodels.tsa.stattools as stattools  # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/bloodshed.csv') ten = df['mdeaths'] y = df['fdeaths']  # Compute Cantankerous Correlations ccs = stattools.ccf(10, y)[:100] nlags = len(ccs)  # Compute the Significance level # ref: https://stats.stackexchange.com/questions/3115/cross-correlation-significance-in-r/3128#3128 conf_level = 2 / np.sqrt(nlags)  # Draw Plot plt.figure(figsize=(12,7), dpi= fourscore)  plt.hlines(0, xmin=0, xmax=100, colour='gray')  # 0 axis plt.hlines(conf_level, xmin=0, xmax=100, color='gray') plt.hlines(-conf_level, xmin=0, xmax=100, color='grayness')  plt.bar(10=np.arange(len(ccs)), tiptop=ccs, width=.3)  # Decoration plt.title('$Cross\; Correlation\; Plot:\; mdeaths\; vs\; fdeaths$', fontsize=22) plt.xlim(0,len(ccs)) plt.testify()

39. Time Series Decomposition Plot

Time series decomposition plot shows the break downward of the time series into trend, seasonal and balance components.

                from statsmodels.tsa.seasonal import seasonal_decompose from dateutil.parser import parse  # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv') dates = pd.DatetimeIndex([parse(d).strftime('%Y-%1000-01') for d in df['appointment']]) df.set_index(dates, inplace=True)  # Decompose  event = seasonal_decompose(df['traffic'], model='multiplicative')  # Plot plt.rcParams.update({'figure.figsize': (10,x)}) result.plot().suptitle('Fourth dimension Series Decomposition of Air Passengers') plt.show()

twoscore. Multiple Time Serial

You lot can plot multiple time series that measures the aforementioned value on the aforementioned chart as shown below. Evidence Lawmaking

                  # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/primary/mortality.csv')  # Ascertain the upper limit, lower limit, interval of Y axis and colors y_LL = 100 y_UL = int(df.iloc[:, i:].max().max()*i.1) y_interval = 400 mycolors = ['tab:red', 'tab:blueish', 'tab:green', 'tab:orange']      # Draw Plot and Annotate fig, ax = plt.subplots(one,1,figsize=(16, 9), dpi= 80)      columns = df.columns[1:]   for i, cavalcade in enumerate(columns):         plt.plot(df.engagement.values, df[column].values, lw=ane.5, colour=mycolors[i])         plt.text(df.shape[0]+one, df[column].values[-1], cavalcade, fontsize=xiv, color=mycolors[i])  # Draw Tick lines   for y in range(y_LL, y_UL, y_interval):         plt.hlines(y, xmin=0, xmax=71, colors='black', alpha=0.3, linestyles="--", lw=0.5)  # Decorations     plt.tick_params(axis="both", which="both", bottom=Imitation, top=False,                     labelbottom=True, left=False, right=Simulated, labelleft=Truthful)          # Lighten borders plt.gca().spines["top"].set_alpha(.iii) plt.gca().spines["bottom"].set_alpha(.iii) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.iii)  plt.title('Number of Deaths from Lung Diseases in the UK (1974-1979)', fontsize=22) plt.yticks(range(y_LL, y_UL, y_interval), [str(y) for y in range(y_LL, y_UL, y_interval)], fontsize=12)     plt.xticks(range(0, df.shape[0], 12), df.date.values[::12], horizontalalignment='left', fontsize=12)     plt.ylim(y_LL, y_UL)     plt.xlim(-2, 80)     plt.show()

41. Plotting with different scales using secondary Y axis

If yous desire to show two time series that measures two dissimilar quantities at the same betoken in time, you can plot the 2d series againt the secondary Y axis on the right. Show Code

                  # Import Information df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economic science.csv")  ten = df['date'] y1 = df['psavert'] y2 = df['unemploy']  # Plot Line1 (Left Y Axis) fig, ax1 = plt.subplots(ane,1,figsize=(16,ix), dpi= 80) ax1.plot(x, y1, color='tab:red')  # Plot Line2 (Right Y Axis) ax2 = ax1.twinx()  # instantiate a 2nd axes that shares the same ten-axis ax2.plot(x, y2, color='tab:blue')  # Decorations # ax1 (left Y centrality) ax1.set_xlabel('Twelvemonth', fontsize=20) ax1.tick_params(axis='x', rotation=0, labelsize=12) ax1.set_ylabel('Personal Savings Rate', colour='tab:red', fontsize=xx) ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' ) ax1.grid(alpha=.4)  # ax2 (right Y axis) ax2.set_ylabel("# Unemployed (1000's)", color='tab:blueish', fontsize=20) ax2.tick_params(axis='y', labelcolor='tab:blueish') ax2.set_xticks(np.arange(0, len(ten), 60)) ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10}) ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=22) fig.tight_layout() plt.show()

42. Fourth dimension Serial with Error Bands

Time series with mistake bands can be constructed if you lot have a time series dataset with multiple observations for each time point (date / timestamp). Below you lot can see a couple of examples based on the orders coming in at various times of the day. And another example on the number of orders arriving over a duration of 45 days. In this approach, the hateful of the number of orders is denoted by the white line. And a 95% conviction bands are computed and drawn around the mean. Bear witness Code

                  from scipy.stats import sem  # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/user_orders_hourofday.csv") df_mean = df.groupby('order_hour_of_day').quantity.mean() df_se = df.groupby('order_hour_of_day').quantity.use(sem).mul(1.96)  # Plot plt.figure(figsize=(16,10), dpi= 80) plt.ylabel("# Orders", fontsize=16)   ten = df_mean.index plt.plot(ten, df_mean, color="white", lw=ii)  plt.fill_between(10, df_mean - df_se, df_mean + df_se, color="#3F5D7D")    # Decorations # Lighten borders plt.gca().spines["acme"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(1) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(1) plt.xticks(x[::two], [str(d) for d in x[::2]] , fontsize=12) plt.title("User Orders by Hour of Day (95% confidence)", fontsize=22) plt.xlabel("Hr of Day")  s, due east = plt.gca().get_xlim() plt.xlim(due south, east)  # Depict Horizontal Tick lines   for y in range(8, xx, 2):         plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.v, linestyles="--", lw=0.5)  plt.bear witness()

Bear witness Code

                  "Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv" from dateutil.parser import parse from scipy.stats import sem  # Import Data df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv',                       parse_dates=['purchase_time', 'purchase_date'])  # Prepare Data: Daily Mean and SE Bands df_mean = df_raw.groupby('purchase_date').quantity.mean() df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)  # Plot plt.figure(figsize=(16,10), dpi= lxxx) plt.ylabel("# Daily Orders", fontsize=16)   x = [d.date().strftime('%Y-%one thousand-%d') for d in df_mean.index] plt.plot(x, df_mean, color="white", lw=2)  plt.fill_between(ten, df_mean - df_se, df_mean + df_se, color="#3F5D7D")    # Decorations # Lighten borders plt.gca().spines["meridian"].set_alpha(0) plt.gca().spines["lesser"].set_alpha(i) plt.gca().spines["correct"].set_alpha(0) plt.gca().spines["left"].set_alpha(ane) plt.xticks(x[::vi], [str(d) for d in ten[::half dozen]] , fontsize=12) plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)  # Axis limits southward, e = plt.gca().get_xlim() plt.xlim(due south, e-ii) plt.ylim(4, 10)  # Describe Horizontal Tick lines   for y in range(5, 10, 1):         plt.hlines(y, xmin=southward, xmax=eastward, colors='black', alpha=0.v, linestyles="--", lw=0.5)  plt.evidence()

43. Stacked Area Chart

Stacked area chart gives an visual representation of the extent of contribution from multiple time series so that it is easy to compare against each other. Show Lawmaking

                  # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/nightvisitors.csv')  # Decide Colors  mycolors = ['tab:cerise', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:gray', 'tab:pink', 'tab:olive']        # Draw Plot and Comment fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= fourscore) columns = df.columns[1:] labs = columns.values.tolist()  # Prepare data x  = df['yearmon'].values.tolist() y0 = df[columns[0]].values.tolist() y1 = df[columns[1]].values.tolist() y2 = df[columns[2]].values.tolist() y3 = df[columns[3]].values.tolist() y4 = df[columns[4]].values.tolist() y5 = df[columns[5]].values.tolist() y6 = df[columns[six]].values.tolist() y7 = df[columns[seven]].values.tolist() y = np.vstack([y0, y2, y4, y6, y7, y5, y1, y3])  # Plot for each column labs = columns.values.tolist() ax = plt.gca() ax.stackplot(x, y, labels=labs, colors=mycolors, alpha=0.8)  # Decorations ax.set_title('Night Visitors in Australian Regions', fontsize=xviii) ax.set(ylim=[0, 100000]) ax.fable(fontsize=10, ncol=four) plt.xticks(ten[::5], fontsize=10, horizontalalignment='heart') plt.yticks(np.arange(10000, 100000, 20000), fontsize=10) plt.xlim(ten[0], ten[-1])  # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.iii) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3)  plt.show()

44. Expanse Chart UnStacked

An unstacked area chart is used to visualize the progress (ups and downs) of two or more than series with respect to each other. In the chart beneath, you can conspicuously see how the personal savings charge per unit comes down as the median duration of unemployment increases. The unstacked surface area chart brings out this phenomenon nicely. Show Code

                  # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")  # Gear up Information x = df['engagement'].values.tolist() y1 = df['psavert'].values.tolist() y2 = df['uempmed'].values.tolist() mycolors = ['tab:cherry', 'tab:blue', 'tab:dark-green', 'tab:orange', 'tab:dark-brown', 'tab:grey', 'tab:pinkish', 'tab:olive']       columns = ['psavert', 'uempmed']  # Draw Plot  fig, ax = plt.subplots(1, 1, figsize=(xvi,9), dpi= lxxx) ax.fill_between(x, y1=y1, y2=0, label=columns[one], alpha=0.five, color=mycolors[ane], linewidth=2) ax.fill_between(x, y1=y2, y2=0, label=columns[0], alpha=0.5, color=mycolors[0], linewidth=2)  # Decorations ax.set_title('Personal Savings Rate vs Median Duration of Unemployment', fontsize=eighteen) ax.set(ylim=[0, 30]) ax.legend(loc='best', fontsize=12) plt.xticks(x[::l], fontsize=10, horizontalalignment='heart') plt.yticks(np.arange(two.5, thirty.0, two.5), fontsize=ten) plt.xlim(-10, x[-1])  # Draw Tick lines   for y in np.arange(2.five, 30.0, two.5):         plt.hlines(y, xmin=0, xmax=len(x), colors='black', alpha=0.three, linestyles="--", lw=0.5)  # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.iii) plt.show()

45. Calendar Oestrus Map

Calendar map is an alternate and a less preferred selection to visualise fourth dimension based information compared to a time serial. Though can be visually appealing, the numeric values are non quite evident. It is however effective in picturising the extreme values and vacation effects nicely.

                import matplotlib as mpl import calmap  # Import Information df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date']) df.set_index('appointment', inplace=Truthful)  # Plot plt.effigy(figsize=(16,10), dpi= 80) calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (xvi,ten)}, yearlabel_kws={'colour':'blackness', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'}) plt.bear witness()

46. Seasonal Plot

The seasonal plot can exist used to compare how the time series performed at same day in the previous flavour (yr / month / week etc). Show Lawmaking

                  from dateutil.parser import parse   # Import Data df = pd.read_csv('https://github.com/selva86/datasets/raw/chief/AirPassengers.csv')  # Ready data df['year'] = [parse(d).twelvemonth for d in df.engagement] df['calendar month'] = [parse(d).strftime('%b') for d in df.appointment] years = df['year'].unique()  # Draw Plot mycolors = ['tab:blood-red', 'tab:blueish', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grayness', 'tab:pinkish', 'tab:olive', 'deeppink', 'steelblue', 'firebrick', 'mediumseagreen']       plt.figure(figsize=(16,ten), dpi= eighty)  for i, y in enumerate(years):     plt.plot('month', 'traffic', data=df.loc[df.year==y, :], colour=mycolors[i], label=y)     plt.text(df.loc[df.yr==y, :].shape[0]-.9, df.loc[df.year==y, 'traffic'][-i:].values[0], y, fontsize=12, color=mycolors[i])  # Ornamentation plt.ylim(50,750) plt.xlim(-0.3, 11) plt.ylabel('$Air Traffic$') plt.yticks(fontsize=12, alpha=.7) plt.title("Monthly Seasonal Plot: Air Passengers Traffic (1949 - 1969)", fontsize=22) plt.grid(axis='y', blastoff=.3)  # Remove borders plt.gca().spines["top"].set_alpha(0.0)     plt.gca().spines["bottom"].set_alpha(0.five) plt.gca().spines["right"].set_alpha(0.0)     plt.gca().spines["left"].set_alpha(0.5)    # plt.legend(loc='upper right', ncol=ii, fontsize=12) plt.show()

Groups

47. Dendrogram

A Dendrogram groups similar points together based on a given distance metric and organizes them in tree like links based on the point'south similarity.

                import scipy.cluster.hierarchy equally shc  # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/primary/USArrests.csv')  # Plot plt.figure(figsize=(16, 10), dpi= 80)   plt.title("USArrests Dendograms", fontsize=22)   dend = shc.dendrogram(shc.linkage(df[['Murder', 'Set on', 'UrbanPop', 'Rape']], method='ward'), labels=df.State.values, color_threshold=100)   plt.xticks(fontsize=12) plt.show()

48. Cluster Plot

Cluster Plot canbe used to demarcate points that vest to the same cluster. Beneath is a representational example to group the US states into 5 groups based on the USArrests dataset. This cluster plot uses the 'murder' and 'set on' columns as X and Y axis. Alternately you tin can use the first to principal components as rthe Ten and Y axis. Testify Code

                  from sklearn.cluster import AgglomerativeClustering from scipy.spatial import ConvexHull  # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/main/USArrests.csv')  # Agglomerative Clustering cluster = AgglomerativeClustering(n_clusters=five, analogousness='euclidean', linkage='ward')   cluster.fit_predict(df[['Murder', 'Assault', 'UrbanPop', 'Rape']])    # Plot plt.figure(figsize=(14, 10), dpi= 80)   plt.scatter(df.iloc[:,0], df.iloc[:,1], c=cluster.labels_, cmap='tab10')    # Encircle def encircle(x,y, ax=None, **kw):     if not ax: ax=plt.gca()     p = np.c_[10,y]     hull = ConvexHull(p)     poly = plt.Polygon(p[hull.vertices,:], **kw)     ax.add_patch(poly)  # Describe polygon surrounding vertices     encircle(df.loc[cluster.labels_ == 0, 'Murder'], df.loc[cluster.labels_ == 0, 'Assault'], ec="k", fc="gold", alpha=0.two, linewidth=0) encircle(df.loc[cluster.labels_ == 1, 'Murder'], df.loc[cluster.labels_ == 1, 'Assault'], ec="k", fc="tab:blue", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == ii, 'Murder'], df.loc[cluster.labels_ == 2, 'Assault'], ec="one thousand", fc="tab:carmine", blastoff=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == 3, 'Murder'], df.loc[cluster.labels_ == 3, 'Assail'], ec="k", fc="tab:light-green", alpha=0.2, linewidth=0) encircle(df.loc[cluster.labels_ == four, 'Murder'], df.loc[cluster.labels_ == 4, 'Assault'], ec="k", fc="tab:orange", alpha=0.2, linewidth=0)  # Decorations plt.xlabel('Murder'); plt.xticks(fontsize=12) plt.ylabel('Attack'); plt.yticks(fontsize=12) plt.title('Agglomerative Clustering of USArrests (5 Groups)', fontsize=22) plt.testify()

49. Andrews Bend

Andrews Curve helps visualize if at that place are inherent groupings of the numerical features based on a given grouping. If the features (columns in the dataset) doesn't help discriminate the group (cyl), then the lines volition not be well segregated as y'all see below.

                from pandas.plotting import andrews_curves  # Import df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") df.drop(['cars', 'carname'], axis=ane, inplace=Truthful)  # Plot plt.figure(figsize=(12,9), dpi= lxxx) andrews_curves(df, 'cyl', colormap='Set1')  # Lighten borders plt.gca().spines["acme"].set_alpha(0) plt.gca().spines["lesser"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.3)  plt.title('Andrews Curves of mtcars', fontsize=22) plt.xlim(-3,iii) plt.filigree(alpha=0.3) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.show()

l. Parallel Coordinates

Parallel coordinates helps to visualize if a feature helps to segregate the groups effectively. If a segregation is effected, that feature is likely going to be very useful in predicting that group.

                from pandas.plotting import parallel_coordinates  # Import Data df_final = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/principal/diamonds_filter.csv")  # Plot plt.figure(figsize=(12,nine), dpi= 80) parallel_coordinates(df_final, 'cut', colormap='Dark2')  # Lighten borders plt.gca().spines["top"].set_alpha(0) plt.gca().spines["lesser"].set_alpha(.3) plt.gca().spines["right"].set_alpha(0) plt.gca().spines["left"].set_alpha(.three)  plt.title('Parallel Coordinated of Diamonds', fontsize=22) plt.grid(blastoff=0.3) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.testify()

That'south all for now! If you encounter some error or bug please notify hither.

Selva Prabhakaran

Selva is the Chief Writer and Editor of Car Learning Plus, with 4 Million+ readership. He has authored courses and books with100K+ students, and is the Primary Information Scientist of a global firm.