Pandas Bokeh: High-Level Charting API for Bokeh with Pandas Integration

Introduction

Pandas Bokeh is a high-level charting library that integrates Bokeh’s interactive plotting capabilities into Pandas. With just a few additional lines of code, you can transform static visualizations into dynamic and engaging plots. Pandas Bokeh enhances Pandas’ .plot() function by allowing users to create interactive visualizations effortlessly.

Pandas Bokeh supports a wide range of charts such as bar charts, line graphs, scatter plots, histogram, pie plot, area plot, map plot, and even geo plots.

Installation

To install Pandas Bokeh, simply run:

pip insatll python-bokeh

After installation, you need to enable Bokeh for Pandas by executing:

import pandas as pd
import pandas_bokeh
import numpy as np

pd.set_option("plotting.backend", "pandas_bokeh")
pandas_bokeh.output_notebook()
Loading BokehJS ...

Key Features & Explanation

  1. Interactive Visualizations
  • Enables zooming, panning, and tooltips for enhanced user experience.

  • Interactive legends to toggle visibility of data points.

  • Hover tools to display precise data values dynamically.

  1. Seamless Pandas Integration
  • Works directly with Pandas DataFrames and Series, reducing the need for additional data transformations.

  • Provides a simple API for plotting without complex configuration.

  1. Multiple Plot Types
  • Supports various charts such as:

    • Line charts for trend analysis.

    • Bar charts for categorical comparisons.

    • Scatter plots for correlation analysis.

    • Histograms for frequency distributions.

    • Geospatial maps for location-based data visualization.

  1. Output Flexibility
  • Supports rendering plots in:

    • Jupyter Notebooks for quick prototyping.

    • Standalone HTML files for easy sharing.

    • Web applications via Bokeh server for dynamic dashboards.

  1. Customization Options
  • Allows fine-tuning of:

    • Colors, labels, titles, and legends to improve readability.

    • Tooltips and hover information for deeper insights.

    • Themes and styles to match branding and aesthetic preferences.

  1. Streaming & Real-Time Data
  • Enables live data updates, ideal for:

    • Financial dashboards tracking stock price fluctuations.

    • IoT applications monitoring sensor data in real-time.

    • Live analytics dashboards in business intelligence.

  1. Map Visualizations
  • Provides interactive geospatial mapping functionality.

  • Supports plotting geographic data points with overlays.

  • Useful for applications like demographic analysis and logistics tracking.

Code Examples

Line plot

# Synthetic stock prices
df = pd.DataFrame({
    "Samsung": np.random.randn(600) + 0.3, 
    "Apple": np.random.randn(600) + 0.55
}, index=pd.date_range('1/1/2010', periods=600))

# Simulate stock prices with a different starting value
df = df.cumsum() + 20 

# Plot Samsung and Apple stock prices
df.plot_bokeh(kind="line",
              title="Stock Prices of Samsung & Apple",
              xlabel="Date",
              ylabel="Stock Price ($)",
              colormap=["blue", "red"],
              figsize=(900, 500))
Figure(
id = '1247', …)
  • The above code generates and visualizes synthetic stock price data using pandas-bokeh.
  • The plot is interactive, allowing zooming, panning, hovertool and tooltips.
  • On clicking the legend elements, the particular graphs hides.
  • Unlike matplotlib, pandas-bokeh provides dynamic plot.
  • These types of plots are commonly used for stock price trends, weather data, and sensor readings.
df.plot_bokeh(rangetool=True,
              colormap=["blue", "red"],
              figsize=(900, 500))
Column(
id = '1693', …)
  • Some advanced line plots like point plot or lineplot with rangetool helps visualize the data more easily.
  • The above plot is lineplot with range tool, we can change the range of data above by dragging the box

Histogram

# Generate synthetic exam score data for two subjects
df = pd.DataFrame({
    "Math Scores": np.random.normal(75, 10, 1000),  
    "Science Scores": np.random.normal(80, 12, 1000)  
})

# Plot Exam Score Distribution (topontop)
df.plot_bokeh.hist(
    bins=30, 
    histogram_type="topontop",
    title="Exam Score Distribution: Math vs Science (topontop)",
    xlabel="Exam Scores", 
    ylabel="Number of Students",
    colormap=["blue", "green"],
    figsize=(900, 500),
    line_color="black"
)

# Plot Exam Score Distribution (Sidebyside)
df.plot_bokeh.hist(
    bins=30,
    histogram_type="sidebyside",
    title="Exam Score Distribution: Math vs Science (Sidebyside)",
    xlabel="Exam Scores", 
    ylabel="Number of Students",
    colormap=["blue", "green"],
    figsize=(900, 500),
    line_color="black"
)

# Plot Exam Score Distribution (Stacked)
df.plot_bokeh.hist(
    bins=30,
    alpha=0.6,
    histogram_type="stacked",
    stacked=True, 
    title="Exam Score Distribution: Math vs Science (Stacked)",
    xlabel="Exam Scores", 
    ylabel="Number of Students",
    colormap=["blue", "green"],
    figsize=(900, 500),
    line_color="black"
)
Figure(
id = '2416', …)
  • For histograms (kind=“hist”) or .plot_bokeh.hist, pandas-bokeh has a lot of different features including type of histogram_type (“sidebyside”, “topontop” or “stacked”)
  • Users can hover over bars to see exact values for deep data analysis.
  • Unlike matplotlib, it is dynamic, zooming, hovering and other features makes it more interactive than normal matplot

Pie plot

data = {
    "Company": ["Apple", "Samsung", "Huawei", "Xiaomi", "Oppo", "Others"],
    "2000": [0, 5, 1, 0, 0, 94],
    "2005": [2, 15, 5, 0, 0, 78],
    "2010": [25, 30, 10, 5, 3, 27],
    "2015": [45, 20, 15, 8, 5, 7],
    "2020": [50, 18, 12, 10, 7, 3]
}

df_market = pd.DataFrame(data)

df_market.plot_bokeh.pie(
    x="Company",
    y="2020",
    colormap=["red", "blue", "green", "orange", "purple", "grey"],
    title="Smartphone Market Share in 2020"
)
__x__values_original
Figure(
id = '1188', …)
df_market.plot_bokeh.pie(
    x="Company",
    colormap=["red", "blue", "green", "orange", "purple", "grey"],
    title="Smartphone Market Share (2000-2020)",
    line_color="grey"
)
__x__values_original
Figure(
id = '2687', …)

Bar plot

data = {
    "Region": ["North America", "Europe", "Asia-Pacific", "Latin America", "Middle East & Africa"],
    "2015": [120, 150, 500, 80, 60],
    "2016": [130, 160, 550, 85, 65],
    "2017": [140, 170, 600, 90, 70],
    "2018": [150, 180, 650, 100, 75],
    "2019": [160, 190, 700, 110, 80],
    "2020": [170, 200, 750, 120, 85]
}

df_sales = pd.DataFrame(data).set_index("Region")

df_sales.plot_bokeh.bar(
    ylabel="Sales (in millions)",
    title="Smartphone Sales by Region in 2020",
    alpha=0.6
)
Figure(
id = '3239', …)
df_sales.plot_bokeh.bar(
    ylabel="Sales (in millions)",
    title="Smartphone Sales by Region (2015-2020)",
    stacked=True,
    alpha=0.6
)
Figure(
id = '3615', …)
df_sales.reset_index(inplace=True)

df_sales.plot_bokeh(
    kind="barh",
    x="Region",
    xlabel="Sales (in millions)",
    title="Smartphone Sales by Region (2015-2020)",
    alpha=0.6,
    legend="bottom_right"
)
Figure(
id = '4387', …)

Scatter plot

Scatter plots help visualize the relationship between two numerical variables. With Pandas-Bokeh, we can create interactive scatter plots that allow zooming and hovering over data points.

# Age vs. Annual Spending
np.random.seed(42)  # For reproducibility
df = pd.DataFrame({
    "Age": np.random.randint(18, 65, 100),  # Random ages between 18 and 65
    "Annual_Spending": np.random.normal(50000, 15000, 100).astype(int),  # Normally distributed spending
    "Spending_Category": np.random.choice(["Low", "Medium", "High"], 100, p=[0.3, 0.5, 0.2])  # Spending categories
})

# Create scatter plot
df.plot_bokeh.scatter(
    x="Age",
    y="Annual_Spending",
    category="Spending_Category",
    title="Customer Age vs. Annual Spending",
    colormap=["green", "orange", "blue"],
    size=6,
    hovertool_string="Age: @Age <br> Annual_Spending: @Annual_Spending <br> Spending_Category: @Spending_Category"
)
Figure(
id = '4174', …)
import numpy as np

# Generate synthetic data
df = pd.DataFrame({
    "Age": np.random.randint(20, 60, 200),
    "Income": np.random.randint(30000, 100000, 200)
})

# Create scatter plot
df.plot_bokeh.scatter(
    x="Age", y="Income", title="Age vs. Income Scatter Plot",
    size=3, hovertool_string="Age: @Age <br> Income: @Income"

)
Figure(
id = '3384', …)

GeoPLots

Pandas-Bokeh also allows for interactive plotting of Maps using GeoPandas by providing a geopandas.GeoDataFrame.plot_bokeh() method. It allows to plot the following geodata on a map :

  • Points/MultiPoints
  • Lines/MultiLines
  • Polygons/MultiPolygons
import geopandas as gpd
import warnings
warnings.filterwarnings("ignore", message="The 'type' attribute is deprecated")
warnings.filterwarnings("ignore", category=UserWarning)

# Read in GeoJSONs from URL:
df_states = gpd.read_file("https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/states/states.geojson")
df_cities = gpd.read_file("https://raw.githubusercontent.com/PatrikHlobil/Pandas-Bokeh/master/docs/Testdata/populated%20places/ne_10m_populated_places_simple_bigcities.geojson")

df_cities["size"] = df_cities.pop_max / 400000  # Normalize city population for size mapping
df_states["geometry"] = df_states["geometry"].apply(
    lambda geom: geom if geom.geom_type == "Polygon" else geom.convex_hull
)

# Plot shapes of US states (pass figure options to this initial plot):
figure = df_states.plot_bokeh(
    figsize=(800, 450),
    simplify_shapes=10000,
    show_figure=False,
    xlim=[-170, -80],
    ylim=[10, 70],
    category="REGION",
    colormap="Dark2",
    legend="States",
    show_colorbar=False,
)

# Plot cities as points on top of the US states layer by passing the figure:
df_cities.plot_bokeh(
    figure=figure,         # <== pass figure here!
    category="pop_max",
    colormap="Viridis",
    colormap_uselog=True,
    size="size",
    hovertool_string="""<h1>@name</h1>
                        <h3>Population: @pop_max </h3>""",
    marker="inverted_triangle",
    legend="Cities",
)
Figure(
id = '2290', …)

Combining Pandas-Bokeh with GeoPandas allows visualization of city locations, travel routes, population density, etc.

# Explode MultiPolygons into individual Polygons
df_airports = gpd.read_file("https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json")
df_airports = df_airports.explode(index_parts=False)  # Explode MultiPolygons
df_airports['country_id'] = pd.factorize(df_airports['name'])[0]
df_airports.plot_bokeh(category="country_id", colormap="Category20")
Figure(
id = '2510', …)

Conclusion

Pandas-Bokeh offers an effortless means to introduce interactivity into Pandas-based plots, enhancing data analysis to be more interactive and insightful. With Pandas-Bokeh, the strong charting features of Bokeh are coupled with Pandas, allowing users to produce dynamic plots with little effort. With its support for various chart types, real-time streaming of data, and geospatial visualizations, it is a great asset for analysts, data scientists, and researchers. Whether used within Jupyter Notebooks, standalone HTML documents, or web applications powered by Bokeh, Pandas-Bokeh adds storytelling power to data visualization.

Use-Cases

  • Exploratory Data Analysis (EDA)
    • Quickly generate interactive plots to analyze patterns and trends in datasets.
    • Use hover tools and zooming to inspect outliers and distributions.
  • Time-Series Analysis
    • Create interactive line charts for financial data, stock trends, and sensor readings.
    • Utilize range tools to dynamically adjust the displayed time window.
  • Geospatial Data Visualization
    • Visualize geographic distributions with interactive map plots.
    • Overlay city locations, population density, and regional data on maps.
  • Real-Time Data Monitoring
    • Build dashboards for stock market tracking, IoT data monitoring, and business intelligence.
    • Update plots dynamically with live-streamed data sources.
  • Business Intelligence & Reporting
    • Enhance reports with interactive bar, pie, and scatter plots for better decision-making.
    • Allow users to filter and zoom into key insights effortlessly.

By leveraging Pandas-Bokeh, users can transform static visualizations into interactive experiences, making data exploration more intuitive and impactful.

References

Pandas-Bokeh : https://github.com/PatrikHlobil/Pandas-Bokeh