Create A Pandas Dataframe AI Agent With Generative AI, Python And OpenAI

Written by Matt Dancho



Hey guys, this is the first article in my NEW GenAI / ML Tips Newsletter. Today, we’re diving into the world of Generative AI and exploring how it can help companies automate common data science tasks. Specifically, we’ll learn how to create a Pandas dataframe agent that can answer questions about your dataset using Python, Pandas, LangChain, and OpenAI’s API. Let’s get started!

Table of Contents

Here’s what you’ll learn in this article:

This is what you are making today

We’ll use this Generative AI Workflow to combine data (from CSVs or SQL databases) with a Pandas Data Frame Agent that helps us produce common analytics outputs like visualizations and reports.

Make A Pandas Data Analysis Agent with Python and Generative AI

Get the Code (In the AI-Tip 001 Folder)


SPECIAL ANNOUNCEMENT: AI for Data Scientists Workshop on December 18th

Inside the workshop I’ll share how I built a SQL-Writing Business Intelligence Agent with Generative AI:

Generative AI for Data Scientists

What: GenAI for Data Scientists

When: Wednesday December 18th, 2pm EST

How It Will Help You: Whether you are new to data science or are an expert, Generative AI is changing the game. There’s a ton of hype. But how can Generative AI actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free Generative AI for Data Scientists workshop.

Price: Does Free sound good?

How To Join: 👉 Register Here


GenAI/ML-Tips Weekly

This article is part of GenAI/ML Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common Data Science and Generative AI coding tasks. Pretty cool, right?

Here is the link to get set up. 👇

Get the code

Get the Code (In the GenAI/ML Tip 001 Folder)

This Tutorial is Available in Video (9-minutes)

I have a 9-minute video that walks you through setting up the Pandas Data Frame Agent and running data analysis with it. 👇

Why Generative AI is Transforming Data Science

Generative AI, powered by models like OpenAI’s GPT series, is reshaping the data science landscape. These models can understand and generate human-like text, making it possible to interact with data in more intuitive ways. By integrating Generative AI into data science, you can:

  • Automate Data Insights: Quickly generate summaries and insights from complex datasets.
  • Enhance Decision Making: Obtain answers to specific questions without manually sifting through data.
  • Improve Accessibility: Make data science more accessible to non-technical stakeholders.

Creating a Pandas dataframe agent combines the power of AI with data science, enabling you to unlock new possibilities in data exploration and interpretation from Natural Language.

What is a Pandas Data Frame Agent?

A Pandas Data Frame Agent automates common Pandas operations from Natural Language inputs.

It can be used to perform:

  • GroupBy + Aggregate
  • Math calculations (that normal LLMs struggle with)
  • Filters
  • Pivots
  • Window calculations
  • Resampling (Time Series)
  • Binning
  • Log Transformations
  • Summary Statistics (Mean, Median, IQR, Min/Max, Count (frequency), etc)

All from Natural Language prompts.

Make A Pandas Data Frame Agent

Let’s walk through the steps to create a Pandas data frame agent that can answer questions about a dataset using Python, OpenAI’s API, Pandas, and LangChain.

Quick Reminder: You can get all of the code and datasets shown in a Python Script and Jupyter Notebook when you join my GenAI/ML Tips Newsletter.

Code Location: /001_pandas_dataframe_agent

Step 1: Setting Up the Python Environment

First, you’ll need to set up your Python environment and install the required libraries.

pip install openai langchain langchain_openai langchain_experimental pandas plotly pyyaml

Next, import the libraries.

Libraries

Then run this to access our utility function, parse_json_to_dataframe().

Utility Function

The last part is to set up your OpenAI API Key. Make sure to get an API Key from OpenAI’s API website.

OpenAI API Key

Note: Replace ‘credentials.yml’ with the path to your YAML file containing the OpenAI API key or set the ‘OPENAI_API_KEY’ environment variable directly.

Step 2: Loading and Exploring the Dataset

Load your dataset into a Pandas DataFrame. For this tutorial, we’ll use a sample customer data CSV file. But you could easily use any data that you can get into a Pandas Data Frame:

  • SQL Database
  • CSV
  • Excel File

Run this code to load the customer dataset:

Load The Customer Dataset

This dataset contains customer information, including sales and geography data.

Step 3: Create the Pandas Data Analysis Agent with LangChain

Initialize the language model and create the Pandas data analysis agent using LangChain.

Create The Pandas Data Frame Agent

This is what’s happening:

  • ChatOpenAI: Initializes the OpenAI language model.
  • create_pandas_dataframe_agent: Creates an agent that can interact with the Pandas DataFrame.
  • agent_type: Specifies the type of agent (using OpenAI functions).
  • suffix: Instructs the agent to return results in JSON format for easy parsing.

Pro-Tip: The secret sauce is to use the suffix parameter to specify the output format. Under the hood, this appends the agent’s default prompt template with additional information that describes how to return the information.

Step 4: Interacting with the Pandas Data Frame Agent

Now, you can ask the agent questions about your data. Try running this code with a Natural Language analysis question:

“What are the total sales by geography?”

Invoke the agent

The agent processes the query and returns a response.

Process Query

This is where Post Processing comes into play. Remember when I added the suffix parameter to return JSON. The Agent actually burries the JSON in a string.

JSON String

That’s OK, because I have created a handy little parsing tool that extracts the JSON from the string and converts it to a Pandas Data Frame for us.

Convert JSON To Pandas

Step 5: Visualizing the Results

With a pandas data frame we can then report the results. I’ll do this manually with Plotly, but a great challenge is to extend the code to create an AI agent that makes the visualization code and executes it automatically.

Data Visualization

This visualization provides a clear view of sales distribution across different geographical regions.

Quick Reminder: You can get all of the code and datasets shown in a Python Script and Jupyter Notebook when you join my GenAI/ML Tips Newsletter.

Conclusion

By integrating Generative AI with data science, you’ve created a powerful tool that can interact with your data in natural language. This Pandas data analysis agent simplifies the process of extracting insights and can help non-technical stakeholders automate common data manipulations to help them make data-driven decisions.

But there’s so much more to learn in Generative AI and data science.

If you’re excited to become a Generative AI Data Scientist with Python, then keep reading…

Become A Generative AI Data Scientist

The future of data science is AI / ML.

I’ve helped 6,107+ students learn data science and now I’m helping them become Generative AI Data Scientists, skilled in combining Generative AI / ML. With this system they have:

  • Landed Promotions to Manager of AI/ML Teams ($200,000+ Role)
  • Made Proof-Of-Concepts for Clients ($25,000+ Consulting Projects)
  • Grew their data science skills with Generative AI (Career Growth)

Here’s the system they are taking to become Generative AI Data Scientists:

Generative AI Bootcamp

This is a Live 8-Week Generative AI Bootcamp for Data Scientists that covers:

  • Week 1: Live Kickoff Clinic + Local LLM Training + AI Fast Track

  • Week 2: Retrieval Augmented Generation (RAG)

  • Week 3: Business Intelligence AI Copilot (SQL + Pandas Tools)

  • Week 4: Customer Analytics Team (Multi-Agent Workflows)

  • Week 5: Time Series Forecasting Team (Multi-Agent Machine Learning Workflows)

  • Week 6: LLM Model Deployment AWS Bedrock

  • Week 7: Fine-Tuning LLM Models AWS Bedrock

  • Week 8: AI App Deployment With AWS Cloud

Enroll In The Next Cohort Here
(And Become A Generative AI Data Scientist in 2025)