Some Pandas methods you should be using!

Table of Contents

Methods
#

Let’s start by storing our data in dictionary format and loading that in a Pandas dataframe.

In [1]: import pandas as pd
In [2]: data = {
   ...:     'Date': pd.to_datetime(['2025-08-05', '2025-08-10', '2025-08-15', '2025-08-20', '2024-08-25']),
   ...:     'State': ['New York', 'Texas', 'California', 'New York', 'Texas'],
   ...:     'Sales': [250, 180, 300, 120, 400],
   ...:     'Category': ['Furniture', 'Office Supplies', 'Technology', 'Furniture', 'Technology'],
   ...:     'Quantity': [3, 5, 2, 4, 1]
   ...: }

In [3]: df = pd.DataFrame(data)

In [4]: df
Out[4]:
        Date       State  Sales         Category  Quantity
0 2025-08-05    New York    250        Furniture         3
1 2025-08-10       Texas    180  Office Supplies         5
2 2025-08-15  California    300       Technology         2
3 2025-08-20    New York    120        Furniture         4
4 2024-08-25       Texas    400       Technology         1

query()
#

With the query() method you can filter rows using SQL-like syntax.

In [5]: df.query("State == 'New York' and Sales > 150")
Out[5]:
        Date     State  Sales   Category  Quantity
0 2025-08-05  New York    250  Furniture         3

assign()
#

With the assign(), you can easily append new columns to your datafame using a simple lambda function. You can also easily chain several methods together, which makes for clear and elegent code, without modifying the original dataframe.

In [6]: df.assign(Discount=lambda x: x['Sales'] * 0.10)
Out[6]:
        Date       State  Sales         Category  Quantity  Discount
0 2025-08-05    New York    250        Furniture         3      25.0
1 2025-08-10       Texas    180  Office Supplies         5      18.0
2 2025-08-15  California    300       Technology         2      30.0
3 2025-08-20    New York    120        Furniture         4      12.0
4 2024-08-25       Texas    400       Technology         1      40.0

explode()
#

If you have a column with multiple category entries, you can use the explode() method to place each category on a separate row. So say you have the following dataframe:

In [9]: df
Out[9]:
        Date       State  Sales                       Category  Quantity
0 2025-08-05    New York    250        [Furniture, Technology]         3
1 2025-08-10       Texas    180              [Office Supplies]         5
2 2025-08-15  California    300                   [Technology]         2
3 2025-08-20    New York    120        [Technology, Furniture]         4
4 2024-08-25       Texas    400  [Technology, Office Supplies]         1

Instead of getting what you see above, you can have a dataframe where each category entry gets its own row, with all the other columns appropriately duplicated.

In [12]: df.explode('Category')
Out[12]:
        Date       State  Sales         Category  Quantity
0 2025-08-05    New York    250        Furniture         3
0 2025-08-05    New York    250       Technology         3
1 2025-08-10       Texas    180  Office Supplies         5
2 2025-08-15  California    300       Technology         2
3 2025-08-20    New York    120       Technology         4
3 2025-08-20    New York    120        Furniture         4
4 2024-08-25       Texas    400       Technology         1
4 2024-08-25       Texas    400  Office Supplies         1

melt()
#

Imagine that you have many entries in your dataframe, like sales over different months.

In [15]: df
Out[15]:
        State  Jan Sales  Feb Sales  Mar Sales
0    New York        250        340        140
1       Texas        180        220        200
2  California        300        410        210
3    New York        120         80        110
4       Texas        400        330        280

With the melt method, you can transform a dataframe from the wide format to the long one.

In [18]: df.melt(id_vars='State', var_name='Month', value_name='Sales')
Out[18]:
         State      Month  Sales
0     New York  Jan Sales    250
1        Texas  Jan Sales    180
2   California  Jan Sales    300
3     New York  Jan Sales    120
4        Texas  Jan Sales    400
5     New York  Feb Sales    340
6        Texas  Feb Sales    220
7   California  Feb Sales    410
8     New York  Feb Sales     80
9        Texas  Feb Sales    330
10    New York  Mar Sales    140
11       Texas  Mar Sales    200
12  California  Mar Sales    210
13    New York  Mar Sales    110
14       Texas  Mar Sales    280

Once you get to this point, the nice thing is that grouping by month and state for example is straightforward!

In [22]: df.melt(id_vars='State', var_name='Month', value_name='Sales').groupby(['Month', 'State']).sum()
Out[22]:
                      Sales
Month     State
Feb Sales California    410
          New York      420
          Texas         550
Jan Sales California    300
          New York      370
          Texas         580
Mar Sales California    210
          New York      250
          Texas         480

So at this point you’ll see a pattern emerging with these methods, that you can take adavntage of many Pandas methods by just chaining them together.

transform()
#

What if you wanted to add each state’s average sales as a column? Most people would do something like this:

In [24]: state_avg = df['Sales'].groupby(df['State']).mean()

In [25]: df.merge(state_avg.rename('Mean'), left_on='State', right_index=True)
Out[25]:
        Date       State  Sales         Category  Quantity   Mean
0 2025-08-05    New York    250        Furniture         3  185.0
1 2025-08-10       Texas    180  Office Supplies         5  290.0
2 2025-08-15  California    300       Technology         2  300.0
3 2025-08-20    New York    120        Furniture         4  185.0
4 2024-08-25       Texas    400       Technology         1  290.0

However with the transform method, you can accomplish the same in just one line of code:

In [28]: df['Mean'] = df.groupby('State')['Sales'].transform('mean')

In [29]: df
Out[29]:
        Date       State  Sales         Category  Quantity   Mean
0 2025-08-05    New York    250        Furniture         3  185.0
1 2025-08-10       Texas    180  Office Supplies         5  290.0
2 2025-08-15  California    300       Technology         2  300.0
3 2025-08-20    New York    120        Furniture         4  185.0
4 2024-08-25       Texas    400       Technology         1  290.0

nlargest( ) & nsmallest( )
#

You’ve probably sorted a dataframe many times with the following code, right?

In [28]: df.sort_values('Sales', ascending=False)
Out[28]:
        Date       State  Sales         Category  Quantity
4 2024-08-25       Texas    400       Technology         1
2 2025-08-15  California    300       Technology         2
0 2025-08-05    New York    250        Furniture         3
1 2025-08-10       Texas    180  Office Supplies         5
3 2025-08-20    New York    120        Furniture         4

Instead of writing the code above, which is computationally expensive for large dataframes, all you need to do is simply use nlargest and nsmallest.

In [31]: df.nlargest(3, 'Sales')
Out[31]:
        Date       State  Sales    Category  Quantity
4 2024-08-25       Texas    400  Technology         1
2 2025-08-15  California    300  Technology         2
0 2025-08-05    New York    250   Furniture         3

In [32]: df.nsmallest(3, 'Sales')
Out[32]:
        Date     State  Sales         Category  Quantity
3 2025-08-20  New York    120        Furniture         4
1 2025-08-10     Texas    180  Office Supplies         5
0 2025-08-05  New York    250        Furniture         3

where()
#

Instead of writing complex boolean logic, where lets you replace values based on conditions in one line of code. So, do you need to cap need to cap sales at 250 to remove outliers? Just write:

In [34]: df['Capped_Sales'] = df['Sales'].where(df['Sales'] <= 250, 250)

In [35]: df
Out[35]:
        Date       State  Sales         Category  Quantity   Mean  Capped_Sales
0 2025-08-05    New York    250        Furniture         3  185.0           250
1 2025-08-10       Texas    180  Office Supplies         5  290.0           180
2 2025-08-15  California    300       Technology         2  300.0           250
3 2025-08-20    New York    120        Furniture         4  185.0           120
4 2024-08-25       Texas    400       Technology         1  290.0           250

cut()
#

Say you want to categorize sales into performance tiers:

In [36]: sales_bins = [0, 100, 200, 300, float('inf')]

In [37]: labels = ['Low', 'Medium', 'High', 'Exceptional']

where [0-100] is low, [101-200] is medium, [201-300] is high and [>301] is exceptional. Using pandas.cut you can segment and sort data values into bins. This function is also useful to transform a continuous variable into a categorical one.

In [38]: df['Performance'] = pd.cut(df['Sales'], bins=sales_bins, labels=labels)

In [39]: df[['State', 'Sales', 'Performance']]
Out[39]:
        State  Sales  Performance
0    New York    250         High
1       Texas    180       Medium
2  California    300         High
3    New York    120       Medium
4       Texas    400  Exceptional

Author

Angelo Varlotta

If you can’t explain it simply, you don’t understand it well enough – Albert Einstein

Methods #

query() #

assign() #

explode() #

melt() #

transform() #

nlargest( ) & nsmallest( ) #

where() #

cut() #