Methods #
Let’s start by storing our data in dictionary format and loading that in a Pandas dataframe.
In [1]: import pandas as pd
In [2]: data = {
...: 'Date': pd.to_datetime(['2025-08-05', '2025-08-10', '2025-08-15', '2025-08-20', '2024-08-25']),
...: 'State': ['New York', 'Texas', 'California', 'New York', 'Texas'],
...: 'Sales': [250, 180, 300, 120, 400],
...: 'Category': ['Furniture', 'Office Supplies', 'Technology', 'Furniture', 'Technology'],
...: 'Quantity': [3, 5, 2, 4, 1]
...: }
In [3]: df = pd.DataFrame(data)
In [4]: df
Out[4]:
Date State Sales Category Quantity
0 2025-08-05 New York 250 Furniture 3
1 2025-08-10 Texas 180 Office Supplies 5
2 2025-08-15 California 300 Technology 2
3 2025-08-20 New York 120 Furniture 4
4 2024-08-25 Texas 400 Technology 1
query() #
With the query()
method you can filter rows using SQL-like syntax.
In [5]: df.query("State == 'New York' and Sales > 150")
Out[5]:
Date State Sales Category Quantity
0 2025-08-05 New York 250 Furniture 3
assign() #
With the assign()
, you can easily append new columns to your datafame using a simple lambda function. You can also easily chain several methods together, which makes for clear and elegent code, without modifying the original dataframe.
In [6]: df.assign(Discount=lambda x: x['Sales'] * 0.10)
Out[6]:
Date State Sales Category Quantity Discount
0 2025-08-05 New York 250 Furniture 3 25.0
1 2025-08-10 Texas 180 Office Supplies 5 18.0
2 2025-08-15 California 300 Technology 2 30.0
3 2025-08-20 New York 120 Furniture 4 12.0
4 2024-08-25 Texas 400 Technology 1 40.0
explode() #
If you have a column with multiple category entries, you can use the explode()
method to place each category on a separate row. So say you have the following dataframe:
In [9]: df
Out[9]:
Date State Sales Category Quantity
0 2025-08-05 New York 250 [Furniture, Technology] 3
1 2025-08-10 Texas 180 [Office Supplies] 5
2 2025-08-15 California 300 [Technology] 2
3 2025-08-20 New York 120 [Technology, Furniture] 4
4 2024-08-25 Texas 400 [Technology, Office Supplies] 1
Instead of getting what you see above, you can have a dataframe where each category entry gets its own row, with all the other columns appropriately duplicated.
In [12]: df.explode('Category')
Out[12]:
Date State Sales Category Quantity
0 2025-08-05 New York 250 Furniture 3
0 2025-08-05 New York 250 Technology 3
1 2025-08-10 Texas 180 Office Supplies 5
2 2025-08-15 California 300 Technology 2
3 2025-08-20 New York 120 Technology 4
3 2025-08-20 New York 120 Furniture 4
4 2024-08-25 Texas 400 Technology 1
4 2024-08-25 Texas 400 Office Supplies 1
melt() #
Imagine that you have many entries in your dataframe, like sales over different months.
In [15]: df
Out[15]:
State Jan Sales Feb Sales Mar Sales
0 New York 250 340 140
1 Texas 180 220 200
2 California 300 410 210
3 New York 120 80 110
4 Texas 400 330 280
With the melt
method, you can transform a dataframe from the wide format to the long one.
In [18]: df.melt(id_vars='State', var_name='Month', value_name='Sales')
Out[18]:
State Month Sales
0 New York Jan Sales 250
1 Texas Jan Sales 180
2 California Jan Sales 300
3 New York Jan Sales 120
4 Texas Jan Sales 400
5 New York Feb Sales 340
6 Texas Feb Sales 220
7 California Feb Sales 410
8 New York Feb Sales 80
9 Texas Feb Sales 330
10 New York Mar Sales 140
11 Texas Mar Sales 200
12 California Mar Sales 210
13 New York Mar Sales 110
14 Texas Mar Sales 280
Once you get to this point, the nice thing is that grouping by month and state for example is straightforward!
In [22]: df.melt(id_vars='State', var_name='Month', value_name='Sales').groupby(['Month', 'State']).sum()
Out[22]:
Sales
Month State
Feb Sales California 410
New York 420
Texas 550
Jan Sales California 300
New York 370
Texas 580
Mar Sales California 210
New York 250
Texas 480
So at this point you’ll see a pattern emerging with these methods, that you can take adavntage of many Pandas methods by just chaining them together.
transform() #
What if you wanted to add each state’s average sales as a column? Most people would do something like this:
In [24]: state_avg = df['Sales'].groupby(df['State']).mean()
In [25]: df.merge(state_avg.rename('Mean'), left_on='State', right_index=True)
Out[25]:
Date State Sales Category Quantity Mean
0 2025-08-05 New York 250 Furniture 3 185.0
1 2025-08-10 Texas 180 Office Supplies 5 290.0
2 2025-08-15 California 300 Technology 2 300.0
3 2025-08-20 New York 120 Furniture 4 185.0
4 2024-08-25 Texas 400 Technology 1 290.0
However with the transform
method, you can accomplish the same in just one line of code:
In [28]: df['Mean'] = df.groupby('State')['Sales'].transform('mean')
In [29]: df
Out[29]:
Date State Sales Category Quantity Mean
0 2025-08-05 New York 250 Furniture 3 185.0
1 2025-08-10 Texas 180 Office Supplies 5 290.0
2 2025-08-15 California 300 Technology 2 300.0
3 2025-08-20 New York 120 Furniture 4 185.0
4 2024-08-25 Texas 400 Technology 1 290.0
nlargest( ) & nsmallest( ) #
You’ve probably sorted a dataframe many times with the following code, right?
In [28]: df.sort_values('Sales', ascending=False)
Out[28]:
Date State Sales Category Quantity
4 2024-08-25 Texas 400 Technology 1
2 2025-08-15 California 300 Technology 2
0 2025-08-05 New York 250 Furniture 3
1 2025-08-10 Texas 180 Office Supplies 5
3 2025-08-20 New York 120 Furniture 4
Instead of writing the code above, which is computationally expensive for large dataframes, all you need to do is simply use nlargest
and nsmallest
.
In [31]: df.nlargest(3, 'Sales')
Out[31]:
Date State Sales Category Quantity
4 2024-08-25 Texas 400 Technology 1
2 2025-08-15 California 300 Technology 2
0 2025-08-05 New York 250 Furniture 3
In [32]: df.nsmallest(3, 'Sales')
Out[32]:
Date State Sales Category Quantity
3 2025-08-20 New York 120 Furniture 4
1 2025-08-10 Texas 180 Office Supplies 5
0 2025-08-05 New York 250 Furniture 3
where() #
Instead of writing complex boolean logic, where
lets you replace values based on conditions in one line of code. So, do you need to cap need to cap sales at 250 to remove outliers? Just write:
In [34]: df['Capped_Sales'] = df['Sales'].where(df['Sales'] <= 250, 250)
In [35]: df
Out[35]:
Date State Sales Category Quantity Mean Capped_Sales
0 2025-08-05 New York 250 Furniture 3 185.0 250
1 2025-08-10 Texas 180 Office Supplies 5 290.0 180
2 2025-08-15 California 300 Technology 2 300.0 250
3 2025-08-20 New York 120 Furniture 4 185.0 120
4 2024-08-25 Texas 400 Technology 1 290.0 250