Pandas

Data manipulation and analysis

Python

Libraries

data-science

analysis

Use in Builder All Python Options

Details

Language / Topic

Python

Rules

Pandas · balanced balanced

- Use vectorized operations (`df['col'] * 2`) instead of `iterrows()` or `apply()` — vectorized ops are 100-1000x faster.

- Use `df.loc[]` for label-based indexing and `df.iloc[]` for integer-based — avoid chained indexing (`df[col][row]`) which causes SettingWithCopyWarning.

- Use `pd.read_csv(dtype={...})` to specify column types upfront — it prevents silent type coercion and reduces memory usage significantly.

- Use `pd.read_csv()` with `dtype` and `parse_dates` for efficient data loading.

- Use `.groupby()` with `.agg()` for aggregation — pass dict of column → function mappings.

- Use `.merge()` for SQL-like joins, `.concat()` for stacking DataFrames.

- Use `.pipe()` for composable data transformation pipelines.