Details

Language / Topic
pythonPython
Category
Libraries

Rules

balanced
- Use vectorized operations (`df['col'] * 2`) instead of `iterrows()` or `apply()` — vectorized ops are 100-1000x faster.
- Use `df.loc[]` for label-based indexing and `df.iloc[]` for integer-based — avoid chained indexing (`df[col][row]`) which causes SettingWithCopyWarning.
- Use `pd.read_csv(dtype={...})` to specify column types upfront — it prevents silent type coercion and reduces memory usage significantly.
- Use `pd.read_csv()` with `dtype` and `parse_dates` for efficient data loading.
- Use `.groupby()` with `.agg()` for aggregation — pass dict of column → function mappings.
- Use `.merge()` for SQL-like joins, `.concat()` for stacking DataFrames.
- Use `.pipe()` for composable data transformation pipelines.