DataFrame#

The DataFrame class is the primary data structure in Polars-PHP. It represents a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled columns.

Constructor#

Create a new DataFrame from a PHP array.

param array $data:

Associative array where keys are column names and values are arrays of column data

param bool $byKeys:

Whether to parse data by keys (default: true)

raises Polars\Exception:

If data cannot be converted to DataFrame

Example:

$df = new DataFrame([
    'name' => ['Alice', 'Bob', 'Charlie'],
    'age' => [25, 30, 35],
    'city' => ['NYC', 'LA', 'Chicago']
]);

Static Methods#

readCsv#

Read a DataFrame from a CSV file.

param string $path:

Path to the CSV file

param bool $hasHeader:

Whether the first row contains column headers (default: true)

param string $separator:

Column separator character (default: “,”)

returns:

DataFrame

raises Polars\Exception:

If file cannot be read or parsed

Example:

$df = DataFrame::readCsv('data.csv');
$df = DataFrame::readCsv('data.tsv', hasHeader: true, separator: "\t");

readJson#

Read a DataFrame from a JSON file.

param string $path:

Path to the JSON file

returns:

DataFrame

raises Polars\Exception:

If file cannot be read or parsed

Example:

$df = DataFrame::readJson('data.json');

readNdjson#

Read a DataFrame from a NDJSON (newline-delimited JSON) file.

param string $path:

Path to the NDJSON file

returns:

DataFrame

raises Polars\Exception:

If file cannot be read or parsed

Example:

$df = DataFrame::readNdjson('data.ndjson');

readParquet#

Read a DataFrame from a Parquet file.

param string $path:

Path to the Parquet file

returns:

DataFrame

raises Polars\Exception:

If file cannot be read or parsed

Example:

$df = DataFrame::readParquet('data.parquet');

Properties#

getColumns / setColumns#

Get column names as an array of strings.

returns:

string[] - Array of column names

Set column names.

param array $columns:

Array of new column names (must match current column count)

raises Polars\Exception:

If column count doesn’t match

Example:

$columns = $df->getColumns(); // ['name', 'age', 'city']
$df->setColumns(['first_name', 'years', 'location']);

dtypes#

Get data types of all columns.

returns:

DataType[] - Array of DataType objects

Series Access#

column#

Get a single column as a Series.

param string $name:

Column name

returns:

Series

raises Polars\Exception:

If column doesn’t exist

Example:

$df = new DataFrame([
    'a' => [1, 2, 3],
    'b' => [4, 5, 6],
]);

$series = $df->column('a');
$series->getName(); // 'a'
$series->sum();     // 6

getSeries#

Get all columns as an array of Series.

returns:

Series[] - Array of Series objects

Example:

$df = new DataFrame([
    'x' => [1, 2],
    'y' => [3, 4],
]);

$seriesArr = $df->getSeries();
// $seriesArr[0] is Series 'x'
// $seriesArr[1] is Series 'y'

Dimensions#

height#

Get the number of rows in the DataFrame.

returns:

int - Number of rows

width#

Get the number of columns in the DataFrame.

returns:

int - Number of columns

shape#

Get the shape of the DataFrame as [rows, columns].

returns:

int[] - Array with [height, width]

Example:

$df->height(); // 3
$df->width();  // 3
$df->shape();  // [3, 3]

Array Access#

DataFrame implements ArrayAccess, allowing bracket notation for accessing data.

offsetExists#

Check if an offset (column name or row index) exists.

offsetGet#

Get value at offset. Supports multiple access patterns:

param mixed $offset:

Can be string, int, or array

returns:

DataFrame

Access patterns:

Pattern

Description

Example

string

Single column

$df['name']

int

Single row (supports negative indexing)

$df[0], $df[-1]

array of strings

Multiple columns

$df[['name', 'age']]

array with int

Specific row from columns

$df[['name', 'age', 0]]

Example:

$df['name'];           // Single column as DataFrame
$df[0];                // First row as DataFrame
$df[-1];               // Last row as DataFrame
$df[['name', 'age']];  // Multiple columns
$df[['name', 1]];      // 'name' column, row index 1

offsetSet / offsetUnset#

Not supported. Use withColumn() or drop() methods instead.

Row Selection#

tail#

Get the last n rows.

param int $n:

Number of rows to return (default: 10)

returns:

DataFrame

Example:

$df->head(5);  // First 5 rows
$df->tail(3);  // Last 3 rows

Aggregations#

count#

Return the number of non-null elements for each column.

returns:

DataFrame - Single row with counts per column

max#

Aggregate columns to their maximum value.

returns:

DataFrame - Single row with max values

min#

Aggregate columns to their minimum value.

returns:

DataFrame - Single row with min values

mean#

Aggregate columns to their mean value.

returns:

DataFrame - Single row with mean values

std#

Aggregate columns to their standard deviation.

param int $ddof:

Delta degrees of freedom (default: 0)

returns:

DataFrame - Single row with std values

Example:

$df = new DataFrame(['values' => [1, 2, 3, 4, 5]]);
$df->min()->item();   // 1
$df->max()->item();   // 5
$df->mean()->item();  // 3.0

sum#

Aggregate columns to their sum value.

returns:

DataFrame - Single row with sum values

median#

Aggregate columns to their median value.

returns:

DataFrame - Single row with median values

variance#

Aggregate columns to their variance.

param int $ddof:

Delta degrees of freedom (default: 0)

returns:

DataFrame - Single row with variance values

quantile#

Aggregate columns to their quantile value.

param float $quantile:

Quantile value between 0 and 1

returns:

DataFrame - Single row with quantile values

nullCount#

Get the number of null values per column.

returns:

DataFrame - Single row with null counts

product#

Aggregate columns to their product value.

returns:

DataFrame - Single row with product values

Selection#

select#

Select columns based on expressions.

param array $expressions:

Array of Expr objects

returns:

DataFrame

raises Polars\Exception:

If expressions are invalid

Example:

use Polars\Expr;

$df = new DataFrame([
    'a' => [1, 2, 3],
    'b' => [4, 5, 6]
]);

// Select with comparison
$result = $df->select([Expr::col('a')->gt(1)]);

// Select with arithmetic
$result = $df->select([Expr::col('a')->add(Expr::col('b'))]);

Core Manipulation#

sort#

Sort DataFrame by one or more columns.

param string|array $by:

Column name or array of column names to sort by

param bool $descending:

Sort in descending order (default: false)

param bool $nullsLast:

Place nulls last (default: true)

param bool $maintainOrder:

Maintain order of equal elements - stable sort (default: false)

param bool $multithreaded:

Use multithreaded sorting (default: true)

returns:

DataFrame

Example:

$df->sort('age');
$df->sort('age', descending: true);
$df->sort(['city', 'age'], maintainOrder: true);

drop#

Drop specified columns.

param array $columns:

Column names to drop

returns:

DataFrame

Example:

$df->drop(['age', 'city']);

rename#

Rename columns.

param array $existing:

Old column names

param array $newNames:

New column names

returns:

DataFrame

Example:

$df->rename(['name', 'age'], ['fullName', 'years']);

filter#

Filter rows by expression.

param Expr $expression:

Filter expression

returns:

DataFrame

Example:

$df->filter(Expr::col('age')->gt(30));

withColumns#

Add or modify columns using expressions.

param array $expressions:

Array of Expr objects

returns:

DataFrame

Example:

$df->withColumns([
    Expr::col('age')->mul(2)->alias('double_age'),
]);

groupBy#

Group by expressions.

param array $expressions:

Array of Expr objects for grouping

returns:

LazyGroupBy

Example:

$result = $df->groupBy([Expr::col('city')])->sum()->collect();

Row/Column Manipulation#

unique#

Get unique rows.

param array|null $subset:

Column names to consider for uniqueness (default: all columns)

param string $keep:

Keep strategy - ‘first’, ‘last’, ‘any’, or ‘none’ (default: ‘first’)

returns:

DataFrame

dropNulls#

Drop rows with null values.

param array|null $subset:

Column names to check for nulls (default: all columns)

returns:

DataFrame

fillNull#

Fill null values with a value or expression.

param mixed $value:

Value to fill nulls with (int, float, string, bool, null, or Expr)

returns:

DataFrame

fillNan#

Fill NaN values with a value or expression.

param mixed $value:

Value to fill NaN with

returns:

DataFrame

reverse#

Reverse row order.

returns:

DataFrame

slice#

Get a slice of rows.

param int $offset:

Start offset

param int $length:

Number of rows

returns:

DataFrame

limit#

Limit to n rows (alias for head).

param int $n:

Number of rows (default: 10)

returns:

DataFrame

join#

Join with another DataFrame.

param DataFrame $other:

The right DataFrame

param array $on:

Array of Expr objects for join columns (used for both sides when leftOn/rightOn not given)

param string $how:

Join type - ‘inner’, ‘left’, ‘right’, ‘full’, ‘cross’ (default: ‘inner’)

param array|null $leftOn:

Left join columns (overrides $on for left side)

param array|null $rightOn:

Right join columns (overrides $on for right side)

param string|null $suffix:

Suffix for duplicate column names (default: ‘_right’)

param string|null $validate:

Join validation - ‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’

param bool|null $coalesce:

Whether to coalesce join columns

returns:

DataFrame

Example:

$result = $df1->join($df2, [Expr::col('id')], how: 'left');
$result = $df1->join($df2, [], 'inner',
    leftOn: [Expr::col('id')],
    rightOn: [Expr::col('key')]
);

withRowIndex#

Add a row index column.

param string $name:

Name of the index column (default: “index”)

param int $offset:

Starting offset (default: 0)

returns:

DataFrame

Export/Row Access#

toArray#

Convert DataFrame to a PHP array of associative arrays (rows).

returns:

array - Array of associative arrays

Example:

$arr = $df->toArray();
// [['name' => 'Alice', 'age' => 25], ['name' => 'Bob', 'age' => 30], ...]

row#

Get a single row as an associative array. Supports negative indexing.

param int $index:

Row index (negative for counting from end)

returns:

array - Associative array

Example:

$row = $df->row(0);   // First row
$row = $df->row(-1);  // Last row

rows#

Get all rows as array of associative arrays (alias for toArray).

returns:

array

DataFrame Operations#

vstack#

Grow this DataFrame vertically by stacking another DataFrame.

param DataFrame $other:

DataFrame to stack

returns:

DataFrame

hstack#

Grow this DataFrame horizontally by adding Series columns.

param array $columns:

Array of Series objects

returns:

DataFrame

equals#

Check if two DataFrames are equal.

param DataFrame $other:

DataFrame to compare with

returns:

bool

estimatedSize#

Get the estimated size in bytes.

returns:

int

getColumnIndex#

Get the column index by name. Returns -1 if not found.

param string $name:

Column name

returns:

int

clear#

Create an empty copy of the DataFrame (same schema, no rows).

returns:

DataFrame

rechunk#

Rechunk the DataFrame into contiguous memory.

returns:

DataFrame

shrinkToFit#

Shrink memory usage of the DataFrame.

isDuplicated#

Get a boolean mask of duplicated rows.

returns:

Series - Boolean Series

isUnique#

Get a boolean mask of unique rows.

returns:

Series - Boolean Series

Advanced Operations#

shift#

Shift column values by n positions.

param int $n:

Number of positions to shift (positive = down, negative = up)

returns:

DataFrame

gatherEvery#

Take every nth row.

param int $n:

Take every nth row

param int $offset:

Starting offset (default: 0)

returns:

DataFrame

cast#

Cast columns to different data types.

param array $dtypes:

Associative array of column name => data type string

param bool $strict:

Use strict casting (default: false)

returns:

DataFrame

Example:

$df->cast(['age' => 'float64', 'score' => 'int32']);

unpivot#

Unpivot a DataFrame from wide to long format.

param array $on:

Column names to use as values

param array $index:

Column names to use as identifiers

param string|null $variableName:

Custom name for the variable column (default: ‘variable’)

param string|null $valueName:

Custom name for the value column (default: ‘value’)

returns:

DataFrame

explode#

Explode list columns into rows.

param array $columns:

Column names to explode

returns:

DataFrame

melt#

Unpivot (alias for unpivot, deprecated name).

param array $on:

Column names to use as values

param array $index:

Column names to use as identifiers

param string|null $variableName:

Custom name for the variable column

param string|null $valueName:

Custom name for the value column

returns:

DataFrame

interpolate#

Interpolate null values using linear interpolation.

returns:

DataFrame

Column Mutation#

dropInPlace#

Remove a column and return it as a Series. Modifies the DataFrame in place.

param string $name:

Column name to remove

returns:

Series - The removed column

raises Polars\Exception:

If column not found

replaceColumn#

Replace a column at a given index. Modifies the DataFrame in place.

param int $index:

Column index to replace

param Series $series:

New column data

raises Polars\Exception:

If index out of bounds or shape mismatch

insertColumn#

Insert a column at a given index. Modifies the DataFrame in place.

param int $index:

Position to insert at

param Series $series:

Column to insert

raises Polars\Exception:

If column name already exists

extend#

Extend this DataFrame with rows from another DataFrame. Modifies the DataFrame in place.

param DataFrame $other:

DataFrame with matching schema to append

raises Polars\Exception:

If schemas don’t match

setSorted#

Set the sorted flag on a column. Modifies the DataFrame in place.

param string $column:

Column name

param bool $descending:

Whether the column is sorted descending (default: false)

Sequential Operations#

selectSeq#

Select columns sequentially (no parallel execution). Same as select but without parallelism.

param array $expressions:

Array of Expr objects

returns:

DataFrame

withColumnsSeq#

Add or modify columns sequentially (no parallel execution). Same as withColumns but without parallelism.

param array $expressions:

Array of Expr objects

returns:

DataFrame

Conversion#

toSeries#

Convert a single-column DataFrame to a Series.

returns:

Series

raises Polars\Exception:

If DataFrame has more than one column

toDummies#

Convert columns to one-hot encoded (dummy) variables.

param array|null $columns:

Columns to encode (null = all columns)

param string $separator:

Separator between column name and value (default: “_”)

param bool $dropFirst:

Drop the first category to avoid multicollinearity (default: false)

returns:

DataFrame

Example:

$df = new DataFrame(['color' => ['red', 'blue', 'red']]);
$dummies = $df->toDummies();

Partitioning#

partitionBy#

Split DataFrame into multiple DataFrames based on unique values in given columns.

param array $by:

Column names to partition by

param bool $maintainOrder:

Maintain the order of the original DataFrame (default: true)

param bool $includeKey:

Include the partition key columns in each partition (default: true)

returns:

array - Array of DataFrame objects

remove#

Remove a row at the given index. Supports negative indexing.

param int $index:

Row index to remove

returns:

DataFrame

raises Polars\Exception:

If index out of bounds

Pivot#

pivot#

Pivot a DataFrame from long to wide format.

param array $on:

Column(s) to use for the pivot

param array|null $index:

Column(s) to use as row index

param array|null $values:

Column(s) to aggregate

param string|null $aggregateFunction:

Aggregation function - ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘min’, ‘max’, ‘count’, ‘len’

param bool $sortColumns:

Sort the resulting pivot columns (default: false)

returns:

DataFrame

Example:

$result = $df->pivot(['subject'], ['name'], ['score'], 'first');

Merge#

mergeSorted#

Merge two sorted DataFrames by a key column.

param DataFrame $other:

The other sorted DataFrame

param string $key:

Column to merge on (must be sorted in both DataFrames)

returns:

DataFrame

unnest#

Unnest struct columns into separate columns.

param array $columns:

Names of struct columns to unnest

returns:

DataFrame

raises Polars\Exception:

If columns are not of Struct type

Advanced Joins#

joinWhere#

Join with another DataFrame using arbitrary predicates.

param DataFrame $other:

The right DataFrame

param array $predicates:

Array of Expr predicate objects

returns:

DataFrame

Example:

$result = $df1->joinWhere($df2, [Expr::col('a')->le(Expr::col('b'))]);

joinAsof#

Perform an asof join with another DataFrame.

param DataFrame $other:

The right DataFrame

param string $on:

Column to join on (must be sorted)

param string|null $strategy:

Join strategy - ‘backward’ (default), ‘forward’, ‘nearest’

param string|null $leftBy:

Group by column for left DataFrame

param string|null $rightBy:

Group by column for right DataFrame

param string|null $tolerance:

Tolerance for the asof join (time duration string e.g. “5m”)

returns:

DataFrame

SQL#

sql#

Execute a SQL query against this DataFrame. The DataFrame is registered as table named “self”.

param string $query:

SQL query string

returns:

DataFrame

Example:

$result = $df->sql("SELECT name, age FROM self WHERE age > 30");

Deprecated#

withRowCount#

Add a row count column. Deprecated alias for withRowIndex.

param string $name:

Name of the count column (default: “row_nr”)

param int $offset:

Starting offset (default: 0)

returns:

DataFrame

Descriptive Methods#

schema (property)#

Get schema description as string.

returns:

string

nUnique#

Get the number of unique values per column.

returns:

DataFrame - Single row with unique counts

glimpse#

Get a quick summary of the DataFrame.

returns:

string

describe#

Get descriptive statistics (count, mean, std, min, max, median, etc.).

returns:

DataFrame

Sampling#

sample#

Randomly sample rows by count or fraction.

param int $n:

Number of rows to sample (ignored if $fraction is set)

param bool $withReplacement:

Allow sampling with replacement (default: false)

param bool $shuffle:

Shuffle the result (default: true)

param float|null $fraction:

Fraction of rows to sample (0.0 to 1.0), overrides $n

param int|null $seed:

Random seed for reproducibility

returns:

DataFrame

Example:

$df->sample(10, seed: 42);
$df->sample(fraction: 0.5, seed: 42);

transpose#

Transpose the DataFrame.

param bool $includeHeader:

Include column names as a column (default: false)

param string $headerName:

Name for the header column (default: “column”)

param array|null $columnNames:

Custom names for the transposed columns

returns:

DataFrame

topK#

Get the top k rows by a column (largest values first).

param int $k:

Number of rows

param string $by:

Column to sort by

returns:

DataFrame

bottomK#

Get the bottom k rows by a column (smallest values first).

param int $k:

Number of rows

param string $by:

Column to sort by

returns:

DataFrame

Utilities#

item#

Return the DataFrame as a scalar value. The DataFrame must contain exactly one element (1 row, 1 column).

returns:

mixed - The scalar value (int, float, string, bool, or null)

raises Polars\Exception:

If DataFrame doesn’t have exactly one element

Example:

$df = new DataFrame(['x' => [42]]);
$value = $df->item(); // 42

isEmpty#

Check if DataFrame is empty.

returns:

bool

copy#

Create a copy of the DataFrame.

returns:

DataFrame

Output#

writeCsv#

Write DataFrame to a CSV file.

param string $path:

Output file path

param bool $includeHeader:

Whether to include column headers

param string $separator:

Column separator character

raises Polars\Exception:

If file cannot be written

Example:

$df->writeCsv('output.csv');
$df->writeCsv('output.tsv', includeHeader: true, separator: "\t");

writeJson#

Write DataFrame to a JSON file.

param string $path:

Output file path

raises Polars\Exception:

If file cannot be written

Example:

$df->writeJson('output.json');

writeNdjson#

Write DataFrame to a NDJSON (newline-delimited JSON) file.

param string $path:

Output file path

raises Polars\Exception:

If file cannot be written

Example:

$df->writeNdjson('output.ndjson');

writeParquet#

Write DataFrame to a Parquet file.

param string $path:

Output file path

raises Polars\Exception:

If file cannot be written

Example:

$df->writeParquet('output.parquet');

__toString#

Return a formatted string representation of the DataFrame.

Example:

echo $df;
// shape: (3, 3)
// ┌─────────┬─────┬─────────┐
// │ name    ┆ age ┆ city    │
// │ ---     ┆ --- ┆ ---     │
// │ str     ┆ i64 ┆ str     │
// ╞═════════╪═════╪═════════╡
// │ Alice   ┆ 25  ┆ NYC     │
// │ Bob     ┆ 30  ┆ LA      │
// │ Charlie ┆ 35  ┆ Chicago │
// └─────────┴─────┴─────────┘