DataFrame#

The DataFrame class is the primary data structure in Polars-PHP. It represents a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled columns.

Constructor#

Create a new DataFrame from a PHP array.

param array $data:: Associative array where keys are column names and values are arrays of column data
param bool $byKeys:: Whether to parse data by keys (default: true)
raises Polars\Exception:: If data cannot be converted to DataFrame

Example:

$df = new DataFrame([
    'name' => ['Alice', 'Bob', 'Charlie'],
    'age' => [25, 30, 35],
    'city' => ['NYC', 'LA', 'Chicago']
]);

Static Methods#

readCsv#

Read a DataFrame from a CSV file.

param string $path:: Path to the CSV file
param bool $hasHeader:: Whether the first row contains column headers (default: true)
param string $separator:: Column separator character (default: “,”)
returns:: DataFrame
raises Polars\Exception:: If file cannot be read or parsed

Example:

$df = DataFrame::readCsv('data.csv');
$df = DataFrame::readCsv('data.tsv', hasHeader: true, separator: "\t");

readJson#

Read a DataFrame from a JSON file.

param string $path:: Path to the JSON file
returns:: DataFrame
raises Polars\Exception:: If file cannot be read or parsed

Example:

$df = DataFrame::readJson('data.json');

readNdjson#

Read a DataFrame from a NDJSON (newline-delimited JSON) file.

param string $path:: Path to the NDJSON file
returns:: DataFrame
raises Polars\Exception:: If file cannot be read or parsed

Example:

$df = DataFrame::readNdjson('data.ndjson');

readParquet#

Read a DataFrame from a Parquet file.

param string $path:: Path to the Parquet file
returns:: DataFrame
raises Polars\Exception:: If file cannot be read or parsed

Example:

$df = DataFrame::readParquet('data.parquet');

Properties#

getColumns / setColumns#

Get column names as an array of strings.

returns:: string[] - Array of column names

Set column names.

param array $columns:: Array of new column names (must match current column count)
raises Polars\Exception:: If column count doesn’t match

Example:

$columns = $df->getColumns(); // ['name', 'age', 'city']
$df->setColumns(['first_name', 'years', 'location']);

dtypes#

Get data types of all columns.

returns:: DataType[] - Array of DataType objects

Series Access#

column#

Get a single column as a Series.

param string $name:: Column name
returns:: Series
raises Polars\Exception:: If column doesn’t exist

Example:

$df = new DataFrame([
    'a' => [1, 2, 3],
    'b' => [4, 5, 6],
]);

$series = $df->column('a');
$series->getName(); // 'a'
$series->sum();     // 6

getSeries#

Get all columns as an array of Series.

returns:: Series[] - Array of Series objects

Example:

$df = new DataFrame([
    'x' => [1, 2],
    'y' => [3, 4],
]);

$seriesArr = $df->getSeries();
// $seriesArr[0] is Series 'x'
// $seriesArr[1] is Series 'y'

Dimensions#

height#

Get the number of rows in the DataFrame.

returns:: int - Number of rows

width#

Get the number of columns in the DataFrame.

returns:: int - Number of columns

shape#

Get the shape of the DataFrame as [rows, columns].

returns:: int[] - Array with [height, width]

Example:

$df->height(); // 3
$df->width();  // 3
$df->shape();  // [3, 3]

Array Access#

DataFrame implements ArrayAccess, allowing bracket notation for accessing data.

offsetExists#

Check if an offset (column name or row index) exists.

offsetGet#

Get value at offset. Supports multiple access patterns:

param mixed $offset:: Can be string, int, or array
returns:: DataFrame

Access patterns:

Pattern	Description	Example
`string`	Single column	`$df['name']`
`int`	Single row (supports negative indexing)	`$df[0]`, `$df[-1]`
`array` of strings	Multiple columns	`$df[['name', 'age']]`
`array` with int	Specific row from columns	`$df[['name', 'age', 0]]`

Example:

$df['name'];           // Single column as DataFrame
$df[0];                // First row as DataFrame
$df[-1];               // Last row as DataFrame
$df[['name', 'age']];  // Multiple columns
$df[['name', 1]];      // 'name' column, row index 1

offsetSet / offsetUnset#

Not supported. Use withColumn() or drop() methods instead.

Row Selection#

head#

Get the first n rows.

param int $n:: Number of rows to return (default: 10)
returns:: DataFrame

tail#

Get the last n rows.

param int $n:: Number of rows to return (default: 10)
returns:: DataFrame

Example:

$df->head(5);  // First 5 rows
$df->tail(3);  // Last 3 rows

Aggregations#

count#

Return the number of non-null elements for each column.

returns:: DataFrame - Single row with counts per column

max#

Aggregate columns to their maximum value.

returns:: DataFrame - Single row with max values

min#

Aggregate columns to their minimum value.

returns:: DataFrame - Single row with min values

mean#

Aggregate columns to their mean value.

returns:: DataFrame - Single row with mean values

std#

Aggregate columns to their standard deviation.

param int $ddof:: Delta degrees of freedom (default: 0)
returns:: DataFrame - Single row with std values

Example:

$df = new DataFrame(['values' => [1, 2, 3, 4, 5]]);
$df->min()->item();   // 1
$df->max()->item();   // 5
$df->mean()->item();  // 3.0

sum#

Aggregate columns to their sum value.

returns:: DataFrame - Single row with sum values

median#

Aggregate columns to their median value.

returns:: DataFrame - Single row with median values

variance#

Aggregate columns to their variance.

param int $ddof:: Delta degrees of freedom (default: 0)
returns:: DataFrame - Single row with variance values

quantile#

Aggregate columns to their quantile value.

param float $quantile:: Quantile value between 0 and 1
returns:: DataFrame - Single row with quantile values

nullCount#

Get the number of null values per column.

returns:: DataFrame - Single row with null counts

product#

Aggregate columns to their product value.

returns:: DataFrame - Single row with product values

Selection#

select#

Select columns based on expressions.

param array $expressions:: Array of Expr objects
returns:: DataFrame
raises Polars\Exception:: If expressions are invalid

Example:

use Polars\Expr;

$df = new DataFrame([
    'a' => [1, 2, 3],
    'b' => [4, 5, 6]
]);

// Select with comparison
$result = $df->select([Expr::col('a')->gt(1)]);

// Select with arithmetic
$result = $df->select([Expr::col('a')->add(Expr::col('b'))]);

Core Manipulation#

sort#

Sort DataFrame by one or more columns.

param string|array $by:: Column name or array of column names to sort by
param bool $descending:: Sort in descending order (default: false)
param bool $nullsLast:: Place nulls last (default: true)
param bool $maintainOrder:: Maintain order of equal elements - stable sort (default: false)
param bool $multithreaded:: Use multithreaded sorting (default: true)
returns:: DataFrame

Example:

$df->sort('age');
$df->sort('age', descending: true);
$df->sort(['city', 'age'], maintainOrder: true);

drop#

Drop specified columns.

param array $columns:: Column names to drop
returns:: DataFrame

Example:

$df->drop(['age', 'city']);

rename#

Rename columns.

param array $existing:: Old column names
param array $newNames:: New column names
returns:: DataFrame

Example:

$df->rename(['name', 'age'], ['fullName', 'years']);

filter#

Filter rows by expression.

param Expr $expression:: Filter expression
returns:: DataFrame

Example:

$df->filter(Expr::col('age')->gt(30));

withColumns#

Add or modify columns using expressions.

param array $expressions:: Array of Expr objects
returns:: DataFrame

Example:

$df->withColumns([
    Expr::col('age')->mul(2)->alias('double_age'),
]);

groupBy#

Group by expressions.

param array $expressions:: Array of Expr objects for grouping
returns:: LazyGroupBy

Example:

$result = $df->groupBy([Expr::col('city')])->sum()->collect();

Row/Column Manipulation#

unique#

Get unique rows.

param array|null $subset:: Column names to consider for uniqueness (default: all columns)
param string $keep:: Keep strategy - ‘first’, ‘last’, ‘any’, or ‘none’ (default: ‘first’)
returns:: DataFrame

dropNulls#

Drop rows with null values.

param array|null $subset:: Column names to check for nulls (default: all columns)
returns:: DataFrame

fillNull#

Fill null values with a value or expression.

param mixed $value:: Value to fill nulls with (int, float, string, bool, null, or Expr)
returns:: DataFrame

fillNan#

Fill NaN values with a value or expression.

param mixed $value:: Value to fill NaN with
returns:: DataFrame

reverse#

Reverse row order.

returns:: DataFrame

slice#

Get a slice of rows.

param int $offset:: Start offset
param int $length:: Number of rows
returns:: DataFrame

limit#

Limit to n rows (alias for head).

param int $n:: Number of rows (default: 10)
returns:: DataFrame

join#

Join with another DataFrame.

param DataFrame $other:: The right DataFrame
param array $on:: Array of Expr objects for join columns (used for both sides when leftOn/rightOn not given)
param string $how:: Join type - ‘inner’, ‘left’, ‘right’, ‘full’, ‘cross’ (default: ‘inner’)
param array|null $leftOn:: Left join columns (overrides $on for left side)
param array|null $rightOn:: Right join columns (overrides $on for right side)
param string|null $suffix:: Suffix for duplicate column names (default: ‘_right’)
param string|null $validate:: Join validation - ‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’
param bool|null $coalesce:: Whether to coalesce join columns
returns:: DataFrame

Example:

$result = $df1->join($df2, [Expr::col('id')], how: 'left');
$result = $df1->join($df2, [], 'inner',
    leftOn: [Expr::col('id')],
    rightOn: [Expr::col('key')]
);

withRowIndex#

Add a row index column.

param string $name:: Name of the index column (default: “index”)
param int $offset:: Starting offset (default: 0)
returns:: DataFrame

Export/Row Access#

toArray#

Convert DataFrame to a PHP array of associative arrays (rows).

returns:: array - Array of associative arrays

Example:

$arr = $df->toArray();
// [['name' => 'Alice', 'age' => 25], ['name' => 'Bob', 'age' => 30], ...]

row#

Get a single row as an associative array. Supports negative indexing.

param int $index:: Row index (negative for counting from end)
returns:: array - Associative array

Example:

$row = $df->row(0);   // First row
$row = $df->row(-1);  // Last row

rows#

Get all rows as array of associative arrays (alias for toArray).

returns:: array

DataFrame Operations#

vstack#

Grow this DataFrame vertically by stacking another DataFrame.

param DataFrame $other:: DataFrame to stack
returns:: DataFrame

hstack#

Grow this DataFrame horizontally by adding Series columns.

param array $columns:: Array of Series objects
returns:: DataFrame

equals#

Check if two DataFrames are equal.

param DataFrame $other:: DataFrame to compare with
returns:: bool

estimatedSize#

Get the estimated size in bytes.

returns:: int

getColumnIndex#

Get the column index by name. Returns -1 if not found.

param string $name:: Column name
returns:: int

clear#

Create an empty copy of the DataFrame (same schema, no rows).

returns:: DataFrame

rechunk#

Rechunk the DataFrame into contiguous memory.

returns:: DataFrame

shrinkToFit#

Shrink memory usage of the DataFrame.

isDuplicated#

Get a boolean mask of duplicated rows.

returns:: Series - Boolean Series

isUnique#

Get a boolean mask of unique rows.

returns:: Series - Boolean Series

Advanced Operations#

shift#

Shift column values by n positions.

param int $n:: Number of positions to shift (positive = down, negative = up)
returns:: DataFrame

gatherEvery#

Take every nth row.

param int $n:: Take every nth row
param int $offset:: Starting offset (default: 0)
returns:: DataFrame

cast#

Cast columns to different data types.

param array $dtypes:: Associative array of column name => data type string
param bool $strict:: Use strict casting (default: false)
returns:: DataFrame

Example:

$df->cast(['age' => 'float64', 'score' => 'int32']);

unpivot#

Unpivot a DataFrame from wide to long format.

param array $on:: Column names to use as values
param array $index:: Column names to use as identifiers
param string|null $variableName:: Custom name for the variable column (default: ‘variable’)
param string|null $valueName:: Custom name for the value column (default: ‘value’)
returns:: DataFrame

explode#

Explode list columns into rows.

param array $columns:: Column names to explode
returns:: DataFrame

melt#

Unpivot (alias for unpivot, deprecated name).

param array $on:: Column names to use as values
param array $index:: Column names to use as identifiers
param string|null $variableName:: Custom name for the variable column
param string|null $valueName:: Custom name for the value column
returns:: DataFrame

interpolate#

Interpolate null values using linear interpolation.

returns:: DataFrame

Column Mutation#

dropInPlace#

Remove a column and return it as a Series. Modifies the DataFrame in place.

param string $name:: Column name to remove
returns:: Series - The removed column
raises Polars\Exception:: If column not found

replaceColumn#

Replace a column at a given index. Modifies the DataFrame in place.

param int $index:: Column index to replace
param Series $series:: New column data
raises Polars\Exception:: If index out of bounds or shape mismatch

insertColumn#

Insert a column at a given index. Modifies the DataFrame in place.

param int $index:: Position to insert at
param Series $series:: Column to insert
raises Polars\Exception:: If column name already exists

extend#

Extend this DataFrame with rows from another DataFrame. Modifies the DataFrame in place.

param DataFrame $other:: DataFrame with matching schema to append
raises Polars\Exception:: If schemas don’t match

setSorted#

Set the sorted flag on a column. Modifies the DataFrame in place.

param string $column:: Column name
param bool $descending:: Whether the column is sorted descending (default: false)

Sequential Operations#

selectSeq#

Select columns sequentially (no parallel execution). Same as select but without parallelism.

param array $expressions:: Array of Expr objects
returns:: DataFrame

withColumnsSeq#

Add or modify columns sequentially (no parallel execution). Same as withColumns but without parallelism.

param array $expressions:: Array of Expr objects
returns:: DataFrame

Conversion#

toSeries#

Convert a single-column DataFrame to a Series.

returns:: Series
raises Polars\Exception:: If DataFrame has more than one column

toDummies#

Convert columns to one-hot encoded (dummy) variables.

param array|null $columns:: Columns to encode (null = all columns)
param string $separator:: Separator between column name and value (default: “_”)
param bool $dropFirst:: Drop the first category to avoid multicollinearity (default: false)
returns:: DataFrame

Example:

$df = new DataFrame(['color' => ['red', 'blue', 'red']]);
$dummies = $df->toDummies();

Partitioning#

partitionBy#

Split DataFrame into multiple DataFrames based on unique values in given columns.

param array $by:: Column names to partition by
param bool $maintainOrder:: Maintain the order of the original DataFrame (default: true)
param bool $includeKey:: Include the partition key columns in each partition (default: true)
returns:: array - Array of DataFrame objects

remove#

Remove a row at the given index. Supports negative indexing.

param int $index:: Row index to remove
returns:: DataFrame
raises Polars\Exception:: If index out of bounds

Pivot#

pivot#

Pivot a DataFrame from long to wide format.

param array $on:: Column(s) to use for the pivot
param array|null $index:: Column(s) to use as row index
param array|null $values:: Column(s) to aggregate
param string|null $aggregateFunction:: Aggregation function - ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘min’, ‘max’, ‘count’, ‘len’
param bool $sortColumns:: Sort the resulting pivot columns (default: false)
returns:: DataFrame

Example:

$result = $df->pivot(['subject'], ['name'], ['score'], 'first');

Merge#

mergeSorted#

Merge two sorted DataFrames by a key column.

param DataFrame $other:: The other sorted DataFrame
param string $key:: Column to merge on (must be sorted in both DataFrames)
returns:: DataFrame

unnest#

Unnest struct columns into separate columns.

param array $columns:: Names of struct columns to unnest
returns:: DataFrame
raises Polars\Exception:: If columns are not of Struct type

Advanced Joins#

joinWhere#

Join with another DataFrame using arbitrary predicates.

param DataFrame $other:: The right DataFrame
param array $predicates:: Array of Expr predicate objects
returns:: DataFrame

Example:

$result = $df1->joinWhere($df2, [Expr::col('a')->le(Expr::col('b'))]);

joinAsof#

Perform an asof join with another DataFrame.

param DataFrame $other:: The right DataFrame
param string $on:: Column to join on (must be sorted)
param string|null $strategy:: Join strategy - ‘backward’ (default), ‘forward’, ‘nearest’
param string|null $leftBy:: Group by column for left DataFrame
param string|null $rightBy:: Group by column for right DataFrame
param string|null $tolerance:: Tolerance for the asof join (time duration string e.g. “5m”)
returns:: DataFrame

SQL#

sql#

Execute a SQL query against this DataFrame. The DataFrame is registered as table named “self”.

param string $query:: SQL query string
returns:: DataFrame

Example:

$result = $df->sql("SELECT name, age FROM self WHERE age > 30");

Deprecated#

withRowCount#

Add a row count column. Deprecated alias for withRowIndex.

param string $name:: Name of the count column (default: “row_nr”)
param int $offset:: Starting offset (default: 0)
returns:: DataFrame

Descriptive Methods#

schema (property)#

Get schema description as string.

returns:: string

nUnique#

Get the number of unique values per column.

returns:: DataFrame - Single row with unique counts

glimpse#

Get a quick summary of the DataFrame.

returns:: string

describe#

Get descriptive statistics (count, mean, std, min, max, median, etc.).

returns:: DataFrame

Sampling#

sample#

Randomly sample rows by count or fraction.

param int $n:: Number of rows to sample (ignored if $fraction is set)
param bool $withReplacement:: Allow sampling with replacement (default: false)
param bool $shuffle:: Shuffle the result (default: true)
param float|null $fraction:: Fraction of rows to sample (0.0 to 1.0), overrides $n
param int|null $seed:: Random seed for reproducibility
returns:: DataFrame

Example:

$df->sample(10, seed: 42);
$df->sample(fraction: 0.5, seed: 42);

transpose#

Transpose the DataFrame.

param bool $includeHeader:: Include column names as a column (default: false)
param string $headerName:: Name for the header column (default: “column”)
param array|null $columnNames:: Custom names for the transposed columns
returns:: DataFrame

topK#

Get the top k rows by a column (largest values first).

param int $k:: Number of rows
param string $by:: Column to sort by
returns:: DataFrame

bottomK#

Get the bottom k rows by a column (smallest values first).

param int $k:: Number of rows
param string $by:: Column to sort by
returns:: DataFrame

Utilities#

item#

Return the DataFrame as a scalar value. The DataFrame must contain exactly one element (1 row, 1 column).

returns:: mixed - The scalar value (int, float, string, bool, or null)
raises Polars\Exception:: If DataFrame doesn’t have exactly one element

Example:

$df = new DataFrame(['x' => [42]]);
$value = $df->item(); // 42

isEmpty#

Check if DataFrame is empty.

returns:: bool

copy#

Create a copy of the DataFrame.

returns:: DataFrame

Output#

writeCsv#

Write DataFrame to a CSV file.

param string $path:: Output file path
param bool $includeHeader:: Whether to include column headers
param string $separator:: Column separator character
raises Polars\Exception:: If file cannot be written

Example:

$df->writeCsv('output.csv');
$df->writeCsv('output.tsv', includeHeader: true, separator: "\t");

writeJson#

Write DataFrame to a JSON file.

param string $path:: Output file path
raises Polars\Exception:: If file cannot be written

Example:

$df->writeJson('output.json');

writeNdjson#

Write DataFrame to a NDJSON (newline-delimited JSON) file.

param string $path:: Output file path
raises Polars\Exception:: If file cannot be written

Example:

$df->writeNdjson('output.ndjson');

writeParquet#

Write DataFrame to a Parquet file.

param string $path:: Output file path
raises Polars\Exception:: If file cannot be written

Example:

$df->writeParquet('output.parquet');

__toString#

Return a formatted string representation of the DataFrame.

Example:

echo $df;
// shape: (3, 3)
// ┌─────────┬─────┬─────────┐
// │ name    ┆ age ┆ city    │
// │ ---     ┆ --- ┆ ---     │
// │ str     ┆ i64 ┆ str     │
// ╞═════════╪═════╪═════════╡
// │ Alice   ┆ 25  ┆ NYC     │
// │ Bob     ┆ 30  ┆ LA      │
// │ Charlie ┆ 35  ┆ Chicago │
// └─────────┴─────┴─────────┘