DataFrame#
The DataFrame class is the primary data structure in Polars-PHP. It represents a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled columns.
Constructor#
Create a new DataFrame from a PHP array.
- param array $data:
Associative array where keys are column names and values are arrays of column data
- param bool $byKeys:
Whether to parse data by keys (default: true)
- raises Polars\Exception:
If data cannot be converted to DataFrame
Example:
$df = new DataFrame([
'name' => ['Alice', 'Bob', 'Charlie'],
'age' => [25, 30, 35],
'city' => ['NYC', 'LA', 'Chicago']
]);
Static Methods#
readCsv#
Read a DataFrame from a CSV file.
- param string $path:
Path to the CSV file
- param bool $hasHeader:
Whether the first row contains column headers (default: true)
- param string $separator:
Column separator character (default: “,”)
- returns:
DataFrame
- raises Polars\Exception:
If file cannot be read or parsed
Example:
$df = DataFrame::readCsv('data.csv');
$df = DataFrame::readCsv('data.tsv', hasHeader: true, separator: "\t");
readJson#
Read a DataFrame from a JSON file.
- param string $path:
Path to the JSON file
- returns:
DataFrame
- raises Polars\Exception:
If file cannot be read or parsed
Example:
$df = DataFrame::readJson('data.json');
readNdjson#
Read a DataFrame from a NDJSON (newline-delimited JSON) file.
- param string $path:
Path to the NDJSON file
- returns:
DataFrame
- raises Polars\Exception:
If file cannot be read or parsed
Example:
$df = DataFrame::readNdjson('data.ndjson');
readParquet#
Read a DataFrame from a Parquet file.
- param string $path:
Path to the Parquet file
- returns:
DataFrame
- raises Polars\Exception:
If file cannot be read or parsed
Example:
$df = DataFrame::readParquet('data.parquet');
Properties#
getColumns / setColumns#
Get column names as an array of strings.
- returns:
string[] - Array of column names
Set column names.
- param array $columns:
Array of new column names (must match current column count)
- raises Polars\Exception:
If column count doesn’t match
Example:
$columns = $df->getColumns(); // ['name', 'age', 'city']
$df->setColumns(['first_name', 'years', 'location']);
dtypes#
Get data types of all columns.
- returns:
DataType[] - Array of DataType objects
Series Access#
column#
Get a single column as a Series.
- param string $name:
Column name
- returns:
Series
- raises Polars\Exception:
If column doesn’t exist
Example:
$df = new DataFrame([
'a' => [1, 2, 3],
'b' => [4, 5, 6],
]);
$series = $df->column('a');
$series->getName(); // 'a'
$series->sum(); // 6
getSeries#
Get all columns as an array of Series.
- returns:
Series[] - Array of Series objects
Example:
$df = new DataFrame([
'x' => [1, 2],
'y' => [3, 4],
]);
$seriesArr = $df->getSeries();
// $seriesArr[0] is Series 'x'
// $seriesArr[1] is Series 'y'
Dimensions#
height#
Get the number of rows in the DataFrame.
- returns:
int - Number of rows
width#
Get the number of columns in the DataFrame.
- returns:
int - Number of columns
shape#
Get the shape of the DataFrame as [rows, columns].
- returns:
int[] - Array with [height, width]
Example:
$df->height(); // 3
$df->width(); // 3
$df->shape(); // [3, 3]
Array Access#
DataFrame implements ArrayAccess, allowing bracket notation for accessing data.
offsetExists#
Check if an offset (column name or row index) exists.
offsetGet#
Get value at offset. Supports multiple access patterns:
- param mixed $offset:
Can be string, int, or array
- returns:
DataFrame
Access patterns:
Pattern |
Description |
Example |
|---|---|---|
|
Single column |
|
|
Single row (supports negative indexing) |
|
|
Multiple columns |
|
|
Specific row from columns |
|
Example:
$df['name']; // Single column as DataFrame
$df[0]; // First row as DataFrame
$df[-1]; // Last row as DataFrame
$df[['name', 'age']]; // Multiple columns
$df[['name', 1]]; // 'name' column, row index 1
offsetSet / offsetUnset#
Not supported. Use withColumn() or drop() methods instead.
Row Selection#
head#
Get the first n rows.
- param int $n:
Number of rows to return (default: 10)
- returns:
DataFrame
tail#
Get the last n rows.
- param int $n:
Number of rows to return (default: 10)
- returns:
DataFrame
Example:
$df->head(5); // First 5 rows
$df->tail(3); // Last 3 rows
Aggregations#
count#
Return the number of non-null elements for each column.
- returns:
DataFrame - Single row with counts per column
max#
Aggregate columns to their maximum value.
- returns:
DataFrame - Single row with max values
min#
Aggregate columns to their minimum value.
- returns:
DataFrame - Single row with min values
mean#
Aggregate columns to their mean value.
- returns:
DataFrame - Single row with mean values
std#
Aggregate columns to their standard deviation.
- param int $ddof:
Delta degrees of freedom (default: 0)
- returns:
DataFrame - Single row with std values
Example:
$df = new DataFrame(['values' => [1, 2, 3, 4, 5]]);
$df->min()->item(); // 1
$df->max()->item(); // 5
$df->mean()->item(); // 3.0
sum#
Aggregate columns to their sum value.
- returns:
DataFrame - Single row with sum values
median#
Aggregate columns to their median value.
- returns:
DataFrame - Single row with median values
variance#
Aggregate columns to their variance.
- param int $ddof:
Delta degrees of freedom (default: 0)
- returns:
DataFrame - Single row with variance values
quantile#
Aggregate columns to their quantile value.
- param float $quantile:
Quantile value between 0 and 1
- returns:
DataFrame - Single row with quantile values
nullCount#
Get the number of null values per column.
- returns:
DataFrame - Single row with null counts
product#
Aggregate columns to their product value.
- returns:
DataFrame - Single row with product values
Selection#
select#
Select columns based on expressions.
- param array $expressions:
Array of Expr objects
- returns:
DataFrame
- raises Polars\Exception:
If expressions are invalid
Example:
use Polars\Expr;
$df = new DataFrame([
'a' => [1, 2, 3],
'b' => [4, 5, 6]
]);
// Select with comparison
$result = $df->select([Expr::col('a')->gt(1)]);
// Select with arithmetic
$result = $df->select([Expr::col('a')->add(Expr::col('b'))]);
Core Manipulation#
sort#
Sort DataFrame by one or more columns.
- param string|array $by:
Column name or array of column names to sort by
- param bool $descending:
Sort in descending order (default: false)
- param bool $nullsLast:
Place nulls last (default: true)
- param bool $maintainOrder:
Maintain order of equal elements - stable sort (default: false)
- param bool $multithreaded:
Use multithreaded sorting (default: true)
- returns:
DataFrame
Example:
$df->sort('age');
$df->sort('age', descending: true);
$df->sort(['city', 'age'], maintainOrder: true);
drop#
Drop specified columns.
- param array $columns:
Column names to drop
- returns:
DataFrame
Example:
$df->drop(['age', 'city']);
rename#
Rename columns.
- param array $existing:
Old column names
- param array $newNames:
New column names
- returns:
DataFrame
Example:
$df->rename(['name', 'age'], ['fullName', 'years']);
filter#
Filter rows by expression.
- param Expr $expression:
Filter expression
- returns:
DataFrame
Example:
$df->filter(Expr::col('age')->gt(30));
withColumns#
Add or modify columns using expressions.
- param array $expressions:
Array of Expr objects
- returns:
DataFrame
Example:
$df->withColumns([
Expr::col('age')->mul(2)->alias('double_age'),
]);
groupBy#
Group by expressions.
- param array $expressions:
Array of Expr objects for grouping
- returns:
LazyGroupBy
Example:
$result = $df->groupBy([Expr::col('city')])->sum()->collect();
Row/Column Manipulation#
unique#
Get unique rows.
- param array|null $subset:
Column names to consider for uniqueness (default: all columns)
- param string $keep:
Keep strategy - ‘first’, ‘last’, ‘any’, or ‘none’ (default: ‘first’)
- returns:
DataFrame
dropNulls#
Drop rows with null values.
- param array|null $subset:
Column names to check for nulls (default: all columns)
- returns:
DataFrame
fillNull#
Fill null values with a value or expression.
- param mixed $value:
Value to fill nulls with (int, float, string, bool, null, or Expr)
- returns:
DataFrame
fillNan#
Fill NaN values with a value or expression.
- param mixed $value:
Value to fill NaN with
- returns:
DataFrame
reverse#
Reverse row order.
- returns:
DataFrame
slice#
Get a slice of rows.
- param int $offset:
Start offset
- param int $length:
Number of rows
- returns:
DataFrame
limit#
Limit to n rows (alias for head).
- param int $n:
Number of rows (default: 10)
- returns:
DataFrame
join#
Join with another DataFrame.
- param DataFrame $other:
The right DataFrame
- param array $on:
Array of Expr objects for join columns (used for both sides when leftOn/rightOn not given)
- param string $how:
Join type - ‘inner’, ‘left’, ‘right’, ‘full’, ‘cross’ (default: ‘inner’)
- param array|null $leftOn:
Left join columns (overrides $on for left side)
- param array|null $rightOn:
Right join columns (overrides $on for right side)
- param string|null $suffix:
Suffix for duplicate column names (default: ‘_right’)
- param string|null $validate:
Join validation - ‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’
- param bool|null $coalesce:
Whether to coalesce join columns
- returns:
DataFrame
Example:
$result = $df1->join($df2, [Expr::col('id')], how: 'left');
$result = $df1->join($df2, [], 'inner',
leftOn: [Expr::col('id')],
rightOn: [Expr::col('key')]
);
withRowIndex#
Add a row index column.
- param string $name:
Name of the index column (default: “index”)
- param int $offset:
Starting offset (default: 0)
- returns:
DataFrame
Export/Row Access#
toArray#
Convert DataFrame to a PHP array of associative arrays (rows).
- returns:
array - Array of associative arrays
Example:
$arr = $df->toArray();
// [['name' => 'Alice', 'age' => 25], ['name' => 'Bob', 'age' => 30], ...]
row#
Get a single row as an associative array. Supports negative indexing.
- param int $index:
Row index (negative for counting from end)
- returns:
array - Associative array
Example:
$row = $df->row(0); // First row
$row = $df->row(-1); // Last row
rows#
Get all rows as array of associative arrays (alias for toArray).
- returns:
array
DataFrame Operations#
vstack#
Grow this DataFrame vertically by stacking another DataFrame.
- param DataFrame $other:
DataFrame to stack
- returns:
DataFrame
hstack#
Grow this DataFrame horizontally by adding Series columns.
- param array $columns:
Array of Series objects
- returns:
DataFrame
equals#
Check if two DataFrames are equal.
- param DataFrame $other:
DataFrame to compare with
- returns:
bool
estimatedSize#
Get the estimated size in bytes.
- returns:
int
getColumnIndex#
Get the column index by name. Returns -1 if not found.
- param string $name:
Column name
- returns:
int
clear#
Create an empty copy of the DataFrame (same schema, no rows).
- returns:
DataFrame
rechunk#
Rechunk the DataFrame into contiguous memory.
- returns:
DataFrame
shrinkToFit#
Shrink memory usage of the DataFrame.
isDuplicated#
Get a boolean mask of duplicated rows.
- returns:
Series - Boolean Series
isUnique#
Get a boolean mask of unique rows.
- returns:
Series - Boolean Series
Advanced Operations#
shift#
Shift column values by n positions.
- param int $n:
Number of positions to shift (positive = down, negative = up)
- returns:
DataFrame
gatherEvery#
Take every nth row.
- param int $n:
Take every nth row
- param int $offset:
Starting offset (default: 0)
- returns:
DataFrame
cast#
Cast columns to different data types.
- param array $dtypes:
Associative array of column name => data type string
- param bool $strict:
Use strict casting (default: false)
- returns:
DataFrame
Example:
$df->cast(['age' => 'float64', 'score' => 'int32']);
unpivot#
Unpivot a DataFrame from wide to long format.
- param array $on:
Column names to use as values
- param array $index:
Column names to use as identifiers
- param string|null $variableName:
Custom name for the variable column (default: ‘variable’)
- param string|null $valueName:
Custom name for the value column (default: ‘value’)
- returns:
DataFrame
explode#
Explode list columns into rows.
- param array $columns:
Column names to explode
- returns:
DataFrame
melt#
Unpivot (alias for unpivot, deprecated name).
- param array $on:
Column names to use as values
- param array $index:
Column names to use as identifiers
- param string|null $variableName:
Custom name for the variable column
- param string|null $valueName:
Custom name for the value column
- returns:
DataFrame
interpolate#
Interpolate null values using linear interpolation.
- returns:
DataFrame
Column Mutation#
dropInPlace#
Remove a column and return it as a Series. Modifies the DataFrame in place.
- param string $name:
Column name to remove
- returns:
Series - The removed column
- raises Polars\Exception:
If column not found
replaceColumn#
Replace a column at a given index. Modifies the DataFrame in place.
- param int $index:
Column index to replace
- param Series $series:
New column data
- raises Polars\Exception:
If index out of bounds or shape mismatch
insertColumn#
Insert a column at a given index. Modifies the DataFrame in place.
- param int $index:
Position to insert at
- param Series $series:
Column to insert
- raises Polars\Exception:
If column name already exists
extend#
Extend this DataFrame with rows from another DataFrame. Modifies the DataFrame in place.
- param DataFrame $other:
DataFrame with matching schema to append
- raises Polars\Exception:
If schemas don’t match
setSorted#
Set the sorted flag on a column. Modifies the DataFrame in place.
- param string $column:
Column name
- param bool $descending:
Whether the column is sorted descending (default: false)
Sequential Operations#
selectSeq#
Select columns sequentially (no parallel execution). Same as select but without parallelism.
- param array $expressions:
Array of Expr objects
- returns:
DataFrame
withColumnsSeq#
Add or modify columns sequentially (no parallel execution). Same as withColumns but without parallelism.
- param array $expressions:
Array of Expr objects
- returns:
DataFrame
Conversion#
toSeries#
Convert a single-column DataFrame to a Series.
- returns:
Series
- raises Polars\Exception:
If DataFrame has more than one column
toDummies#
Convert columns to one-hot encoded (dummy) variables.
- param array|null $columns:
Columns to encode (null = all columns)
- param string $separator:
Separator between column name and value (default: “_”)
- param bool $dropFirst:
Drop the first category to avoid multicollinearity (default: false)
- returns:
DataFrame
Example:
$df = new DataFrame(['color' => ['red', 'blue', 'red']]);
$dummies = $df->toDummies();
Partitioning#
partitionBy#
Split DataFrame into multiple DataFrames based on unique values in given columns.
- param array $by:
Column names to partition by
- param bool $maintainOrder:
Maintain the order of the original DataFrame (default: true)
- param bool $includeKey:
Include the partition key columns in each partition (default: true)
- returns:
array - Array of DataFrame objects
remove#
Remove a row at the given index. Supports negative indexing.
- param int $index:
Row index to remove
- returns:
DataFrame
- raises Polars\Exception:
If index out of bounds
Pivot#
pivot#
Pivot a DataFrame from long to wide format.
- param array $on:
Column(s) to use for the pivot
- param array|null $index:
Column(s) to use as row index
- param array|null $values:
Column(s) to aggregate
- param string|null $aggregateFunction:
Aggregation function - ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘min’, ‘max’, ‘count’, ‘len’
- param bool $sortColumns:
Sort the resulting pivot columns (default: false)
- returns:
DataFrame
Example:
$result = $df->pivot(['subject'], ['name'], ['score'], 'first');
Merge#
mergeSorted#
Merge two sorted DataFrames by a key column.
- param DataFrame $other:
The other sorted DataFrame
- param string $key:
Column to merge on (must be sorted in both DataFrames)
- returns:
DataFrame
unnest#
Unnest struct columns into separate columns.
- param array $columns:
Names of struct columns to unnest
- returns:
DataFrame
- raises Polars\Exception:
If columns are not of Struct type
Advanced Joins#
joinWhere#
Join with another DataFrame using arbitrary predicates.
- param DataFrame $other:
The right DataFrame
- param array $predicates:
Array of Expr predicate objects
- returns:
DataFrame
Example:
$result = $df1->joinWhere($df2, [Expr::col('a')->le(Expr::col('b'))]);
joinAsof#
Perform an asof join with another DataFrame.
- param DataFrame $other:
The right DataFrame
- param string $on:
Column to join on (must be sorted)
- param string|null $strategy:
Join strategy - ‘backward’ (default), ‘forward’, ‘nearest’
- param string|null $leftBy:
Group by column for left DataFrame
- param string|null $rightBy:
Group by column for right DataFrame
- param string|null $tolerance:
Tolerance for the asof join (time duration string e.g. “5m”)
- returns:
DataFrame
SQL#
sql#
Execute a SQL query against this DataFrame. The DataFrame is registered as table named “self”.
- param string $query:
SQL query string
- returns:
DataFrame
Example:
$result = $df->sql("SELECT name, age FROM self WHERE age > 30");
Deprecated#
withRowCount#
Add a row count column. Deprecated alias for withRowIndex.
- param string $name:
Name of the count column (default: “row_nr”)
- param int $offset:
Starting offset (default: 0)
- returns:
DataFrame
Descriptive Methods#
schema (property)#
Get schema description as string.
- returns:
string
nUnique#
Get the number of unique values per column.
- returns:
DataFrame - Single row with unique counts
glimpse#
Get a quick summary of the DataFrame.
- returns:
string
describe#
Get descriptive statistics (count, mean, std, min, max, median, etc.).
- returns:
DataFrame
Sampling#
sample#
Randomly sample rows by count or fraction.
- param int $n:
Number of rows to sample (ignored if $fraction is set)
- param bool $withReplacement:
Allow sampling with replacement (default: false)
- param bool $shuffle:
Shuffle the result (default: true)
- param float|null $fraction:
Fraction of rows to sample (0.0 to 1.0), overrides $n
- param int|null $seed:
Random seed for reproducibility
- returns:
DataFrame
Example:
$df->sample(10, seed: 42);
$df->sample(fraction: 0.5, seed: 42);
transpose#
Transpose the DataFrame.
- param bool $includeHeader:
Include column names as a column (default: false)
- param string $headerName:
Name for the header column (default: “column”)
- param array|null $columnNames:
Custom names for the transposed columns
- returns:
DataFrame
topK#
Get the top k rows by a column (largest values first).
- param int $k:
Number of rows
- param string $by:
Column to sort by
- returns:
DataFrame
bottomK#
Get the bottom k rows by a column (smallest values first).
- param int $k:
Number of rows
- param string $by:
Column to sort by
- returns:
DataFrame
Utilities#
item#
Return the DataFrame as a scalar value. The DataFrame must contain exactly one element (1 row, 1 column).
- returns:
mixed - The scalar value (int, float, string, bool, or null)
- raises Polars\Exception:
If DataFrame doesn’t have exactly one element
Example:
$df = new DataFrame(['x' => [42]]);
$value = $df->item(); // 42
isEmpty#
Check if DataFrame is empty.
- returns:
bool
copy#
Create a copy of the DataFrame.
- returns:
DataFrame
Output#
writeCsv#
Write DataFrame to a CSV file.
- param string $path:
Output file path
- param bool $includeHeader:
Whether to include column headers
- param string $separator:
Column separator character
- raises Polars\Exception:
If file cannot be written
Example:
$df->writeCsv('output.csv');
$df->writeCsv('output.tsv', includeHeader: true, separator: "\t");
writeJson#
Write DataFrame to a JSON file.
- param string $path:
Output file path
- raises Polars\Exception:
If file cannot be written
Example:
$df->writeJson('output.json');
writeNdjson#
Write DataFrame to a NDJSON (newline-delimited JSON) file.
- param string $path:
Output file path
- raises Polars\Exception:
If file cannot be written
Example:
$df->writeNdjson('output.ndjson');
writeParquet#
Write DataFrame to a Parquet file.
- param string $path:
Output file path
- raises Polars\Exception:
If file cannot be written
Example:
$df->writeParquet('output.parquet');
__toString#
Return a formatted string representation of the DataFrame.
Example:
echo $df;
// shape: (3, 3)
// ┌─────────┬─────┬─────────┐
// │ name ┆ age ┆ city │
// │ --- ┆ --- ┆ --- │
// │ str ┆ i64 ┆ str │
// ╞═════════╪═════╪═════════╡
// │ Alice ┆ 25 ┆ NYC │
// │ Bob ┆ 30 ┆ LA │
// │ Charlie ┆ 35 ┆ Chicago │
// └─────────┴─────┴─────────┘