LazyFrame#
The LazyFrame class represents a lazy computation graph. Operations on a LazyFrame are not executed immediately — instead, they build a query plan that is optimized and executed when collect() is called.
Create a LazyFrame via DataFrame::lazy(), or by scanning a file with LazyFrame::scanCsv(), LazyFrame::scanNdjson(), or LazyFrame::scanParquet().
Scan Methods (Static Constructors)#
scanCsv#
Scan a CSV file into a LazyFrame. The file is not fully read into memory — instead, a query plan is created that reads data on demand.
- param string $path:
Path to the CSV file
- param bool $hasHeader:
Whether the first row contains column headers (default: true)
- param string $separator:
Column separator character (default: “,”)
- returns:
LazyFrame
- raises Polars\Exception:
If file cannot be scanned
Example:
$lf = LazyFrame::scanCsv('data.csv');
$df = $lf->filter(Expr::col('age')->gt(30))->collect();
scanNdjson#
Scan a NDJSON (newline-delimited JSON) file into a LazyFrame.
- param string $path:
Path to the NDJSON file
- returns:
LazyFrame
- raises Polars\Exception:
If file cannot be scanned
Example:
$lf = LazyFrame::scanNdjson('data.ndjson');
$df = $lf->select([Expr::col('name'), Expr::col('age')])->collect();
scanParquet#
Scan a Parquet file into a LazyFrame. Parquet scanning is highly efficient due to columnar format and predicate pushdown.
- param string $path:
Path to the Parquet file
- returns:
LazyFrame
- raises Polars\Exception:
If file cannot be scanned
Example:
$lf = LazyFrame::scanParquet('data.parquet');
$df = $lf->filter(Expr::col('salary')->gt(50000))->collect();
Core Methods#
collect#
Execute the lazy query and return a materialized DataFrame.
- returns:
DataFrame
- raises Polars\Exception:
If the query plan fails to execute
Example:
$df = new DataFrame(['a' => [1, 2, 3]]);
$result = $df->lazy()
->filter(Expr::col('a')->gt(1))
->collect();
select#
Select columns by expression.
- param array $expressions:
Array of
Polars\Exprobjects- returns:
LazyFrame
filter#
Filter rows by a boolean expression.
- param Expr $expression:
Boolean expression to filter by
- returns:
LazyFrame
withColumns#
Add or overwrite columns using expressions.
- param array $expressions:
Array of
Polars\Exprobjects- returns:
LazyFrame
groupBy#
Group by one or more expressions. Returns a LazyGroupBy object.
- param array $expressions:
Array of
Polars\Exprobjects to group by- returns:
LazyGroupBy
sort#
Sort by a column.
- param string $column:
Column name to sort by
- param bool $descending:
Sort in descending order (default: false)
- param bool $nullsLast:
Place null values last (default: true)
- returns:
LazyFrame
Attributes#
columns (getter)#
Get column names.
- returns:
string[]
dtypes (getter)#
Get data types of all columns.
- returns:
DataType[]
width#
Get the number of columns.
schema#
Get the schema description as a string.
Row Operations#
head#
Get the first n rows.
tail#
Get the last n rows.
first#
Get the first row.
last#
Get the last row.
slice#
Get a slice of rows.
limit#
Limit to n rows (alias for head).
Aggregations#
count#
Count non-null elements per column.
sum / mean / median / min / max#
Aggregate all columns to their respective values.
std#
Standard deviation with configurable degrees of freedom.
variance#
Variance with configurable degrees of freedom.
quantile#
Quantile aggregation (uses nearest method).
nullCount#
Count null values per column.
Column Manipulation#
drop#
Drop columns by name.
- param array $columns:
string[] column names to drop
rename#
Rename columns.
- param array $existing:
Old column names
- param array $newNames:
New column names
unique#
Remove duplicate rows.
- param array|null $subset:
Column names to consider (null = all columns)
- param string $keep:
Strategy: ‘first’, ‘last’, ‘any’, or ‘none’
Null Handling#
dropNulls#
Drop rows containing null values.
- param array|null $subset:
Column names to check (null = all columns)
fillNull#
Fill null values with a literal or expression.
fillNan#
Fill NaN values with a literal or expression.
Join#
join#
Join with another LazyFrame.
- param LazyFrame $other:
Right side of the join
- param array $on:
Array of
Polars\Exprobjects for join columns- param string $how:
Join type: ‘inner’, ‘left’, ‘right’, ‘full’, ‘cross’
Example:
$result = $df1->lazy()
->join($df2->lazy(), [Expr::col('key')], how: 'inner')
->collect();
Miscellaneous#
withRowIndex#
Add a row index column.
- param string $name:
Name of the index column (default: “index”)
- param int $offset:
Starting offset (default: 0)
- returns:
LazyFrame
reverse#
Reverse the row order.
explain#
Return the query plan as a string.
- param bool $optimized:
Show optimized plan (default: true)
cache#
Cache intermediate results.
Sink Methods#
Sink methods execute the lazy query plan and write results directly to a file. They return a DataFrame with the result.
sinkCsv#
Sink the LazyFrame to a CSV file.
- param string $path:
Output file path
- param bool $includeHeader:
Whether to include column headers (default: true)
- param string $separator:
Column separator character (default: “,”)
- returns:
DataFrame
- raises Polars\Exception:
If sink operation fails
Example:
$df = new DataFrame(['a' => [1, 2, 3], 'b' => [4, 5, 6]]);
$df->lazy()
->filter(Expr::col('a')->gt(1))
->sinkCsv('output.csv');
sinkParquet#
Sink the LazyFrame to a Parquet file.
- param string $path:
Output file path
- returns:
DataFrame
- raises Polars\Exception:
If sink operation fails
Example:
$df->lazy()->sinkParquet('output.parquet');
sinkNdjson#
Sink the LazyFrame to a NDJSON (newline-delimited JSON) file.
- param string $path:
Output file path
- returns:
DataFrame
- raises Polars\Exception:
If sink operation fails
Example:
$df->lazy()->sinkNdjson('output.ndjson');
__toString#
Returns the unoptimized query plan as a string.