LazyFrame#

The LazyFrame class represents a lazy computation graph. Operations on a LazyFrame are not executed immediately — instead, they build a query plan that is optimized and executed when collect() is called.

Create a LazyFrame via DataFrame::lazy(), or by scanning a file with LazyFrame::scanCsv(), LazyFrame::scanNdjson(), or LazyFrame::scanParquet().

Scan Methods (Static Constructors)#

scanCsv#

Scan a CSV file into a LazyFrame. The file is not fully read into memory — instead, a query plan is created that reads data on demand.

param string $path:

Path to the CSV file

param bool $hasHeader:

Whether the first row contains column headers (default: true)

param string $separator:

Column separator character (default: “,”)

returns:

LazyFrame

raises Polars\Exception:

If file cannot be scanned

Example:

$lf = LazyFrame::scanCsv('data.csv');
$df = $lf->filter(Expr::col('age')->gt(30))->collect();

scanNdjson#

Scan a NDJSON (newline-delimited JSON) file into a LazyFrame.

param string $path:

Path to the NDJSON file

returns:

LazyFrame

raises Polars\Exception:

If file cannot be scanned

Example:

$lf = LazyFrame::scanNdjson('data.ndjson');
$df = $lf->select([Expr::col('name'), Expr::col('age')])->collect();

scanParquet#

Scan a Parquet file into a LazyFrame. Parquet scanning is highly efficient due to columnar format and predicate pushdown.

param string $path:

Path to the Parquet file

returns:

LazyFrame

raises Polars\Exception:

If file cannot be scanned

Example:

$lf = LazyFrame::scanParquet('data.parquet');
$df = $lf->filter(Expr::col('salary')->gt(50000))->collect();

Core Methods#

collect#

Execute the lazy query and return a materialized DataFrame.

returns:

DataFrame

raises Polars\Exception:

If the query plan fails to execute

Example:

$df = new DataFrame(['a' => [1, 2, 3]]);
$result = $df->lazy()
    ->filter(Expr::col('a')->gt(1))
    ->collect();

select#

Select columns by expression.

param array $expressions:

Array of Polars\Expr objects

returns:

LazyFrame

filter#

Filter rows by a boolean expression.

param Expr $expression:

Boolean expression to filter by

returns:

LazyFrame

withColumns#

Add or overwrite columns using expressions.

param array $expressions:

Array of Polars\Expr objects

returns:

LazyFrame

groupBy#

Group by one or more expressions. Returns a LazyGroupBy object.

param array $expressions:

Array of Polars\Expr objects to group by

returns:

LazyGroupBy

sort#

Sort by a column.

param string $column:

Column name to sort by

param bool $descending:

Sort in descending order (default: false)

param bool $nullsLast:

Place null values last (default: true)

returns:

LazyFrame

Attributes#

columns (getter)#

Get column names.

returns:

string[]

dtypes (getter)#

Get data types of all columns.

returns:

DataType[]

width#

Get the number of columns.

schema#

Get the schema description as a string.

Row Operations#

tail#

Get the last n rows.

first#

Get the first row.

last#

Get the last row.

slice#

Get a slice of rows.

limit#

Limit to n rows (alias for head).

Aggregations#

count#

Count non-null elements per column.

sum / mean / median / min / max#

Aggregate all columns to their respective values.

std#

Standard deviation with configurable degrees of freedom.

variance#

Variance with configurable degrees of freedom.

quantile#

Quantile aggregation (uses nearest method).

nullCount#

Count null values per column.

Column Manipulation#

drop#

Drop columns by name.

param array $columns:

string[] column names to drop

rename#

Rename columns.

param array $existing:

Old column names

param array $newNames:

New column names

unique#

Remove duplicate rows.

param array|null $subset:

Column names to consider (null = all columns)

param string $keep:

Strategy: ‘first’, ‘last’, ‘any’, or ‘none’

Null Handling#

dropNulls#

Drop rows containing null values.

param array|null $subset:

Column names to check (null = all columns)

fillNull#

Fill null values with a literal or expression.

fillNan#

Fill NaN values with a literal or expression.

Join#

join#

Join with another LazyFrame.

param LazyFrame $other:

Right side of the join

param array $on:

Array of Polars\Expr objects for join columns

param string $how:

Join type: ‘inner’, ‘left’, ‘right’, ‘full’, ‘cross’

Example:

$result = $df1->lazy()
    ->join($df2->lazy(), [Expr::col('key')], how: 'inner')
    ->collect();

Miscellaneous#

withRowIndex#

Add a row index column.

param string $name:

Name of the index column (default: “index”)

param int $offset:

Starting offset (default: 0)

returns:

LazyFrame

reverse#

Reverse the row order.

explain#

Return the query plan as a string.

param bool $optimized:

Show optimized plan (default: true)

cache#

Cache intermediate results.

Sink Methods#

Sink methods execute the lazy query plan and write results directly to a file. They return a DataFrame with the result.

sinkCsv#

Sink the LazyFrame to a CSV file.

param string $path:

Output file path

param bool $includeHeader:

Whether to include column headers (default: true)

param string $separator:

Column separator character (default: “,”)

returns:

DataFrame

raises Polars\Exception:

If sink operation fails

Example:

$df = new DataFrame(['a' => [1, 2, 3], 'b' => [4, 5, 6]]);
$df->lazy()
    ->filter(Expr::col('a')->gt(1))
    ->sinkCsv('output.csv');

sinkParquet#

Sink the LazyFrame to a Parquet file.

param string $path:

Output file path

returns:

DataFrame

raises Polars\Exception:

If sink operation fails

Example:

$df->lazy()->sinkParquet('output.parquet');

sinkNdjson#

Sink the LazyFrame to a NDJSON (newline-delimited JSON) file.

param string $path:

Output file path

returns:

DataFrame

raises Polars\Exception:

If sink operation fails

Example:

$df->lazy()->sinkNdjson('output.ndjson');

__toString#

Returns the unoptimized query plan as a string.