Skip to content

Polars Cloud client 0.4.3

Latest

Choose a tag to compare

@ritchie46 ritchie46 released this 14 Jan 11:26
eaf818a

✨ Enhancements

  • Support multi-sink on single node queries
  • Cost-based planner: shuffle cost based on row count and schema
  • Record physical metrics per-attempt
  • Count stage metrics per stage attempt
  • Physical properties description
  • Stage status through scheduler
  • Simplify leaf cache
  • Filter observatory spans on the client side
  • Remove Option<> from physical metrics
  • Add engine and query type to endpoint
  • Save query owner
  • Enable cost-based-planner with client arguments
  • Add Hash "by" field to output partitioning
  • Add Observatory query eviction
  • Execute stages in topological order
  • Cost-based planner POC
  • Checkpoint shuffle data with configurable period
  • Allow configuring the amount of partitions per worker
  • Limit statusmap to a default of 150k queries
  • Prune old QueryResults in scheduler
  • Remove dead nodes on Observatory
  • Add S3 options to anonymous results

🐞 Bug fixes

  • Properly log worker exit errors
  • Change node for join
  • Cost-based planner: top-k lowering
  • Allow parallel reading of large single parquet file
  • Cost-based planner: map-reduce by key edge partitioning
  • Cost-based planner: fix join with slice lowering
  • Fix attempt_number match logic
  • Wrong condition of MapReduceByKey direct lowering
  • Fix summary method on query profile
  • Fix query state after worker failure
  • Manually derive FromPyObject
  • Properly activate the created span when queueing proxy mode
  • Remove duplicate metrics
  • Fix potential panic when dropping query result
  • Really support optional anonymous results
  • Only warn once if we see an unknown OTEL metric
  • Use UTC instead of FixedOffset

🛠️ Other improvements

  • Cost-based planner: always estimate row count, add a cost function
  • Move POLARS_SKIP_DSL_HASH_VERIFICATION into binary
  • Add fn to unwrap a single output of a graph node
  • Prefix local shuffle path with worker identifier
  • Cost-based planner: write edge stats with debug output
  • Cost-based planner: add PhysEdge, clean up phys nodes
  • Remove unnecessary fields from IRNodeProperties
  • Move visualization data generation into polars cloud
  • Cost-based planner: disallow single-node map-reduce
  • Cost-based planner: disallow single-node distinct
  • Cost-based planner: simplify stage building pass
  • Split planner into modules
  • Remove needless clone in run_query
  • Remove physical metrics SSE endpoint
  • Only shuffle read from workers that successfully finished
  • Remove unneeded realloc
  • Convert physical metrics endpoint from SSE to REST
  • Use fetch failure to inform worker failure
  • Update Polars and Pyo3 patch
  • Use jiff to deserialize durations
  • Make dot graph node inputs ordered
  • Remove unused StageProperties::partitioned_by field
  • Add a schema field to logical plan edge
  • Make InputNo into a struct
  • Make forest generic over dag keys
  • Limit statusmap to a default of 150k queries
  • Move query count in the catalog
  • Use empty DataFrameScan for shuffle reads with no paths
  • Move the shuffle read params from IR to its struct
  • Delete remove query on drop
  • Spawn in drop
  • More internal mutability in client
  • Release GIL in client auth
  • Release the GIL on entry