Skip to content

Functional Coverage & Roadmap

This document captures how fpstreams applies functional programming concepts today, what gaps are worth filling next, and how Rust could accelerate expensive workloads while preserving the Python-first API.

Current Functional Programming Coverage

fpstreams already embodies several FP staples:

  • Composable pipelines via Stream/ParallelStream transformations such as map, filter, flat_map, zip, scan, batch, and window.
  • Lazy evaluation in stream pipelines, with terminal operations like collect, reduce, to_list, and count.
  • Functional helpers like pipe, curry, and retry for composition, currying, and robust async retries.
  • Container types (Option, Result) that encode nullability and error handling in a functional style.
  • Collectors that provide grouping, summarizing, partitioning, and mapping into aggregate results.
  • Async and parallel variants to keep functional pipelines consistent across sync/async/CPU-bound workloads.

Potential Functional Additions

If you want deeper FP ergonomics, these additions would keep the API aligned with the existing stream/collector style:

  1. Stream combinators
  2. partition(predicate) → returns (matches, non_matches) without forcing collectors.
  3. chunk_by(key_fn) → starts a new chunk when the key changes (useful for run-length-like grouping).
  4. distinct_by(key_fn) → distinct with projection, complementing distinct().
  5. take_until(predicate) / drop_until(predicate) → common FP flow-control operations.
  6. merge_sorted(other, key=None) → stream-friendly merges for pre-sorted inputs.

  7. Collector extensions

  8. median, percentile, and histogram collectors for richer stats.
  9. top_n / bottom_n collectors to avoid full sorts for large datasets.

  10. Option/Result ergonomics

  11. zip, sequence, and traverse helpers to combine Option/Result values across collections.

  12. Type-focused affordances

  13. map_typed / filter_typed variants or overloads that narrow types for better IDE guidance.

These are additive and can remain optional, preserving the current API surface while giving power users more functional vocabulary.

Rust Acceleration Plan

Certain operations are CPU-heavy or memory-sensitive and are prime candidates for a Rust extension module. The key is to keep Python ergonomics while enabling a fast-path for large data or numeric workloads.

Candidate hotspots

  • Numeric collectors: summarizing, summing, averaging, quantiles.
  • High-volume transforms: map/filter/flat_map on numeric streams.
  • Windowing/scan: especially on large sequences of numeric data.
  • Group-by and distinct for large datasets (hashing overhead in Python can be high).
  • Parallel operations: a Rust-backed parallel() pipeline using rayon for consistent throughput.

Proposed approach

  1. Optional extension module
  2. Build a fpstreams_rust extension via pyo3 + maturin.
  3. Ship as an extra (e.g., pip install fpstreams[fast]) that preserves the pure-Python fallback.

  4. Stable Python API

  5. Keep the public classes and method signatures unchanged.
  6. Route to Rust fast-paths when the stream contains Rust-friendly types (e.g., numeric lists, NumPy arrays, or buffer protocol inputs).

  7. Interoperability strategy

  8. Support Python iterables for compatibility, but add an optimized path for lists/tuples/arrays.
  9. Use PyBuffer/NumPy views for zero-copy operations where possible.

  10. Incremental rollout

  11. Start with collectors (summarizing, summing, averaging) and window/scan operations.
  12. Add parallel map/filter/reduce after functional parity is proven.
  13. Gate by benchmarks to validate real-world wins.

  14. Testing & CI

  15. Add Python/Rust parity tests.
  16. Build wheels for major platforms in CI to keep installation friction low.

This plan keeps fpstreams easy to install while unlocking a high-performance option for complex workloads.