From 07ff9cd5880c5168c1fc91327b42e2289f282907 Mon Sep 17 00:00:00 2001 From: Samer Hamood Date: Mon, 27 Jan 2025 17:52:37 +0000 Subject: [PATCH] Improve docs (#212) * Correct table format * Use single-line JSON label for code block * Make grammar consistent, add punctuation and generally fix language a bit * Make language used more uniformly consistent, structured and accurate and correct some spelling, grammar and punctuation * Link file references to files * Format names of function parameters * Add language improvements to CHANGELOG.md --- CHANGELOG.md | 63 ++++++++++++++++++++++++++---------------------- README.md | 68 ++++++++++++++++++++++++++-------------------------- 2 files changed, 69 insertions(+), 62 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index dbc7f02..60e3e9c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,30 +1,34 @@ # Changelog +## Release 1.6 + +- Corrected and improved language consistency in [readme](README.md) and `CHANGELOG.md` + ## Release 1.5 ## Release 1.4 -- Allow empty sequence expressions `seq()`, `pseq()` (#159) -- Add `no_wrap` option to `head()`, `head_option()`, `first()`, `last()` and `last_option()`, as well as to `seq()`, `pseq()` and `Sequence` constructor +- Added support for empty sequence expressions `seq()`, `pseq()` (#159) +- Added `no_wrap` option to `head()`, `head_option()`, `first()`, `last()` and `last_option()`, as well as to `seq()`, `pseq()` and `Sequence` constructor ## Release 1.3.0 -- added precompute attribute to reverse transformation (#137) -- Update setup.py dill to requirements.txt (#138) +- Added precompute attribute to reverse transformation (#137) +- Updated setup.py dill to requirements.txt (#138) - Docstring of tail fixed (#140) -- adding extend feature (#144) +- Added extend feature (#144) ## Release 1.2.0 -- Fix Broken link in readme -- Loosen version requirements #129 -- Fix lint errors -- Fix StopIteration errors for Python 3.7 #132 -- Drop support for python 3.4 +- Fixed Broken link in readme +- Loosened version requirements #129 +- Fixed lint errors +- Fixed StopIteration errors for Python 3.7 #132 +- Dropped support for python 3.4 ## Release 1.1.3 -- Fix bug in `partition` https://github.com/EntilZha/PyFunctional/issues/124 +- Fixed bug in `partition` https://github.com/EntilZha/PyFunctional/issues/124 ## Release 1.1.0 @@ -32,19 +36,22 @@ - Implemented `count_by_key` - Implemented `count_by_value` - Implemented `accumulate` https://github.com/EntilZha/PyFunctional/pull/104 -- Fix bug in `grouped` https://github.com/EntilZha/PyFunctional/pull/123 -- Fix bug in `to_csv` https://github.com/EntilZha/PyFunctional/pull/123 -- Fix bug with incorrect wrapping of pandas dataframes https://github.com/EntilZha/PyFunctional/pull/122 -- Allow variance on versions of certain packages: https://github.com/EntilZha/PyFunctional/pull/117 and https://github.com/EntilZha/PyFunctional/pull/116 +- Added support for variance on versions of certain packages: https://github.com/EntilZha/PyFunctional/pull/117 and https://github.com/EntilZha/PyFunctional/pull/116 - Various typo fixes - Various CI fixes -- Fix issue with `first/head` evaluating entire sequence https://github.com/EntilZha/PyFunctional/commit/fb8f3686cf94f072f4e6ed23a361952de1447dc8 -- Drop CI testing and official support for Python 3.3 -- Make import much faster by loading pandas more lazily https://github.com/EntilZha/PyFunctional/issues/99 +- Dropped CI testing and official support for Python 3.3 +- Made import much faster by loading pandas more lazily https://github.com/EntilZha/PyFunctional/issues/99 + +### Bug Fixes + +- Fixed bug in `grouped` https://github.com/EntilZha/PyFunctional/pull/123 +- Fixed bug in `to_csv` https://github.com/EntilZha/PyFunctional/pull/123 +- Fixed bug with incorrect wrapping of pandas dataframes https://github.com/EntilZha/PyFunctional/pull/122 +- Fixed issue with `first/head` evaluating entire sequence https://github.com/EntilZha/PyFunctional/commit/fb8f3686cf94f072f4e6ed23a361952de1447dc8 ## Release 1.0.0 -Reaching `1.0` primarily means that API stability has been reached so I don't expect to run into many new breaking changes. +Reaching `1.0` primarily means that API stability has been reached, so I don't expect to run into many new breaking changes. ### New Features @@ -67,13 +74,13 @@ Reaching `1.0` primarily means that API stability has been reached so I don't ex - Implemented pretty html repr for Jupyter - Implemented proper parsing of pandas DataFrames -- Detect when its possible to pretty print a table and do so +- Added feature to detect when it's possible to pretty print a table and do so - `list`/`to_list` have a parameter `n` to limit number of results ### Bug Fixes - Fixed bug where `grouped` unnecessarily forces precomputation of sequence -- Remove package installations from default requirements that sometimes break installation on barebones systems in python 2.7 +- Removed package installations from default requirements that sometimes break installation on barebones systems in python 2.7 ## Release 0.7.0 @@ -96,14 +103,14 @@ Reaching `1.0` primarily means that API stability has been reached so I don't ex ### Contributors - Thanks to [versae](https://github.com/versae) for implementing most of the `pseq` feature! -- Thanks to [ChuyuHsu](https://github.com/ChuyuHsu) for implemented large parts of the compression feature! +- Thanks to [ChuyuHsu](https://github.com/ChuyuHsu) for implementing large parts of the compression feature! ## Release 0.6.0 ### New Features - Added support for reading to and from SQLite databases -- Change project name to `PyFunctional` from `ScalaFunctional` +- Changed project name from `ScalaFunctional` to `PyFunctional` - Added `to_pandas` call integration ### Internal Changes @@ -125,13 +132,13 @@ Reaching `1.0` primarily means that API stability has been reached so I don't ex - Fixed case where `_wrap` is changing named tuples to arrays when it should preserve them - Fixed documentation on `to_file` which incorrectly copied from `seq.open` delimiter parameter -- Fixed `Sequence.zip_with_index` behavior. used to mimic `enumerate` by zipping on the left size - while scala and spark do zip on the right side. This introduces different behavior and more flexible - behavior in combination with `enumerate` A start parameter was also added like in `enumerate` +- Fixed `Sequence.zip_with_index` behavior, which used to mimic `enumerate` by zipping on the left side + while Scala and Spark zip on the right side. This introduces different but more flexible + behavior in combination with `enumerate`. A start parameter was also added like in `enumerate` ## Release 0.4.1 -Fix python 3 build error due to wheel installation of enum34. Package no longer depends on enum34 +Fixed python 3 build error due to wheel installation of enum34. Package no longer depends on enum34 ## Release 0.4.0 @@ -156,7 +163,7 @@ Fix python 3 build error due to wheel installation of enum34. Package no longer - `Sequence.to_file` to save files - `Sequence.to_csv` to save csv files - Improved documentation with more examples and mention LINQ explicitly -- Change PyPi keywords to improve discoverability +- Changed PyPi keywords to improve discoverability - Created [Google groups mailing list](https://groups.google.com/forum/#!forum/scalafunctional) ### Bug Fixes diff --git a/README.md b/README.md index 2f1997e..14d1237 100644 --- a/README.md +++ b/README.md @@ -135,9 +135,9 @@ seq(words).map(lambda word: (word, 1)).reduce_by_key(lambda x, y: x + y) In the next example we have chat logs formatted in [json lines (jsonl)](http://jsonlines.org/) which contain messages and metadata. A typical jsonl file will have one valid json on each line of a file. -Below are a few lines out of `examples/chat_logs.jsonl`. +Below are a few lines out of [examples/chat_logs.jsonl](examples/chat_logs.jsonl). -```json +```json lines {"message":"hello anyone there?","date":"10/09","user":"bob"} {"message":"need some help with a program","date":"10/09","user":"bob"} {"message":"sure thing. What do you need help with?","date":"10/09","user":"dave"} @@ -160,8 +160,8 @@ word_counts = messages\ ``` -Next, lets continue that example but introduce a json database of users from `examples/users.json`. -In the previous example we showed how `PyFunctional` can do word counts, in the next example lets +Next, let's continue that example but introduce a json database of users from [examples/users.json](examples/users.json). +In the previous example we showed how `PyFunctional` can do word counts, in the next example let's show how `PyFunctional` can join different data sources. ```python @@ -187,8 +187,8 @@ data = users.inner_join(message_tuples) ### CSV, Aggregate Functions, and Set functions -In `examples/camping_purchases.csv` there are a list of camping purchases. Lets do some cost -analysis and compare it the required camping gear list stored in `examples/gear_list.txt`. +In [examples/camping_purchases.csv](examples/camping_purchases.csv) there is a list of camping purchases. Let's do some +cost analysis and compare it to the required camping gear list stored in [examples/gear_list.txt](examples/gear_list.txt). ```python purchases = seq.csv('examples/camping_purchases.csv') @@ -277,7 +277,7 @@ operations are run in parallel with more to be implemented in a future release: Parallelization uses python `multiprocessing` and squashes chains of embarrassingly parallel operations to reduce overhead costs. For example, a sequence of maps and filters would be executed -all at once rather than in multiple loops using `multiprocessing` +all at once rather than in multiple loops using `multiprocessing`. ## Documentation @@ -337,29 +337,29 @@ complete documentation reference [transformation and actions API](http://docs.pyfunctional.pedro.ai/en/latest/functional.html#module-functional.pipeline). | Function | Description | Type | -| ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | +|---------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------| | `map(func)/select(func)` | Maps `func` onto elements of sequence | transformation | -| `starmap(func)/smap(func)` | Apply `func` to sequence with `itertools.starmap` | transformation | +| `starmap(func)/smap(func)` | Applies `func` to sequence with `itertools.starmap` | transformation | | `filter(func)/where(func)` | Filters elements of sequence to only those where `func(element)` is `True` | transformation | | `filter_not(func)` | Filters elements of sequence to only those where `func(element)` is `False` | transformation | | `flatten()` | Flattens sequence of lists to a single sequence | transformation | -| `flat_map(func)` | `func` must return an iterable. Maps `func` to each element, then merges the result to one flat sequence | transformation | +| `flat_map(func)` | Maps `func` to each element, then merges the result to one flat sequence. `func` must return an iterable | transformation | | `group_by(func)` | Groups sequence into `(key, value)` pairs where `key=func(element)` and `value` is from the original sequence | transformation | | `group_by_key()` | Groups sequence of `(key, value)` pairs by `key` | transformation | | `reduce_by_key(func)` | Reduces list of `(key, value)` pairs using `func` | transformation | -| `count_by_key()` | Counts occurrences of each `key` in list of `(key, value)` pairs | transformation | -| `count_by_value()` | Counts occurrence of each value in a list | transformation | +| `count_by_key()` | Counts occurrence of each `key` in sequence of `(key, value)` pairs | transformation | +| `count_by_value()` | Counts occurrence of each value in the sequence | transformation | | `union(other)` | Union of unique elements in sequence and `other` | transformation | | `intersection(other)` | Intersection of unique elements in sequence and `other` | transformation | | `difference(other)` | New sequence with unique elements present in sequence but not in `other` | transformation | | `symmetric_difference(other)` | New sequence with unique elements present in sequence or `other`, but not both | transformation | | `distinct()` | Returns distinct elements of sequence. Elements must be hashable | transformation | | `distinct_by(func)` | Returns distinct elements of sequence using `func` as a key | transformation | -| `drop(n)` | Drop the first `n` elements of the sequence | transformation | -| `drop_right(n)` | Drop the last `n` elements of the sequence | transformation | -| `drop_while(func)` | Drop elements while `func` evaluates to `True`, then returns the rest | transformation | +| `drop(n)` | Drops the first `n` elements of the sequence | transformation | +| `drop_right(n)` | Drops the last `n` elements of the sequence | transformation | +| `drop_while(func)` | Drops elements while `func` evaluates to `True`, returning the rest | transformation | | `take(n)` | Returns sequence of first `n` elements | transformation | -| `take_while(func)` | Take elements while `func` evaluates to `True`, then drops the rest | transformation | +| `take_while(func)` | Takes elements while `func` evaluates to `True`, dropping the rest | transformation | | `init()` | Returns sequence without the last element | transformation | | `tail()` | Returns sequence without the first element | transformation | | `inits()` | Returns consecutive inits of sequence | transformation | @@ -368,12 +368,12 @@ complete documentation reference | `zip_with_index(start=0)` | Zips the sequence with the index starting at `start` on the right side | transformation | | `enumerate(start=0)` | Zips the sequence with the index starting at `start` on the left side | transformation | | `cartesian(*iterables, repeat=1)` | Returns cartesian product from itertools.product | transformation | -| `inner_join(other)` | Returns inner join of sequence with other. Must be a sequence of `(key, value)` pairs | transformation | -| `outer_join(other)` | Returns outer join of sequence with other. Must be a sequence of `(key, value)` pairs | transformation | -| `left_join(other)` | Returns left join of sequence with other. Must be a sequence of `(key, value)` pairs | transformation | -| `right_join(other)` | Returns right join of sequence with other. Must be a sequence of `(key, value)` pairs | transformation | -| `join(other, join_type='inner')` | Returns join of sequence with other as specified by `join_type`. Must be a sequence of `(key, value)` pairs | transformation | -| `partition(func)` | Partitions the sequence into elements which satisfy `func(element)` and those that don't | transformation | +| `inner_join(other)` | Returns inner join of sequence with `other`. Must be a sequence of `(key, value)` pairs | transformation | +| `outer_join(other)` | Returns outer join of sequence with `other`. Must be a sequence of `(key, value)` pairs | transformation | +| `left_join(other)` | Returns left join of sequence with `other`. Must be a sequence of `(key, value)` pairs | transformation | +| `right_join(other)` | Returns right join of sequence with `other`. Must be a sequence of `(key, value)` pairs | transformation | +| `join(other, join_type='inner')` | Returns join of sequence with `other` as specified by `join_type`. Must be a sequence of `(key, value)` pairs | transformation | +| `partition(func)` | Partitions the sequence into elements that satisfy `func(element)` and those that don't | transformation | | `grouped(size)` | Partitions the elements into groups of size `size` | transformation | | `sorted(key=None, reverse=False)/order_by(func)` | Returns elements sorted according to python `sorted` | transformation | | `reverse()` | Returns the reversed sequence | transformation | @@ -389,7 +389,7 @@ complete documentation reference | `all()` | Returns `True` if all elements in sequence are truthy | action | | `exists(func)` | Returns `True` if `func(element)` for any element in the sequence is `True` | action | | `for_all(func)` | Returns `True` if `func(element)` is `True` for all elements in the sequence | action | -| `find(func)` | Returns the element that first evaluates `func(element)` to `True` | action | +| `find(func)` | Returns the first element for which `func(element)` evaluates to `True` | action | | `any()` | Returns `True` if any element in sequence is truthy | action | | `max()` | Returns maximal element in sequence | action | | `min()` | Returns minimal element in sequence | action | @@ -398,33 +398,33 @@ complete documentation reference | `sum()/sum(projection)` | Returns the sum of elements possibly using a projection | action | | `product()/product(projection)` | Returns the product of elements possibly using a projection | action | | `average()/average(projection)` | Returns the average of elements possibly using a projection | action | -| `aggregate(func)/aggregate(seed, func)/aggregate(seed, func, result_map)` | Aggregate using `func` starting with `seed` or first element of list then apply `result_map` to the result | action | +| `aggregate(func)/aggregate(seed, func)/aggregate(seed, func, result_map)` | Aggregates using `func` starting with `seed` or first element of list then applies `result_map` to the result | action | | `fold_left(zero_value, func)` | Reduces element from left to right using `func` and initial value `zero_value` | action | | `fold_right(zero_value, func)` | Reduces element from right to left using `func` and initial value `zero_value` | action | | `make_string(separator)` | Returns string with `separator` between each `str(element)` | action | | `dict(default=None)` / `to_dict(default=None)` | Converts a sequence of `(Key, Value)` pairs to a `dictionary`. If `default` is not None, it must be a value or zero argument callable which will be used to create a `collections.defaultdict` | action | | `list()` / `to_list()` | Converts sequence to a list | action | | `set() / to_set()` | Converts sequence to a set | action | -| `to_file(path)` | Saves the sequence to a file at path with each element on a newline | action | -| `to_csv(path)` | Saves the sequence to a csv file at path with each element representing a row | action | +| `to_file(path)` | Saves the sequence to a file at `path` with each element on a newline | action | +| `to_csv(path)` | Saves the sequence to a csv file at `path` with each element representing a row | action | | `to_jsonl(path)` | Saves the sequence to a jsonl file with each element being transformed to json and printed to a new line | action | | `to_json(path)` | Saves the sequence to a json file. The contents depend on if the json root is an array or dictionary | action | -| `to_sqlite3(conn, tablename_or_query, *args, **kwargs)` | Save the sequence to a SQLite3 db. The target table must be created in advance. | action | +| `to_sqlite3(conn, tablename_or_query, *args, **kwargs)` | Saves the sequence to a SQLite3 db. The target table must be created in advance | action | | `to_pandas(columns=None)` | Converts the sequence to a pandas DataFrame | action | | `cache()` | Forces evaluation of sequence immediately and caches the result | action | | `for_each(func)` | Executes `func` on each element of the sequence | action | -| `peek(func)` | Executes `func` on each element of the sequence but returns the element | transformation | +| `peek(func)` | Executes `func` on each element of the sequence and returns it | transformation | ### Lazy Execution Whenever possible, `PyFunctional` will compute lazily. This is accomplished by tracking the list of transformations that have been applied to the sequence and only evaluating them when an action is -called. In `PyFunctional` this is called tracking lineage. This is also responsible for the -ability for `PyFunctional` to cache results of computation to prevent expensive re-computation. +called. In `PyFunctional` this is called tracking lineage. This is also responsible for `PyFunctional`'s +ability to cache the results of computations to prevent expensive re-computation. This is predominantly done to preserve sensible behavior and used sparingly. For example, calling `size()` will cache the underlying sequence. If this was not done and the input was an iterator, then further calls would operate on an expired iterator since it was used to compute the length. -Similarly, `repr` also caches since it is most often used during interactive sessions where its +Similarly, `repr` also caches since it is most often used during interactive sessions where it's undesirable to keep recomputing the same value. Below are some examples of inspecting lineage. ```python @@ -457,7 +457,7 @@ l_elements = elements.to_list() Files are given special treatment if opened through the `seq.open` and related APIs. `functional.util.ReusableFile` implements a wrapper around the standard python file to support -multiple iteration over a single file object while correctly handling iteration termination and +multiple iterations over a single file object while correctly handling iteration termination and file closing. ### `no_wrap` option @@ -478,7 +478,7 @@ That behaviour can be changed with `no_wrap` option: ``` -The option is also accpeted by `seq()`/`pseq()` as well as `Sequence()` constructor, for example: +The option is also accepted by `seq()`/`pseq()` as well as `Sequence()` constructor, for example: ``` >>> type(seq([list(), list()], no_wrap=True).last()) @@ -501,7 +501,7 @@ To contribute, create a fork of `PyFunctional`, make your changes, then make sur In order to be merged, all pull requests must: - Pass all the unit tests -- Pass all the pylint tests, or ignore warnings with explanation of why its correct to do so +- Pass all the pylint tests, or ignore warnings with explanation of why it's correct to do so - Not significantly reduce coverage without a good reason ([coveralls.io](coveralls.io/github/EntilZha/PyFunctional)) - Edit the `CHANGELOG.md` file in the `Next Release` heading with changes