updated python efficient coding

zoupeicheng · zoupeicheng · commit 3712a9d96b79 · 2020-04-05T18:12:34.000+08:00
diff --git a/EfficientCoding.md b/EfficientCoding.md
@@ -93,6 +93,8 @@ print(times.average)
 
 ```
 ## Code profiling
+
+### Profiling time consumption
 Let's see the frequency and timing of each lines of a code
 ```python
 !pip install line_profiler
@@ -107,4 +109,130 @@ Let's see the frequency and timing of each lines of a code
 
 ```
 
+### Profiling memory consumption
+
+```python
+!pip install memory_profiler
+
+%load_ext memory_profiler
+
+## Magic command
+%mprun
+
+## profile a function
+%mprun -f function_name function_name(arg1,arg2,agr3)
+
+```
+
+## Efficient Functions
+
+### zip
+
+```python
+new_list = [*zip(l1,l2,l3,...)] ## Note that each li is an iterable, a list can be combined with a, for example, map object.
+```
+
+### Counter
+
+```python
+
+from collections import Counter
+
+# Use list comprehension to get each names's starting letter
+starting_letters = [a[0] for a in names]
+
+# Collect the count of names for each starting_letter
+starting_letters_count = Counter(starting_letters)
+```
+
+
+### itertools
+
+#### combinations
+```python
+from itertools import combinations
+
+list_tup = [*combinations(obj_list,num)] ## choose num from obj_list
+
+```
+
+### set
+set methods
+* itersection()
+* difference() # a.difference(b) = a\b
+* symmetric_difference() # all elements in exactly one list
+* union()
+
+Use built-in function for membership testing:
+* in
+
+### Eliminating Loops
+Using np vectorizations, map, itertools.
+
+### Writing more efficient Loops
+
+Move operations outside loop as much as possible:
+```python
+# Collect all possible pairs using combinations()
+possible_pairs = [*combinations(types, 2)]
+
+# Create an empty list called enumerated_tuples
+enumerated_tuples = []
+
+# Add a line to append each enumerated_pair_tuple to the empty list above
+for i,pair in enumerate(possible_pairs, 1):
+    enumerated_pair_tuple = (i,) + pair
+    enumerated_tuples.append(enumerated_pair_tuple)
+
+# Convert all tuples in enumerated_tuples to a list
+enumerated_pairs = [*map(list, enumerated_tuples)]
+print(enumerated_pairs)
+```
+
+
+## Iterrating over Pandas Dataframe
+```python
+for i,row in df.iterrows():
+    print(i,row)
+```
+This is much faster than for loop with iloc
+
+However, there is even a faster approach:
+
+```python
+for row_namedtuple in team_wins_df.itertuples():
+    print(row_namedtuple.Team)
+```
+
+If we want to apply a function for each column (0) or row (1), use df.apply.
+
+### Pandas internals
+
+Even better than apply!!!
+
+Vectorize Operations!
+
+```python
+%%timeit
+win_perc_preds_loop = []
+
+# Use a loop and .itertuples() to collect each row's predicted win percentage
+for row in baseball_df.itertuples():
+    runs_scored = row.RS
+    runs_allowed = row.RA
+    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
+    win_perc_preds_loop.append(win_perc_pred)
+
+%%timeit
+# Apply predict_win_perc to each row of the DataFrame
+win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)
+
+%%timeit
+# Calculate the win percentage predictions using NumPy arrays
+win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
+baseball_df['WP_preds'] = win_perc_preds_np
+
+print(baseball_df.head())
+```
+
 
diff --git a/FunctionsInPython.md b/FunctionsInPython.md
@@ -0,0 +1,45 @@
+# Functions in Python
+I recommend this excellent [course](https://campus.datacamp.com/courses/writing-efficient-code-with-pandas/), alongside with the officual document.
+
+## Best Practices
+
+### Docstrings
+
+#### Google Style
+
+```python
+def function(ar_1,arg_2=42):
+    """Imperative language description here
+    
+
+    Args:
+      arg_1 (str): Description
+      arg_2 (int,optional): Write optional when an argument has a 
+        default value.
+
+
+
+    Returns:
+      bool: Optional description of the return value
+      Extra lines are not indented.
+
+    Raises:
+      ValueError: Include any error types that the function 
+        intentionally raises
+
+    Notes:
+      See https://www.website.com
+      for more info.
+    """
+    return something
+
+
+## retrieve the doc string
+print(function.__doc__)
+
+import inspect
+print(inspect.getdoc(function))
+
+```
+
+
diff --git a/README.md b/README.md
@@ -77,4 +77,5 @@ There are many great sources to learn Data Science, and here are some advice to
 12. [DeploymentTools](DeploymentTools.md)
 13. Others coming ...
     * [Efficient Coding in Python](EfficientCoding.md)
+    * [Writing Functions in Python](FunctionsInPython.md)
     * Data Structure and Algorithms