You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [simple_linear_regression] Review lecture pandas code, spelling with update to american spelling
* update data location
* update all fl to data_url
* TST: add label but no caption
* update numbered and captioned figures
* ensure only one figure is returned
* remove tip
However we need to think about formalising this guessing process by thinking of this problem as an optimization problem.
144
+
However we need to think about formalizing this guessing process by thinking of this problem as an optimization problem.
118
145
119
146
Let's consider the error $\epsilon_i$ and define the difference between the observed values $y_i$ and the estimated values $\hat{y}_i$ which we will call the residuals
The Ordinary Least Squares (OLS) method, as the name suggests, chooses $\alpha$ and $\beta$ in such a way that **minimises** the Sum of the Squared Residuals (SSR).
177
+
The Ordinary Least Squares (OLS) methodchooses $\alpha$ and $\beta$ in such a way that **minimizes** the sum of the squared residuals (SSR).
You can download {download}`a copy of the data here <_static/lecture_specific/simple_linear_regression/life-expectancy-vs-gdp-per-capita.csv>` if you get stuck
411
+
You can download {download}`a copy of the data here <https://github.com/QuantEcon/lecture-python-intro/raw/main/lectures/_static/lecture_specific/simple_linear_regression/life-expectancy-vs-gdp-per-capita.csv>` if you get stuck
367
412
368
413
**Q3:** Use `pandas` to import the `csv` formatted data and plot a few different countries of interest
369
414
370
415
```{code-cell} ipython3
371
-
fl = "_static/lecture_specific/simple_linear_regression/life-expectancy-vs-gdp-per-capita.csv" # TODO: Replace with GitHub link
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectancy (Years)",);
501
+
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life expectancy (years)",);
457
502
```
458
503
459
504
This data shows a couple of interesting relationships.
460
505
461
506
1. there are a number of countries with similar GDP per capita levels but a wide range in Life Expectancy
462
507
2. there appears to be a positive relationship between GDP per capita and life expectancy. Countries with higher GDP per capita tend to have higher life expectancy outcomes
463
508
464
-
Even though OLS is solving linear equations -- one option we have is to transform the variables, such as through a log transform, and then use OLS to estimate the transformed variables
465
-
466
-
:::{tip}
467
-
ln -> ln == elasticities
468
-
:::
509
+
Even though OLS is solving linear equations -- one option we have is to transform the variables, such as through a log transform, and then use OLS to estimate the transformed variables.
469
510
470
511
By specifying `logx` you can plot the GDP per Capita data on a log scale
471
512
472
513
```{code-cell} ipython3
473
-
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectancy (Years)", logx=True);
514
+
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life expectancy (years)", logx=True);
474
515
```
475
516
476
517
As you can see from this transformation -- a linear model fits the shape of the data more closely.
0 commit comments