-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from mpuren/main
update
- Loading branch information
Showing
7 changed files
with
3,539 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
numero|legislature | ||
1|17/06/1789 - 30/09/1791 ; Assemblée nationale constituante | ||
2|01/10/1791 - 20/09/1792 ; Assemblée nationale législative | ||
3|21/09/1792 - 26/10/1795 ; Convention nationale | ||
4|27/10/1795 - 26/12/1799 ; Conseil des Cinq-Cents | ||
5|01/01/1800 - 04/06/1814 ; Corps législatif | ||
12|04/06/1814 - 20/03/1815 ; Chambre des députés des départements | ||
6|03/06/1815 - 13/07/1815 ; Chambre des représentants | ||
7|07/10/1815 - 05/09/1816 ; Ire législature | ||
8|04/11/1816 - 24/12/1823 ; IIe législature | ||
9|23/03/1824 - 05/11/1827 ; IIIe législature | ||
10|05/02/1828 - 16/05/1830 ; IVe législature | ||
13|23/06/1830 - 31/05/1831 ; Ire législature | ||
14|23/06/1831 - 25/05/1834 ; IIe législature | ||
15|31/07/1834 - 03/10/1837 ; IIIe législature | ||
16|18/12/1837 - 02/02/1839 ; IVe législature | ||
17|04/04/1839 - 12/06/1842 ; Ve législature | ||
18|26/07/1842 - 06/07/1846 ; VIe législature | ||
19|17/08/1846 - 24/02/1848 ; VIIe législature | ||
20|04/05/1848 - 26/05/1849 ; Assemblée nationale constituante | ||
21|28/05/1849 - 02/12/1851 ; Assemblée nationale législative | ||
22|29/03/1852 - 27/11/1857 ; Ire législature | ||
23|28/11/1857 - 04/11/1863 ; IIe législature | ||
24|05/11/1863 - 27/04/1869 ; IIIe législature | ||
25|28/06/1869 - 04/09/1870 ; IVe législature | ||
26|12/02/1871 - 07/03/1876 ; | ||
27|08/03/1876 - 25/06/1877 ; Ire législature | ||
28|07/11/1877 - 27/10/1881 ; IIe législature | ||
29|28/10/1881 - 09/11/1885 ; IIIe législature | ||
30|10/11/1885 - 11/11/1889 ; IVe législature | ||
31|12/11/1889 - 14/10/1893 ; Ve législature | ||
32|15/10/1893 - 31/05/1898 ; VIe législature | ||
33|01/06/1898 - 31/05/1902 ; VIIe législature | ||
34|01/06/1902 - 31/05/1906 ; VIIIe législature | ||
35|01/06/1906 - 31/05/1910 ; IXe législature | ||
36|01/06/1910 - 31/05/1914 ; Xe législature | ||
37|01/06/1914 - 07/12/1919 ; XIe législature | ||
38|08/12/1919 - 31/05/1924 ; XIIe législature | ||
39|01/06/1924 - 31/05/1928 ; XIIIe législature | ||
40|01/06/1928 - 31/05/1932 ; XIVe législature | ||
41|01/06/1932 - 31/05/1936 ; XVe législature | ||
42|01/06/1936 - 31/05/1942 ; XVIe législature | ||
43|06/11/1945 - 10/06/1946 ; Ire Assemblée nationale constituante | ||
44|11/06/1946 - 27/11/1946 ; 2e Assemblée nationale constituante | ||
45|28/11/1946 - 04/07/1951 ; Ire législature | ||
46|05/07/1951 - 01/12/1955 ; IIe législature | ||
47|19/01/1956 - 08/12/1958 ; IIIe législature | ||
48|09/12/1958 - 09/10/1962 ; Ire législature | ||
49|06/12/1962 - 02/04/1967 ; IIe législature | ||
50|03/04/1967 - 30/05/1968 ; IIIe législature | ||
51|11/07/1968 - 01/04/1973 ; IVe législature | ||
52|02/04/1973 - 02/04/1978 ; Ve législature | ||
53|03/04/1978 - 22/05/1981 ; VIe législature | ||
54|02/07/1981 - 01/04/1986 ; VIIe législature | ||
55|02/04/1986 - 14/05/1988 ; VIIIe législature | ||
56|23/06/1988 - 01/04/1993 ; IXe législature | ||
57|02/04/1993 - 21/04/1997 ; Xe législature | ||
58|01/06/1997 - 18/06/2002 ; XIe législature | ||
59|19/06/2002 - 25/06/2007 ; XIIe législature | ||
60|26/06/2007 - 19/06/2012 ; XIIIe législature | ||
61|20/06/2012 - 20/06/2017 ; XIVe législature | ||
62|21/06/2017 - 17/04/2020 ; XVe législature |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Listes des fichiers contenus dans ce répertoire : | ||
|
||
* liste des députés de la IVe législature de la IIIe République : 10/11/1885 - 11/11/1889 (legislature30.csv) | ||
* liste des députés de la Ve législature de la IIIe République : 12/11/1889 - 14/10/1893 (legislature31.csv) | ||
* liste des députés de la VIe législature de la IIIe République : 15/10/1893 - 31/05/1898 (legislature32.csv | ||
* liste des députés ayant siégé au cours des trois législatures sans doublons (legislature30_32.csv) | ||
* liste des législatures avec l'identifiant numérique correspondant utilisé sur le site de l'Assemblée nationale (liste_legislatures.csv) | ||
* le notebook avec la fonction permettant de récupérer les trois premiers fichiers (scrap_AN_legislatures.ipynb) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Webscraping des députés de l'AN pour AGODA" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"La fonction ci-dessous permet de sélectionner des législatures (cf la correspondance des numéros dans le fichier \"liste_legislatures.csv\" et d'obtenir en sortie la liste des députés ayant siégé durant ces législatures sous forme d'un fichier .csv. Pour chaque député on dispose de : \n", | ||
"\n", | ||
"* son identifiant numérique dans la base publiée sur le site web de l'assemblée nationale\n", | ||
"* son nom\n", | ||
"* ses prénoms\n", | ||
"* le lien vers sa page du site web de l'Assemblée nationale\n", | ||
"* son année de naissance\n", | ||
"* son mois de naissance\n", | ||
"* son jour de naissance\n", | ||
"* son année de décès\n", | ||
"* son mois de décès\n", | ||
"* son jour de décès" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def scrap_AN_legislatures(debut, fin):\n", | ||
" import requests\n", | ||
" import bs4 as bs\n", | ||
" import pandas as pd\n", | ||
" debut = int(debut)\n", | ||
" fin = int(fin + 1)\n", | ||
" legislatures = range(debut, fin)\n", | ||
" url = \"http://www2.assemblee-nationale.fr/sycomore/liste/(legislature)/\"\n", | ||
" df_complet = pd.DataFrame(columns=[\"id_depute\", \"nom\", \"prenom\", \"lien\",\"naissance_j\", \"naissance_m\", \"naissance_a\", \"deces_j\", \"deces_m\", \"deces_a\" ])\n", | ||
"\n", | ||
" for leg in legislatures:\n", | ||
" mois = {\"janvier\": \"1\", \"février\": \"2\", \"mars\": \"3\", \"avril\": \"4\", \"mai\": \"5\", \"juin\": \"6\",\\\n", | ||
" \"juillet\": \"7\", \"août\": \"7\", \"septembre\": \"9\", \"octobre\": \"10\", \"novembre\": \"11\", \"décembre\": \"12\"}\n", | ||
" r = requests.get(url + str(leg))\n", | ||
" pageWeb = r.text\n", | ||
" localisation_depart = pageWeb.find(\"<table class=\\\"sycomore\\\">\")\n", | ||
" localisation_fin = pageWeb.find(\"</table>\")\n", | ||
" tableau_seul = pageWeb[localisation_depart:localisation_fin]\n", | ||
" soup = bs.BeautifulSoup(tableau_seul, 'lxml')\n", | ||
" all_tr = list(soup.find_all(\"tr\"))\n", | ||
" all_tr_data = all_tr[1:]\n", | ||
" \n", | ||
" nom = []\n", | ||
" prenom = []\n", | ||
" lien = []\n", | ||
" id_depute = []\n", | ||
" date_naissance = []\n", | ||
" date_deces = []\n", | ||
" \n", | ||
" for data in all_tr_data:\n", | ||
" \n", | ||
" id_depute.append(int(data.find('a').get('href').replace(\"/sycomore/fiche/(num_dept)/\", \"\")))\n", | ||
" nom.append(data.td['data-sort'])\n", | ||
" prenom.append(data.td.get_text().replace(data.find('strong').string, \"\").strip())\n", | ||
" lien.append(\"http://www2.assemblee-nationale.fr\" + data.find('a').get('href'))\n", | ||
" date_naissance.append(data.find_all('td')[1].string)\n", | ||
" date_deces.append(data.find_all('td')[2].string)\n", | ||
" \n", | ||
" data_dic = {\"id_depute\": id_depute, \n", | ||
" \"nom\": nom, \n", | ||
" \"prenom\": prenom, \n", | ||
" \"lien\": lien, \n", | ||
" \"date_naissance\": date_naissance,\n", | ||
" \"date_deces\": date_deces} \n", | ||
"\n", | ||
" df = pd.DataFrame(data_dic)\n", | ||
" df[[\"date_naissance\", \"naissance_m\", \"naissance_a\"]] = df[\"date_naissance\"].str.split(expand=True).replace(\"1er\", \"1\")\n", | ||
" df[[\"date_deces\", \"deces_m\", \"deces_a\"]] = df[\"date_deces\"].str.split(expand=True).replace(\"1er\", \"1\")\n", | ||
" df.rename(columns = {\"date_naissance\": \"naissance_j\", \"date_deces\": \"deces_j\"}, inplace=True)\n", | ||
" df = df[[\"id_depute\", \"nom\", \"prenom\", \"lien\",\"naissance_j\", \"naissance_m\", \"naissance_a\",\\\n", | ||
" \"deces_j\", \"deces_m\", \"deces_a\" ]]\n", | ||
" \n", | ||
" df[\"naissance_m\"] = df[\"naissance_m\"].map(mois, na_action = 'ignore')\n", | ||
" df[\"deces_m\"] = df[\"deces_m\"].map(mois, na_action = 'ignore')\n", | ||
"# On enregistre localement le DF contenant la liste des députés de la législature \n", | ||
" df.to_csv(\"legislature\" + str(leg) + \".csv\", sep=\"|\", index=False)\n", | ||
"# On insère la liste des députés de la législature dans le DF df_complet en supprimant les doublons\n", | ||
" df_complet = pd.concat([df_complet, df], ignore_index=True).drop_duplicates()\n", | ||
" \n", | ||
" df_complet.to_csv(\"legislature\" + str(debut) + \"_\" + str(fin - 1) + \".csv\", sep=\"|\", index=False)\n" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Dans la liste des législatures, les trois suivantes\n", | ||
"\n", | ||
"* 10/11/1885 - 11/11/1889 ; IVe législature\n", | ||
"* 12/11/1889 - 14/10/1893 ; Ve législature\n", | ||
"* 15/10/1893 - 31/05/1898 ; VIe législature\n", | ||
"\n", | ||
"ont respectivement les numéros 30, 31 et 32.\n", | ||
"\n", | ||
"le script ci-dessous permet d'obtenir la liste des députés pour chaque législatures et la liste globale des députés ayant siégé durant ces législatures, sans doublons.\n", | ||
"\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"scrap_AN_legislatures(30, 32)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.5" | ||
}, | ||
"latex_envs": { | ||
"LaTeX_envs_menu_present": true, | ||
"autoclose": false, | ||
"autocomplete": true, | ||
"bibliofile": "biblio.bib", | ||
"cite_by": "apalike", | ||
"current_citInitial": 1, | ||
"eqLabelWithNumbers": true, | ||
"eqNumInitial": 1, | ||
"hotkeys": { | ||
"equation": "Ctrl-E", | ||
"itemize": "Ctrl-I" | ||
}, | ||
"labels_anchors": false, | ||
"latex_user_defs": false, | ||
"report_style_numbering": false, | ||
"user_envs_cfg": false | ||
}, | ||
"toc": { | ||
"base_numbering": 1, | ||
"nav_menu": {}, | ||
"number_sections": true, | ||
"sideBar": true, | ||
"skip_h1_title": false, | ||
"title_cell": "Table of Contents", | ||
"title_sidebar": "Contents", | ||
"toc_cell": false, | ||
"toc_position": {}, | ||
"toc_section_display": true, | ||
"toc_window_display": false | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |