Skip to content

Commit 4cf94ac

Browse files
committed
🐟 🦈 🐠 🦀 🐙 🦑
1 parent 2f6a66f commit 4cf94ac

File tree

4 files changed

+118
-205
lines changed

4 files changed

+118
-205
lines changed

NEWS.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,35 @@ And constructed with the following guidelines:
2323

2424
For more information on SemVer, please visit http://semver.org/.
2525

26+
v 5.0.0
27+
-------
28+
29+
Another streamlined re-design following new abilities for data hosting and access.
30+
This release relies on a HuggingFace datasets hosting for data and metadata hosting
31+
in parquet and schema.org.
32+
33+
Data access is simplified to use the simple HuggingFace datasets API instead
34+
of the previous contentid-based resolution. This allows metadata to be defined
35+
with directly alongside the data platform independent of the R package.
36+
37+
A simplified access protocol relies on `duckdbfs` for direct reads of tables.
38+
Several functions previously used only to manage connections are now deprecated
39+
or removed, along with a significant number of dependencies.
40+
41+
Core use still centers around the same package API using the `fb_tbl()` function,
42+
with legacy helper functions for common tables like `species()` are still accessible and
43+
can still optionally filter by species name where appropriate. As before, loading the
44+
full tables and sub-setting manually is still recommended.
45+
46+
Historic helper functions like `load_taxa()` (combining the taxonomic classification from Species,
47+
Genus, Family and Order tables), `validate_names()`, and `common_to_sci()` and
48+
`sci_to_common()` should be in working order, all using table-based outputs.
49+
50+
2651
v 4.1.1
2752
-------
2853

29-
* hotfix for bug in 4.1.0 on Windows -- duckdb httpfs on windows creates sigfault
54+
* hotfix for bug in 4.1.0 on Windows -- `duckdb` `httpfs` on windows created `segfault`
3055

3156
v 4.1.0
3257
--------

README.Rmd

Lines changed: 22 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,30 @@ output: github_document
1313
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/rfishbase)](https://github.com/r-hub/cranlogs.app)
1414
<!-- badges: end -->
1515

16+
Welcome to `rfishbase 5`! This is the fourth rewrite of the original `rfishbase` package described in [Boettiger et al. (2012)](https://doi.org/10.1111/j.1095-8649.2012.03464.x).
1617

1718

18-
Welcome to `rfishbase 4`. This is the fourth rewrite of the original `rfishbase` package described in [Boettiger et al. (2012)](https://doi.org/10.1111/j.1095-8649.2012.03464.x).
19+
Another streamlined re-design following new abilities for data hosting and access.
20+
This release relies on a HuggingFace datasets hosting for data and metadata hosting
21+
in parquet and schema.org.
22+
23+
Data access is simplified to use the simple HuggingFace datasets API instead
24+
of the previous contentid-based resolution. This allows metadata to be defined
25+
with directly alongside the data platform independent of the R package.
26+
27+
A simplified access protocol relies on `duckdbfs` for direct reads of tables.
28+
Several functions previously used only to manage connections are now deprecated
29+
or removed, along with a significant number of dependencies.
30+
31+
Core use still centers around the same package API using the `fb_tbl()` function,
32+
with legacy helper functions for common tables like `species()` are still accessible and
33+
can still optionally filter by species name where appropriate. As before, loading the
34+
full tables and sub-setting manually is still recommended.
35+
36+
Historic helper functions like `load_taxa()` (combining the taxonomic classification from Species,
37+
Genus, Family and Order tables), `validate_names()`, and `common_to_sci()` and
38+
`sci_to_common()` should be in working order, all using table-based outputs.
39+
1940

2041
- `rfishbase 1.0` relied on parsing of XML pages served directly from Fishbase.org.
2142
- `rfishbase 2.0` relied on calls to a ruby-based API, `fishbaseapi`, that provided access to SQL snapshots of about 20 of the more popular tables in FishBase or SeaLifeBase.
@@ -91,61 +112,6 @@ available_releases()
91112
```
92113

93114

94-
95-
## Low-memory environments
96-
97-
If you have very limited RAM (e.g. <= 1 GB available) it may be helpful to use `fishbase` tables in remote form by setting `collect = FALSE`. This allows the tables to remain on disk, while the user is still able to use almost all `dplyr` functions (see the `dbplyr` vignette). Once the table is appropriately subset, the user will need to call `dplyr::collect()` to use generic non-dplyr functions, such as plotting commands.
98-
99-
```{r}
100-
fb_tbl("occurrence")
101-
```
102-
103-
104-
## Local copy
105-
106-
Set the option "rfishbase_local_db" = TRUE to create a local copy, otherwise will use a remote copy.
107-
Local copy will get better performance after initial import, but may experience conflicts when
108-
`duckdb` is upgraded or when multiple sessions attempt to access the directory. Remove the default
109-
storage directory (given by `db_dir()`) after upgrading duckdb if using a local copy.
110-
111-
```{r}
112-
options("rfishbase_local_db" = TRUE)
113-
db_disconnect() # close previous remote connection
114-
115-
conn <- fb_conn()
116-
conn
117-
```
118-
119-
Users can trigger a one-time download of all fishbase tables (or a list of desired tables) using `fb_import()`. This will ensure later use of any function can operate smoothly even when no internet connection is available. Any table already downloaded will not be re-downloaded. (Note: `fb_import()` also returns a remote duckdb database connection to the tables, for users who prefer to work with the remote data objects.)
120-
121-
```{r}
122-
fb_import()
123-
```
124-
125-
126-
127-
```{r include=FALSE}
128-
db_disconnect(conn)
129-
```
130-
131-
132-
133-
## Interactive RStudio pane
134-
135-
RStudio users can also browse all fishbase tables interactively in the RStudio connection browser by using the function `fisbase_pane()`. Note that this function will first download a complete set of the fishbase tables.
136-
137-
## Backwards compatibility
138-
139-
140-
`rfishbase` 4.0 tries to maintain as much backwards compatibility as possible with rfishbase 3.0. Because parquet preserves native data types, some encoded types may differ from earlier versions. As before, these are not always the native type -- e.g. fishbase encodes some boolean (logical TRUE/FALSE) values as integer (-1, 0) or character types. Use `as.logical()` to coerce into the appropriate type in that case.
141-
142-
Toggling between fishbase and sealifebase servers using an environmental variable, `FISHBASE_API`, is now deprecated.
143-
144-
Note that fishbase will store downloaded files by hash in the app directory, given by `db_dir()`. The default location can be set by configuring the desired path in the environmental variable, `FISHBASE_HOME`.
145-
146-
147-
148-
149115
-----------
150116

151117
Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/). By contributing to this project, you agree to abide by its terms.

README.md

Lines changed: 65 additions & 133 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
<!-- badges: start -->
55

6-
[![R-CMD-check](https://github.com/ropensci/rfishbase/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/rfishbase/actions)
6+
[![R-CMD-check](https://github.com/ropensci/rfishbase/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ropensci/rfishbase/actions/workflows/R-CMD-check.yaml)
77
[![Coverage
88
status](https://codecov.io/gh/ropensci/rfishbase/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/rfishbase?branch=master)
99
[![Onboarding](https://badges.ropensci.org/137_status.svg)](https://github.com/ropensci/software-review/issues/137)
@@ -12,10 +12,35 @@ status](https://www.r-pkg.org/badges/version/rfishbase)](https://cran.r-project.
1212
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/rfishbase)](https://github.com/r-hub/cranlogs.app)
1313
<!-- badges: end -->
1414

15-
Welcome to `rfishbase 4`. This is the fourth rewrite of the original
15+
Welcome to `rfishbase 5`! This is the fourth rewrite of the original
1616
`rfishbase` package described in [Boettiger et
1717
al. (2012)](https://doi.org/10.1111/j.1095-8649.2012.03464.x).
1818

19+
Another streamlined re-design following new abilities for data hosting
20+
and access. This release relies on a HuggingFace datasets hosting for
21+
data and metadata hosting in parquet and schema.org.
22+
23+
Data access is simplified to use the simple HuggingFace datasets API
24+
instead of the previous contentid-based resolution. This allows metadata
25+
to be defined with directly alongside the data platform independent of
26+
the R package.
27+
28+
A simplified access protocol relies on `duckdbfs` for direct reads of
29+
tables. Several functions previously used only to manage connections are
30+
now deprecated or removed, along with a significant number of
31+
dependencies.
32+
33+
Core use still centers around the same package API using the `fb_tbl()`
34+
function, with legacy helper functions for common tables like
35+
`species()` are still accessible and can still optionally filter by
36+
species name where appropriate. As before, loading the full tables and
37+
sub-setting manually is still recommended.
38+
39+
Historic helper functions like `load_taxa()` (combining the taxonomic
40+
classification from Species, Genus, Family and Order tables),
41+
`validate_names()`, and `common_to_sci()` and `sci_to_common()` should
42+
be in working order, all using table-based outputs.
43+
1944
- `rfishbase 1.0` relied on parsing of XML pages served directly from
2045
Fishbase.org.
2146
- `rfishbase 2.0` relied on calls to a ruby-based API, `fishbaseapi`,
@@ -57,24 +82,23 @@ function:
5782
fb_tbl("ecosystem")
5883
```
5984

60-
# A tibble: 157,870 × 18
61-
autoctr E_CODE Ecosy…¹ Specc…² Stock…³ Status Curre…⁴ Abund…⁵ LifeS…⁶ Remarks
62-
<int> <int> <int> <int> <int> <chr> <chr> <chr> <chr> <chr>
63-
1 1 1 50628 549 565 native Present <NA> adults <NA>
64-
2 2 1 189 552 568 native Present <NA> adults <NA>
65-
3 3 1 189 554 570 native Present <NA> adults <NA>
66-
4 4 1 79732 873 889 native Present <NA> adults <NA>
67-
5 5 1 5217 948 964 native Present <NA> adults <NA>
68-
6 7 1 39852 956 972 native Present <NA> adults <NA>
69-
7 8 1 39852 957 973 native Present <NA> adults <NA>
70-
8 9 1 39852 958 974 native Present <NA> adults <NA>
71-
9 10 1 188 1526 1719 native Present <NA> adults <NA>
72-
10 11 1 188 1626 1819 native Present <NA> adults <NA>
73-
# … with 157,860 more rows, 8 more variables: Entered <int>,
74-
# Dateentered <dttm>, Modified <int>, Datemodified <dttm>, Expert <int>,
75-
# Datechecked <dttm>, WebURL <chr>, TS <dttm>, and abbreviated variable names
76-
# ¹​EcosystemRefno, ²​Speccode, ³​Stockcode, ⁴​CurrentPresence, ⁵​Abundance,
77-
# ⁶​LifeStage
85+
# A tibble: 160,334 × 18
86+
autoctr E_CODE EcosystemRefno Speccode Stockcode Status CurrentPresence
87+
<int> <int> <int> <int> <int> <chr> <chr>
88+
1 1 1 50628 549 565 native Present
89+
2 2 1 189 552 568 native Present
90+
3 3 1 189 554 570 native Present
91+
4 4 1 79732 873 889 native Present
92+
5 5 1 5217 948 964 native Present
93+
6 7 1 39852 956 972 native Present
94+
7 8 1 39852 957 973 native Present
95+
8 9 1 39852 958 974 native Present
96+
9 10 1 188 1526 1719 native Present
97+
10 11 1 188 1626 1819 native Present
98+
# ℹ 160,324 more rows
99+
# ℹ 11 more variables: Abundance <chr>, LifeStage <chr>, Remarks <chr>,
100+
# Entered <int>, Dateentered <dttm>, Modified <int>, Datemodified <dttm>,
101+
# Expert <int>, Datechecked <dttm>, WebURL <chr>, TS <dttm>
78102

79103
You can see all the tables using `fb_tables()` to see a list of all the
80104
table names (specify `sealifebase` if desired). Careful, there are a lot
@@ -115,26 +139,26 @@ parallels the database structure of Fishbase. As such, almost all
115139
fb_tbl("species", "sealifebase")
116140
```
117141

118-
# A tibble: 103,169 × 109
119-
SpecCode Genus Species Author Speci…¹ FBname FamCode Subfa…² GenCode TaxIs…³
120-
<int> <chr> <chr> <chr> <int> <chr> <int> <chr> <int> <int>
121-
1 10217 Abyss… cidaris Poore3113 <NA> 512 <NA> 9280 0
122-
2 10218 Abyss… panope Poore3113 <NA> 512 <NA> 9280 0
123-
3 90399 Abyss… averin… Kussa3113 <NA> 502 <NA> 17490 0
124-
4 52610 Abyss… millari Monni… 2585 <NA> 978 <NA> 9281 0
125-
5 52611 Abyss… wyvill… Herdm2892 <NA> 978 <NA> 9281 0
126-
6 138684 Abyss… planus (Slad81020 <NA> 1615 <NA> 24229 0
127-
7 90400 Abyss… acutil… Doti … 3113 <NA> 587 <NA> 9282 0
128-
8 10219 Abyss… argent… Menzi3113 <NA> 587 <NA> 9282 0
129-
9 10220 Abyss… bathya… Just,3113 <NA> 587 <NA> 9282 0
130-
10 10221 Abyss… dentif… Menzi3113 <NA> 587 <NA> 9282 0
131-
# … with 103,159 more rows, 99 more variables: Remark <chr>,
132-
# PicPreferredName <chr>, PicPreferredNameM <chr>, PicPreferredNameF <chr>,
133-
# PicPreferredNameJ <chr>, Source <chr>, AuthorRef <int>, SubGenCode <int>,
134-
# Fresh <int>, Brack <int>, Saltwater <int>, Land <int>, BodyShapeI <chr>,
135-
# DemersPelag <chr>, AnaCat <chr>, MigratRef <int>, DepthRangeShallow <int>,
136-
# DepthRangeDeep <int>, DepthRangeRef <int>, DepthRangeComShallow <int>,
137-
# DepthRangeComDeep <int>, DepthComRef <int>, LongevityWild <dbl>, …
142+
# A tibble: 102,464 × 111
143+
SpecCode Genus Species Author SpeciesRefNo FBname FamCode Subfamily GenCode
144+
<int> <chr> <chr> <chr> <int> <chr> <int> <chr> <int>
145+
1 57969 Abdopus horrid… (D'Or 96968 Red S… 1890 Octopodi… 24384
146+
2 57836 Abdopus tenebr… (Smit 19 <NA> 1890 Octopodi… 24384
147+
3 57142 Abdopus tongan… (Hoyl 19 <NA> 1890 Octopodi… 24384
148+
4 2381155 Abdopus undula… Huffa… 84307 <NA> 1890 <NA> 24384
149+
5 14647 Abebai… troglo… Vande 19 <NA> 572 <NA> 9260
150+
6 165283 Aberom… muranoi Baces 104101 <NA> 616 <NA> 33537
151+
7 140720 Aberra… banyul… Macki… 85340 <NA> 174 <NA> 9262
152+
8 40346 Aberra… enigma… unspe 19 <NA> 174 <NA> 9262
153+
9 20199 Aberra… aberra… (Barn 19 <NA> 308 <NA> 9263
154+
10 93706 Aberro… verruc… Kasat 3696 <NA> 922 <NA> 17969
155+
# ℹ 102,454 more rows
156+
# ℹ 102 more variables: TaxIssue <int>, Remark <chr>, PicPreferredName <chr>,
157+
# PicPreferredNameM <chr>, PicPreferredNameF <chr>, PicPreferredNameJ <chr>,
158+
# Source <chr>, AuthorRef <int>, SubGenCode <int>, Fresh <int>, Brack <int>,
159+
# Saltwater <int>, Land <int>, BodyShapeI <chr>, DemersPelag <chr>,
160+
# Amphibious <chr>, AmphibiousRef <int>, AnaCat <chr>, MigratRef <int>,
161+
# DepthRangeShallow <int>, DepthRangeDeep <int>, DepthRangeRef <int>, …
138162

139163
## Versions and importing all tables
140164

@@ -147,99 +171,7 @@ fishbase.org. Check available releases:
147171
available_releases()
148172
```
149173

150-
[1] "23.01" "21.06" "19.04"
151-
152-
## Low-memory environments
153-
154-
If you have very limited RAM (e.g. \<= 1 GB available) it may be helpful
155-
to use `fishbase` tables in remote form by setting `collect = FALSE`.
156-
This allows the tables to remain on disk, while the user is still able
157-
to use almost all `dplyr` functions (see the `dbplyr` vignette). Once
158-
the table is appropriately subset, the user will need to call
159-
`dplyr::collect()` to use generic non-dplyr functions, such as plotting
160-
commands.
161-
162-
``` r
163-
fb_tbl("occurrence")
164-
```
165-
166-
# A tibble: 1,097,303 × 106
167-
catnum2 OccurrenceR…¹ SpecC…² Syncode Stock…³ Genus…⁴ Speci…⁵ ColName PicName
168-
<int> <int> <int> <int> <int> <chr> <chr> <chr> <chr>
169-
1 34424 36653 227 22902 241 "Megal… "cypri… "Megal… ""
170-
2 95154 45880 NA NA NA "" "" "" ""
171-
3 97606 45880 NA NA NA "" "" "" ""
172-
4 100025 45880 5520 25676 5809 "Johni… "belan… "" ""
173-
5 98993 45880 5676 16650 5969 "Chrom… "retro… "" ""
174-
6 99316 45880 454 23112 468 "Drepa… "punct… "" ""
175-
7 99676 45880 5388 145485 5647 "Gymno… "bosch… "" ""
176-
8 99843 45880 16813 119925 15264 "Hemir… "balin… "" ""
177-
9 100607 45880 8288 59635 8601 "Ostra… "rhino… "" ""
178-
10 101529 45880 NA NA NA "Scomb… "toloo… "" ""
179-
# … with 1,097,293 more rows, 97 more variables: CatNum <chr>, URL <chr>,
180-
# Station <chr>, Cruise <chr>, Gazetteer <chr>, LocalityType <chr>,
181-
# WaterDepthMin <dbl>, WaterDepthMax <dbl>, AltitudeMin <int>,
182-
# AltitudeMax <int>, LatitudeDeg <int>, LatitudeMin <dbl>, NorthSouth <chr>,
183-
# LatitudeDec <dbl>, LongitudeDeg <int>, LongitudeMIn <dbl>, EastWest <chr>,
184-
# LongitudeDec <dbl>, Accuracy <chr>, Salinity <chr>, LatitudeTo <dbl>,
185-
# LongitudeTo <dbl>, LatitudeDegTo <int>, LatitudeMinTo <dbl>, …
186-
187-
## Local copy
188-
189-
Set the option “rfishbase_local_db” = TRUE to create a local copy,
190-
otherwise will use a remote copy. Local copy will get better performance
191-
after initial import, but may experience conflicts when `duckdb` is
192-
upgraded or when multiple sessions attempt to access the directory.
193-
Remove the default storage directory (given by `db_dir()`) after
194-
upgrading duckdb if using a local copy.
195-
196-
``` r
197-
options("rfishbase_local_db" = TRUE)
198-
db_disconnect() # close previous remote connection
199-
200-
conn <- fb_conn()
201-
conn
202-
```
203-
204-
<duckdb_connection 5fa20 driver=<duckdb_driver 543a0 dbdir='/home/cboettig/.local/share/R/rfishbase/fishbase_23.01' read_only=FALSE bigint=numeric>>
205-
206-
Users can trigger a one-time download of all fishbase tables (or a list
207-
of desired tables) using `fb_import()`. This will ensure later use of
208-
any function can operate smoothly even when no internet connection is
209-
available. Any table already downloaded will not be re-downloaded.
210-
(Note: `fb_import()` also returns a remote duckdb database connection to
211-
the tables, for users who prefer to work with the remote data objects.)
212-
213-
``` r
214-
fb_import()
215-
```
216-
217-
<duckdb_connection 5fa20 driver=<duckdb_driver 543a0 dbdir='/home/cboettig/.local/share/R/rfishbase/fishbase_23.01' read_only=FALSE bigint=numeric>>
218-
219-
## Interactive RStudio pane
220-
221-
RStudio users can also browse all fishbase tables interactively in the
222-
RStudio connection browser by using the function `fisbase_pane()`. Note
223-
that this function will first download a complete set of the fishbase
224-
tables.
225-
226-
## Backwards compatibility
227-
228-
`rfishbase` 4.0 tries to maintain as much backwards compatibility as
229-
possible with rfishbase 3.0. Because parquet preserves native data
230-
types, some encoded types may differ from earlier versions. As before,
231-
these are not always the native type – e.g. fishbase encodes some
232-
boolean (logical TRUE/FALSE) values as integer (-1, 0) or character
233-
types. Use `as.logical()` to coerce into the appropriate type in that
234-
case.
235-
236-
Toggling between fishbase and sealifebase servers using an environmental
237-
variable, `FISHBASE_API`, is now deprecated.
238-
239-
Note that fishbase will store downloaded files by hash in the app
240-
directory, given by `db_dir()`. The default location can be set by
241-
configuring the desired path in the environmental variable,
242-
`FISHBASE_HOME`.
174+
[1] "19.04" "21.06" "23.01" "23.05" "24.07"
243175

244176
------------------------------------------------------------------------
245177

0 commit comments

Comments
 (0)