Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robustness to relational assumptions #89

Open
mdsumner opened this issue Nov 27, 2018 · 4 comments
Open

robustness to relational assumptions #89

mdsumner opened this issue Nov 27, 2018 · 4 comments
Milestone

Comments

@mdsumner
Copy link
Member

mdsumner commented Nov 27, 2018

silicate is currently very naive in what it tries to do, and if there's a disconnected between the set of object$object_ and object$object_link_edge then conversions to SC0 and so on will fail (try SC0(x)).

Assuming:

  • parent table ids should be unique, effectively primary keys
  • all parent ids exist in link tables

we can get around both of these, by filtering out parents that don't get reference, and ignoring links that go nowhere - but there'll need to be some kind of formalism around this.

Reprex for a given example from osmdata_sc:

l
#> Error in eval(expr, envir, enclos): object 'l' not found
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
x <- opq ("hampi india") %>%
  add_osm_feature (key="historic", value="ruins") %>%
  osmdata_sc ()

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
nrow(distinct(x$object, object_)) == nrow(x$object)
#> [1] FALSE


setdiff(x$object$object_, x$object_link_edge$object_)
#>  [1] "339933896"  "339934121"  "340357917"  "340371192"  "340371784" 
#>  [6] "340371819"  "340372074"  "340372141"  "977472885"  "980400748" 
#> [11] "980400822"  "1148814767" "1427080593" "1427116083" "1676740758"
#> [16] "1676740761" "3456932077" "3907790727" "4066327806" "4550329208"


silicate::SC0(x)
#> Error in `[[<-.data.frame`(`*tmp*`, "topology_", value = list(`1` = structure(list(: replacement has 21 rows, data has 97
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Ubuntu 18.04.1 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2018-11-27                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                           
#>  assertthat    0.2.0      2017-04-11 [2] CRAN (R 3.5.1)                   
#>  backports     1.1.2      2017-12-13 [2] CRAN (R 3.5.1)                   
#>  base64enc     0.1-3      2015-07-28 [2] CRAN (R 3.5.1)                   
#>  bindr         0.1.1      2018-03-13 [2] CRAN (R 3.5.1)                   
#>  bindrcpp      0.2.2      2018-03-29 [2] CRAN (R 3.5.1)                   
#>  callr         3.0.0.9001 2018-11-13 [2] Github (r-lib/callr@be7707d)     
#>  cli           1.0.1.9000 2018-11-13 [2] Github (r-lib/cli@56538e3)       
#>  crayon        1.3.4      2017-09-16 [2] CRAN (R 3.5.1)                   
#>  curl          3.2        2018-03-28 [2] CRAN (R 3.5.1)                   
#>  debugme       1.1.0      2017-10-22 [2] CRAN (R 3.5.1)                   
#>  desc          1.2.0      2018-11-13 [2] Github (r-lib/desc@7c12d36)      
#>  devtools      2.0.1      2018-10-26 [2] CRAN (R 3.5.1)                   
#>  digest        0.6.18     2018-10-10 [2] CRAN (R 3.5.1)                   
#>  dplyr       * 0.7.8      2018-11-10 [2] CRAN (R 3.5.1)                   
#>  evaluate      0.12       2018-10-09 [2] CRAN (R 3.5.1)                   
#>  fs            1.2.6      2018-08-23 [2] CRAN (R 3.5.1)                   
#>  gibble        0.1.2.9001 2018-11-27 [1] local                            
#>  glue          1.3.0      2018-07-17 [2] CRAN (R 3.5.1)                   
#>  htmltools     0.3.6      2017-04-28 [2] CRAN (R 3.5.1)                   
#>  httr          1.3.1      2017-08-20 [2] CRAN (R 3.5.1)                   
#>  jsonlite      1.5        2017-06-01 [2] CRAN (R 3.5.1)                   
#>  knitr         1.20       2018-02-20 [2] CRAN (R 3.5.1)                   
#>  lattice       0.20-38    2018-11-04 [4] CRAN (R 3.5.1)                   
#>  lubridate     1.7.4      2018-04-11 [2] CRAN (R 3.5.1)                   
#>  magrittr      1.5        2014-11-22 [2] CRAN (R 3.5.1)                   
#>  memoise       1.1.0      2017-04-21 [2] CRAN (R 3.5.1)                   
#>  osmdata     * 0.0.8.099  2018-11-27 [1] Github (ropensci/osmdata@94e377e)
#>  pillar        1.3.0      2018-07-14 [2] CRAN (R 3.5.1)                   
#>  pkgbuild      1.0.2      2018-10-16 [2] CRAN (R 3.5.1)                   
#>  pkgconfig     2.0.2      2018-08-16 [2] CRAN (R 3.5.1)                   
#>  pkgload       1.0.2      2018-10-29 [2] CRAN (R 3.5.1)                   
#>  prettyunits   1.0.2      2015-07-13 [2] CRAN (R 3.5.1)                   
#>  processx      3.2.0.9000 2018-11-13 [2] Github (r-lib/processx@8374340)  
#>  ps            1.2.1      2018-11-06 [2] CRAN (R 3.5.1)                   
#>  purrr         0.2.5      2018-05-29 [2] CRAN (R 3.5.1)                   
#>  R6            2.3.0      2018-10-04 [2] CRAN (R 3.5.1)                   
#>  Rcpp          1.0.0      2018-11-07 [2] CRAN (R 3.5.1)                   
#>  remotes       2.0.2      2018-10-30 [2] CRAN (R 3.5.1)                   
#>  rlang         0.3.0.1    2018-10-25 [2] CRAN (R 3.5.1)                   
#>  rmarkdown     1.10       2018-06-11 [2] CRAN (R 3.5.1)                   
#>  rprojroot     1.3-2      2018-01-03 [2] CRAN (R 3.5.1)                   
#>  rvest         0.3.2      2016-06-17 [2] CRAN (R 3.5.1)                   
#>  sessioninfo   1.1.1      2018-11-05 [2] CRAN (R 3.5.1)                   
#>  silicate      0.1.5.9001 2018-11-27 [1] local                            
#>  sp            1.3-1      2018-06-05 [2] CRAN (R 3.5.1)                   
#>  stringi       1.2.4      2018-07-20 [2] CRAN (R 3.5.1)                   
#>  stringr       1.3.1      2018-05-10 [2] CRAN (R 3.5.1)                   
#>  testthat      2.0.1      2018-10-13 [2] CRAN (R 3.5.1)                   
#>  tibble        1.4.2      2018-01-22 [2] CRAN (R 3.5.1)                   
#>  tidyr         0.8.2      2018-10-28 [2] CRAN (R 3.5.1)                   
#>  tidyselect    0.2.5      2018-10-11 [2] CRAN (R 3.5.1)                   
#>  usethis       1.4.0      2018-08-14 [2] CRAN (R 3.5.1)                   
#>  withr         2.1.2      2018-03-15 [2] CRAN (R 3.5.1)                   
#>  xml2          1.2.0      2018-01-24 [2] CRAN (R 3.5.1)                   
#>  yaml          2.2.0      2018-07-25 [2] CRAN (R 3.5.1)                   
#> 
#> [1] /perm_storage/home/mdsumner/R/x86_64-pc-linux-gnu-library/3.5
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Created on 2018-11-27 by the reprex package (v0.2.1)

@mdsumner mdsumner added this to the 0.2.0 milestone Nov 27, 2018
@mpadge
Copy link
Member

mpadge commented Nov 27, 2018

A superficial fix for the start of some of the issues here is in this remote commit. That at least introduces an initial degree of robustness that I think should be present, although of course this has effects that will still need to be processed downstream because the resultant table will have empty topology_ entries.

There are also of course larger issues at play. I do concur with your general assumptions / visions up until now that the object and object_link_edge tables should generally accord. I do think this also makes sense in the osmdata_sc context, and can mostly be reconciled by ensuring that every object_link_edge$object_ is also present in the object table itself. This will not necessarily be so, because my osmdata_sc vision to now has been of the object table more as equivalent to a "feature" matrix so that, for example, SC(an_sfc_list) would yield an empty object table. I can easily reconcile that with your vision by ensuring that nrow(SC(an_sfc_list)$object) == nrow(SC(an_sfc_list)$object_link_edge). This requires simply inserting some effectively empty entries in the object table, much as the commit linked above does for SC0.

It does not, however, solve a coupla larger issues.

  1. osmdata_sc will still generate a lot of duplicated object$object_ entries for reasons described in the osmdata_sc issue, so in general nrow(x$object) will be quite a bit more than nrow(x$object_link_edge), and I don't see any way around that (all sphier-like considerations and whatnot). That of course may make it tricky when joining object and object_link_edge tables, but the object table could easily be spread prior to that act, I'd suggest.
  2. (This one is entirely my fault, no criticism at all, but ...) I still don't see the real logic behind
identical(SC(minimal_mesh), SC(minimal_mesh) %>% SC()) # FALSE

? I need that vision to really be able to work towards a solution here.

@mdsumner
Copy link
Member Author

mdsumner commented Nov 27, 2018

They are randomly generate UIDs, no two will ever be the same (though SC0 ought to be, with some caveats around coordinate precision/identity).

identical(SC(minimal_mesh), SC(minimal_mesh))
# [1] FALSE

set.seed(10)
x <- SC(minimal_mesh)
set.seed(10)
identical(x, SC(minimal_mesh))
[1] TRUE

@mpadge
Copy link
Member

mpadge commented Nov 27, 2018

Oh yeah, stupid me, of course. Ta! Oh no, but hey, this is going to affect the osmdata_sc kind of stuff, because those IDs should be enduring and propagated throughout. I'll delve in myself there ...

@mdsumner
Copy link
Member Author

There is a missing SC.SC <- function(x, ...) x - the other models have that shortcut - but I guess you meant more generally as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants