You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Main problem: Nodes appear in order of data frame under some conditions (such as symmetric flows) but under unknown conditions (some asymmetric flows, but not all), they appear out of order according to other, unknown rules. Manual positioning using node.x and node.y also has unclear rules. I'm trying to work around the lack of a sorting feature but hitting snags all over the place.
Forgive me, I'm rather new to plotly and don't understand how plotly.R interacts with python or js plotly. In trying to solve this problem, I see Issue #4373 for plotly.js describes lack of a sort feature and Issue #3002 for plotly.py states that node.x and node.y cannot be 0.
My use case is that I want to produce a large set of sankey graphs for flows between 5 specific nodes at Time1 and 5 specific nodes at Time2. For this reason, I would like my nodes to be drawn in the same order every time, no matter the size of the nodes or flows. I wrote script to dynamically find the correct node.y positions for nodes based on their order and size. Even this workaround is running into problems as noted in the code below.
More broadly, why is the data frame order of the nodes being overridden, such as in the uneven_flows example below?
library(plotly)
#> Loading required package: ggplot2#> #> Attaching package: 'plotly'#> The following object is masked from 'package:ggplot2':#> #> last_plot#> The following object is masked from 'package:stats':#> #> filter#> The following object is masked from 'package:graphics':#> #> layout
library(tidyverse)
my_labels<-
c(
"Node 0",
"Node 1",
"Node 2",
"Node 3",
"Node 4",
"Node 5",
"Node 6",
"Node 7",
"Node 8",
"Node 9"
)
# Uses original data, which includes some flows much larger than otherssource_ids<-
c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
target_ids<-
c(5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9)
varying_flows<-
c(60, 23, 1, 0, 9, 15, 33, 13, 4, 3, 0, 9, 8, 2, 1, 0, 4, 12, 127, 9, 4, 4, 1, 11, 1)
my_varying_flows<-data.frame(source_ids, target_ids, varying_flows)
fig1<- plot_ly(
type="sankey",
arrangement="snap",
node=list(
label=my_labels),
link=list(
source=my_varying_flows$source_ids,
target=my_varying_flows$target_ids,
value=my_varying_flows$varying_flows))
fig1<-fig1 %>%
layout(
title=list(
text="fig1 - varying flows out of order"
)
)
# Nodes do not appear in intended order. Node 3, the largest node, appears below# Node 4, and the right hand nodes are also out of order.fig1
# Build a new set of test data with even, identical flowseven_flows<- rep(10, times=25)
my_even_flows<-data.frame(source_ids, target_ids, even_flows)
fig2<- plot_ly(
type="sankey",
arrangement="snap",
node=list(
label=my_labels),
link=list(
source=my_even_flows$source_ids,
target=my_even_flows$target_ids,
value=my_even_flows$even_flows))
fig2<-fig2 %>%
layout(
title=list(
text="fig2 - even flows in order"
)
)
# Displays nodes in intended order, apparently because something behind the# scenes likes the even flows and keeps the default arrangement.fig2
# Workaround to dynamically determine node.y positions relative to size of nodes# and sorting order in original data. But even this behaves in unexpected ways,# and in the node.y argument we need to take the complement of them (i.e., 1 -# the value generated here).label_pos_dfs<-list(
# Label positions of source node labelsmy_varying_flows %>%
group_by(source_ids) %>%
summarize(n= sum(varying_flows)) %>%
rename(node.name=source_ids) %>%
mutate(label.pos=1- (cumsum(n) -n/2) / sum(n)),
# Label positions of target node labelsmy_varying_flows %>%
group_by(target_ids) %>%
summarize(n= sum(varying_flows)) %>%
rename(node.name=target_ids) %>%
mutate(label.pos=1- (cumsum(n) -n/2) / sum(n))
)
my_node_label_y_positions<-
lapply(label_pos_dfs, "[", "label.pos") %>%
bind_rows() %>%
pull(label.pos)
fig3<- plot_ly(
type="sankey",
arrangement="snap",
node=list(
label=my_labels,
# Avoiding 0 values seemed to helpx= c(1e-03, 1e-03, 1e-03, 1e-03, 1e-03, 1, 1, 1, 1, 1),
# Not clear to me why these didn't work and we instead need their# complements (e.g., 1 - original value) for correct placement, as if the# node.y positions were the distance from the top, not the bottom?y=my_node_label_y_positions*-1+1),
link=list(
source=my_varying_flows$source_ids,
target=my_varying_flows$target_ids,
value=my_varying_flows$varying_flows))
fig3<-fig3 %>%
layout(
title=list(
text="fig3 - varying flows in intended order with odd workaround!"
)
)
# Nodes appear in intended order. # fig3
I would like to thank @even-of-the-hour for sharing this. This is a great solution for sorting the nodes exactly how you want them. I have added to this code to accommodate 3 levels and wanted to share in case it helps out anyone else.
# Creating dummy data; 3 levels, each with 4 nodes.d1<-
c(0:3) %>% rep(4) %>% rep(4) %>% sortd2<-
c(4:7) %>% rep(4) %>% sort %>% rep(4)
d3<-
c(8:11) %>% sort %>% rep(4) %>% rep(4)
varying_flows<- rpois(64,0.25)
my_labels<- paste0("Node ", 1:12)
my_varying_flows<-data.frame(d1, d2, d3, varying_flows)
# Convert the data to the format required by sankey functionfor(iin1:2){
group1<- c("d1","d2")[i]
group2<- c("d2","d3")[i]
my_varying_flows_thick<-my_varying_flows %>% group_by(!!as.name(group1),!!as.name(group2)) %>% summarise(sum(varying_flows))
colnames(my_varying_flows_thick) <- c("source", "target", "thickness")
source.label.pos<-my_varying_flows_thick %>%
group_by(source) %>%
summarize(n= sum(thickness)) %>%
mutate(source.label.pos=1- (cumsum(n) -n/2) / (sum(n)))
target.label.pos<-my_varying_flows_thick %>%
group_by(target) %>%
summarize(n= sum(thickness)) %>%
mutate(target.label.pos=1- (cumsum(n) -n/2) / (sum(n)))
my_varying_flows_thick$source.label.pos<-source.label.pos$source.label.pos[match(my_varying_flows_thick$source, source.label.pos$source)]
my_varying_flows_thick$target.label.pos<-target.label.pos$target.label.pos[match(my_varying_flows_thick$target, target.label.pos$target)]
if(i==1){
my_varying_flows_data<-my_varying_flows_thick; next
}
my_varying_flows_data<- rbind(my_varying_flows_data,my_varying_flows_thick)
}
# Calculate x,y positionnode_x<- sort(rep(c(0:2),4))/2+ c(rep(0.001, 4), rep(0,8))
node_y<-my_varying_flows_data[,c("source","source.label.pos")] %>% group_by() %>% unique %>%
select(source.label.pos) %>%
unlist %>% as.numericnode_y<- c(node_y,my_varying_flows_data[,c("target","target.label.pos")] %>% group_by() %>%
filter(!target%in%my_varying_flows_data$source) %>% unique %>%
select(target.label.pos) %>%
unlist %>% as.numeric)
node_y<-node_y*-1+ max(node_y)
node_y<-node_y %>% round(3)
node_y[node_y==0] <-0.001node_y# Plotfig4<- plot_ly(
type="sankey",
arrangement="snap",
node=list(
label=my_labels,
# Avoiding 0 values seemed to helpx=node_x,
# Not clear to me why these didn't work and we instead need their# complements (e.g., 1 - original value) for correct placement, as if the# node.y positions were the distance from the top, not the bottom?y=node_y
),
link=list(
source=my_varying_flows_data$source,
target=my_varying_flows_data$target,
value=my_varying_flows_data$thickness
)
)
fig4<-fig4 %>%
layout(
title=list(
text="fig4 - varying flows in intended order with odd workaround;3 levels"
)
)
# Nodes appear in intended order. fig4
Main problem: Nodes appear in order of data frame under some conditions (such as symmetric flows) but under unknown conditions (some asymmetric flows, but not all), they appear out of order according to other, unknown rules. Manual positioning using node.x and node.y also has unclear rules. I'm trying to work around the lack of a sorting feature but hitting snags all over the place.
Forgive me, I'm rather new to plotly and don't understand how plotly.R interacts with python or js plotly. In trying to solve this problem, I see Issue #4373 for plotly.js describes lack of a sort feature and Issue #3002 for plotly.py states that node.x and node.y cannot be 0.
My use case is that I want to produce a large set of sankey graphs for flows between 5 specific nodes at Time1 and 5 specific nodes at Time2. For this reason, I would like my nodes to be drawn in the same order every time, no matter the size of the nodes or flows. I wrote script to dynamically find the correct node.y positions for nodes based on their order and size. Even this workaround is running into problems as noted in the code below.
Minimally, I guess I'm looking for more detailed documentation about node.x and node.y compared to what is currently in the reference page.
More broadly, why is the data frame order of the nodes being overridden, such as in the uneven_flows example below?
Created on 2022-01-29 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: