Prework - Edgelist-to-Network
Converting empirical data to networks
This workshop is focused on the integration of empirical data and theory in pollination networks. An important first step toward that integration is to know how to take empirical data and convert them into the standardized matrix format that is used in most packages and functions. For mutualistic networks like plant-pollinator networks, as discussed above, that format is a bipartite matrix with plants as rows, pollinators as columns, and (typically) counts of interactions filling the matrix.
Typically, empirical data collected on ecological networks take the form of an edge list. This is the case for pollination networks in particular. To collect pollination network data, we are typically observing a set of flowers. When a flower visitor (presumed pollinator) visits a plant, we would record on a data sheet or field notebook the identity of the plant species and the identity of the flower visitor species. (In the field, we might capture an insect pollinator for later identification, but the idea remains the same, that at some point we would have a data sheet with the identities of the plants and the pollinators).
Similarly, you could also have an edge list created through pollen DNA metabarcoding; while you might need to play around with the formatting a bit it is common to end up with an edge list like you would have from field data collection.
That might look something like this:
id.num | plant | pollinator |
---|---|---|
2371 | Delphinium barbeyi | Bombus flavifrons |
2372 | Heracleum spondophyllum | Thricops sp. |
2373 | Mertensia fusiformis | Bombus flavifrons |
2374 | Delphinium barbeyi | Bombus appositus |
2375 | Delphinium barbeyi | Bombus flavifrons |
2376 | Mertensia fusiformis | Colias sp. |
2377 | Delphinium barbeyi | Bombus flavifrons |
… but we want to turn that into a bipartite matrix. One approach for coding that transformation is in the code chunk below, focused on the tidyverse
way of doing things. For those of you familiar with the bipartite
package in R, it contains a convenience function to carry out this same transformation (frame2webs()
) and you can certainly just use that function without having to think much about it… though perhaps you do have to think about it a little more than you might like.
For prework for the workshop, we are going through the tidyverse
approach precisely because it does make you think a little more, and is good practice for some of the data programming that we will be doing throughout the workshop. At the end of this file, we also have included code for using the frame2webs
approach.
The general tidyverse
approach is:
- group our data by plant species and pollinator species
- ultimately we don’t want repeated rows or columns for a plant or pollinator; we want them all combined together
- tally the number of visits for each pollinator species to that plant species—basically “collapsing” the rows that have the same plant-pollinator combinations and counting up how many interactions there were for each unique plant-pollinator combination
- create a new column for each pollinator species: keeping the plants as rows, we’ll do this with the
pivot_wider
function indplyr
(part of thetidyverse
), in the process filling in for each plant-pollinator combination the number of interactions we calculated in the prior step- in doing this step, we are also “collapsing” all of the plant rows such that there will be one row per plant
Let’s see this in action. We’ll start with the data frame called edgelist
which is exactly the same as the data in the table above. Before you begin, you’ll want to initialize the tidyverse
and kableExtra
packages. The latter is not necessary for conducting the data transformation, but it’s helpful for displaying your data in table form cleanly in Rmarkdown.
library(tidyverse)
library(kableExtra)
For the first two steps of the transformation, here is how we do that, and what it looks like:
# step 1:
weighted <- edgelist %>%
group_by(plant, pollinator) %>%
# step 2:
tally()
# show in nicely-formatted table:
kable(weighted, row.names = F, ) %>% kable_styling(full_width = F,
position = "left", bootstrap_options = "condensed")
plant | pollinator | n |
---|---|---|
Delphinium barbeyi | Bombus appositus | 1 |
Delphinium barbeyi | Bombus flavifrons | 3 |
Heracleum spondophyllum | Thricops sp. | 1 |
Mertensia fusiformis | Bombus flavifrons | 1 |
Mertensia fusiformis | Colias sp. | 1 |
The new weighted
dataframe is displayed above. You’ll see that it created a new column called “n” that contains the tallied the number of times each unique interaction occurred. It also “collapsed” the rows into unique plant-pollinator interactions; previously there were 7 rows, and now there are 5, because one of the unique interactions (between Bombus flavifrons and Delphinium barbeyi) happened three times.
Moving from this, let’s check out the next step, where using the pivot_wider
function we pivot the single “pollinator” column to form one new column for each unique entry in the old column. We will fill in the values for each row-by-column combination from the tallied interactions (the new “n” column).
Again there are 5 rows in this new matrix, with 4 unique values (Bombus flavifrons is represented twice), so we expect to see four pollinator columns emerge in the next step. Similarly, this should collapse the 5 current plant rows into 3, one for each unique plant species. So we should see a 3 row \(\times\)
4 column bipartite matrix when we are done.
This is how to get there:
# step 3 (pivot_wider)
bipart.net <- weighted %>%
pivot_wider(id_cols = c("plant"), # plants are the rows
names_from = pollinator, # names of new columns: unique pollinators
values_from = n, # values in the matrix from tallied interactions
values_fill = 0) # if no value, enter zero (otherwise NA)
# show in nicely-formatted table:
kable(bipart.net, row.names = F, ) %>% kable_styling(full_width = F,
position = "left", bootstrap_options = "condensed")
plant | Bombus appositus | Bombus flavifrons | Thricops sp. | Colias sp. |
---|---|---|---|---|
Delphinium barbeyi | 1 | 3 | 0 | 0 |
Heracleum spondophyllum | 0 | 0 | 1 | 0 |
Mertensia fusiformis | 0 | 1 | 0 | 1 |
Looks just as hoped for: a 3-row by 4-column matrix. Note in the code above that we specified that values_fill = 0
. We did this because otherwise R
would not know how to fill in the combinations for which it doesn’t have any information, and so it would fill them in with NA
if we left that argument out.
Exercise
Let’s practice turning an edge list into a bipartite network; this time we will do it at a (slightly) larger scale. We are going to use data collected by Berry’s Community Ecology class at the University of Washington in spring 2022, at the UW Medicinal Plant Garden (just outside the Life Sciences Building, where the Brosi Lab is based). The data for pollinators were collected at the family level.
The data file is medgarden.csv
(click on the name to download). Import it and follow the steps above to generate a bipartite network. Call the resulting dataframe bipartite.med
.
You should get this exact bipartite network back:
plant | Osmia | Bombus | Syrphid | Halictid | Muscid | Ant | Hummingbird |
---|---|---|---|---|---|---|---|
Borago officinalis | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Camassia leichtlinii | 0 | 1 | 2 | 0 | 0 | 0 | 0 |
Catharanthus roseus | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Coclearia afficinalis | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
Eriogonum umbellatum | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
Erysimum asperum | 0 | 0 | 4 | 1 | 0 | 2 | 0 |
Gilia capitata | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Glaucium flavum | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Helleborus orientalis | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Heuchera micrantha | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Heuchera sanguinea | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Horminum pyrenaicum | 0 | 0 | 3 | 0 | 0 | 0 | 0 |
Hydrophyllum virginiana | 0 | 0 | 4 | 0 | 1 | 0 | 0 |
Iris douglasiana | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Polemonium reptans | 3 | 0 | 0 | 0 | 2 | 0 | 0 |
Polygonum bisorta | 0 | 0 | 0 | 3 | 1 | 0 | 0 |
Rhodiola rosea | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Ruta graveolens | 0 | 0 | 0 | 2 | 0 | 0 | 0 |
Sedua spathulifolium | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Smilacina stellata | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Thymus leucospermus | 1 | 0 | 1 | 2 | 0 | 0 | 0 |
Tragopogon porrifolius | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Wyethia angustifolia | 1 | 6 | 2 | 0 | 0 | 0 | 0 |
To format your network data in a nice table like the one above, in Rmarkdown, you can use this code:
# show in nicely-formatted table:
kable(bipart.med, row.names = F, ) %>% kable_styling(full_width = F,
position = "left", bootstrap_options = "condensed")
Easier (?) alternative with bipartite
The code above gives an understanding of how the data are transformed from an edge list into a bipartite network. Having that understanding of the code is helpful when your edge list data are formatted in an atypical way or have other challenges associated with them.
Still, as noted above, there is a potentially faster / easier way to accomplish the same task in the bipartite
package in R, using the frame2webs()
function. The downside is not just a lack of understanding of what is happening “under the hood”; it’s also that the function has some, well, idiosyncrasies. We walk through its use below (to read more about the function, enter ?frame2webs
into the console).
A couple of things to keep in mind with this function:
frame2webs()
assumes that you have data with multiple sites; if you only have one site you still have to input the site name to the function call. Thus, if you don’t have a site name in your data (as in the included “medgarden” dataset), you have to add one.- The default output for
frame2webs()
is a list of arrays, where each site has its own network in the format of an array. That can be a powerful approach if you are doing a lot of computation across several distinct / separate networks, but at the cost of not being the easiest to work with.- Below, we have instead set the
type.out
argument to= "array"
so that we get a numeric array instead of a list - We then convert that array back to a data frame for easier use
- Below, we have instead set the
- When converting directly from an array to a data frame, because the array contains information about the site (even if there is only one site!), the column names will be altered to include both the pollinator and the site
- We have fixed this (converted back to only the pollinator names) in the example below
- In this format, in contrast to our
tidyverse
example, the plants are stored as row names rather than as a variable. That structure again has some advantages, but if you’d rather have the plants as their own variable, we included code for that as well.
library(bipartite)
# add a "site" variable
med$site = "site1"
# run `frame2webs`
bipartite.med = frame2webs(med, varnames = c(lower = "plant", higher = "pollinator",
webID = "site"), type.out = "array")
# optional: view the output (array)
bipartite.med
# optional: convert to data frame
bipartite.med = data.frame(bipartite.med)
# optional: view the output (data frame); note pollinator names
bipartite.med
# fix pollinator names (other possible ways to do this in the above step)
names(bipartite.med) = sort(unique(med$pollinator))
# optional; if you want a separate variable (column) for plant names
bipartite.med$plant = row.names(bipartite.med)
Optional exercise
Convert the raw medgarden
data to network format using frame2webs()
, following the instructions above.