GitHub

The code for the herein described process can also be freely downloaded from https://github.com/majvdvel/meteorites.

Introduction

In the previous blog post, we talked about visualising the locations where meteorites struck Earth and we made some conclusions about the locations where these meteorites were found. What we didn’t touch upon yet is the meteorites’ mass. We will keep working with our cleaned meteorite data set, and we will aim to find out whether we can get some kind of mass distribution, and if we can use the meteorites’ mass in a global visualisation.

The packages which will be used are data.table, leaflet, ggplot2, maps, maptools, raster, dplyr and htmltools.

Meteorites’ mass descriptive analysis

As an initial step, we can check the minimum and maximum mass of meteorites to get an idea of the mass range.

met[, min(mass)]
## [1] 0
met[, max(mass)]
## [1] 6e+07

We get a mass range of 0 g to 60 t. Such a lower boundary might reveal some missing data, so for our further investigations, we will omit the meteorites with a mass of 0 g. To be sure we don’t throw away a significant number of meteorites, we first query our data to know what fraction of objects has been recorded with zero mass. data.table’s .N functionality again provides a fast way to retrieve this information.

met[mass == 0, .N] / met[, .N]
## [1] 0.0005637508
met <- met[mass != 0]
met[, min(mass)]
## [1] 0.01

To get an idea of the mass distribution, we will start with plotting a basic histogram with ggplot2.

ggplot(data = met, aes(x = mass)) +
    geom_histogram() +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_y_log10()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can see that even though the mass has a large range, the y-axis logarithmic scale reveals that the utmost majority of all masses can be found on the lower end of the histogram. By adding a logarithmic scale to the x-axis, we can keep the large masses in sight but enhancing the detail in the smaller masses, while preserving a linear y-axis. To keep R from complaining, we can add the argument bins = 50 to the ggplot2::geom_histogram() function. For a cleaner plot we also add theme_minimal() from ggplot2.

ggplot(data = met, aes(x = mass)) +
    geom_histogram(bins = 50) +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_x_log10() +
    theme_minimal()

A possible explanation for the steep rise in the lower bounds of the distribution can be explained by the nature of the meteorites located in this area. These are generally very small, and will not be seen easily. Another factor contributing is that a lot of meteorites of this size burn up in the atmosphere, and as such they never reach the ground.

It might be interesting to investigate whether both the “Found”- and “Fell”-categorised meteorites follow similar distributions. For this purpose, we split the graph according to this category. By using only one ggplot2::geom_histogram() function we would create a stacked histogram. Instead, we call two separate functions, each with their own subset of the full data.

ggplot(data = met, aes(x = mass)) +
    geom_histogram(data = subset(met, fall == 'Found'), bins = 50, aes(fill = fall), alpha = 0.2) +
    geom_histogram(data = subset(met, fall == 'Fell'), bins = 50, aes(fill = fall), alpha = 0.2) +
    scale_fill_manual(name = 'Discovery', values = c('blue', 'red')) +
    xlab('Mass (g)') +
    ylab('Count') +
    scale_x_log10() +
    theme_minimal()

Besides the scaling factor between the distributions, their shape is also different. Indeed, the mass peak of the “Fell”-category meteorites is higher and their distribution is more symmetric. A possible explanation could involve the aforementioned arguments about the size of the smaller meteorites. If we consider the most common meteorites of the total distribution, their mass is around 10 g. We can reason that meteorites of this size and smaller are less likely to be seen when falling.

Global meteorite mass

An interesting statistic might be the average meteorite mass per country area. To visualise it, let us follow the example guidelines on making choroplets on the Leaflet for R Github page, while making some necessary adaptations.

To prepare our data for visualisation, we first need to obtain the total meteorite mass for every country. We can do this with data.table’s functionalities where we sum the mass column for all country entries. We obtain a data.table object containing the masses per country.

masses <- met[, .(mass = sum(mass)), by = country]
head(masses %>% rename(`mass (g)` = mass))
country mass (g)
Germany 1956832.50
Denmark 58245.81
Canada 1328864.44
Mexico 72370561.40
Argentina 53862045.50
Pakistan 138956.60

To begin our visualisation, we need to get a map of the world where every country is represented by a polygon. We can do this using the maps package. A convenient format to manipulate and enhance geographical information is sp.

Note: while we did not explicitly load the sp package, this has been done automatically by loading maps.

# Map object
mapWorld <- map('world', fill = TRUE, plot = FALSE)

# Some countries are represented in the format countryname:countrypart
# Only countryname is extracted
IDs <- sapply(strsplit(mapWorld$names, ':'), function(x) x[1])
# Convertion of map object to SpatialPolygons
world <- map2SpatialPolygons(mapWorld, IDs = IDs, proj4string = CRS('+proj=longlat +datum=WGS84'))

# Country names are merged to the SpatialPolygons into a SpatialPolygonsDataFrame
world_df <- data.frame(ID = sort(unique(IDs)))
row.names(world_df) <- world_df$ID
world_SPDF <- SpatialPolygonsDataFrame(world, data = world_df)
names(world_SPDF) <- 'country'

# Masses are merged to the SpatialPolygonDataFrame
world <- merge(world_SPDF, masses, by = 'country')

To get the total area for every country, we can search for an external data set, or we can choose to take an easier and less accurate approach, where we calculate every area in km\(^2\) by using the raster::area() function to calculate the area of every country’s polygon. For our own comfort, let us choose the latter option.

Now that we have the total area and total meteorite mass for every country, it is easy to calculate the average mass per square kilometre for every country.

world$area_sqkm <- raster::area(world) / 1000000
world$m_avg <- world$mass / world$area_sqkm

Now that our data is prepared, let us start constructing the actual visualisation using leaflet. The first step will be to plot the Earth where every country is given by its corresponding polygon.

leaflet(world, width = '100%') %>%
    setView(0, 0, zoom = 1) %>%
    addProviderTiles('CartoDB.Positron') %>%
    addPolygons()

To create colour bins for our data, we should first check in what range the mass per area lies. As not all countries are represented in NASA’s database, we will have some NA values introduced in the mass entry by the merging step. We omit them in the minimum and maximum calculations.

min(na.omit(world$m_avg))
## [1] 0.0004658533
max(na.omit(world$m_avg))
## [1] 104.9483

We find that the average meteorite mass per area lies between 0 and 105 g/km\(^2\). Hence we can define our mass bins. Since we want more than 9 bins to map the masses, we can not use the YlOrRd color palette as proposed by the Leaflet for R - Choropleths webpage. Hence we define our own palette through colorRampPalette(). Let us also add a color legend with the addLegend() function.

bins <- c(0, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8, 25.6, 51.2, 102.4, Inf)
pal <- colorBin(colorRampPalette(colors = c('#ffeda0', '#ff4500', '#bd0026'))(12), domain = world$m_avg, bins = bins)

leaflet(world, width = '100%') %>%
    setView(0, 0, zoom = 2) %>%
    addProviderTiles('CartoDB.Positron') %>%
    addPolygons(fillColor = ~pal(m_avg),
                weight = 1.2,
                opacity = 1,
                color = 'white',
                fillOpacity = 0.7) %>%
    addLegend(pal = pal,
              values = ~m_avg,
              opacity = 0.7,
              title = NULL,
              position = 'bottomright')

Great, we are almost there! The last step will consist of constructing a label which gets displayed when a country is hovered over. Our choices of information to be displayed will be the country name, the average meteorite mass per area, the minimum and maximum meteorite size, the amount of meteorites that have been found and the percentage of “Fell” versus “Found” meteorites.

country_data <- met[, .(m_min = min(mass), m_max = max(mass), tot = .N, fell = sum(fall == 'Fell')), by = 'country']
world <- merge(world, country_data, by = 'country')

Now we can use the resulting SpatialPolygonsDataFrame to add information to the labels. These labels should be transformed to HTML-friendly text, which is why we use the htmltools::HTML() function. We have to add <span style="float:left"> to every line of text instead of just declaring this in the div style definition since Firefox tends to have issues displaying the labels otherwise.

labels <- sprintf('
                <div style="width:350px">
                    <strong>%s</strong><br/>
                    <span style="float:left">Average meteorite mass per km<sup>2</sup>:</span>
                        <span style="float:right">%0.4g g</span><br/>
                    <span style="float:left">Total number of meteorites:</span>
                        <span style="float:right">%d</span><br/>
                    <span style="float:left">Minimum meteorite mass:</span>
                        <span style="float:right">%0.4g g</span><br/>
                    <span style="float:left">Maximum meteorite mass:</span>
                        <span style="float:right">%0.4g g</span><br/>

                    <span style="float:left">Fell</span><span style="float:right">Found</span><br/>
                    <span style="color:#67a9cf;float:left">%0.4s%%</span>
                        <span style="color:#ef8a62;float:right">%0.4s%%</span><br/>
                    <span style="background:#67a9cf;width:%s%%;float:left">&nbsp;</span>
                        <span style="background:#ef8a62;width:%s%%;float:right">&nbsp;</span>
                </div>',
                world$country,
                world$m_avg,
                world$tot,
                world$m_min,
                world$m_max,
                100 * world$fell / world$tot,
                100 * (1 - world$fell / world$tot),
                100 * world$fell / world$tot,
                100 * (1 - world$fell / world$tot)) %>%
    lapply(htmltools::HTML)

As a last step, we can now add these labels and a hover functionality to the leaflet map to obtain our final result.

leaflet(world, width = '100%') %>%
    setView(0, 0, zoom = 2) %>%
    addProviderTiles('CartoDB.Positron', options = providerTileOptions(minZoom = 2)) %>%
    addPolygons(fillColor = ~pal(m_avg),
                weight = 1.2,
                opacity = 1,
                color = 'white',
                fillOpacity = 0.7,
                highlight = highlightOptions(weight = 2, color = '#666', fillOpacity = 0.7, bringToFront = TRUE),
                label = labels,
                labelOptions = labelOptions(style = list('font-weight' = 'normal', 'padding' = '3px 8px'),
                                            'textsize' = '15px', 'direction' = 'auto')) %>%
    addLegend(pal = pal,
              values = ~m_avg,
              opacity = 0.7,
              title = htmltools::HTML('Average mass in g/km<sup>2</sup>'),
              position = 'bottomright')

Conclusions

After the rather basic plotting methods included in the previous blog post, we now delved into more advanced visualisation methods, especially by designing the custom labels. Not a lot of sources are available for this purpose, at least not in an R data visualisation context. As can be seen in the final plot, custom labels can provide an additional layer of interesting information while preventing unneccesary cluttering of the main image.

More analysis can be done on the meteorites’ data set, however this requires additional background knowledge. Material scientists can investigate properties of the meteorites’ composition, while others should be able to provide scientific explanations for the shape of the obtained mass distributions.