We met our Science Gossip challenge!

Thanks to all those who took part in the Science Gossip challenge! In the last 2 weeks you contributed ~110,000 new classifications to the data, and completed approximately 21,000 pages! Talk was very active too, which is great. A number of volunteers discovered some great images including:

Beautiful maps of Wiltshire – http://talk.sciencegossip.org/#/subjects/ASC0000oto

A full page plate full of dragonflies http://talk.sciencegossip.org/#/subjects/ASC000027r

Facsimiles of notices from the 1604 plague – http://talk.sciencegossip.org/#/subjects/ASC0000rfb

And some lovely looking photographs of slime mold! http://talk.sciencegossip.org/#/subjects/ASC0003hww

Not only did we meet our challenge goal of 100k classifications, we were able to complete 3 of the 5 new journals that were uploaded earlier this month:  Botany Miscellany (1830–1833), Journal of Botany: Being a Second Series of the Botanical Miscellany (1834–1842), and London Journal of Botany (1842–1848).  Two of the older journals are very close to being done – Wiltshire archaeological and natural history magazine. (99% complete) and Hardwicke’s science-gossip : an illustrated medium of interchange and gossip for students and lovers of nature. (82% complete).

You can also dip in and see some of the data generated over the project’s lifetime, check out http://explore.sciencegossip.org/. Here you’ll find pages displaying the aggregated and individual assessments made by volunteers. All of this has been anonymized, but it’s still interesting to see how many people picked out particular keywords.

More content is on the way so stay tuned at www.sciencegossip.org

Thanks so much for all your hard work,
Trish, Geoff, Jim, Victoria and everyone on the Science Gossip Team

New journals available in Science Gossip: Let’s celebrate by classifying all of the data, old and new!

The Science Gossip team recently uploaded five new journals that were established and edited by William Jackson Hooker*, founding director of the Royal Botanical Gardens at Kew, and one of the most important botanists of the nineteenth century. These include:

Botany Miscellany (1830–1833), Journal of Botany: Being a Second Series of the Botanical Miscellany (1834–1842), London Journal of Botany (1842–1848) and Hooker’s Journal of Botany and Kew Garden Miscellany (1849–1857). The Journal of the Quekett Microscopical Club (1868 to present) is the longest running ‘amateur’ journal for microscopical societies. The society, which was established in 1865, came directly out of the community developed through the natural history periodical Science Gossip – with founding members including the publisher Robert Hardwicke and editor and mycologist Mordecai Cubitt Cooke.

Hookers Journal 2

In addition to the new journals, there are approximately 24,000 pages still to be classified between the five original journals: Gardeners Magazine and Register or Rural & Domestic Improvement (23% complete), Gardener’s Chronicle (2% complete), Hardwicke’s Science Gossip (77% complete), the Quarterly Journal of the Geological Society of London (86% complete), Wiltshire Archaeological and Natural History Magazine (96% complete).

With your help we can identify the wealth of illustrations locked in these wonderful Victorian natural history journals, and make them available for researchers and any interested parties for years to come!

Citizen Scientist’s Algorithm Helps Science Gossip Team to Reduce Text-only Pages

Thanks to the efforts of an active Zooniverse volunteer and the Science Gossip project team, users can now focus on the beautiful illustrations found in Science Gossip’s 19th century natural history periodicals and spend less time marking pages with no illustrations. The Biodiversity Heritage Library (BHL) had developed algorithms for filtering out pages without images in a previously-related project called Art of Life, where project partners from the Indianapolis Museum of Art Lab developed 4 algorithms. The two deemed most useful were based on 1) coordinate metadata from ABBYY software and 2) contrast properties of the pages. The Science Gossip project team had considered using these algorithms to filter pages before uploading to the Zooniverse site but decided against it because it was surmised users might like to view all pages in a journal for contextual reference. After the launch in March 2015, it became clear many Science Gossip users wanted the team to reduce the numbers of pages without illustrations because they didn’t want to spend their time on these types of pages when the project was really about illustrations.

An active volunteer, Briana Harder (aka Quia on Zooniverse), prodded the Science Gossip team to consider using automated methods for filtering and even put together an algorithm herself for the team to test against its existing algorithms. Briana’s algorithm picks out chunks of images and if the background is too variable sometimes picks out text. When comparing the accuracy of the 3 algorithms together, Briana’s and the BHL ABBYY algorithm performed well with less than a 1% margin of error. Contrast performed poorly and Briana and the team deemed it was not useful for filtering. In the end it was decided to just utilize the ABBYY algorithm since the pages had already been processed by that algorithm and it would be much quicker to implement.

The filter was applied in mid-May. Since then the number of pages without illustrations has been reduced considerably, hopefully resulting in a more satisfying experience for our users. Thanks go out to Briana and the folks on the team who worked on this task. When asked what her motivations were for contributing to Science Gossip and other Zooniverse projects Briana explained:

My involvement with Science Gossip is an adventure in serendipity. Darren McRoy, Zooniverse’s community builder, gave me a nudge to go check out some of the newer Zoo projects, among others I ended up on Floating Forests, and in one of their blog posts, they asked for help in improving their pipeline for selecting coastline images for classifications. […] I wrote an algorithm that improved the pipeline […]

Briana’s work on Floating Forests led that team to reprocess their data and dramatically speed up the project. While this reprocessing was underway, Science Gossip launched a beta test. Briana noticed that there were a lot of text-only pages in this project—another opportunity for an algorithm!

I thought ‘There has to be a good way to filter out all these pages, text recognition is a well developed field…I bet I could write something to filter these so the project can be more efficient.’

And she was absolutely right –Thank you Briana and all Zooniverse users who go above and beyond to help us improve our projects! The collaborative spirit of this community continues to benefit all of us in ways we never expected. We hope everyone is benefiting from the reduced noise in the Science Gossip dataset and would love feedback from our users on the impact of the filtered pages.

Trish Rose-Sandler, Data Analyst, BHL and Data Projects Coordinator, Missouri Botanical Garden