rOpenSci | HTTP tools

HTTP tools

Interact with Web Resources
Showing 10 of 12

Record HTTP Calls to Disk

Scott Chamberlain
Description

Record test suite HTTP requests and replays them during future runs. A port of the Ruby gem of the same name (https://github.com/vcr/vcr/). Works by hooking into the webmockr R package for matching HTTP requests by various rules (HTTP method, URL, query parameters, headers, body, etc.), and then caching real HTTP responses on disk in cassettes. Subsequent HTTP requests matching any previous requests in the same cassette use a cached HTTP response.

View Documentation

General Purpose GraphQL Client

Scott Chamberlain
Description

A GraphQL client, with an R6 interface for initializing a connection to a GraphQL instance, and methods for constructing queries, including fragments and parameterized queries. Queries are checked with the libgraphqlparser C++ parser via the gaphql package.

View Documentation
trufflesniffer
Staff maintained

Scan Secrets in R Scripts, Packages, or Projects

Scott Chamberlain
Description

Scan secrets in r scripts, packages, or projects.

View Documentation

HTTP Client

Scott Chamberlain
Description

A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Rubys faraday’ gem (https://rubygems.org/gems/faraday). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package curl, an interface to libcurl (https://curl.haxx.se/libcurl).

View Documentation

Stubbing and Setting Expectations on HTTP Requests

Scott Chamberlain
Description

Stubbing and setting expectations on HTTP requests. Includes tools for stubbing HTTP requests, including expected request conditions and response conditions. Match on HTTP method, query parameters, request body, headers and more. Can be used for unit tests or outside of a testing context.

View Documentation

HTTP Error Helpers

Scott Chamberlain
Description

HTTP error helpers. Methods included for general purpose HTTP error handling, as well as individual methods for every HTTP status code, both via status code numbers as well as their descriptive names. Supports ability to adjust behavior to stop, message or warning. Includes ability to use custom whisker template to have any configuration of status code, short description, and verbose message. Currently supports integration with crul, curl, and httr.

View Documentation

rOpenSci's blog guidance

Maëlle Salmon
Description

It provides templates for roweb2 blogging and help for a GitHub forking workflow.

View Documentation
binman
CRAN

A Binary Download Manager

Ju Yeong Kim
Description

Tools and functions for managing the download of binary files. Binary repositories are defined in YAML format. Defining new pre-download, download and post-download templates allow additional repositories to be added.

View Documentation
robotstxt
CRAN Peer-reviewed

A robots.txt Parser and Webbot/Spider/Crawler Permissions Checker

Peter Meissner
Description

Provides functions to download and parse robots.txt files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, …) are allowed to access specific resources on a domain.

Scientific use cases
  1. Dogucu, M., & Çetinkaya-Rundel, M. (2020). Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities. Journal of Statistics Education, 1–11. https://doi.org/10.1080/10691898.2020.1787116
View Documentation
RSelenium
CRAN

R Bindings for Selenium WebDriver

Ju Yeong Kim
Description

Provides a set of R bindings for the Selenium 2.0 WebDriver (see https://selenium.dev/documentation/en/ for more information) using the JsonWireProtocol (see https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol for more information). Selenium 2.0 WebDriver allows driving a web browser natively as a user would either locally or on a remote machine using the Selenium server it marks a leap forward in terms of web browser automation. Selenium automates web browsers (commonly referred to as browsers). Using RSelenium you can automate browsers locally or remotely.

Scientific use cases
  1. Silva, D., Meireles, F. (2015). Ciência Política na era do Big Data: automação na coleta de dados digitais. Politica Hoje, v.2, (pp. 87-102) https://github.com/meirelesff/meirelesff.github.io/raw/master/files/bigdata2016.pdf
  2. Nousiainen, K., Kanduri, K., Ricaño-Ponce, I., Wijmenga, C., Lahesmaa, R., Kumar, V., & Lähdesmäki, H. (2018). snpEnrichR: analyzing co-localization of SNPs and their proxies in genomic regions. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty460
  3. Blankers, M., van der Gouwe, D., & van Laar, M. (2019). 4-Fluoramphetamine in the Netherlands: Text-mining and sentiment analysis of internet forums. International Journal of Drug Policy, 64, 34–39. https://doi.org/10.1016/j.drugpo.2018.11.016
  4. Krah, F.-S., Bates, S., & Miller, A. (2019). rMyCoPortal - an R package to interface with the Mycology Collections Portal. Biodiversity Data Journal, 7. https://doi.org/10.3897/bdj.7.e31511
  5. Lee, A. J., Jones, B. C., & DeBruine, L. M. (2019, January 21). Investigating the association between mating-relevant self-concepts and mate preferences through a data-driven analysis of online personal descriptions. https://doi.org/10.31234/osf.io/38zef
  6. Mitchell, J. M., & Moseley, H. N. B. (2019). Deriving Accurate Lipid Classification based on Molecular Formula. https://doi.org/10.1101/572883
  7. Rybinski, K. 2019. A machine learning framework for automated analysis of central bank communication and media discourse. The case of Narodowy Bank Polski. Bank & Credit. 50(1): 1-20. http://bankikredyt.nbp.pl/content/2019/01/BIK_01_2019_01.pdf
  8. Fioravanti, G., Piervitali, E., & Desiato, F. (2019). A new homogenized daily data set for temperature variability assessment in Italy. International Journal of Climatology. https://doi.org/10.1002/joc.6177
  9. Roh, T., Jeong, Y., Jang, H., & Yoon, B. (2019). Technology opportunity discovery by structuring user needs based on natural language processing and machine learning. PLOS ONE, 14(10), e0223404. https://doi.org/10.1371/journal.pone.0223404
  10. Nüst, D., Eddelbuettel, D., Bennett, D., Cannoodt, R., Clark, D., Daroczi, G., … & Marwick, B. (2020). The Rockerverse: Packages and Applications for Containerization with R. arXiv preprint arXiv:2001.10641 https://arxiv.org/pdf/2001.10641.pdf
  11. Salgado, D., & Oancea, B. (2020). On new data sources for the production of official statistics. arXiv preprint https://arxiv.org/pdf/2003.06797.pdf
  12. Fraser, N., Momeni, F., Mayr, P., & Peters, I. (2020). The relationship between bioRxiv preprints, citations and altmetrics. Quantitative Science Studies, 1–21. https://doi.org/10.1162/qss_a_00043
  13. Hannon, B. A., Fairfield, W. D., Adams, B., Kyle, T., Crow, M., & Thomas, D. M. (2020). Use and abuse of dietary supplements in persons with diabetes. Nutrition & Diabetes, 10(1). https://doi.org/10.1038/s41387-020-0117-6
  14. Stringham, O., Toomes, A., Kanishka, A. M., Mitchell, L., Heinrich, S., Ross, J. V., & Cassey, P. (2020). A guide to using the Internet to monitor and quantify the wildlife trade. https://ecoevorxiv.org/5yzw9/download?format=pdf
  15. Bisbee, J., & Honig, D. (2020). Flight to Safety: 2020 Democratic Primary Election Results and COVID-19. Covid Economics, 3(10), 54-84. http://www.amcham-egypt.org/bic/pdf/corona1/Covid%20Economics%20by%20CEPR.pdf
  16. Göbel, S. 2020. Voting and Social Media-Based Political Participation. https://doi.org/10.31235/osf.io/sjq4g
  17. Mancosu, M., & Vegetti, F. (2020). What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media + Society, 6(3), 205630512094070. https://doi.org/10.1177/2056305120940703
  18. Gessa, A., Jiménez, A., & Sancha, P. (2020). Open Innovation in Digital Healthcare: Users’ Discrimination between Certified and Non-Certified mHealth Applications. Journal of Open Innovation: Technology, Market, and Complexity, 6(4), 130. https://doi.org/10.3390/joitmc6040130
View Documentation
wdman
CRAN

Webdriver/Selenium Binary Manager

Ju Yeong Kim
Description

There are a number of binary files associated with the Webdriver/Selenium project (see http://www.seleniumhq.org/download/, https://sites.google.com/a/chromium.org/chromedriver/, https://github.com/mozilla/geckodriver, http://phantomjs.org/download.html and https://github.com/SeleniumHQ/selenium/wiki/InternetExplorerDriver for more information). This package provides functions to download these binaries and to manage processes involving them.

View Documentation

A GraphQL Query Parser

Jeroen Ooms
Description

Bindings to the libgraphqlparser C++ library. Parses GraphQL syntax and exports the AST in JSON format.

View Documentation
decapitated

Headless Chrome Orchestration

Bob Rudis
Description

The Chrome browser https://www.google.com/chrome/ has a headless mode which can be instrumented programmatically. Tools are provided to perform headless Chrome instrumentation on the command-line and will eventually provide support for the DevTools instrumentation API or the forthcoming phantomjs-like higher-level API being promised by the development team.

View Documentation