Skip to main content

Black Ostrich: Web Application Scanning with String Solvers

Save to calendar

Jan 27

Date and time: 27 January 2023, 11:00 – 12:00 CET – hybrid seminar
Speaker: Prof. Andrei Sabelfeld, Chalmers
Title: Black Ostrich: Web Application Scanning with String Solvers

Where: KTH Campus, Room 1440, Lindstedtsvägen 3, floor 3

Watch the recorded presentation:


Abstract: Securing web applications remains a pressing challenge. Unfortunately, state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers’ limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This talk presents Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint-solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to handle complex real-world regular expressions, including support for the relevant features of ECMA  JavaScript regular expressions.

We implement our approach by extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the  July 2021 Common Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners.

Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%), while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns,  simultaneously demonstrating that our regular expression encoding is practical. We find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.

Joint work with Benjamin Eriksson, Amanda Stjerna, Riccardo De Masellis, and Philipp Ruemmer.

Bio: Andrei Sabelfeld is a Professor at the Chalmers University of Technology.  Before joining Chalmers as faculty, he was a Research Associate at  Cornell University in Ithaca, NY, USA. Andrei Sabelfeld’s research ranges from foundations to practice in a range of topics in computer security and privacy. He is a recipient of a number of prestigious prizes and awards from ERC, SSF, VR, WASP, Chalmers, Google, Meta  (Facebook), and Amazon. Today, he leads a group of researchers at  Chalmers engaged in a number of internationally visible projects on software security, web security, IoT security, security foundations, and applied cryptography.

Link to the profile of Andrei Sabelfeld