Hello,

I've applied libFuzzer (​http://tutorial.libfuzzer.info) to regexp library and found 5 heap-buffer-overflows, stack overflow, assert failure, use of uninitialized data, SIGSEGV, infinite loop, undefined shift, invalid enum value and a bunch of memory leaks in just half an hour:

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:132:10 in char const* boost::re_detail_106300::re_skip_past_null<char>(char const*)

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:221:29 in gnu_cxx::normal_iterator<char const*, std::string> boost::re_detail_106300::re_is_set_member<gnu_cxx::normal_iterator<char const*, std::string>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(gnu_cxx::normal_iterator<char const*, std::string>, gnu_cxx::normal_iterator<char const*, std::string>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

SUMMARY: AddressSanitizer: heap-buffer-overflow /sanitizer_common_interceptors.inc:278 in interceptor_strlen

SUMMARY: AddressSanitizer: heap-buffer-overflow boost/regex/v4/perl_matcher.hpp:166:19 in gnu_cxx::normal_iterator<char const*, std::string> boost::re_detail_106300::re_is_set_member<gnu_cxx::normal_iterator<char const*, std::string>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(gnu_cxx::normal_iterator<char const*, std::string>, gnu_cxx::normal_iterator<char const*, std::string>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

a.out: boost/regex/v4/perl_matcher_common.hpp:606: bool boost::re_detail_106300::perl_matcher<gnu_cxx::normal_iterator<const char *, std::basic_string<char> >, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> > > >, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::match_backref() = __gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, Allocator = std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> > > >, traits = boost::regex_traits<char, boost::cpp_regex_traits<char> >: Assertion `r.first != r.second' failed.

SUMMARY: MemorySanitizer: use-of-uninitialized-value boost/regex/v4/perl_matcher.hpp:166:13 in std::1::wrap_iter<char const*> boost::re_detail_106300::re_is_set_member<std::__1::__wrap_iter<char const*>, char, boost::regex_traits<char, boost::cpp_regex_traits<char> >, unsigned int>(std::1::wrap_iter<char const*>, std::1::wrap_iter<char const*>, boost::re_detail_106300::re_set_long<unsigned int> const*, boost::re_detail_106300::regex_data<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > > const&, bool)

SUMMARY: AddressSanitizer: heap-buffer-overflow ./boost/regex/v4/basic_regex_parser.hpp:2599:68 in boost::re_detail_106300::basic_regex_parser<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::parse_perl_extension()

boost/regex/v4/basic_regex_parser.hpp:2599:68: runtime error: load of value 56794092, which is not a valid value for type 'boost::re_detail_106300::syntax_element_type'

Direct leak of 4096 byte(s) in 1 object(s) allocated from:

SUMMARY: AddressSanitizer: stack-overflow ./boost/regex/v4/basic_regex_creator.hpp:1054 in boost::re_detail_106300::basic_regex_creator<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::create_startmap(boost::re_detail_106300::re_syntax_base*, unsigned char*, unsigned int*, unsigned char)

SUMMARY: AddressSanitizer: SEGV

ALARM: working on the last Unit for 17 seconds

boost/regex/v4/basic_regex_parser.hpp:904:49: runtime error: shift exponent 325804978 is too large for 32-bit type 'unsigned int'

Full reports and triggering inputs for each bug are attached.

Test that I used is simply:

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {

try { std::string str((char*)Data, Size); boost::regex e(str); boost::match_results<std::string::const_iterator> what; boost::regex_match(str, what, e, boost::match_default | boost::match_partial); } catch (const std::exception&) {} return 0;

}

I would suggest to rerun the fuzzer after fixing these bugs as fuzzer was mostly choking on the existing bugs as they are easy to trigger.

Also it can make sense to set up continuous fuzzing using ​https://github.com/google/oss-fuzz which will automatically test latest code.