Dealing with Verification Data Overload
Massive amounts of test logs and console output are generated each day at the Eclipse Adoptium and Eclipse OpenJ9 projects because of the huge number of tests multiplied by the number of versions, platforms and implementations tested. This 'noise' from daily, personal, and pull request builds requires processing in order to consume and make sense of it (87 million+ tests are executed per release and 20G+ of test output is produced daily). Not only we have the challenge of completing all tests in a timely fashion with limited machine resources, but also we must take this raw data and refine it to more easily and effectively understand what next action to take.
This presentation covers our approaches to using parallelization and handling copious amounts of verification data. The goal is to optimize for the best test builds throughput, make the verification data meaningful, and guide next actions. From a quality assurance perspective, there is value in results summary and aggregation. We need best practices in the application of data visualization techniques, filtering, and categorization. We need to continuously evolve and improve, employing relevant technologies such as deep learning, Github Actions, vm cloud agents, etc. Ultimately, the goal of our data refinery efforts needs to be to display the results efficiently, allow users to quickly narrow down the problems and better monitor the farm to support the complex continuous delivery pipelines.