Code Intelligence in Large C(++) Repositories - Colin Grant and Benjamin Davis
Providing a productive development environment for developers working on large C/C++ codebases with extensive configuration is challenging. Accurate symbol resolution depends on an understanding of compilation configuration, but users may need to switch between configurations, branches, and compilation trees frequently. This talk will address a number of the problems of providing accurate code intelligence in such projects and some of the approaches and tradeoffs that we have explored to address them.
The gold standard for understanding a C/C++ codebase is the full record of its compilation, including all configuration. CLangD requires such information in the form of a compile_commands.json or compile-flags.txt file to supply its code intelligence, and in the best case, its performance is excellent. However, generating the compilation data necessary is laborious: often users must actually run a build, which is impractical for users frequently switching contexts, and impossible for users who need to browse broken code. Some of the difficulties associated with CLangD's data requirements can be addressed by sharing of databases, and recent CLang versions also support remote compilation databases. However, setting up such infrastructure is still a hurdle, and requires users to have access to the remote resource at all times.
Another approach to providing effective indexing is to rely less on certainty about compilation and provide alternative means of symbol resolution. One approach that we have had success within Ericsson is a combination of CDT's CScope for parsing and reference tracking and CTags for symbol resolution in case of uncertainty. For external symbol definitions, this approach certainly returns the correct definition, but may also return extraneous definitions where the same names are used multiple times – it trades certainty for comprehensiveness.