{"id":368,"date":"2012-09-17T00:44:33","date_gmt":"2012-09-17T07:44:33","guid":{"rendered":"http:\/\/www.myopictopics.com\/?p=368"},"modified":"2012-09-17T14:29:26","modified_gmt":"2012-09-17T21:29:26","slug":"game-engine-metadata-creation-with-clang","status":"publish","type":"post","link":"https:\/\/www.myopictopics.com\/?p=368","title":{"rendered":"Game Engine Metadata Creation with Clang"},"content":{"rendered":"<p style=\"text-align: justify;\">Game engines often have some form of metadata system that can be used for a myriad of tasks. My little home brew engine, for example, uses metadata to facilitate serialization, allow object allocation by name, content-updating, etc. It\u2019s all quite common, but creating such a system is actually pretty complex when you start to get into the nitty gritty implementation details. Your metadata design choices very quickly start to inform many other areas of your engine design.<\/p>\n<p style=\"text-align: justify;\">Various engines create and store their metadata in various ways. Unreal Engine 3, for example, uses UnrealScript to describe game logic as well as provide the source for the engine metadata. Their UC compiler creates C++ headers which are compiled into the game binary while the metadata is shunted over to the .u script packages. I\u2019ve never liked this scheme for a couple reasons. Perhaps chief among them is that it requires programmers use one language to write their type definitions and then use a different language for their code their code. In other words, they have to author their C++ header files by proxy. There are a few other problems with that system, but it\u2019s not helpful to dwell on it.<\/p>\n<h2>A Little Context to Start<\/h2>\n<p style=\"text-align: justify;\">My hobby engine uses a system where any class or struct can have metadata, but that\u2019s not strictly required. Additionally, each metadata-enabled type is not required to have a virtual function table. Lets consider a simple class sitting in a header file called ChildType.h:<\/p>\n<pre><span style=\"font-family: ariel;\"><strong><em>ChildType.h<\/em><\/strong><\/span><hr\/><span style=\"font-size: 85%;\">class childType : public parentType\r\n{\r\n  <em>DECLARE_TYPE( childType, parentType );<\/em>\r\n  void <strong>RedactedFunction<\/strong>( float <span style=\"color: #0000ff;\">secretSauce<\/span> );\r\n  float <span style=\"color: #0000ff;\">unsavedValue<\/span>; <span style=\"color: #333333;\">\/\/\/ +NotSaved<\/span>\r\n  double * <span style=\"color: #0000ff;\">ptrToDbl<\/span>;\r\n  ValueType_t <span style=\"color: #0000ff;\">someValues<\/span>[3];\r\n};<\/span><\/pre>\n<p style=\"text-align: justify;\">By virtue of using the <span style=\"font-family: courier new;\">DECLARE_TYPE<\/span> macro, it is clear this type is intended to be metadata-enabled, but it\u2019s not obvious where that data comes from. We need to express all the critical information about the class in such a way that we can serialize it or generically inspect it. The hows and whys of my design choices aren\u2019t important, but my solution is to define a new cpp file that looks like this:<\/p>\n<pre><span style=\"font-family: ariel;\"><strong><em>ChildTypeClass.cpp<\/em><\/strong><\/span><hr\/><span style=\"font-size: 85%;\">MemberMetadata const childType::TypeMemberInfo[] = {\r\n  {\u201c<strong>unsavedValue<\/strong>\u201d, eDT_Float,       eDT_None,    eDF_NotSaved, offsetof(childType, unsavedValue), 1, NULL },\r\n  {\u201c<strong>ptrToDbl<\/strong>\u201d,     eDT_Pointer,     eDT_Float64, eDF_None,     offsetof(childType, ptrToDbl),     1, NULL },\r\n  {\u201c<strong>someValues<\/strong>\u201d,   eDT_StaticArray, eDT_Struct,  eDF_None,     offsetof(childType, someValues),   3, ValueStructClass }\r\n};\r\nIMPLEMENT_TYPE<\/strong>( childType, parentType );<\/span>\r\n<\/pre>\n<p style=\"text-align: justify;\">There\u2019s a lot going on here, and there is a lot of unimportant plumbing hidden behind those <span style=\"font-family: courier new;\">DECLARE_TYPE<\/span> and <span style=\"font-family: courier new;\">IMPLEMENT_TYPE<\/span> macros. For this discussion, the <span style=\"font-family: courier new;\">DECLARE_TYPE<\/span> macro adds a class-static array of MemberMetadata structures. Each member of that array specifies a name, primary type, secondary type, flags (like +NotSaved), <span style=\"font-family: courier new;\">offsetof<\/span> the member inside the structure, static array length (usually 1), and target metadata type (or <span style=\"font-family: courier new;\">NULL<\/span>).<\/p>\n<p style=\"text-align: justify;\">That\u2019s a lot of data to type in correctly and maintain over the life of an engine. I knew very early on that while manual typing was ok to bootstrap the project, automation would have to enter the picture eventually.<\/p>\n<h2>Automation Enters the Picture<\/h2>\n<p style=\"text-align: justify;\">As I said, my design choices aren\u2019t the focus of this article &#8211; perhaps another time. This article is about how I\u2019m going about creating my metadata.<\/p>\n<p style=\"text-align: justify;\">I very briefly considered writing a C++ parserBAHAHAHAHA! Oh, man\u2026that\u2019s rich! But seriously, compiler grammars are one of those weird things that amuse me, and I actually considered writing just enough of a \u201cloose\u201d C++ parser to pull out the information I wanted. When I stopped to think about the magnitude of language features and weird cases in C++, however, reality came crashing in and it\u2019s easy to see why I abandoned the idea.<\/p>\n<h2>Enter Clang<\/h2>\n<p style=\"text-align: justify;\"><a href=\"http:\/\/clang.llvm.org\/\">Clang<\/a> is a C language family front-end for LLVM.  Clang is free (BSD), and Clang is awesome! There are a few layers to the onion, but at a high level, Clang is a C++ to LLVM compiler. When used in tandem with LLVM, it\u2019s a complete optimizing compiler ecosystem. More importantly for our purposes, however, libClang provides a relatively simple C-based API into the abstract syntax tree (AST) created by the language parser. That\u2019s HUGE for creating metadata. Seriously, it\u2019s almost all of the heavy lifting in something like this. Making matters even easier, there are <a href=\"http:\/\/www.python.org\/\">Python<\/a> bindings if that\u2019s your thing. The Python bindings are probably a little easier to work with due to the fast iteration times and cleaner string handling.<\/p>\n<h2>A Little Glossary Action<\/h2>\n<p style=\"text-align: justify;\">The libClang API uses a handful of simple concepts to model your code in AST form.<\/p>\n<p style=\"text-align: justify;\"><strong>Translation Unit<\/strong> \u2013 In practical terms, it\u2019s one run of the compiler that creates one AST. We can think of it as one compiled file + any files it includes. In terms of libClang, it\u2019s the base-level container and the jumping-off point for the data-gathering work we\u2019ll need to do.<\/p>\n<p style=\"text-align: justify;\"><strong>Cursor<\/strong> \u2013 A cursor represents one syntactic construct in the AST. They can represent an entire namespace or a simple variable name. They also maintain parent\/child relationships with other cursors. For our purposes, the cursors are the nodes in the AST we need to traverse. They also contain references to the source file position where they were found.<\/p>\n<p style=\"text-align: justify;\"><strong>Type<\/strong> \u2013 This one is easy. The cursors we\u2019re looking for will often reference a type. These are literally the language types. Keep in mind that Clang models the entire type system, so typedefs are different from the types they alias. We\u2019ll get into that.<\/p>\n<h2>The Plan<\/h2>\n<p>Once the parsing is taken care of, the solution is pretty straight forward.<\/p>\n<ol>\n<li>Setup the Environment<\/li>\n<li>For each header, make a Translation Unit.<\/li>\n<li>Traverse the AST for interesting type definition cursors.<\/li>\n<li>For each type definition, look for member data cursors and other information.<\/li>\n<li>Once we have all the information we need about a type, dump text to a file.<\/li>\n<\/ol>\n<p><strong><em>Setup &#8211; Compiler Environment<\/strong><\/em><\/p>\n<p style=\"text-align: justify;\">Obviously, we need one or more header files to work on, but there\u2019s more &#8211; more than I naively anticipated anyway. Even if we just hardcode a list of headers, we won\u2019t be able to compile them by themselves. We need to replicate enough of your normal project environment get the same parsing result that your normal compiler would create. We need to use the same command line preprocessor defines (-D\u2019s) as well as the same additional include folders (-I\u2019s).<\/p>\n<p style=\"text-align: justify;\">I\u2019m going to leave that as an exercise for the reader, but my solution involved parsing my Visual Studio project files. It wasn\u2019t a huge deal, but this is where project structure will play a decently sized role, and a well-structured project will be easier to configure.<\/p>\n<p><strong><em>Setup &#8211; Header Environment<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Another thing I hadn\u2019t considered is what sort of environment a header lives in. There\u2019s no way to know what files need to be included prior to the inclusion of a given header. There are really only two things you can assume when dealing with this problem &#8211; Assume any pre-compiled headers are included before your header and assume that nothing else needs to be included because your header can stand on its own. That second point jibes with how I generally structure my headers anyway, but it is graduated to a hard requirement in this case. Header cascades can kill your compile time performance, but external header order dependencies are worse. Obviously, it\u2019s a good idea to mitigate header cascades with forward declarations where possible.<\/p>\n<p><strong><em>Time to Make the Doughnuts<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Once we\u2019ve gathered all our include paths and preprocessor symbols, invoking clang is pretty easy. We can\u2019t just pass the header to <span style=\"font-family: Courier New;\">clang_parseTranslationUnit<\/span> and call it done. I suppose we can, but \u201c.h\u201d is an ambiguous extension. Clang won\u2019t know how to act without some additional arguments to indicate the language to use. I also needed to include my PCH file anyway, so I ended up creating an ephemeral .cpp file to kill two birds with one stone. Conveniently, Clang has support for in-memory or \u201cunsaved\u201d files. Blatting a few #include strings into a buffer is all it takes. Here is the basic setup for building a translation unit for a single header called \u201cMyEngineHeader.h\u201d. Obviously, your environment arguments will be a bit different.<\/p>\n<pre><span style=\"font-family: ariel; \"><strong><em>Build a Translation Unit<\/em><\/strong><\/span><hr\/><span style=\"font-size: 80%;\">char const * <span style=\"color: #0000ff;\">args<\/span>[] = {\"<span style=\"color: #808080\">-Wmicrosoft<\/span>\"\r\n            , \"<span style=\"color: #808080\">-Wunknown-pragmas<\/span>\"\r\n            , \"<span style=\"color: #808080\">-I\\\\MyEngine\\\\Src<\/span>\"\r\n            , \"<span style=\"color: #808080\">-I\\\\MyEngine\\\\Src\\\\Core<\/span>\"\r\n            , \"<span style=\"color: #808080\">-D_DEBUG=1<\/span>\" };\r\n\r\nCXUnsavedFile <span style=\"color: #0000ff;\">dummyFile<\/span>;\r\ndummyFile.Filename = \"<span style=\"color: #808080\">dummy.cpp<\/span>\";\r\ndummyFile.Contents = \"<span style=\"color: #808080\">#include \\\"MyEnginePCH.h\\\"\\n#include \\\"MyEngineHeader.h\\\"<\/span>\";\r\ndummyFile.Length = strlen( dummyFile.Contents );\r\n\r\nCXIndex <span style=\"color: #0000ff;\">CIdx<\/span> = <strong>clang_createIndex<\/strong>(1, 1);\r\nCXTranslationUnit <span style=\"color: #0000ff;\">tu<\/span> = <strong>clang_parseTranslationUnit<\/strong>( CIdx, \"<span style=\"color: #808080\">dummy.cpp<\/span>\"\r\n                                 , args, ARRAY_SIZE(args)\r\n                                 , &dummyFile, 1\r\n                                 , CXTranslationUnit_None );\r\n<\/span>\r\n<\/pre>\n<p><strong><em>Build Errors? WTF?!<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Compiling your code with Clang will probably output some unexpected errors. Remember that Clang is a C++ compiler with nuances like any other, and Clang\u2019s nuances won\u2019t necessarily match your other C++ compilers\u2019 nuances. This is a good thing &#8211; seriously! Anyone who\u2019s done cross-platform work will tell you that with every additional platform, long-standing bugs show themselves. Embracing a multi-compiler situation will force you to keep cleaner, more standards-compliant code.<\/p>\n<p style=\"text-align: justify;\">Unfortunately, resolving the major problems is not optional here. We need to create a valid AST for traversal that isn&#8217;t missing any attributes that describe our data. In order to get what we want, the parser actually has to finish what it\u2019s doing. Use <span style=\"font-family: courier new;\">clang_getDiagnostic<\/span>, <span style=\"font-family: courier new;\">clang_getDiagnosticSpelling<\/span>, etc. to get human-readable error messages.<\/p>\n<pre><span style=\"font-family: ariel;\"><strong><em>Diagnostics Dump<\/em><\/strong><\/span><hr\/><span style=\"font-size: 80%;\">unsigned int <span style=\"color: #0000ff;\">numDiagnostics<\/span> = <strong>clang_getNumDiagnostics<\/strong>( tu );\r\nfor ( unsigned int <span style=\"color: #0000ff;\">iDiagIdx<\/span>=0; iDiagIdx &lt; numDiagnostics; ++iDiagIdx )\r\n{\r\n  CXDiagnostic <span style=\"color: #0000ff;\">diagnostic<\/span> = <strong>clang_getDiagnostic<\/strong>( tu, iDiagIdx );\r\n\r\n  CXString <span style=\"color: #0000ff;\">diagCategory<\/span> = <strong>clang_getDiagnosticCategoryText<\/strong>( diag );\r\n  CXString <span style=\"color: #0000ff;\">diagText<\/span> = <strong>clang_getDiagnosticSpelling<\/strong>( diag );\r\n  CXDiagnosticSeverity <span style=\"color: #0000ff;\">severity<\/span> = <strong>clang_getDiagnosticSeverity<\/strong>( diag );\r\n  \r\n  printf( \"<span style=\"color: #808080\">Diagnostic[%d] - %s(%d)- %s\\n<\/span>\"\r\n                               , iDiagIdx\r\n                               , <strong>clang_getCString<\/strong>( diagCategory )\r\n                               , severity\r\n                               , <strong>clang_getCString<\/strong>( diagText ) );\r\n                               \r\n  <strong>clang_disposeString<\/strong>( diagText );\r\n  <strong>clang_disposeString<\/strong>( diagCategory );\r\n\r\n  <strong>clang_disposeDiagnostic<\/strong>( diagnostic );\r\n}<\/span><\/pre>\n<p><strong><em>Time to Start Digging!<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">The compile step should have provided you with a valid translation unit. We need to keep that around, but we\u2019re not going to do much with it once we&#8217;ve checked for errors. Once we get the top-level cursor with <span style=\"font-family: Courier New;\">clang_getTranslationUnitCursor()<\/span>, we&#8217;ll put the translation unit in a safe place and use the cursor as the top-level object from then on.<\/p>\n<p style=\"text-align: justify;\">We want to find relevant types, but we have to be smart about it. The C-language Clang interface uses a clunky callback API called <span style=\"font-family: Courier New;\">clang_visitChildren.<\/span> (Note: Python provides a simpler non-recursive <span style=\"font-family: Courier New;\">getChildren<\/span> interface that returns an iterator.) Clang will call your callback for each child cursor it encounters. Your callback, in turn, returns a value indicating whether the iteration should recurse to deeper children, continue to this child\u2019s siblings, or quit entirely.<\/p>\n<p>We\u2019re only interested in type declarations at this stage, but C++ allows new types to appear in several places. Fortunately, we can pare down the file pretty quickly.<\/p>\n<pre style=\"font-size: 80%\"><table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<tbody>\r\n<tr>\r\n<td valign=\"top\" width=\"133\"><strong>Item<\/strong><\/td>\r\n<td valign=\"top\" width=\"198\"><strong>Cursor Kind<\/strong><\/td>\r\n<td valign=\"top\" width=\"93\"><strong>Recurse?<\/strong><\/td>\r\n<td valign=\"top\" width=\"111\"><strong>Remember?<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Typedef<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_TypedefDecl<\/td>\r\n<td valign=\"top\" width=\"93\">Yes<\/td>\r\n<td valign=\"top\" width=\"111\">No<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Class Decl<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_ClassDecl<\/td>\r\n<td valign=\"top\" width=\"93\">Yes<\/td>\r\n<td valign=\"top\" width=\"111\">Yes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Struct Decl<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_StructDecl<\/td>\r\n<td valign=\"top\" width=\"93\">Yes<\/td>\r\n<td valign=\"top\" width=\"111\">Yes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Namespace Decl<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_Namespace<\/td>\r\n<td valign=\"top\" width=\"93\">Yes<\/td>\r\n<td valign=\"top\" width=\"111\">No<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Enumerations<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_EnumDecl<\/td>\r\n<td valign=\"top\" width=\"93\">No<\/td>\r\n<td valign=\"top\" width=\"111\">Yes?<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\"><em>Anything Else<\/em><\/td>\r\n<td valign=\"top\" width=\"198\">???<\/td>\r\n<td valign=\"top\" width=\"93\">No<\/td>\r\n<td valign=\"top\" width=\"111\">No<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table><\/pre>\n<p style=\"text-align: justify;\">It should be fairly obvious for an experienced programmer what to look for \u2013 the table above is the rule-set I\u2019ve been using. There are a few other cases that aren\u2019t covered \u2013 function-private types, and unions. It\u2019d be easy enough to deal with these cases too, but I haven\u2019t had a need to serialize a union just yet, and function-private types have limited utility for serialization.<\/p>\n<pre><span style=\"font-family: ariel;\"><strong><em>Traverse For Types<\/em><\/strong><\/span><hr\/><span style=\"font-size: 80%;\">\r\nMyTraversalContext <span style=\"color: #0000ff;\">typeTrav<\/span>;\r\nclang_visitChildren( <strong>clang_getTranslationUnitCursor<\/strong>( tu ), <strong>GatherTypesCB<\/strong>, &typeTrav );\r\n\r\nenum CXChildVisitResult <strong>GatherTypesCB<\/strong>( CXCursor <span style=\"color: #0000ff;\">cursor<\/span>, CXCursor <span style=\"color: #0000ff;\">parent<\/span>, CXClientData <span style=\"color: #0000ff;\">client_data<\/span> )\r\n{\r\n  MyTraversalContext * <span style=\"color: #0000ff;\">typeTrav<\/span> = reinterpret_cast<MyTraversalContext*>( client_data );\r\n  CXCursorKind <span style=\"color: #0000ff;\">kind<\/span> = <strong>clang_getCursorKind<\/strong>( cursor );\r\n\r\n  CXChildVisitResult <span style=\"color: #0000ff;\">result<\/span> = <span style=\"color: #00Af00;\">CXChildVisit_Continue<\/span>;\r\n\r\n  switch( kind )\r\n  {\r\n      case <em>CXCursor_EnumConstantDecl<\/em>:\r\n        typeTrav->AddEnumCursor( cursor );\r\n        break;\r\n\r\n      case <em>CXCursor_StructDecl<\/em>:\r\n      case <em>CXCursor_ClassDecl<\/em>:\r\n        typeTrav->AddNewTypeCursor( cursor );\r\n        result = <span style=\"color: #00Af00;\">CXChildVisit_Recurse<\/span>;\r\n        break;\r\n\r\n      case <em>CXCursor_TypedefDecl<\/em>:\r\n      case <em>CXCursor_Namespace<\/em>:\r\n        result = <span style=\"color: #00Af00;\">CXChildVisit_Recurse<\/span>;\r\n        break;\r\n  }\r\n\r\n  return result;\r\n}<\/span><\/pre>\n<p style=\"text-align: justify;\">Enumerations are in the mix even though they\u2019re a bit of a special case. For metadata purposes, you might get away with just treating them as integers. You can, however, be more robust in the face of changing types if you store them as symbolic strings until you do the final bake of your data.<\/p>\n<p><strong><em>Panning For Gold<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Now that we have a bunch of types, we might want to filter them. Remember that we might have encountered a massive header cascade in the compilation step. Logically, we\u2019re only interested in types that were declared in the header we\u2019re directly processing. We\u2019ll get the other ones when we process their headers in turn. Fortunately, we can iterate the list of interesting type cursors we just created in the last step, and ask each one what file location it came from. Types from other headers can be safely culled.<\/p>\n<p><strong><em>Data Gathering &#8211; Internal<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">After culling types from other headers, we should have a much smaller list of interesting types, so we can use their cursors as starting points to learn more about them. This is where we start gathering all that data I mentioned earlier. We&#8217;ll iterate for this data in much the same way we got the type cursors in the first place. I\u2019m sure you can do them both in one sweep, but I find the problem domain a little easier to think about in two-phases. This time, instead of starting at the top of the translation unit, we can start the iteration at the type cursor. We want to iterate the type declaration cursor completely and look for a few different things.<\/p>\n<pre style=\"font-size: 80%\"><table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<tbody>\r\n<tr>\r\n<td valign=\"top\" width=\"133\"><strong>Item<\/strong><\/td>\r\n<td valign=\"top\" width=\"198\"><strong>Cursor Kind<\/strong><\/td>\r\n<td valign=\"top\" width=\"93\"><strong>Recurse?<\/strong><\/td>\r\n<td valign=\"top\" width=\"111\"><strong>Remember?<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Base Type Ref<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_CXXBaseSpecifier<\/td>\r\n<td valign=\"top\" width=\"93\">No<\/td>\r\n<td valign=\"top\" width=\"111\">Yes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Member Var<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_FieldDecl<\/td>\r\n<td valign=\"top\" width=\"93\">Not yet(*)<\/td>\r\n<td valign=\"top\" width=\"111\">Yes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Static Class Var<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_VarDecl<\/td>\r\n<td valign=\"top\" width=\"93\">No<\/td>\r\n<td valign=\"top\" width=\"111\">No<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"133\">Methods<\/td>\r\n<td valign=\"top\" width=\"198\">CXCursor_CXXMethod<\/td>\r\n<td valign=\"top\" width=\"93\">No<\/td>\r\n<td valign=\"top\" width=\"111\">???<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n* <em>See examples below<\/em><\/pre>\n<p style=\"text-align: justify;\">Base types and member variables should be fairly obvious as to why we want them, but static class variables might seem odd for this list. I use them for another level of filtering. I know that any class that supports metadata uses that <span style=\"font-family: courier new;\">DECLARE_TYPE<\/span> macro from earlier. Of course, macros are all resolved by the preprocessor, so the C-language parser never sees that symbol, but buried within is a single static class variable with a known name and type that I can find. If it\u2019s not there, then this class is incapable of supporting metadata, and I can just skip it entirely. Looking at the problem the other way, the only thing I need to do in order to enable metadata for a given class is add the <span style=\"font-family: courier new;\">DECLARE<\/span> macro. The rest takes a care of itself.<\/p>\n<p style=\"text-align: justify;\">As an aside, I haven\u2019t bothered adding any method-invoke plumbing to my engine, but it\u2019s fairly easy to see how one might suss that out of the data. Lots of engines do that sort of thing, so I won\u2019t be surprised if I end up digging into it eventually.<\/p>\n<p><strong><em>Data Gathering &#8211; External<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Once we have all the data we\u2019ll need from inside the class, we only need a few external tidbits before we can generate the metadata file. We need to know the fully-qualified namespace of the target type as well as those of all the base types. This, too, is pretty straight forward though, as you can simply ask any cursor what its lexical parent happens to be. By iterating until you hit the translation unit, you can capture all the containing scopes of a given type. There is an important caveat that bit me only after a good while. Consider this code:<\/p>\n<pre><span style=\"font-family: ariel;\"><strong><em>Ambiguity Operator<\/em><\/strong><\/span><hr\/><span style=\"font-size: 80%;\">namespace <strong>FooNS<\/strong>\r\n{\r\n  class <strong>Foo<\/strong>\r\n  {\r\n    int <span style=\"color: #0000ff;\">dataFoo<\/span>;\r\n  };\r\n}\r\n\r\nusing namespace <span style=\"color: #0000ff;\">FooNS<\/span>;\r\ntypedef Foo FooAlias;\r\n\r\nclass <strong>Bar<\/strong> : public <strong>FooAlias<\/strong>\r\n{\r\n  int <span style=\"color: #0000ff;\">dataBar<\/span>;\r\n};<\/span><\/pre>\n<p style=\"text-align: justify;\">When we try to find the full scope of the base class of Bar, we won\u2019t be aware that Foo actually lives inside of FooNS, and the <span style=\"font-family: courier new;\">using<\/span> directive is what allows this to work. I am not a fan of <span style=\"font-family: courier new;\">using<\/span> and often refer to it as the ambiguity operator. I suppose it happens often enough in production code, however, that we should deal with it correctly.<\/span><\/p>\n<p style=\"text-align: justify;\">The way to deal with this situation is to walk the lexical parent chain as I already mentioned, but at every step along the way, we need to see if the parent scope is a class-type, struct-type, or typedef. If it is any of those, then we need to get the type record from the parent cursor, then ask for the canonical type record in order to eliminate the typedef indirections, and finally get the cursor from the type record using the <span style=\"font-family: courier new;\">clang_getTypeDeclaration<\/span> function. This might seem needlessly complex, but consider gathering data from <span style=\"font-family: courier new;\">Bar<\/span> in the above code example. Walking the lexical parents of <span style=\"font-family: courier new;\">Bar<\/span> works as expected because it doesn&#8217;t quietly live inside of any scopes that aren&#8217;t obvious. Doing the same for the base class (<span style=\"font-family: courier new;\">FooAlias<\/span>) is a different story. In that case, the base class is actually a <span style=\"font-family: courier new;\">typedef<\/span> of <span style=\"font-family: courier new;\">Foo<\/span> which is quietly defined inside of the <span style=\"font-family: courier new;\">FooNS<\/span> namespace.<\/p>\n<p><strong><em>Data Gathering &#8211; Off-World<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">We\u2019ve already dug into the class as well as ascended the various scopes in which a class might live. With all that accounted for, what else could there be to gather? Way back in the first section where I said my data members can have flags such as \u201c+NotSaved\u201d, I never really said how that information was found. Unfortunately, C++ doesn\u2019t really provide a code-annotaion scheme that integrates with the mechanics of the parser. There\u2019s a little room for a <span style=\"font-family: courier new;\">#pragma<\/span> or an <span style=\"font-family: courier new;\">__attribute__<\/span> interface, but I was unable to make those systems work how I wanted. Additionally, I don\u2019t really like how cumbersome they would have to be in order to give per-member-data attribute granularity. Instead I simply opted to use code comments. That way, the regular code would be completely unaware of them, and I would have free reign to implement any metadata features I wanted. Obviously, this is where we step outside of the AST that has served us so well up to this point, but really very far outside. We already have a cursor for each data member, and we can use <span style=\"font-family: courier new;\">clang_getCursorExtent<\/span> get the exact positions in the source files where this cursor occurs. From there, it becomes fairly trivial to do localized scan for comments using any syntax you happen to want.<\/p>\n<p><strong><em>Writing it out<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Ok, finally we should have all the data we need to write out our metadata. As I said earlier, I write everything out to a .cpp file for inclusion into the engine project. That\u2019s a nice, human-readable method, but it has an implicit requirement that any metadata generation run might require a small additional build. In Visual Studio, that can also mean you have to reload the project if you\u2019ve added any new classes. On the bright side, all your metadata is there at engine startup without load-order or chicken and egg problems.<\/p>\n<p style=\"text-align: justify;\">There are other ways, however. For example, you could dump all this information out to a binary archive that is slurped into the engine on startup. It could also be demand-loaded and unloaded if the overhead starts to be an issue.<\/p>\n<h2>Details Details\u2026<\/h2>\n<p style=\"text-align: justify;\">Now that you\u2019re all sleeping soundly after traversing the AST, I thought I&#8217;d give a few examples of how the AST is structured for common cases.<\/p>\n<p style=\"text-align: justify;\">Let\u2019s briefly consider some normal member data:<\/p>\n<pre>class <strong>Normal<\/strong>\r\n{\r\n  int <span style=\"color: #0000ff;\">data1<\/span>;\r\n  float <span style=\"color: #0000ff;\">data2<\/span>;\r\n};<\/pre>\n<p style=\"text-align: justify;\">The cursor hierarchy looks like this:<\/p>\n<pre style=\"font-size: 80%\"><table><thead><tr><th>Cursor Text<\/th><th>Cursor Kind<\/th><th>Type Kind<\/th><\/tr><\/thead>\r\n<tbody>\r\n<tr><td>Translation Unit<\/td><td><\/td><td><\/td><\/tr>\r\n<tr><td>    Normal<\/td><td>CXCursor_ClassDecl<\/td><td>CXType_Record<\/td><\/tr>\r\n<tr><td>        data1<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Int<\/td><\/tr>\r\n<tr><td>        data2<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Float<\/td><\/tr>\r\n<\/tbody><\/table><\/pre>\n<p style=\"text-align: justify;\">It\u2019s actually pretty intuitive once you\u2019re comfortable with some basic compiler concepts. Types are separated from semantic constructs, and POD types are directly represented in the Clang API.<\/p>\n<p style=\"text-align: justify;\">Now consider only slightly more complexity:<\/p>\n<pre>class <strong>StillPrettyNormal<\/strong>\r\n{\r\n  int * <span style=\"color: #0000ff;\">dataPtr1<\/span>;\r\n  struct DataType * <span style=\"color: #0000ff;\">dataPtr2<\/span>;\r\n};<\/pre>\n<pre style=\"font-size: 80%\"><table><thead><tr><th>Cursor Text<\/th><th>Cursor Kind<\/th><th>Type Kind<\/th><th>Pointee Type<\/th><\/tr><\/thead>\r\n<tbody>\r\n<tr><td>Translation Unit<\/td><td><\/td><td><\/td><\/tr>\r\n<tr><td>    StillPrettyNormal<\/td><td>CXCursor_ClassDecl<\/td><td>CXType_Record<\/td><td><\/td><\/tr>\r\n<tr><td>        dataPtr1<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Pointer<\/td><td>CXType_Int<\/td><\/tr>\r\n<tr><td>        DataType<\/td><td>CXCursor_StructDecl<\/td><td>CXType_Record<\/td><td><\/td><\/tr>\r\n<tr><td>        dataPtr2<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Pointer<\/td><td>CXType_Record(*)<\/td><\/tr>\r\n<\/tbody>\r\n<\/table>* <em>Causes ref to DataTypeClass<\/em><\/pre>\n<p style=\"text-align: justify;\">There are two odd parts here. First, we\u2019ve lost the notion of \u2018integer\u2019 for dataPtr1 &#8211; we\u2019ve only been told that it\u2019s some sort of pointer. This isn\u2019t really a problem though, because Clang provides the <span style=\"font-family: courier new;\">clang_getPointeeType<\/span> function. You can call this on any pointer type to get the next type in the chain. Pointers to pointers to pointers can be resolved this way through multiple calls if need be.<\/p>\n<p style=\"text-align: justify;\">Second, we have an unexpected struct declaration in the middle of our class declaration. Well, it\u2019s not completely unexpected, actually. The inclusion of \u2018struct\u2019 in the field declaration of dataPtr2 is also a forward declaration for the type, and Clang represents this. Fortunately, it\u2019s a sibling of the actual field declarations and we can safely ignore it.<\/p>\n<p style=\"text-align: justify;\">The final part of this example is the addition of the type reference to the externally-defined <span style=\"font-family: courier new;\">DataType<\/span>. Forward declarations in C++ allow types to be mentioned and not fully defined until they are used so for now, I have to assume that <span style=\"font-family: courier new;\">DataTypeClass<\/span> actually exists somewhere else. The possibility exists, however, that DataType is not a metadata-enabled class, and the reference to DataTypeClass will break at link-time. Obviously, this needs to be tightened down in my engine, and ideally, I&#8217;d like to avoid some sort of comment-flag mark-up. In order to solve this problem the correct way, I&#8217;ll probably have to shuffle the tool to look at all headers multiple times and create a dependency\/attribute graph.<\/p>\n<p style=\"text-align: justify;\">Ok, one more example, but I\u2019ll warn you ahead of time that this goes a little past the edge of where I wanted to go with my metadata-creation tool.<\/p>\n<pre>class <strong>WackyTown<\/strong>\r\n{\r\n     MyContainer&lt; MyData* &gt; <span style=\"color: #0000ff;\">cacheMisser<\/span>;\r\n     MyContainer&lt; MyData*, PoolAllocator&lt;32768&gt; &gt; <span style=\"color: #0000ff;\">sendHelp<\/span>;\r\n};<\/pre>\n<pre style=\"font-size: 80%\"><table><thead><tr><th>Cursor Text<\/th><th>Cursor Kind<\/th><th>Type Kind<\/th><\/tr><\/thead>\r\n<tbody>\r\n<tr><td>Translation Unit<\/td><td><\/td><td><\/td><\/tr>\r\n<tr><td>    WackyTown<\/td><td>CXCursor_ClassDecl<\/td><td>CXType_Record<\/td><\/tr>\r\n<tr><td>        cacheMisser<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Unexposed(*)<\/td><\/tr>\r\n<tr><td>            MyContainer<\/td><td>CXCursor_TemplateRef<\/td><td>CXType_Invalid<\/td><\/tr>\r\n<tr><td>            MyData<\/td><td>CXCursor_TypeRef<\/td><td>CXType_Record<\/td><\/tr>\r\n\r\n<tr><td>        sendHelp<\/td><td>CXCursor_FieldDecl<\/td><td>CXType_Unexposed(*)<\/td><\/tr>\r\n<tr><td>            MyContainer<\/td><td>CXCursor_TemplateRef<\/td><td>CXType_Invalid<\/td><\/tr>\r\n<tr><td>            MyData<\/td><td>CXCursor_TypeRef<\/td><td>CXType_Record<\/td><\/tr>\r\n<tr><td>            PoolAllocator<\/td><td>CXCursor_TemplateRef<\/td><td>CXType_Invalid<\/td><\/tr>\r\n<tr><td>            <em>(no-name)<\/em><\/td><td>CXCursor_IntegerLiteral<\/td><td>CXType_Int<\/td><\/tr>\r\n<\/tbody><\/table>\r\n* <em>See the Canonical Type!<\/em><\/pre>\n<p style=\"font-size: 90%; text-align: justify; overflow: auto; background-color: #E7E7F7; padding: 2em; line-height: 1.5em; border: 1px solid #ddd; margin: 1.5em 0;\"><em>A quick confession: I started out using Clang\u2019s C-interface, and I\u2019ve attempted to write this article using C-langauge references. However, I actually did most of my experimentation using the Python bindings. They appear a little incomplete in comparison, so it\u2019s possible that I simply missed the correct course on this one.<\/em><\/p>\n<p style=\"text-align: justify;\">There is a lot of missing data in there! First, the types for <span style=\"font-family: courier new;\">cacheMisser<\/span> and <span style=\"font-family: courier new;\">sendHelp<\/span> are both listed as \u201cUnexposed\u201d. Second, the template-ref types are invalid with no obvious way to get to the template definition. Third, nowhere is it exposed to the AST that the MyType references are actually pointers. Fourth, there is no way to know what integer value was passed to <span style=\"font-family: courier new;\">PoolAllocator<\/span> in either instance. Fourth, the implicit hierarchy of the template type arguments have been flattened. What a mess!<\/p>\n<p style=\"text-align: justify;\">Experimentally, I found that I could get the canonical type of the \u201cUnexposed\u201d type for cacheMisser and then get the declaration of the canonical type. That gave me a new <span style=\"font-family: courier new;\">CXCursor_ClassDecl<\/span> cursor which made a certain amount of sense if you think of templates as meta-types and template usages as real types. The new cursor looked like this:<\/p>\n<pre style=\"font-size: 80%\"><table><thead><tr><th>Cursor Text<\/th><th>Cursor Kind<\/th><th>Type Kind<\/th><\/tr><\/thead>\r\n<tbody>\r\n<tr><td>MyContainer&lt; MyData *, PoolAllocator&lt; 1000 &gt; &gt;<\/td><td>CXCursor_ClassDecl<\/td><td>CXType_Record<\/td><\/tr>\r\n<\/tbody><\/table><\/pre>\n<p style=\"text-align: justify;\">On the surface, it seems very promising, but there were no children of this cursor. So while it looks like all of our information is represented at the top level, there was no way to dig into it. Fortunately, I&#8217;m not very template-heavy in my engine project, but this is something I&#8217;m still trying to work through for completeness.<\/p>\n<p><strong><em>A Note on Performance<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\">Running my little utility script on a single header takes about 5 seconds from start to finish. Much of that time is because I&#8217;m using a fairly heavy pre-compiled header in my regular engine builds, so building an otherwise trivial file actually touches several dozen core-systems files in addition to any deep rabbit holes caused by C-Runtime inclusions. I&#8217;ve mitigated this for the most part by using Clangs actual pre-compiled header functionality instead of just including my PCH as a normal header. As expected, this turns out to be a huge win when I&#8217;m running this across all my engine files. Much like ordinary PCH&#8217;s, I pay the 5-second penalty once and then each source file is virtually instantaneous. <\/p>\n<h2>That&#8217;s All, Folks!<\/h2>\n<p style=\"text-align: justify;\">I think that&#8217;ll just about do it for my engine metadata generator. While I hope someone out there can benefit from this little bit of yak-shaving for my on-going engine project, I&#8217;m sure this wasn&#8217;t a wholly original idea. I&#8217;d love to hear what others are doing in this vein using Clang or some other tools. Until next time &#8211; hopefully not in another year.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Game engines often have some form of metadata system that can be used for a myriad of tasks. My little home brew engine, for example, uses metadata to facilitate serialization, allow object allocation by name, content-updating, etc. It\u2019s all quite common, but creating such a system is actually pretty complex when you start to get <a href='https:\/\/www.myopictopics.com\/?p=368' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-368","post","type-post","status-publish","format-standard","hentry","category-engineering","category-3-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"_links":{"self":[{"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/posts\/368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=368"}],"version-history":[{"count":112,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/posts\/368\/revisions"}],"predecessor-version":[{"id":482,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=\/wp\/v2\/posts\/368\/revisions\/482"}],"wp:attachment":[{"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.myopictopics.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}