-->

Builders in Java versus C++?

2020-08-17 05:37发布

问题:

In Google's Protocol Buffer API for Java, they use these nice Builders that create an object (see here):

Person john =
  Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .addPhone(
      Person.PhoneNumber.newBuilder()
        .setNumber("555-4321")
        .setType(Person.PhoneType.HOME))
    .build();

But the corresponding C++ API does not use such Builders (see here)

The C++ and the Java API are supposed to be doing the same thing, so I'm wondering why they didn't use builders in C++ as well. Are there language reasons behind that, i.e. it's not idiomatic or it's frowned upon in C++? Or probably just the personal preference of the person who wrote the C++ version of Protocol Buffers?

回答1:

The proper way to implement something like that in C++ would use setters that return a reference to *this.

class Person {
  std::string name;
public:
  Person &setName(string const &s) { name = s; return *this; }
  Person &addPhone(PhoneNumber const &n);
};

The class could be used like this, assuming similarly defined PhoneNumber:

Person p = Person()
  .setName("foo")
  .addPhone(PhoneNumber()
    .setNumber("123-4567"));

If a separate builder class is wanted, then that can be done too. Such builders should be allocated in stack, of course.



回答2:

I would go with the "not idiomatic", although I have seen examples of such fluent-interface styles in C++ code.

It may be because there are a number of ways to tackle the same underlying problem. Usually, the problem being solved here is that of named arguments (or rather their lack of). An arguably more C++-like solution to this problem might be Boost's Parameter library.



回答3:

Your claim that "the C++ and the Java API are supposed to be doing the same thing" is unfounded. They're not documented to do the same things. Each output language can create a different interpretation of the structure described in the .proto file. The advantage of that is that what you get in each language is idiomatic for that language. It minimizes the feeling that you're, say, "writing Java in C++." That would definitely be how I'd feel if there were a separate builder class for each message class.

For an integer field foo, the C++ output from protoc will include a method void set_foo(int32 value) in the class for the given message.

The Java output will instead generate two classes. One directly represents the message, but only has getters for the field. The other class is the builder class and only has setters for the field.

The Python output is different still. The class generated will include a field that you can manipulate directly. I expect the plug-ins for C, Haskell, and Ruby are also quite different. As long as they can all represent a structure that can be translated to equivalent bits on the wire, they're done their jobs. Remember these are "protocol buffers," not "API buffers."

The source for the C++ plug-in is provided with the protoc distribution. If you want to change the return type for the set_foo function, you're welcome to do so. I normally avoid responses that amount to, "It's open source, so anyone can modify it" because it's not usually helpful to recommend that someone learn an entirely new project well enough to make major changes just to solve a problem. However, I don't expect it would be very hard in this case. The hardest part would be finding the section of code that generates setters for fields. Once you find that, making the change you need will probably be straightforward. Change the return type, and add a return *this statement to the end of the generated code. You should then be able to write code in the style given in Hrnt's answer.



回答4:

To follow up on my comment...

struct Person
{
   int id;
   std::string name;

   struct Builder
   {
      int id;
      std::string name;
      Builder &setId(int id_)
      {
         id = id_;
         return *this;
      }
      Builder &setName(std::string name_)
      {
         name = name_;
         return *this;
      }
   };

   static Builder build(/* insert mandatory values here */)
   {
      return Builder(/* and then use mandatory values here */)/* or here: .setId(val) */;
   }

   Person(const Builder &builder)
      : id(builder.id), name(builder.name)
   {
   }
};

void Foo()
{
   Person p = Person::build().setId(2).setName("Derek Jeter");
}

This ends up getting compiled into roughly the same assembler as the equivalent code:

struct Person
{
   int id;
   std::string name;
};

Person p;
p.id = 2;
p.name = "Derek Jeter";


回答5:

The difference is partially idiomatic, but is also the result of the C++ library being more heavily optimized.

One thing you failed to note in your question is that the Java classes emitted by protoc are immutable and thus must have constructors with (potentially) very long argument lists and no setter methods. The immutable pattern is used commonly in Java to avoid complexity related to multi-threading (at the expense of performance) and the builder pattern is used to avoid the pain of squinting at large constructor invocations and needing to have all the values available at the same point in the code.

The C++ classes emitted by protoc are not immutable and are designed so that the objects can be reused over multiple message receptions (see the "Optimization Tips" section on the C++ Basics Page); they are thus harder and more dangerous to use, but more efficient.

It is certainly the case that the two implementations could have been written in the same style, but the developers seemed to feel that ease of use was more important for Java and performance was more important for C++, perhaps mirroring the usage patterns for these languages at Google.



回答6:

In C++ you have to explicitly manage memory, which would probably make the idiom more painful to use - either build() has to call the destructor for the builder, or else you have to keep it around to delete it after constructing the Person object. Either is a little scary to me.