Heap Space: hibernate

Wednesday, February 3, 2010

Hibernate/JPA confuses "legacy" with "high throughput", "scalable", and "complex"

I like Hibernate/JPA for dead-simple inserts and update. I detest it for queries. I also detest how the Hibernate documentation totally misuses the term "legacy". When I hear the term "legacy", I think "old and dumb". But on planet Hibernate, "legacy" means "Any application which operates on a piddly amount of data and doesn't buy into our myopic, java-centric view of the cosmos in which a relational database is just a pain-in-the-ass file system whose performance and data integrity issues exist only to perpetuate the hiring of very expensive DBAs".

For example, Hibernate says it doesn't support trigger-driven sequence creation because that's a "legacy" kind of a thing. What if my database is accessed both by Scala, Ruby, Perl, and Java? I'd like to encode some core business rules directly in the database so I don't have to duplicate logic in different languages. I don't call that "legacy". I call it "reality".

Hibernate thinks that compound unique keys are indicative of a "legacy" database. There are perfectly good reasons--from a (gasp!) data modeling perspective--to use complex unique keys and dispense with primary keys. Having to introduce new artificial primary keys into all my tables to make life easier for the object layer is something I am now more or less required to do so as not to infuriate application developers. But there is a real cost to the data model: the data model is forced to violate DRY. The data model has to maintain two redundant definitions of uniqueness: one that is natural and one that is imposed by Hibernate. Hibernate claims that having a foreign key refer to a unique key of the associated table rather than the primary key is "complicated and confusing". This is bullshit.

Hibernate thinks that only "legacy" tables have lots of columns. In reality, it's often very useful to make really fat tables to accomodate what would otherwise be complex, possibly inefficient multi-table queries. I've made fat tables before and I'll make them again. It's not legacy, it's reality.

Hibernate thinks that everyone should have "fewer tables than classes", and then claims that storing some attributes in a secondary table (hello @Secondary annotation) is really only for...you guessed it..."legacy" databases. Hold the phone. What if I have some bit of data that is part of an object but needs database-level security? For example, if I have a patient table and I want to expose access to social security numbers, perhaps I'd like to put the social security numbers in a separate table, or in a separate schema, and have the database manage the security of this table slightly differently? What if I have some huge freakin' columns that map to a particular subclass of an object, but I'd like to keep them stored separately for query performance? What if these columns are extremely rarely used? I don't want to risk pulling them all into memory (which I can control by FetchType.EAGER). I want to manage the storage for these data differently. Maybe the data is so large that I need to partition the backing tables. Hibernate seems to be saying that these situations are for those stupid old "legacy" applications. Just let your object model define the data model! What could possibly go wrong?

My needs aren't legacy needs. These are the needs of reality.

Friday, April 24, 2009

If Hibernate is so great for developers, why does it make my unit tests run 400x slower?

I am insanely frustrated by how sluggish hibernate startup time is during unit tests. I have explored ways to optimize this, but I believe Hibernate has made a fundamental mistake with boot-time loading. It would be far,far better to do these sorts of checks at compile time instead of at load time. It would be so much better to do it at compile time that I consider it a moronic blunder to do it at load time.

Here are some of the things I've tried to do to speed up Hibernate load times, both for unit tests and for deploying to our dev/test tomcats.

1. Try -Dhibernate.use_lazy_proxy_factory=true

Nope, that doesn't do a thing. ~19s startup for my piddly unit test that needs to grab one row, from one table, and parse it. Without hibernate, it would take about 3.6 microseconds.

2. Make your own custom persistence.xml

Nope, this doesn't work because we have a highly interconnected schema, with over 700 tables. Some folks will point out that my organization must be brain dead for having that many tables in the same schema, but our application is all about interogating data. Besides, the schema evolved over ten years, starting back when Java was in its infancy. So don't tell me that the solution to long startup overhead in a unit test is to refactor a mult-terrabyte schema.

This interconnected-ness, even on a smaller schema, causes problems. Hibernate isn't a compiler; it's not doing static analysis to determine what Entity classes to load. It does a brute force, load-everything-at-boot-time, regardless of whether the JVM will actually ever need the classes in question. Our home grown RO tools did this once...at compile time. Ironically, developers complained about this. I thought it was great; I love static analysis. Hibernate doesn't do this once at compile time, it does it hundreds of times per day, each time a developer runs a unit test. By my calculation, the per developer hibernate tax, if you run about 50 unit tests a day during your personal development work, eats at least 25 minutes. It's not just 25 minutes, though, since each occurence is a distraction, and distractions are a waste-multiplier (like the military's "force multiplier", only for wasting time).

3. Just write your own weird class loader.

Seriously? If Hibernate is a tool whose goal is dumbing-down SQL and relational databases, do you really think that your average Hibernate user is going to be able to navigate a custom class loader?

4. Re-architect your code so that all database access goes through a separately deployed, long-lived service layer like a web service or RMI.

Everything that makes "option" 2 above impossible also makes this advice worthless. When you have a large (table-wise), highly interconnected schema, you can't just start shuffling logical subgroups of tables into separate schemas without spending a few years refactoring everything to go through those service layers.

TL;DR: Please, Hibernate, I beg you: move load-time checks into compile-time checks, or just optimize the bejeezus out of whatever is going on during load time so that it takes less than 2 seconds.

Heap Space

Wednesday, February 3, 2010

Hibernate/JPA confuses "legacy" with "high throughput", "scalable", and "complex"

Friday, April 24, 2009

If Hibernate is so great for developers, why does it make my unit tests run 400x slower?

Followers

Blog Archive