If you ever embark on a period of time writing open source (free) software you'll almost certainly come across many different attitudes in the people who choose to use this software.
There will be some people who'll actually say thanks for providing this software, that it's helped them with their project, saved them many hours of time that otherwise they would have had to spend writing something similar. The software was suitable for their project needs, so great, your effort in making the software open source has benefitted people. Feel good.
There'll be people who maybe say thanks but typically just accept the software as it is, some sort of given, and when they have a problem they take the source code and try to work out where the problem is. They may ask you questions about how the code works, or where to look to get started in resolving the problem. And some time later they may come back with a patch for their problem, so this makes the software better. Again, a good thing.
There will also be a group who take the software and use it for their projects. If a problem occurs they will report it. Their report may provide a way of reproducing the problem, or it may not. If a problem is reported with a way of being reproduced then it can be fixed when the people writing the software have some spare time to do it. Another good thing. If there is no way provided to demonstrate it then the problem report is of little use to anyone ... unless the person who has the problem is willing to get their hands dirty, get the code and fix it (with help where necessary) since only they can see it.
A final group will take the software and if a problem occurs they keep it to themselves; it's like they expect you to be aware of everything that could possibly happen with the software, the developer is a crystal ball wielder. The people who develop that software only have a certain amount of time, and they typically will use it and test it against what their own project requires. That doesn't mean that their use-cases are the same as yours. So don't expect open source developers testing to cover what you need for your project; you could contribute tests to their suite, or donate for their time to run against other datastores if this is important to you.
You may find people who ask the question "should I ditch use of your software?" when faced with a problem, something that your software doesn't cater for, or fails on. Maybe this is in some kind of "threat" sense, fix this problem or I leave? Well the answer to that is simple really. People should do what is right for their project. They've demonstrated one way or another whether they wish to contribute anything to the open source software (problem reports, testcases, patches, documentation, blogs, testimonials, donations, etc, there are many ways). If they haven't demonstrated the willingness to do anything for the project then their input won't be missed if they go off somewhere else. Do they pay the people who develop the software ? well no. Does the license of that software imply any guarantee that all problems will be fixed immediately when the toys are thrown from the pram? nope. Maybe this software is not the correct tool for their project? in which case use the correct tool for the job, and don't vent your frustration at your choices on the people who have provided something for nothing. Further to this, stick to the old addage "don't ask someone to do what you wouldn't be prepared to do yourself", didn't your mum teach you that?
People seem to have got accustomed to having an open source solution these days, and that somehow it's their "right" to have it and their "right" to have any problems found fixed. While open source (free) software gives projects a leg up in reaching their end goal more rapidly and is a great thing for software developers, open source software owes the end user nothing. Best understand this. The end user has the opportunity to do many things to contribute to that software, make it better, repay those people who put their time into developing it. The time of these people who wrote it is important to them, even if it isn't to you; at least respect that.
Some things are for sure, when you embark on writing open source software, it can be very rewarding, very beneficial if you want a way to demonstrate to potential employers of your coding skills, excellent possibilities for exploring other technologies and gaining experience, working with other people with different viewpoints, but don't go into it for the gratitude :-)
[Disclaimer : while there is such a thing as commercial open source software, providing the source code yet charging for the software, what is being discussed here is the much more common open source free software]
DataNucleus
Flexible standardised Java persistence to any conceivable type of datastore
Performance - effect of various features
2 years ago I made a post about performance/benchmarking, and the fact that some groups like some magic black and white "X is better than Y" (and that there is only one measure of performance so it doesn't matter what object graphs are used it will always be the same). The evidence is that they are wrong. Needless to say there will always be groups that don't share our philosophy, or don't have time to do a complete analysis (though publish their results knowing that they are incomplete and likely invalid, after all it's not their software they're maybe not presenting in a fair light). Recently we had another performance exercise. This came to the conclusion "Hibernate is better than DataNucleus, and you should really just get ObjectDB". So we're back in the territory of black and white. Yes, an OODBMS ought to be way faster than RDBMS, particular when the RDBMS has a persistence layer in front of it (and you have to pay for the OODBMS besides), but that is not the subject of this post. We'll concentrate on the former component of that conclusion.
There is nothing to add to the previous blog post in terms of correctness, we stand by all of it and nothing has been demonstrated to the contrary. This blog post simply takes the recent exercise sample and demonstrates how enabling/disabling certain features has a major impact on (DataNucleus) performance. The author of that exercise demonstrated results showing that JDO and JPA with DataNucleus were on a par in terms of performance, but below Hibernate in terms of INSERTs (anything between 1.5 and 2 times) and on a par for SELECTs (some faster, some slower but more or less the same). Since JDO and JPA are shown to be equivalent, we'll just run the exercise with JDO here, but the same is easily demonstratable using JPA (because in DataNucleus you have full control over all persistence properties and features regardless of API).
The sample data used by this case is that of 3 classes. Student has a (1-N unidirectional) List of Credit and has a (1-1 unidirectional) Thesis. We persist 100000 Students each with 1 Credit and 1 Thesis. So that's 300000 objects to be inserted, and then 100000 Students queried.
The INSERT is as follows
and the SELECT is as follows
So we'll run (on H2 database, on a Core i5 64-bit PC running Linux, 4Gb RAM) and vary our persistence properties to see the effect.
One important thing to note is that it is extremely useful to have the ability to set many of these properties on a PersistenceManager (or EntityManager) basis (so you could have a PM just for bulk inserts and disable L2 caching, or set the transaction to not be "optimistic"). JDO 3.1 adds the ability to set persistence properties on the PersistenceManager, though DataNucleus only currently supports a minimal set there - SVN trunk now has the ability to turn off the L2 cache in a PM while have it enabled for the PMF as a whole.
There is nothing to add to the previous blog post in terms of correctness, we stand by all of it and nothing has been demonstrated to the contrary. This blog post simply takes the recent exercise sample and demonstrates how enabling/disabling certain features has a major impact on (DataNucleus) performance. The author of that exercise demonstrated results showing that JDO and JPA with DataNucleus were on a par in terms of performance, but below Hibernate in terms of INSERTs (anything between 1.5 and 2 times) and on a par for SELECTs (some faster, some slower but more or less the same). Since JDO and JPA are shown to be equivalent, we'll just run the exercise with JDO here, but the same is easily demonstratable using JPA (because in DataNucleus you have full control over all persistence properties and features regardless of API).
The sample data used by this case is that of 3 classes. Student has a (1-N unidirectional) List of Credit and has a (1-1 unidirectional) Thesis. We persist 100000 Students each with 1 Credit and 1 Thesis. So that's 300000 objects to be inserted, and then 100000 Students queried.
The INSERT is as follows
try
{
pm.currentTransaction().begin();
for (int x = 0; x < 100000; x++);
{
Student student = new Student();
Thesis thesis = new Thesis();
thesis.setComplete(true);
student.setThesis(thesis);
List credits = new ArrayList();
Credit credit = new Credit();
credits.add(credit);
student.setCredits(credits);
pm.makePersistent(student);
}
pm.currentTransaction().commit();
}
finally
{
pm.close();
}
and the SELECT is as follows
try
{
Query q = pm.newQuery(
"select from " + Student.class.getName() +
" where thesis.complete == true && credits.size()==1");
Collection result = (Collection) q.execute();
... loop through results, so we know they're loaded
}
finally
{
pm.close();
}
So we'll run (on H2 database, on a Core i5 64-bit PC running Linux, 4Gb RAM) and vary our persistence properties to see the effect.
Original persistence properties (from original author)
optimistic=true, L2 cache=true, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 120s, SELECT = 6.5s
Disabled L2 cache
Since we're persisting huge numbers of objects and it takes time to cache those, and in the original authors case Hibernate had no L2 cache enabled, lets turn the L2 cache off. So we now have
optimistic=true, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 106s, SELECT = 4.0s
Why the improvement? : because objects didn't need caching, so DataNucleus didn't need to generate the cacheable form of those 300000 objects on INSERT, and 100000 objects on SELECT.
Disabled Optimistic Locking
Now instead of using optimistic locking (queue all operations until commit/flush), we allow all persists to be auto-flushed. As our exercise is bulk-insert we don't care about optimistic locking since we're creating the objects. So we now have
optimistic=false, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=builtin
INSERT = 42s, SELECT = 4.0s
Why the improvement ? : because objects are flushed as they are encountered so we don't have to hang on to a large number of changes, so the memory impact is less. Note that we could have observed a noticeable speed up also if we had instead called "pm.flush()" in the loop after every 1000 or 10000 objects. See the performance tuning guide for that.
Use BoneCP connection-pooling
Use BoneCP instead of built-in DBCP, so we have
optimistic=false, L2 cache=false, persistenceByReachabilityAtCommit=false, detachAllOnCommit=false, detachOnClose=false, manageRelationships=false, connectionPooling=bonecp
INSERT = 42s, SELECT = 3.8s
Why the (slight) improvement ? : because BoneCP has benchmarks showing that it has less overhead than DBCP
Conclusion
As you can see, with very minimal tweaking we've reduced the INSERT time by a factor of 3, and the SELECT time by a factor of 1.7! That would equate to being noticeably faster than Hibernate in the authors original timings (for both INSERT and SELECT). Note that we already had the detach flags set to not detach anything, so they didn't need tuning (but should be included if you hadn't already looked at those in your performance tests, similarly all of the other features listed in the Performance Tuning Guide referenced above).
Does the above mean that "DataNucleus is faster than Hibernate" ? Not as such, it is in some situations and not in others. We can turn on/off many things and get different results, just as Hibernate likely can (though I'd say DataNucleus is more configurable than the majority if not all of the other persistence solutions so at least you have significant flexibility to do this with DataNucleus). In the same way we could persist other object graphs and get different results due to some parts of the persistence process being more optimised than others. One thing you can definitely say is that DataNucleus has very good performance (300000 objects persisted in 42secs on a PC, and 100000 objects queried in less than 4secs) and that performance can be significantly tuned.
The other thing that we said in the original blog post and repeat here, if you are serious about performance analysis you have to dig into the details to understand why and, as a consequence, you have an idea what to tune. You also need to assess what your application really needs to perform and what is considered acceptable performance; if you're not going to make a proper attempt at tuning a persistence solution (whether that is DataNucleus, Hibernate, or any other), best not bother at all and just use what you were going to use anyway since you don't have the time to give a fair representation (which is why we don't present any Hibernate results here, so nothing hypocritical in that).
Does the above mean that "DataNucleus is faster than Hibernate" ? Not as such, it is in some situations and not in others. We can turn on/off many things and get different results, just as Hibernate likely can (though I'd say DataNucleus is more configurable than the majority if not all of the other persistence solutions so at least you have significant flexibility to do this with DataNucleus). In the same way we could persist other object graphs and get different results due to some parts of the persistence process being more optimised than others. One thing you can definitely say is that DataNucleus has very good performance (300000 objects persisted in 42secs on a PC, and 100000 objects queried in less than 4secs) and that performance can be significantly tuned.
The other thing that we said in the original blog post and repeat here, if you are serious about performance analysis you have to dig into the details to understand why and, as a consequence, you have an idea what to tune. You also need to assess what your application really needs to perform and what is considered acceptable performance; if you're not going to make a proper attempt at tuning a persistence solution (whether that is DataNucleus, Hibernate, or any other), best not bother at all and just use what you were going to use anyway since you don't have the time to give a fair representation (which is why we don't present any Hibernate results here, so nothing hypocritical in that).
Enhancing in v3.2
Whilst a "final release" of version 3.2 of DataNucleus is still some way off, some important changes have been made to the enhancement process that people need to be aware of, and can benefit from.
JDO : Ability to enhance all classes as "detachable" without updating metadata
When you enhance classes for the JPA API they are all made detachable without a need to specify anything in the metadata (since JPA doesn't have a concept of not being detachable). With JDO the default is not detachable (for backwards compatibility with JDO1 which didn't have the detachment concept). In v3.2 of DataNucleus you can set the alwaysDetachable option (see the enhancer docs ) and all classes will be enhanced detachable without the need to touch the metadata; much easier than opening up every class or metadata file and adding detachable="true" !JPA : Throwing of exceptions due to the bytecode enhancement contract
The bytecode enhancement contract requires that classes throw exceptions under some specific situations where information is either not present or not valid. These always used JDO-based exceptions before to match the JDO bytecode enhancement contract exactly. These are now changed to better suit the JPA API, and remove a need to understand JDO when using JPA.- if a non-detached field was accessed then a JDODetachedFieldAccessException was thrown; this is now changed to an (java.lang.)IllegalAccessException.
- in some cases where an internal error occurred a JDOFatalInternalException would be thrown; this is now changed to an (java.lang.)IllegalStateException.
No "datanucleus-enhancer.jar", and no need of external "asm.jar"
The DataNucleus enhancer was always maintained as a separate project, but is now merged into datanucleus-core.jar and so will be available directly whenever you have DataNucleus in your CLASSPATH. Taking this further, the enhancer makes use of the excellent ASM library and in v3.2 datanucleus-core.jar includes a repackaged version of the ASM v4.1 classes internally. This means that you have one less dependency also and can do enhancement with less thinking.
PS Remember, bytecode enhancement is "evil", developers of some other persistence solution told you that back in 2003, and you should never forget it! ;-)
Persistence to Neo4j graph datastores
Whilst DataNucleus JDO/JPA already supported persistence and querying of objects to/from RDBMS (all variants), ODBMS (NeoDatis), Documents (XML, Excel, ODF), Web (JSON), Document-based (MongoDB), Map-based (HBase, AppEngine, Cassandra), as well as others like LDAP and VMForce, it was clear that we didn't yet have a plugin to any of the nice new graph datastores like Neo4j. To this end, we now provide a new store plugin, supporting persistence to Neo4j.
datanucleus.ConnectionURL=neo4j:{my_datastore_location}
Refer to the DataNucleus docs for more details. Note that the plugin is not yet released, but is available as a nightly build for anyone wishing to give it a try
Feedback is welcome (over on the DataNucleus Forum, or below in the comments). Additionally if anyone with more experience in Neo4j who would like this plugins capabilities to be enhanced why not get involved? You contribute a few patches for example - the source code is available here, and the issue tracker is a good place to start
Enjoy!
Usage
Just like all of the other store plugins we aim to make its usage as seamless and transparent as possible so that you, the user, has a high level of portability for your application. In simple terms you just mark your model classes with JDO or JPA metadata (annotations or XML) just as you would do for RDBMS (or any other datastore), and write your JDO or JPA persistence code in the normal way. The only difference is that the data is persisted into Neo4j transparently. I've not had time to write up a tutorial yet, but the model and persistence code would be identical to persisting to any other datastore, just that in the definition of the datastore "URL" it would be something likedatanucleus.ConnectionURL=neo4j:{my_datastore_location}
Refer to the DataNucleus docs for more details. Note that the plugin is not yet released, but is available as a nightly build for anyone wishing to give it a try
Currently supported
- Each object of a class becomes a Neo4j Node.
- Supports datastore identity, application identity, and nondurable identity
- Supports versioned objects
- Fields of all primitive and primitive wrappers can be persisted
- Fields of many other standard Java types can be persisted (Date, URL, URI, Locale, Currency, JodaTime, javax.time, plus many more)
- 1-1, 1-N, M-N, N-1 relation is persisted as a Neo4j Relationship (doesn't support Map fields currently)
- JDOQL/JPQL queries can be performed, and the operators &&, ||, ==, !=, >, >=, <, <= are processed using Cypher, with any remaining syntax handled in-memory currently.
- Support for using Neo4j-assigned "node id" for "identity" value strategy.
- Checks for duplicate object identity
- Embedded (and nested embedded) 1-1 fields, and querying of these fields
Likely supported soon
- Processing of more JDOQL/JPQL syntaxis in Cypher to minimise any in-memory processing
- Support for backed SCO collection wrappers allowing more efficient Relationship management.
Feedback is welcome (over on the DataNucleus Forum, or below in the comments). Additionally if anyone with more experience in Neo4j who would like this plugins capabilities to be enhanced why not get involved? You contribute a few patches for example - the source code is available here, and the issue tracker is a good place to start
Enjoy!
DataNucleus AccessPlatform v3.1 coming soon ...
Almost a year from the release of version 3.0 and we move close to the release of version 3.1 (due late in July 2012). So what has changed in that time ?
Consolidation
While DataNucleus' plugin architecture is very flexible, it can lead to a large number of plugins being available. This in itself is not a bad thing but, if your application is using many features, you do have to keep track of more plugins and their versions. Version 3.1 merges the following plugins into other plugins- datanucleus-management was a plugin providing JMX capabilities to DataNucleus usage. It is now merged into datanucleus-core and is now part of a new statistics monitoring API.
- datanucleus-javaxtime was a plugin providing support for the new javax.time classes that will provide a real Date/Time API for Java. This will be part of JDK 1.8 IIRC, so we have moved support for these Java types into datanucleus-core. More and more people will be using them and expecting their persistence to be seamless.
- datanucleus-cache had support for an early version of the forthcoming javax.cache standardised Caching API, but the API has since changed and is reaching a level of maturity. As a result we now provide support for the latest javax.cache API in datanucleus-core, so the typical user (when javax.cache is widely implemented) will not need the datanucleus-cache plugin
- datanucleus-xmltypeoracle was a plugin providing support for persisting String fields to XMLType JDBC columns for Oracle. It is now merged into the datanucleus-rdbms plugin.
As a result of these changes the typical application will only need datanucleus-core, datanucleus-api-jdo or datanucleus-api-jpa, as well as the datanucleus-{datastore} plugin of your choice in the CLASSPATH at runtime. In addition some persistence properties have more sensible defaults that will mean that more applications won't need some value setting to work optimally.
JPA 2.1
The latest revision of the JPA spec (JSR0338) is under way, and has some new features already fleshed out. In Version 3.1 of DataNucleus we provide early access support for
- Stored Procedure API : This allows users of JPA to invoke stored procedures in their RDBMS and get back output parameters and/or result sets. Obviously not applicable when using JPA with a non-RDBMS datastore.
- Type Converter API : This defines a way in which a user can have a field in their Entity and wish to convert the value before it gets to the datastore (and back on retrieval). For example if you have some Java type of your own and want to persist it as a String you could define an attribute converter.
Obviously as JPA2.1 continues we will continue adding features to match their spec.
Other New Features
Whilst some of these could be argued to deserve their own section in this blog, I list here other prominent changes in version 3.1
- The REST API has had significant work, and now provides much more enhanced support for JDOQL/JPQL including order clauses etc. It also now supports use of datastore identity, bulk delete, and much more.
- The enhancer will now work with JDK1.7 (and higher), using the latest version of ASM.
- JTA handling with JPA is now complete
- Support for nondurable identity is now provided for RDBMS, MongoDB, HBase, Excel and ODF.
- You can now have any nontransactional updates persisted atomically. Previously only nontransactional persists and deletes were able to be performed atomically. This means we now have a real "auto-commit" mode of operation
- The HBase plugin adds support for multitenancy, as well as obeying JDO/JPA naming strategies.
- The MongoDB plugin adds support for embedded objects with inheritance, obeys JDO/JPA naming strategies, and adds support for several new query features being evaluated in the datastore.
- The Excel and ODF plugins add support for JDO/JPA naming strategies.
- The plugin for the Google AppEngine datastore has had a long-needed upgrade, and now works with DataNucleus v3.x. So users of that platform can get access to all of the work that has happened since 2009, finally!
DB4O dropped!
Whilst it generally is policy to add capabilities with every release, it occasionally makes sense to remove functionality that is not considered worthwhile. Support for persisting to db4o datastores now falls under this category. As versions of db4o have been released, public APIs have changed making it hard to follow their development. Additionally Versant, the parent company of db4o, have recently released their primary object datastore with a JPA API (to add to its existing JDO API). Consequently it is felt that as Versant have done absolutely nothing to assist in the process of us providing a standards based API for their software, as they are commercial and perfectly capable of committing resource to their projects, our support is now withdrawn. It remains in DataNucleus SVN for anyone who needs such like, but no resource from this project will be directed at their (commercial) datastore.
And that's it. Maintenance of version 3.0 is now at an end (except commercial), and maintenance of version 3.1 will start once we release 3.1 as well as, at some point, the start of development for version 3.2.
GAE/J and DataNucleus v3 - Part 2
In the previous post we saw some initial changes to make GAE/J DataNucleus plugin work with the latest version of DataNucleus plugins. In this post we describe some further features of interest to GAE users that they weren't able to use before.
Storage Version
With v2 of the plugin it will, by default, persist using a new "storage version". In v1 of the plugin it persisted no explicit information about relations, and instead relied on doing queries for parent key to find related objects; obviously when all relations were owned then this was valid. In v2 of the plugin it persists a property in the Entity for each relation (containing the Key(s) of the related object(s)), at the owner side always. In the case of unowned relations (see below) it also will persist a property in the Entity at the non-owner side of a bidirectional relation. Obviously all existing data uses v1 of the storage version, but don't let that concern you since the plugin will check for presence of this property, and if not present then fall back to v1 behaviour to get the related objects. As entities are updated the data will be migrated to v2 storage version (a migration tool to do the job in one pass is in the works also).
So when we persist an object of type A with related B it will do the following
Be aware that if you persist unowned relations in a transaction then you will need to have multi-entity-group transactions enabled, since each object is in its own entity group.
Interface Fields
With v2 of the plugin you can now have fields of interface type (representing a persistable type) and persist them as you would normally do. Refer to the DataNucleus docs for how to do it (paying particular attention to the type of the field in metadata)
That's a brief summary of some of the more noteworthy improvements, and hopefully now GAE/J using JDO (or JPA) is a much more pleasant place to be, and you can refer almost directly to the DataNucleus docs for many more features now. In addition to the above changes, and in fixing various other minor bugs, the code structure has been changed quite a bit so future enhancements ought to be much more rapidly achievable
Storage Version
With v2 of the plugin it will, by default, persist using a new "storage version". In v1 of the plugin it persisted no explicit information about relations, and instead relied on doing queries for parent key to find related objects; obviously when all relations were owned then this was valid. In v2 of the plugin it persists a property in the Entity for each relation (containing the Key(s) of the related object(s)), at the owner side always. In the case of unowned relations (see below) it also will persist a property in the Entity at the non-owner side of a bidirectional relation. Obviously all existing data uses v1 of the storage version, but don't let that concern you since the plugin will check for presence of this property, and if not present then fall back to v1 behaviour to get the related objects. As entities are updated the data will be migrated to v2 storage version (a migration tool to do the job in one pass is in the works also).
Unowned Relations
By default in GAE/J all relations are owned meaning that any child objects have the parent object Key as part of their Key, and persisted as part of the same entity-group. This is obviously useful in optimising retrieval of data, but there are times when you simply want your model persisting and not have imposition of ownership. In v2 of the plugin you can have unowned relations, where each object is in its own entity-group. To define a relation like this, see the following example@PersistenceCapable
public class A
{
@Persistent(primaryKey="true", valueStrategy=IdGeneratorStrategy.IDENTITY)
long id;
@Unowned
B b;
...
}
@PersistenceCapable
public class B
{
@Persistent(primaryKey="true", valueStrategy=IdGeneratorStrategy.IDENTITY)
long id;
@Unowned
@Persistent(mappedBy="b")
A a;
String name;
...
}So when we persist an object of type A with related B it will do the following
- PUT the A, generating its Key, but without property for B
- PUT the B, generating its Key, and with a property referring to the key of A
- PUT the A with the property referring to the key of B.
Be aware that if you persist unowned relations in a transaction then you will need to have multi-entity-group transactions enabled, since each object is in its own entity group.
Datastore Identity
With JDO the user has the choice of having their own primary key field (application-identity), or having the identity of the object defined for them (datastore-identity). GAE v1 only allowed application-identity. In v2 of the GAE DN plugin it also allows datastore-identity. To give an example@PersistenceCapable
@DatastoreIdentity(strategy=IdGeneratorStrategy.IDENTITY)
public class MyClass
{
...
}So with this class it will persist an Entity and its Key will use IDENTITY strategy.
Interface Fields
With v2 of the plugin you can now have fields of interface type (representing a persistable type) and persist them as you would normally do. Refer to the DataNucleus docs for how to do it (paying particular attention to the type of the field in metadata)
That's a brief summary of some of the more noteworthy improvements, and hopefully now GAE/J using JDO (or JPA) is a much more pleasant place to be, and you can refer almost directly to the DataNucleus docs for many more features now. In addition to the above changes, and in fixing various other minor bugs, the code structure has been changed quite a bit so future enhancements ought to be much more rapidly achievable
GAE/J and DataNucleus v3 - Part 1
Some time ago I wrote a post about GAE/J and how it provides JDO/JPA. It had many limitations and shortcomings. Recently we have had the chance to update their DataNucleus plugin to work with version 3.0. Here are the major changes that users of that plugin will see if they build and use GAE/J DataNucleus plugin from SVN trunk.
JDOQL/JPQL : Support for methods/operators
If a user sets the query extension/hint "datanucleus.query.evaluateInMemory" then the query will be evaluated in-memory. This has an obvious drawback in terms of memory utilisation (if the number of results is large), but the big plus is that it will evaluate almost all JDOQL/JPQL syntax.
JDOQL : Support for input candidate collection
You can now specify the instances that you want to query over using query.setCandidateCollection(...). Means that you have a list of instances and can query which of them match a particular filter criteria.
Primary Key Types
Previously you could only have Long, String or Key. You can now also have long.
Plugin package naming
Now uses com.google.appengine.datanucleus as its package root, hence not using the DataNucleus-owned domain.
JDOQL/JPQL setResultClass
This is now supported for the standard types of result classes, so you no longer need to convert manually the result into your required type.
Value Generation
GAE/J users can now make use of other DataNucleus value generators, such as "uuid", "uuid-hex"
PersistenceManagerFactory
The PersistenceManagerFactory used is now the standard DataNucleus PMF, not any custom GAE variant. To be specific the PersistenceManagerFactoryClass is now org.datanucleus.api.jdo.JDOPersistenceManagerFactory. If you want to have a singleton PMF, simply set the persistence property datanucleus.singletonPMFForName to true. This will then return any existing PMF if present for the requested persistence-unit, or create it if not present.
EntityManagerFactory
The EntityManagerFactory used is now the standard DataNucleus EMF, not any custom GAE variant. To be specific the PersistenceProvider is org.datanucleus.api.jpa.PersistenceProviderImpl. If you want to have a singleton EMF, simply set the persistence property datanucleus.singletonEMFForName to true. This will then return any existing EMF if present for the requested persistence-unit, or create it if not present.
JPA2
By using DataNucleus v3 you now have available all of the changes made in JPA2, so things like Criteria queries, metamodel, etc.
JDO3
By using DataNucleus v3 you now have available all of the changes made in JDO3.0/JDO3.1. This means query timeouts, metadata API, enhancer API, as well as the DataNucleus proposal for Typesafe JDO queries.
Level2 Caching
Level2 Caching is enabled by default, using an internal map-based cache. You can improve this further by setting the persistence property datanucleus.cache.level2.type to "javax.cache" and include datanucleus-cache.jar in your CLASSPATH. This will then cache using GAE Memcache
Non-transactional Persistence
DataNucleus non-transactional behaviour is different now, with any call to pm.makePersistent, pm.deletePersistent, em.persist, em.merge, em.remove being atomic, sent to the datastore immediately. Any updates to fields via setters are still queued.
JPA RetainValues
JPA usage, by default, has datanucleus.RetainValues set to true now. This means that when you commit a transaction the object will retain the values of its fields (previously it migrated to hollow state).
Persistence of other java types
In GAE/J v1 you can only persist fields of the following types : primitive, primitive wrapper, String, Date, Enum, BigDecimal, some com.google.appengine types, as well as Collection types. With v2 you can now persist fields of types Currency, Locale, Timezone, BigInteger, Color, Point, StringBuffer, Jodatime, javax.time, and many more.
Be aware though ... there is more to come
JDOQL/JPQL setResultClass
This is now supported for the standard types of result classes, so you no longer need to convert manually the result into your required type.
Value Generation
GAE/J users can now make use of other DataNucleus value generators, such as "uuid", "uuid-hex"
PersistenceManagerFactory
The PersistenceManagerFactory used is now the standard DataNucleus PMF, not any custom GAE variant. To be specific the PersistenceManagerFactoryClass is now org.datanucleus.api.jdo.JDOPersistenceManagerFactory. If you want to have a singleton PMF, simply set the persistence property datanucleus.singletonPMFForName to true. This will then return any existing PMF if present for the requested persistence-unit, or create it if not present.
EntityManagerFactory
The EntityManagerFactory used is now the standard DataNucleus EMF, not any custom GAE variant. To be specific the PersistenceProvider is org.datanucleus.api.jpa.PersistenceProviderImpl. If you want to have a singleton EMF, simply set the persistence property datanucleus.singletonEMFForName to true. This will then return any existing EMF if present for the requested persistence-unit, or create it if not present.
JPA2
By using DataNucleus v3 you now have available all of the changes made in JPA2, so things like Criteria queries, metamodel, etc.
JDO3
By using DataNucleus v3 you now have available all of the changes made in JDO3.0/JDO3.1. This means query timeouts, metadata API, enhancer API, as well as the DataNucleus proposal for Typesafe JDO queries.
Level2 Caching
Level2 Caching is enabled by default, using an internal map-based cache. You can improve this further by setting the persistence property datanucleus.cache.level2.type to "javax.cache" and include datanucleus-cache.jar in your CLASSPATH. This will then cache using GAE Memcache
Non-transactional Persistence
DataNucleus non-transactional behaviour is different now, with any call to pm.makePersistent, pm.deletePersistent, em.persist, em.merge, em.remove being atomic, sent to the datastore immediately. Any updates to fields via setters are still queued.
JPA RetainValues
JPA usage, by default, has datanucleus.RetainValues set to true now. This means that when you commit a transaction the object will retain the values of its fields (previously it migrated to hollow state).
Persistence of other java types
In GAE/J v1 you can only persist fields of the following types : primitive, primitive wrapper, String, Date, Enum, BigDecimal, some com.google.appengine types, as well as Collection types. With v2 you can now persist fields of types Currency, Locale, Timezone, BigInteger, Color, Point, StringBuffer, Jodatime, javax.time, and many more.
Be aware though ... there is more to come
Subscribe to:
Posts (Atom)
