Search EggHeadCafe's Job Board
EggHeadCafe Silverlight WPF ASP.NET VB.NET C# Excel SQL Server SharePoint
search
MicrosoftArticlesForumsFAQs
C# .NET
VB.NET
Visual Studio .NET
ADO.NET
Xml / Xslt
VB 6.0
.NET CF
GDI+
LINQ
Deployment
Security
FoxPro
Silverlight / WPF
Entity Framework
RIA Services

WebArticlesForumsFAQs
JavaScript
ASP
ASP.NET
WCF

DatabasesArticlesForumsFAQs
SQL Server
Access
Oracle
MySQL
Other Databases

OfficeArticlesForumsFAQs
Excel
Word
Powerpoint
Outlook
Publisher
Money

Non-MicrosoftArticlesForumsFAQs
NHibernate
Perl
PHP
Ruby
Java
Linux / Unix
Apple
Open Source

Operating SysArticlesForumsFAQs
Windows 7
Windows Server
Windows Vista
Windows XP
Windows Update
MAC
Linux / UNIX

Server PlatformsArticlesForumsFAQs
BizTalk
Site Server
Exhange Server
IIS

Graphic DesignArticlesForumsFAQs
Macromedia Flash
Adobe PhotoShop
Expression Blend
Expression Design
Expression Web

OtherArticlesForumsFAQs
Lounge
Subversion / CVS
Ask Dr. Dotnetsky
Active Directory
Networking
Uninstall Virus
Job Openings
Product Reviews
Search Engines
Resumes

 

Object-Relational Mapping as a Persistence Strategy


By Douglas Minnaar
Printer Friendly Version
View My Articles
109 Views
    

Currently, Object-Relational mapping is probably the hottest topic of discussion on the internet in terms of the various software development circles. With all the hype surrounding ORM, it is easy to forget the ‘why’. Why does ORM exist? It would seem that ORM has become somewhat of a religious war. One can only wonder if the ORM war will have people in the future asking ‘why’. The focus of Object-Relational mapping is to solve the object-relational impedance mismatch.


Object-Relational Mapping as a Persistence Strategy

 

Introduction

 

Currently, Object-Relational mapping is probably the hottest topic of discussion on the internet in terms of the various software development circles. With all the hype surrounding ORM, it is easy to forget the ‘why’. Why does ORM exist? It would seem that ORM has become somewhat of a religious war. One can only wonder if the ORM war will have people in the future asking ‘why’. The focus of Object-Relational mapping is to solve the object-relational impedance mismatch. I.e. the object model does not equal the relational model and this is considered by many to be a problem. There are those that have subscribed to the viability of ORM as a persistence strategy, and there are those that have not. Some have questioned whether the object-relational impedance mismatch is a problem at all. Others argue that the object-relational impedance mismatch issue is being solved on the wrong side of the equation. I.e. there are those that advocate the use of Object-Oriented databases as a solution and not ORM. The fact that ORM has created so much attention and heated debate implies that there is something of importance or at least perceived importance. I leave it up to you to decide. That something deserves a closer inspection to separate the hype from the fact. The aim of this article is to encompass important ‘views’ for and against ORM. By so doing, it is intended to establish a different, or at least more informed, perception of ORM regardless of what the current perception may be.

 

Intent

 

To be fair, the topic of Object-Relational Mapping requires a book to address all the matters relating to it in its entirety. I provide just enough information in this article to allow one to consider ORM from a pragmatic perspective.

 

Object-Relational Mapping as a persistence strategy is both a statement and a question. In the first instance, I am implying that ORM is a persistence strategy. In other words, ORM is a viable technology in terms of satisfying ones persistence requirements. In the second instance, I am questioning ORM as a persistence strategy. I.e. Is ORM a viable solution in terms of satisfying ones persistence requirements. The focus of this article is not on the best ORM tools that are available. ‘Best’ is at best transient and what is best today may not be what’s best tomorrow. Instead, the focus is on ORM at a conceptual level. The reason that I have chosen to focus my attention at a conceptual level is due to the nature of change. ORM tools will appear, change, disappear and reappear. Technology will evolve and so will ORM tools. I consider the concepts behind ORM to be of far more significance and value. This allows one to view ORM in a pragmatic and hopefully unbiased view. Therefore, the intent of this article is to view ORM at a conceptual level, in an unbiased manner, and assume a technology agnostic approach.

 

For the purpose of clarity, the intent of this article is further summarized to be as follows:

 

  • Description of ORM
    • What is ORM?
    • Why does it exist?
    • In what form does it exist?
    • What are the problems associated with ORM?

  • Viability of ORM as a persistence strategy
    • Discussion on the requirements of a persistence strategy. Determining how well if at all ORM satisfies persistence requirements.
    • Reusability
    • Maturity
    • Time and Cost
    • Architecture

 

  • Choosing an ORM
    • Commercial vs. Open Source vs. In-house

 

  • Succeeding with ORM

 

  • Failing with ORM

 

Object-Relational Mapping

 

Often, a good way of understanding what something does, is to understand the reason for its existence. I.e. why does it exist? In terms of enterprise software development, most if not all software systems deal with getting data in and out of a data store. In this context the form of data store is unimportant. What is important is how one deals with the data store persistence. Data persistence implies the concept of persisting and retrieving data. It is a common and recurring development challenge. For those that are familiar with design patterns, the aforementioned statement should sound oddly familiar. Design patterns, at their simplest definition, are tried and tested solutions to common and recurring challenges. Therefore, is there or are there design patterns to deal with data persistence. Yes, there are many patterns that one might employ to deal with the data persistence challenge. Is ORM such a design pattern? ORM may use many design patterns to achieve its end. Whether it in and of itself is a pattern is probably a discussion best left to another article or forum. ORM is a technique that one might choose as a data persistence strategy. ORM is ‘A’ solution, not ‘THE’ solution. It does help address a common development challenge, that being of data persistence. The popular term to describe the data persistence issue is that of the object-relational impedance mismatch. The question that arises is whether the impedance mismatch is really an issue. The Java development community seems to have adopted ORM well before any of the other development communities. I used the word ‘seems’ as I have not encountered all development communities and therefore cannot say with certainty that this is the case. The Microsoft development community has chosen to ignore the object-relational mismatch until recently. I say recently because there only seems to be a boom in terms of the availability of ORM tools in recent times. I am not, however, implying that Microsoft has chosen to ignore the mismatch issue. Before continuing the discussion of ORM, the following points are highlighted in terms of understanding the definition of ORM.

 

  • It provides a way to resolve the object-relational impedance mismatch. This object-relational impedance mismatch is considered to be the core problem.
  • It is a technique for converting data between a relational database and an object-oriented programming language
  • It is an abstraction of data persistence code
  • The result of an ORM implementation is often likened to that of a ‘virtual object database’.
  • ORM provides us with database independence 

To further elaborate on what ORM is, I discuss ORM in terms of the following challenges:

 

  • Core Challenge
  • Conceptual Challenge
  • Legacy Challenge
  • Usability Challenge

 

Core Challenge

 

 


The object-relational mismatch issue simply put is that the relational model does not equal the object-oriented model.

 

If the object-relational impedance mismatch is really an issue then what makes it an issue. The following reasons offer a clue.

 

  • How does one map columns, rows, and tables to objects?
  • How does one deal with relationships?
    • How does one map object inheritance to relational tables?
    • How does one deal with composition and aggregation?
  • How does one deal with conflicting type systems between databases?
  • How does one deal with the different design goals
    • The relational model is designed for data storage and retrieval. Its focus is in terms of how to best manage data.
    • The OO model is all about how to best model behavior.
  • How does one make objects persistent?

 

It is important to consider the context to which ones application has been designed in terms of working with data. Context refers to the facts or circumstances that encompass an idea and help determine the interpretation or relevance of that idea. I therefore provide context in terms of three approaches or views.

 

  • Table Approach
  • Entity Approach
  • Domain Approach 

The aforementioned approaches refer to the way in which one views ones data and is explained as follows:

 

  • The Table approach assumes a data centric viewpoint. When one thinks about a list of customers, one would typically view a customer as a record that exists in the customer table. This would imply that ones chosen development technology would need to provide a means of interacting with data in a ‘table manner’. Microsoft ADO.NET for example provides an API of data components in the form of DataSets, DataTables, and DataRows. This does not imply that one needs to be doing Microsoft development in order to use this approach. It is also found that stored procedures are often used in this approach.

 

  • The Entity approach implies that the relational model is represented by an abstract model in which records are viewed as entities. Therefore, one does not work with tables and rows but rather entities. This way of thinking is more suited towards an ORM way of thinking.

 

  • The Domain approach is an object representation of the domain in terms of data and behavior. Often a domain model resembles the data model. The difference is that the domain model combines data with behavior. Also, the domain model can have relationships that are not typically available in a relational model. Inheritance for example is a relationship that is unique to an object or domain model. A rich domain model can be incredibly difficult and complex to map to a database. I.e. it can be difficult and complex to perform persistence from a domain model to a relational model. 

In summary, given the core problem and different data approaches, one can theorize that the core problem is mostly applicable to an Entity or Domain approach. As a result the same holds true for ORM. This is an important facet to contemplate when considering the viability of ORM as a persistence strategy. Furthermore, to understand the core problem, one needs to understand in which context it exists, and that it may not be relevant in all contexts.

 

Conceptual Challenges

 

Now that we have an understanding of what ORM is and why it exists, we can explore the conceptual problems that an ORM implementation is required to address. This section is intended to provide some insight into the difficult problems that an ORM implementation must address. I have attempted to present complex issues in as simple a manner as possible. After reading this section however, one will realize that addressing the noted conceptual problems is no trivial task.

 

Efficient data retrieval

 

Currently, ORM implementations are not well suited to write-centric applications. Instead they are more suited to read-modify-write applications. Therefore, data retrieval must be efficient. A danger of ORM implementations is to retrieve the entire object graph in an object invocation. For example

 

 

 

 

What were to happen if I instantiate Company?

 

  1. Company would have a list of Departments
  2. For each Department, there will be a list of Employees
  3. For each Employee, there will be a list of Skills and a list of Addresses
  4. Company would also have a list of ProductLine items.
  5. For each ProductLine, there is a Product.
  6. For each Product there is a Category

With a single Company instance, we have managed to pull through the entire object graph along with data. This is extremely inefficient as one may only need an instance of Company and nothing else. Instead, data has been pulled through needlessly. Luckily there are ways in which to address this data retrieval problem. Two possible ways are listed as follows:

 

  • Lazy-Loading

 

A lazy load implies that a load will only occur if explicitly instructed to do so. This helps interrupt the loading process thereby allowing data to be available only when it is required. If implemented correctly and appropriately, lazy loading can significantly improve performance. In the example above, a list of Departments will only be populated (loaded) when a request to that reference on a Company instance is made. The same principle applies to the other objects that are part of the object graph.

 

  • Eager-Joining

 

Where Lazy-Loading loads data on demand, Eager-Joining does a pre-fetch of data.

 

Good ORM implementations allow for the configuration of how data should be loaded. Whether one should use Lazy-Loading or Eager-Joining depends on the application and the application environment.

 

Minimize the number of updates

 

As mentioned before, ORM implementations are not well suited to write-centric operations. Therefore, one would want to minimize the number of updates made to the database. A possible solution to this problem is to work with ones data in a disconnected manner; offline so to speak. There are consequences to working with data offline. Firstly, one would need to minimize the number of database calls. Secondly, one would require a way of dealing with data concurrency.

 

In order to minimize the number of database calls, one would need to maintain object status offline so that any data related operation can be delayed until a transaction is complete. Ideally, one would require a mechanism that could abstract the task of object tracking from the developer. Such a mechanism is available in the form of a design pattern namely the ‘Unit of Work’. Briefly, the ‘Unit of Work’ design pattern keeps track of objects that have changed. In so doing, one knows what objects (in-memory object data) must be synchronized with the database. The ‘Unit of Work’ also helps minimize the number of database calls. This ultimately has the effect of improving the efficiency of data related operations. Furthermore, it has the ability to help one solve more complicated issues such as business transactions that span multiple system transactions. It does this by making use of the ‘Optimistic Offline Lock’ and ‘Pessimistic Offline Lock’ design patterns. For further information, please refer to ‘Patterns of Enterprise Architecture’ by Martin Fowler.

 

Data concurrency simply refers to the idea of multiple sources accessing data concurrently (at the same time). Data concurrency control refers to the way in which one would deal with data being accessed concurrently. Different methods of concurrency control have different side effects. The two primary methods of implementing data concurrency are as follows:

 

  • Pessimistic concurrency control

When a data table record is accessed; that record is effectively locked. Therefore, no other source can gain access to the locked record until it is released. Pessimistic concurrency control is suited to environments where data contention is high.

 

  • Optimistic concurrency control 

With optimistic concurrency control, data is not locked. When an update occurs, a check is done to verify whether the data had changed since it was last accessed. If an update has occurred an error is raised. Typically, the transaction rolls back and needs to be restarted. Optimistic concurrency control is suited to environments where data contention is low.

 

The following effects typically result in the absence of concurrency control:

 

  • Lost Updates 

When two or more transactions access the same record, and then perform an update, loss of data will occur. The reason for this is because transactions are unaware of other transactions. The last update wins meaning that whatever changes were made by previous transactions will be lost.

 

  • Dirty Read 

When a second transaction accesses a row that is in a state of being updated by another transaction; a dirty read occurs.  The reason for this is because the second transaction is reading uncommitted data. The uncommitted data may change by a transaction committing an update.

 

  • Non-repeatable Read 

Non-repeatable read is similar to dirty read except that the data read has been committed by the second transaction. The same row can be read by multiple sources; however, each read returns a different result. Hence the term non-repeatable read.

 

  • Phantom Read 

When a range of rows is being read by a transaction, a row belonging to that range may be deleted. Alternatively, a row may be added. The result of an insert or delete behaviour in the aforementioned instance would result in a phantom read. Therefore, a transaction result may contain a row that no longer exists because it has been removed in another transaction. Similarly, a transaction result may not contain a record that had been inserted by another transaction. In either event, for an insert or delete action by another transaction, an inaccurate depiction of the data results.

 

The conceptual problem of minimizing the number of updates may at first seem a simple one. However, addressing the issue of object status tracking and concurrency control is no trivial task. If there are database developers or administrators reading this article; they must be wondering why software developers are trying to do this in code when concurrency control is readily available as a standard feature of most mainstream relational database management systems. Not only is it available, but probably a far superior implementation. Surely one should be leveraging off this fact. I don’t have all the answers; however, the following reasons may offer some illumination.

 

  • Minimise number of updates thereby improving overall efficiency as explained above.
  • A RDMS may only offer one particular type of concurrency control. An alternative is required and therefore implemented in code.
  • This point may be the beginnings of another controversial topic, but in my experience, I have found software developers (myself included) to be control freaks. Therefore, developers may want more control over the persistence mechanism for whatever reason that might be.
  • Then there are other issues like the ‘Not invented by me’ syndrome or the ‘I can do it better’ syndrome.

Expressing queries in OO

 

Queries need to be structured in such a way so that one does not need to think in terms of a relational model. (Results in tight coupling between object model and information schema). ORM tools may choose to implement their own flavour of an Object Query Language. The question that is raised is “What’s wrong with thinking in terms of the relational model?” Provided that one does not try to design a domain model based on a database, I don’t think anything is wrong with thinking in terms of the relational model. In fact, I think that one should be able to think in terms of the relational model. I also believe that one should be able to think in terms of the OO model. I am obviously referring to a developer context. What is important is to not allow ones OO thinking to infect ones relational thinking, and vice versa. I.e. it would be foolish to try and design a relational database in an object oriented way (I am not implying that there is anything wrong with OO databases. I am merely implying that relational databases being forced to behave in an OO manner is a poor idea). Alternatively, it would be foolish to try and design an OO model in a strictly relational way. What’s relational should be relational and what’s object-oriented should be object oriented. Why do I say this? The answer to this is almost an entire article on its own. Therefore, for brevities sake, I will say this. Years have been spent on improving the relational model in terms of how one would effectively create, retrieve, update and delete data. This is what relational databases do best. Work with data in a relational way. Inheritance cannot be implemented in a relational model and even if one could there would be implications as to the overall effectiveness of such an implementation. Similarly, it would be unwise to design an OO model in a relational way as this would neglect one of Object-Orientation’s foundational pillars, that being inheritance (which gives us reuse especially if one has designed with behaviour reuse in mind)

 

Mapping Inheritance

 

In this section, I am going to discuss how one might deal with the issue of mapping inheritance from an OO model to a relational model. For this discussion, I am going to assume a Domain Approach. Although I discuss how to map objects to the relational model, the approach generally applies to mapping tables to objects as well. Therefore, once again it is important to understand the chosen approach one is implementing. Some ORM tools are better suited to a Domain Model approach. Others are better suited to a Data Driven approach.

 

Because inheritance is not natively supported by relational database management systems, much thought is required as to how one would represent an inheritance hierarchy in a database. There are several ways available that offer a means of performing an inheritance to table mapping. I briefly discuss three possible ways namely

 

  • Map hierarchy to a single table
  • Map concrete class to a table
  • Map every class to a table

 

Map hierarchy to a single table

 

 

In this scenario, the attributes from all the classes participating in a hierarchy are extracted into a single table. The table name is usually that of the parent class. An additional attribute is added to indicate the type of Department. The Department type would typically be stored in a lookup table (DepartmentType).

 

  • Advantages

This approach is simple and it allows one to easily add new classes. The DepartmentType would need to be updated and any new attributes can be added to the table as columns.

 

Because there is only one table (two if you create a lookup), there will be no need for table joins when querying data for whatever means. Fewer table joins typically indicate more efficient data retrieval.

 

  • Disadvantages 

Not all attributes are relevant to all Department types. This may result in many null or empty attribute values in the database table. The effect of this would be wasted database space.

 

Coupling is increased between class hierarchy and table. This is undesirable as a change in one class may affect the table which may affect other classes within the hierarchy.

 

The resultant table may not be in line with best practices in terms of database normalization.

 

Map Concrete Class to a Table

 

 

In this scenario, a table is created for each concrete class. The attributes of the base class are included for each table.

 

  • Advantages 

Good performance in terms of accessing a single objects data.

 

  • Disadvantages            

When a class changes, one will be not only be required to change its corresponding table but also the corresponding tables of its child classes.

 

Can result in repeating data

 

Map each Class to a Table

 

 

Every class has an associative table in the database. There is a direct mapping between classes and tables.

 

  • Advantages

Easy to add or modify subclasses

Easy to modify base class

The one-to-one mapping makes this approach easy to understand

 

  • Disadvantages

Data access in terms of a reads and writes can be slow because there are more tables involved. More table joins may be required in order to perform a data related operation.

 

 

Mapping Relationships

 

There are three types of object relationships that may need to be mapped into the relational model.

 

  • Association

Association represents a structural relationship that describes a set of connections between objects. It is a conceptual connection between classes.

 

Both classes are conceptually at the same level (importance)

 

Each class performs a role in the association. However, associations may be unidirectional (one-way) (only one class plays a role) or bidirectional (both classes play a role in the relationship).

 

  • Aggregation

Aggregation is a special type of association that represents a relationship between a whole and its parts.

 

The whole is the larger class (aggregate class) and the parts are the component classes.

 

Aggregation represents a “has-a” relationship. I.e. An object of the aggregate class has objects of the component classes.

 

There must be at least one component class but there may be an unspecified amount of component classes per aggregate class (whole)

 

Therefore, when a class is made up of component classes, the relationship that exists between the class and component class is an aggregation. We can say that a class is an aggregation of its component classes. A component class is a class that is part of a larger class. Aggregation is a part-of or part-whole association meaning that a component class is part-of the aggregate class.

 

  • Composition

Composition is aggregation that involves a strong relationship between the aggregate object and its component classes. This means that if you remove the composite class, all the component classes will also be removed. This is not the case for an aggregate class. The important thing to remember is that for composition, a component exists as a component only within the composite object.

 

When one thinks about the aforementioned relationships, one must go a step further. The aforementioned object relationships can be thought of in terms of two classifications namely multiplicity and directionality.

 

  • Multiplicity

    • One-to-one. For example, an Employee may only have one Identity. An Identity may only belong to one Employee.
    • One-to-many. For example, an Order may have many OrderItems. An OrderItem may only belong to one Order.
    • Many-to-many. For example, an Employee may have one or many Skills. A Skill may belong to many Employees

  • Directionality
    • Uni-Directional.

A unidirectional association implies that only one class performs a role in the association. For example, only an Employee needs to know of its Identity. An identity does not need to know of the Employee it is assigned to.

 

    • Bi-Directional

A bidirectional association implies that both classes perform a role in the relationship. For example, an Employee knows what Department it works for. A Department also knows what Employees work for it.

 

An association regardless of whether it is an aggregation or composition is mapped to a relational relationship in terms of its multiplicity and directionality. Only when one implements referential integrity does the true nature of the association matter. Composition, for example, describes relationships between aggregates. It describes what will happen if we add or remove a component. I.e. if you delete a component, will it lead to the deletion of another component? This is known as cascading. For every action there is a reaction. Composition describes this reaction.

 

Legacy Challenges

 

Software developers and architects often presume to have control over the database schema. This is a mistake as there are many database administrators that implement strict operational rules in terms of what actions are allowed and disallowed on the database.

 

Some database administrators may insist that all database interaction occurs via stored procedures. One of the advantages of an ORM is that it generates dynamic SQL to perform most or all data related actions on the database. Therefore, if one is required to use stored procedures to access the database, it raises a question as to the viability of an ORM in the aforementioned instance.

 

One is often required to work with existing database schemas therefore an ORM implementation must be able to work with (or map to) existing database schemas in an effective manner.

 

Usability Challenges

 

I list the following problems for completeness sake; however, the following problems apply to most if not all software tools.

 

Ease of use – The user experience is paramount to achieving long term success and use.

 

Know ones users – There is a constant trade-off between ones tool being intrusive or non-intrusive. Intrusive may automate many things upfront but may prove to be a pain when it comes to maintenance. Non-intrusive may provide more control over functionality but may mean more work for the developer.

 

Addressing different levels of usability – Typically one tool should consider how a user would interact with a tool at a novice, intermediate and advanced level. A novice may want many things to be automated just to get the job done. As more complicated needs arise, a user may need more advanced ways of completing a task. Essentially, at an advanced level one would expect more control.

 

API (Application Programming Interface) - An API should be simple to use from a developers perspective. A developer may be required to perform more specialised tasks. Therefore, an API should be extensible to allow a developer to add specialised behaviour to a system.

 

Lessons from history

 

Complexity and Usability

 

It would seem that the most abused term in software development at the moment is “Do the simplest thing works”.  The term is often quoted out of context therefore the true meaning of the term is misunderstood. Some interpret this term as “Let’s not think about what we do. Let’s hack away as this is the simplest thing that works”. I agree that hacking away is simple to do but this is not what was intended by the original meaning of the term. Entropy is often confused with complexity. Entropy obviously refers to one of the laws of thermodynamics but it is appropriate in terms of software too. It essentially refers to the uncertainty of an outcome. Entropy tends towards a maximum and so the level of uncertainty of a system does too. I have found that entropy is often a result of “Doing the simplest thing that works”. Most software systems or tools will grow in complexity. Complexity can and should be managed. Alternatively, software systems can grow in terms of entropy. Entropy should not be confused with complexity.

 

“A complex system that works is invariably found to have evolved from a simple system that worked…A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.” — John Gall in Systemantics: How Systems Really Work and How They Fail

 

I think the aforementioned statement addresses the point that I am trying to infer. Simplicity is good but plan for complexity. An ORM tool that is very simple to use initially may not be so simple to use when more complex scenarios arise. Tools can be used to hide complexity. Tools should not be used to hide entropy.

 

Also the true meaning of simplicity must be understood. Writing less code does not mean it’s simpler. It simply means doing less work. Once again, doing less work initially (perceived initial simplicity) may result in more work (possibly more complicated too) at a later stage.

 

Application life span versus data life span

 

  • It is often found that data outlives the application that uses it.
  • A common misconception is that the application is more important than the data. However, applications may come and go but the data typically remains.
  • Once cannot assume that the application controls the database schema

Allow relational databases to do what they do best. I have personally seen a system that tried to force a relational database to behave in an object-oriented manner. Needless to say, the performance was poor. Furthermore, databases are really good at performing set based queries. Therefore, one may be required to operate outside the boundaries of an ORM to perform specific queries such as reporting or batch updates. It is important for an ORM to allow such an exit mechanism.

 

In-house frameworks

 

Many of the following points are relevant to most in-house frameworks; however, I discuss them within the context of an ORM.

 

·         Complex to write and maintain

 

·         Excessive time spent on data persistence instead of application logic

 

If the scope of ones development task is to develop enterprise applications, why is one developing custom ORM frameworks? There are many open source communities and companies that have created ORM frameworks. Furthermore ORM frameworks are their area of expertise. It would be wise to utilise the existing frameworks.

 

·         Intellectual property

 

Documentation is a way of maintaining intellectual property. In my experience, documentation is probably the bane of every developer’s existence. Therefore, the following questions arise:

 

Where will the intellectual property of an in-house framework reside?

·         Will it be documentation?

o        Who will write and maintain documentation?

o        How recent will the documentation be at any given time?

o        How accurate will the documentation be?

o        What will the quality of documentation be?

·         Will it be in the minds of the developers who created it?

·         Will it only be in the code once the developers who created it leave?

 

·         Quality

 

Quality can be understood in terms of the following points:

 

  • The International Organization for Standardization (ISO) defines quality as “the totality of characteristics of an entity that bear on its ability to satisfy stated or implied needs.”
  • Conformance to requirements
  • Fitness for use

Therefore, what kind of measures will be put into place to ensure quality?

 

·         Typically, in-house frameworks are poorly implemented and documented. This is mostly due to the fact that developing in-house frameworks may not be the focus of the development task at hand. As a result, quality will be a questionable.

 

·         More code results in higher probability of software defects. This can obviously be mitigated and managed; however, the questions that arise are as follows:

o        How much additional work is involved?

o        Will the additional work and management that is required be followed through?

 

·         Testing

 

Testing is a way of providing software quality. It is also additional overhead as test cases must be written and maintained. Therefore, when one considers the work involved in writing an in-house framework, one must consider testing.

 

·         Support

 

Once the framework is in place, how will support be implemented?

 

  • If it’s an in-house framework, there won’t be an active community using it. The consequence of this is as follows:
    • No forum
    • Limited or no feedback
  • Will support be in the form of documentation?
  • Will support be in the form of formal training?
  • Will there be any additional reference material such as books?
  • New developers must learn framework. Therefore, a new developer cannot leverage off existing knowledge of open source or commercial ORM frameworks. If the documentation is poor, a new developer cannot find a book to aid in terms of that developers understanding.

Having written my own in-house ORM implementation (DEZO – Dougs Easy ORM), I have the opinion that developing an in-house framework should be avoided if there are already similar existing frameworks. This may not be true for all scenarios however. I mention this for the following reasons:

 

  • There may be good impetus from ‘higher powers’. Therefore, perhaps budget has been allocated to allow one to truly develop an in-house framework appropriately and properly. As the old mantra goes “Build the proper product and then build it properly.”
  • It may be a strategic decision to help an organization gain a competitive advantage in which case the aforementioned point would need to hold true.
  • Perhaps, there is really something specific that is required that no other framework can provide

Provided that the aforementioned points are in place, one would still require technical leadership that is aware of the dangers of in-house frameworks. The only thing worse than re-inventing the wheel, is repeating all the mistakes that go along with it.

 

Viability of ORM as a Persistence Strategy

 

What makes an ORM viable in terms of satisfying ones persistence requirements?

 

  • Architecturally

Persistence is a significant part of most enterprise application software architectures. Therefore, what elements would one require in terms of satisfying ones persistence requirements? Without specifically referring to an ORM implementation, the following non-functional requirements may be relevant to most data persistent strategies or architectures.

 

·         Flexibility

o        Extensibility

o        Maintainability

·         Usability

·         Portability

·         Testability

·         Performance

 

One must understand the effect of using an ORM within the context of ones overall architecture.

 

  • Time

Where or at what point can one save time?

 

Saving time and money is probably the most punted and overused sales pitch in terms of promoting software products. As with most things concerning software development, there is almost always a trade-off. It is important to understand that saving time up-front may result in more time spent in maintenance. Alternatively, time spent up-front may save time in maintenance.

 

  • Cost

ORM implementations are freely available (or no cost form). Others are commercially available at a price. Therefore, there is a trade-off between development costs and commercial costs. Cheaper is not necessarily cheaper in the long term.

 

What will the ROI (Return on Investment) be?

 

  • Support

Is it well understood and supported? I.e. how mature is the ORM implementation?

·         Is the ORM widely used?

·         Does a community exist that could aid in terms of ones better understanding of an ORM?

·         What kind of support exists?

o        Forum

o        Email

o        Documentation

o        Formal training

 

  • The ‘People’ issue

People aren’t an issue; however, people may have issues when it comes to using an ORM. This may become an issue. After much debate and discussion with colleagues as to the viability of an ORM, I have reached the conclusion that the consideration of the people that must use the ORM is of utmost importance. This applies to any data persistence strategy that one might choose to employ. It does not matter how good an ORM may be. If the people (developers) who must use it are using it because they must use it, the ORM implementation will fail. One may argue from a dogmatic perspective that it is not for the developer to decide what is appropriate and what is not appropriate in terms of persistence. The fact remains that if developers don’t like it then they aren’t going to use it as one should. Have you ever tried to herd a group of cats? Have you ever tried to teach a cat to roll-over, sit or play dead? It would be folly to even try. Developers are like cats in this regard. One can stand up on ones soap box and shout till blue in the face that ORM must be used. It won’t matter. If one does not learn to consider strategies in terms of the people that must use it, one will fail.

 

Therefore, to summarise, when deciding on the viability of ORM as a persistence strategy one must consider people (the developers that are intended to use the ORM) first. Only then can one decide on whether an ORM is architecturally suitable, time efficient, cost effective, and well supported. To substantiate my opinion, I have been involved in several technical or architectural evaluations. In fact, unfortunately, they were more post mortems than evaluations. By that I mean that the architecture was for all intent and purposes ‘dead’. The reason being, that a decision was made to use an ORM implementation without considering the skill and acceptance of the developers that would ultimately use it. The developers wanted to use vanilla data access platforms for whatever programming language was chosen. Why? They considered ‘plumbing’ code to be easier because firstly it was what they felt more comfortable with. Secondly, they had already built up a strong knowledge base in which to write the aforementioned code. They were not interested in using something that felt unnatural to them. Therefore, time was not invested in terms of acquiring the required skills for effectively implementing an ORM. This ultimately led to the demise of the entire application as one could no longer maintain or extend it and even if one could, the time-to-market was incredibly slow. I should also note that this was mostly applicable to Microsoft development environments. ORM implementations seem to have been better received by the Java community. For the bigots out there, this has little to do with development skill but mostly the different technologies available within the two environments and personal choice.

 

ORM is architecturally significant. Therefore, deciding on ORM as a data persistence strategy is an architectural decision. However, I have the opinion that it is the responsibility of the architect to involve the development team in terms of the ultimate decision. A developer’s focus is on implementation detail. An architects (technical lead) focus is at a more abstract level, enabling them to think of the bigger picture. I think that only by combining the two view points can one make a good decision.

 

 

After all that has been noted in this article, the question arises “Why ORM as a data persistence strategy and why not?”

 

Why ORM as a data persistence strategy?

 

  • Maturity 

ORM implementations have matured considerably and are continuing to do so. ORM implementations may reach a level of maturity that completely invalidates any arguments against it. One merely needs to consider existing technologies that are widely in use. The cellular phone, once considered impractical because of its size is now a cornerstone technology in terms of the mobile space. An argument once existed as to the practicality of using a cellular phone to browse the internet. Well, browsing the internet with ones mobile is now almost something that some people cannot live without. The point is, technology will change (improve) and so will the generations that use them.

 

Provided all the though that is required to implement an ORM successfully has occurred, the following points hold true:

 

  • Boost in productivity
    • Less data code to be written
    • Less code to test
    • Less code to maintain

Repetitive code implies a different problem to that of the impedance mismatch issue. A possible solution to ‘plumbing’ and repetitive code is to use code generation. There may be ORM tools available that are capable of performing code generation. This does not mean that ORM tools are code generators nor does it mean that we should use ORM tools because they give us code generation. There are tools available that specifically perform code generation. If code generation is your issue then perhaps some research into good code generation tools is more applicable. Something that ORM does do is build queries dynamically.

 

  • Improved ease of maintenance

 

  • Provides database independence.

 

  • If domain model thinking is required, an ORM can foster OO domain thinking

 

  • If ones application exhibits a typical read-modify-write lifecycle, then ORM may be an appropriate fit

 

  • An ORM may be appropriate if there is little or no need for stored procedures.

 

Furthermore, to succeed with ORM, one must consider the following:

 

  • Validate ORM choice by implementing a vertical slice. I.e. Implement the ORM as it would be required to behave within the context of all tiers of the architecture.
  • Agree on acceptable performance metrics; then test against the agreed upon metrics
  • Measure the effect of an ORM across a network and database
  • Ensure that there is adequate potential for growth and optimization
  • Know ones chosen ORM well. It is important to understand what the code is doing and not doing. It is important to understand what patterns have been employed in terms of better understanding how the ORM will behave.
  • Know the database. An ORM is not an excuse to plead ignorance in terms of ones understanding of database design and access.

 

Now that we have considered the advantages of using an ORM, we must now consider the disadvantages of ORM and its associated pitfalls.

 

Why not use ORM as a persistence strategy?

 

  • Data Access Policies

It may be policy that all data access may only occur through the use of stored procedures. ORM implementations are not well suited to such a scenario.

 

Some architectures model stored procedures as services. If ones view of a stored procedure is that of a service, then ORM will also be unsuitable.

 

It may be required to enforce security at the stored procedure level.

 

  • Some loss of control over data persistence

 

  • It may be more difficult to tune database queries. One would need to run a profiler on the database to see what queries are being executed and in what form they are being executed. The limited visibility of queries running on a database may also be an issue.

 

  • Typically the performance of a tool affects the performance of an application. If the tool misbehaves in terms of performance for example, one would require a mechanism to control persistence at a lower level. It may be an issue if ones tool disallows such a mechanism.

 

  • An ORM can be more difficult to maintain. It is paramount to match an appropriate ORM to an architecture scenario/s.

 

  • Some ORM implementations require the learning of a new object query language pertaining specifically to the ORM in question. This may not be ideal as one would need to learn a new query language each time a different ORM is used.

 

  • Batch processing

 

Databases are good at performing batch processing as they have the ability to perform set based queries. ORM implementations are not good at performing batch type of operations.

 

  • Reporting

 

It is not the intention of an ORM to perform reporting functionality. If reporting is required, explore the different reporting solutions that are available.

 

  • Stored Procedures

 

If there is a significant use of stored procedures, it would be better to explore different persistence strategies to ORM.

 

  • The following points address the question of how one would typically fail using an ORM:

 

·         Insist on doing everything with ORM

·         Refuse to use stored procedures or triggers

·         Using ORM as a scapegoat to plead ignorance in terms of ones understanding of SQL and database design/implementation

·         Hacking ORM into an existing data persistence strategy.

o        Less of a problem if ones architecture allows for the use of multiple persistence strategies, and ones chosen ORM exhibits an architecture that is compatible with ones overall architecture

 

  • Architecture by Product

It is very wrong to design ones architecture around a product. It is not the product that determines ones architecture. One should design ones architecture based on satisfying functional and non-functional requirements. After the architecture is in place, choose a product that best fits the architecture.

 

  • Developers

 

    • Consider the skill of developers on the project
    • Consider the personalities of the developers on the project
    • Consider what you know the developers on the project will feel most comfortable with
    • Consider whether ORM would be conducive to developers that must maintain the system 

As mentioned before, if one does not have buy-in from the development team as a whole, it may be found that chances of failure are high.

 

Choosing an ORM

 

So, a decision has been made that an ORM will indeed be used as a persistence strategy. One should consider everything that has been mentioned in this article when deciding on the ORM tool. Additionally, consider the following points:

 

  • Commercial or Open Source

If Open Source

    • Is it active?
    • Maturity (How long has it been in existence?)
    • To what degree is it being used
    • What kind of support exists if at all? 

If Commercial

    • How large is the vendor?
    • How long has the vendor been in existence?
    • Is ORM the vendor’s core business?
    • What kind of support exists if at all?
    • Does one have access to the source code or is it merely the binaries?
    • Can one download an evaluation copy (What are the limitations of the demo?)

  • Is documentation provided? In what form is documentation provided?
  • Is formal training available?
  • To what degree is the ORM plug-and-play?
  • Or is the ORM plug-and-pray?
  • What are the consequences of integrating an ORM with ones architecture?
  • What is the learning curve
  • Ones architecture will exhibit a degree of style. ORM’s also exhibit a degree of style. Some may be intrusive but offer advanced features. Others may be non-intrusive but offer more flexibility
  • Therefore, does the ORM fit with ones architectural style?

 

Conclusion

 

It is inconsequential whether one believes the object-relational impedance mismatch to be an issue or not. The fact remains that there seems to be enough impetus to help ORM grow further in terms of its usability and functionality. Therefore, one cannot ignore the value of ORM as a Data Persistence Strategy.

 

My only advice to people that think ORM implementations are useless is to state that opinion in a forum and wait for the overwhelming response. One will find quiet the opposite to hold true. There are many organizations that are enjoying the benefits of ORM. If you have been involved with projects where ORM has failed miserably, consider whether ORM was appropriate or not. A poor decision is not the fault of a technology.

 

ORM is a tool for ones toolbox. One would rather have it and not need to use it than need it and not have it. Just like any other tool, one should apply the same vigilance in terms of determining whether the tool is truly suitable for the job or not.

 

In my opinion, ORM has reached a point of maturity where it may be considered to be viable as a persistence strategy in certain contexts. However, it is not viable in all contexts. I have explained the different contexts in terms of a Table approach, Entity approach and Domain approach. One should consider the aforementioned approaches when formulating an opinion.

 


Biography - Douglas Minnaar
After pursuing an Electronic Engineering diploma for several years, Douglas Minnaar worked as an Electronic Technician where he was involved with a number of engineering projects that mostly involved working with digital electronic systems. It was during this time that Douglas discovered his passion for software development. Douglas then went back to pursue a degree in Computer Science majoring in distributed computing systems. He has been practicing as a software developer predominantly in the Microsoft .NET space ever since.

button
Article Discussion: Object-Relational Mapping as a Persistence Strategy
Douglas Minnaar posted at Monday, June 11, 2007 6:15 AM
Original Article