Wednesday, December 31, 2008

The Exception-Tolerant Organization - Part 5

As explained in my introductory post on the Exception-Tolerant Organization, ETOs do not tolerate waste, stifling of innovation, or inflexibility. These are the corrosive agents that prevent an organization from smoothly handling exceptions, maintaining a competitive edge, and operating effectively.

When I was at IBM Research I knew of a research Fellow who would monitor the keystrokes made by his administrative assistants while they were typing, and would spend quite a bit of time with them working on utilizing the fewest keystrokes possible. This is intolerance of waste taken to the extreme, where a few burning trees are saved while sacrificing the forest. Although it is quite easy to disdain waste made at this low level, this is not the kind of intolerance we are discussing here.

While the canonical definition of waste is anything that does not add value, what exactly constitutes waste for the Exception-Tolerant Organization? As previous posts have mentioned, exceptions happen to organizations, to customers, and to people every single day. Those organizations that cannot handle the exceptions and improve from them will fight a constant perception of lower value in the eyes of their customers.

For those who understand the benefits of mapping a value stream, waste would be anything that slows the velocity of movement through the value stream. The inability to swiftly handle exceptions can bring this velocity down to near zero. Being Exception-Tolerant can keep the value stream moving at the pace of innovation.

Waste for the ETO, then, is the set of roadblocks to handling exceptions: inflexibility combined with the stifling of innovation.

Inflexibility is tough to exhibit while being Exception-Tolerant, but is even tougher to recognize in an Exception-Intolerant organization. After all, there are certainly industries and products where strict and rigid standards must be constantly and consistently adhered to (watch-making, food preparation, chip manufacturing, aerospace), but these should not be mistaken for inflexibility. Chip manufacturers, despite adhering to the use of 30+ year old computer code, have been able to find avenues of flexibility in many ways over the past few years - from lower-voltage conduits to hyper-threading to multi-core pipelines.

Stifling of innovation is perhaps the touchiest subject when it comes to organizations who may be looking to become Exception-Tolerant. The urge to suppress innovation is strong in risk-averse environments, and is often exercised in such forms as:

- the boss consistently saying "No"
- a departmental control group demanding that a potential innovative division look and operate the exact same way as the existing divisions
- new ideas whose implementations are unduly laden with process, direct-to-archive documentation, or extreme executive input

The stifling of innovation is often a by-product of culture, and the effects may not be felt in world markets until long after the key culture-makers have moved on. Strong cultures that stifle innovation are not often discovered by the outside until the effect in the marketplace becomes obvious. ETOs, even ones with strict standards, have a tough time saying "No". Inflexible organizations find it all too easy to say "No" and close doors on new ideas and customer needs by stifling innovation.

But with all the good work that we as people do, and all the work that we endeavor to do, how do we let ourselves get into a mode of inflexibility and resistance to innovation? The hard truth here is that we most often become inflexible and stifle innovation when we put self-interests above working together as a team. There is good reason that this struggle is often labeled as the war within.

But the other side of this hard truth is that ETOs are given credit by their customers for handling the exceptions and increasing the overall value proposition. And when credit is given in this fashion, ETOs give their people, who have worked together as a team, every opportunity to take the credit and reap the rewards - thus making team achievement in the best self-interest.

Friday, December 19, 2008

Valuating Technology System Delivery

A well-worn but still prevailing theory of system delivery states that a system can be delivered according to three main factors:

- the system is of high quality and functionality (a "good" system)
- the system runs speedily under all conditions and usage loads (a "fast" system)
- the system can be built and delivered inexpensively (a "cheap" system)

The conventional wisdom is that a system can be delivered possessing AT MOST two of the three above factors. In other words:

- a good and fast system cannot be delivered cheaply
- a good and cheap system will not run speedily
- a fast and cheap system will not be a good system

You may or may not hold this view yourself, but many still do. Of course, it is quite possible (and rather beneficial) to have all three factors represented positively in a system delivery. But you need to look at the entire lifecycle of a system to understand this, not just the planning and construction stages.

Many under-weigh, or ignore completely, the usage and maintenance portions of a system's lifecycle when evaluating TCO and ROI. If the system's features, functionality, and construction are underrepresented before the system is delivered, higher costs will be made apparent later in the lifecycle:

- the system's users will invent workarounds to their normal business processing, just to accommodate the system's lack of performance or mis-matched view of the world. This usually leads to weaker controls and gaps in business processing, which leads to anywhere from higher operating costs to lost revenue.

- the system performs poorly or contains only a few compelling features, but just barely enough to be useful. While there could be no justifiable loss in cost, the cost to keep-alive the system plus the cost to improve and re-architect the system could easily be more than double the initial funds promised. Building a more desirable and well-performing system might have cost more up front, but still could have been less than the funds needed later for repairs.

- the system performs so poorly, or contains such a lack of compelling features, that it will not be used at all. This represents a loss equal to the cost of construction of the system, plus the difference in current operating costs that may have been saved, plus the cost of lost opportunity to improve and be competitive.

So how do you deliver a system that is good, fast, and cheap? Stating the system's costs and its benefits over the ENTIRE system lifecycle, both with equal precision, will help you arrive at acceptable definitions for "good", "fast", and "cheap" for your organization. But without the equality in precision, someone is bound to be disappointed upon delivery of the system.

So which lifecycle phases and cost factors are you accounting for in your system delivery analysis? Do you feel that you have all the bases covered? And how is the precision measured?

Act Like An Owner

Does your organization ask you to act, and think, like an owner? Does your organization ask this of you when you come aboard?

What would an Owner do in your organization? Would an Owner carry a vision? Innovate? Seek improvement and new opportunities?

Would an Owner cut corners? Cut costs? Maintain the status quo?

Would only an Owner understand the key drivers to the business? Or could anyone else in the organization achieve such an understanding?

Does your organization educate and allow you to understand the key drivers to its business?

Does your organization allow you to cut corners and costs? To maintain the status quo?

Does your organization allow you to carry a vision? Innovate? Seek improvement and new opportunities?

Your organization may ask you to act like an owner - but does it allow it?

The Exception-Tolerant Organization - Part 4

As explained in my introductory post on the Exception-Tolerant Organization, ETOs practice daily Risk Management. Another concept that sounds quite simple, but organizations find it difficult to sustain due to these key factors:

- lack of inertia and momentum
- lack of systematized and/or automated assistance
- key-man risk and blame

Lack of inertia and momentum comes from not instilling review activities into the organization's daily operations. Just asking these fundamental questions on a daily basis goes a long way towards an effective review process:

- What did I accomplish yesterday?
- What am I working on today?
- What obstacles or problems am I facing?
- How confident am I that the current goals will be accomplished on-time?

For many organizations that rely on reviewing large amounts of performance or other time-sensitive data, having a systematized and automated collation and presentation of this data is key to establishing a daily momentum. The systems and processes supporting them should have the appropriate fail-safes and redundancies to handle exceptional processing cases so that timely delivery (and thus Risk Management momentum) can be maintained.

Automated assistance is also key for the human side as well. In reviewing an organization's performance or exception conditions, leaning on automation can clear our heads of the mundane and manual steps, and allows us to focus on the exception conditions. Plus, what Director of Risk Management wants to arrive at work at 7 am just to push a button, or wait for reports 1 through 10 to finish running and printing before tackling the issues of the day?

Being a Director of Risk Management is a tough proposition to satisfy for an organization. Not only must you have volumes of data and information at your disposal, you must have the prescience to understand what is going to happen seconds from now in both your own building and halfway around the world. A Risk Management Director could be hailed as the heroic steward of the ship that avoids the icebergs of a competitive landscape, or could easily be the goat when the ship is steered right when perhaps it should have turned left.

For an organization, it is easy to both funnel singular responsibility and cast blame on one person when things blow up. But why do this, when after the blame subsides, the organization is still in an unfavorable situation when a risk materializes? Risk Management is one of those cross-cutting activities that the entire organization can practice effectively on a daily basis via continuous review and improvement. Allow your Risk Management Directors to get the support of the entire organization, and the Director will support the viability and competitive health of the organization in return.

Saturday, November 15, 2008

The Exception-Tolerant Organization - Part 3

As explained in my introductory post on the Exception-Tolerant Organization, ETOs emphasize their strengths and compensate for weaknesses by working together and communicating openly. If the concept sounds simple, that's because it is simple. But with all of the advanced ways, means, and tools we have at our disposal to collaborate and communicate openly, we still find in today's world places and situations where these fundamentals are simply not done.

Working together involves several cultural practices within an organization:

- Total team involvement from the start
- Proactively seeking the improvement of the organization
- Staying open to new ideas and possibilities
- Shared goals without harmful personal agendas

The first point is easier done than said, but the other three involve a resistance factor to change within our organizations. ETOs have the cultural grounding to foster and promote all four points above. As discussed in Part 2, ETOs systematically implement the communication pathways and processes to support these points.

But as is often the case in life, the greater challenge in overcoming obstacles can come from within ourselves, especially on the last point above. We don't always realize that we are more in competition with ourselves than we are at odds with others, and many times we are better served improving ourselves and overcoming our own shortcomings. Instead, we often put up walls where requirements and miracle solutions are volleyed back and forth, neither ever really satisfying the concerns or goals of the parties on both sides of the wall. But when we take down the walls and mprove ourselves, our best foot is then truly placed forward for the benefit of our organizations. Our strengths become the organization's strongest capabilities to be deployed across the greatest spectrum of benefit. And our weaknesses are compensated for by our continuous self-improvement, by identifying risks early, and especially by communicating openly.

Communicating openly, at its most basic level, involves face-to-face discussion, debate, and sometimes conflict - something that people can tend to avoid, in both their personal and professional lives. But for open communication to be effective, an environment needs to be created by the organization where intense debate and disagreement are tolerated, and conflicts can be satisfying to resolve. The organization should make it clear that it is okay to disagree and debate issues, but the result of each debate should still include a clear decision to move forward along a certain path of action.

As an exercise, have everyone in your organization begin to answer these questions out loud daily. Ideally, everyone related to a project or a core business of your organization should be in the same room (or on conference if necessary):

- What did I accomplish yesterday?
- What am I working on today?
- What obstacles or problems am I facing?
- How confident am I that the current goals will be accomplished on-time?

For the last question, use some kind of rating scale. Example: have each person rate their confidence from 1 to 5, with 5 being the most confident. Any answers below a 4 should be a cause for concern and those concerns should be addressed openly. The Agile practice called Scrum advocates the daily use of these questions.

Communicating openly also involves the appropriate use of communication tools. Communications involving urgency or time sensitivity should use direct methods: direct-line phone, internet, and video calls; direct text messages/pages; and of course face-to-face conversation. The use of email and instant messaging, while becoming increasingly mobile and location-independent, should not be relied on for urgent time-sensitive communication. These communication methods often have either multiple inboxes or streams/threads of communication occurring simultaneously, have lengthy queues of messages attached to them, or depend on having your communication device successfully "subscribe" to that message stream. The number of inboxes/threads and the size of the inbox queues are things that you cannot guarantee to be small enough so that your urgent message is received timely. Eliminate this frustration up front by identifying early a reachable line of communication to use for urgent matters.

Some very simple and fantastic exercises in working together and communicating openly (many taking 10 minutes or less to complete with a noisy room full of people) can be found here. While some of these are focused on Agile principles and practices, many of these deal with the core issues of communication and collaboration, and may help to expand your thinking. My thinking was certainly expanded after participating in some of the exercises. My thanks to Michael De La Maza for bringing these to my attention.

Monday, November 10, 2008

Challenging Our Assumptions

At a recent party I sat with a technology team whose members were dejected from having their latest project terminated by their company. This team spoke fondly about their months of development, their early adoption of cloud computing methods and technologies, and even their conferences and consultations with members of NASA regarding the use of advanced technologies. They shook their heads and couldn't understand why the system didn't succeed.

This team had hailed the project as the next great data transmission hub for the team's company. They specified communication protocols and APIs, worked with the application developers to foster understanding, and even reviewed a few sample data streams to understand the problem space. The project seemed to have everything going for it. But during construction of this hub, the team made a fateful design assumption regarding the separation of data streams within communication channels.

This assumption was that data streams could be separated by a data pattern that would "never be seen" in the actual live data itself. The team was warned by the application developers and certain managers that the possibility of this data pattern showing up in live data was in fact likely. Still, the team moved forward with completion of the system, performed some successful testing with a thin time-slice of data transmissions over a period of several days, and then went live with the system.

Within three hours of the system going live, this data pattern delimiter showed up in several places in the live data. As a result, the system was delivering incomplete data and crossing data transmission streams. Processes that were dependent on this new data transmission hub had to be reverted back to their old transmission methods. Beyond changing the data pattern delimiter to something else that might "never be seen" in the actual live data, the team did not have an alternative solution to this problem. And thus a short time later, the project was terminated.

Unfortunately for this team, a fundamental design assumption was made early on that contradicted directly with the problem space that their solution would address. The lack of challenge, investigation, and thorough testing of this assumption sealed the project's fate. While we are often proud of our assumptions, we are made better when our assumptions are challenged and tested at the earliest possible opportunity.

Thursday, November 6, 2008

Project and Technology Development Methodologies

Rather than be the trillionth blogger writing on the subject of Agile, refactoring, Test-Driven-Development, domain-driven-design, and other practices and paradigms, I'll lay out some pragmatic principles to keep in mind when executing projects, no matter which development practices you adopt. In the rush to absorb the hottest trends, we should keep in mind the tried-and-true principles that still work:

Start with iterative and incremental. Project phases are well and good, but in order to complete a project successfully, the path to completion needs to be traversed. And yes, you must have an idea of your destination. But will you wait until all requirements are specified to the last detail before moving on to any architecture or design concerns? Will you have a design project phase that takes into account the current state but fails to connect with the changes in the world three months later when the design phase is completed?

If you break down your project phases into iterations, with a guaranteed feedback session at the end of each iteration, you will be much more likely to keep up with changing requirements and conditions. You will also be able to identify failing efforts sooner, before they cause damage later.

Start as soon as you can. One of the common reasons for project slippage is that some critical component or project dependency was not available earlier in the project cycle. Was it because the need for this component was not forseen? No, it was just not available at the appropriate time. One way to mitigate this is to start working on these components after just enough baseline requirements have been gathered. You may spend a little more up front to support this initial development, but it will cost you far more in the long run should you end up in an ill-timed slippage scenario. The ways it may cost you range from longer development cycles to mis-timed market entry to project personnel turnover. These are big-ticket costs.

Military leaders are trained to make decisions with 40% to 70% of the information necessary, and regularly provide feedback as more information is available. You can do the same on projects and be very effective.

Do your Risk Management. As stated so eloquently in the fantastic treatise on software project management Waltzing With Bears, "Risk Management is Project Management for adults." Listing and valuing project risks is something that can be done to 100% completion up front. But the great (and sometimes painfully realized) thing about Risk Management is that 100% completion is often not enough. Here is a great place to be iterative and incremental in providing feedback loops on the state of risks, mitigating circumstances, and changing requirements. If you build the iterations and feedback loops into your risk management on a project, the rest of your project practice will need to follow suit just to keep up.

You'll also be more inclined to tackle project deliverables that resolve the greatest risks first. The sooner these risks are resolved, the more accurately you can publish a date range for delivery.

Test early and continuously. Your project exists in some form from start to finish. It may start on paper, and live in hardware and software at the finish. But in whatever form it exists, it can be tested. As new requirements are gathered and formulated, test cases can be drafted and pitted against the project's assumptions. As new hardware is installed, its images and monitoring agents can be configured and proven. As new code is written, test cases can be written before or alongside the code.

Don't wait for a project to be 50% or 75% completed to start drafting up test cases and setting up a test facility. Automate much of your tests if you can, so that they can be run at least once a day. Automation is again a case of a little more effort up front, saving much more cost and headache later.

Thursday, October 30, 2008

The Exception-Tolerant Organization - Part 2

As explained in my introductory post on the Exception-Tolerant Organization, ETOs build the communication pathways, business processes, and technology features to handle exceptions systematically. Where the business processes and technology features are largely matters of construction, the essential communication pathways can often be the component most elusive to an organization.

To relate this to a recent business event: an organization was forming a new business venture, bringing mature and existing technologies and processes in house from one of their partners. In the suite of this venture's features, there was one particularly public-facing feature that was not even close to being on par with the rest. Very early on in the business venture, one principal surmised that this feature had the strong potential to sully the reputation of the organization, and told the other principals of his analysis and recommendations for a course of action. A few principals were sorely disappointed that this was brought to their attention so early in the business venture, as they felt that this "negative" analysis was not what was needed during the venture's "honeymoon" period.

About one year later, the other principals began to see the strongly negative customer reactions to this public-facing feature. When they approached the principal who originally gave his analysis, they said to him "You didn't tell me it was THAT bad." In this case, the communication pathways were cut off early in the venture, perhaps when they were needed the most. And when they were opened again a year later, it was done in a reactive fashion that does not lend itself towards a systematic way to handle exceptions.

So how can you create these communication pathways? You can begin an email thread, as the principal above did, but these threads either are ignored early, or become so long that the interest drops off quickly. You can appoint a single point of contact to receive and dispatch the exceptions, but that person ends up being a single choke point more often than not. Or you can set up a system purely to capture issues and exceptions for review, which relies on people willing to take the few necessary minutes to input their issues into the system. This system can provide an effective entry point for exceptions and issues, if your organization is culturally adaptable to putting an entry, approval, and management process in place. The organization above did go and implement such a system.

Whether this type of system is a cultural fit for your organization or not, a regularly-scheduled gathering of the principals solely for the purpose of reviewing exceptions would go a long way towards getting the effort off the ground. Your organization may need to lighten the "status meeting" load somewhat to make room for this type of gathering. But because of the regular schedule, you are that much closer to systematically handling your exceptions.

As stated above, the business processes and technology features are largely matters of construction. But where business processes are concerned, many organizations focus on constructing processes for project lifecycles, development lifecycles, change control, and administration. These can be helpful in measuring performance when they are not over-engineered, but they are all are intended to handle the "normal" course of business. This still leaves out the what-just-happened, where-do-we-go, and what-do-we-do when an exception does occur.

As a starting point, take the lead from technology support teams. Great technology support teams have an initial point of contact, an exception entry and tracking system, and "run books" that contain procedures written in detailed step-wise fashion documenting EXACTLY what to do when a particular exception occurs. When an exception occurs and is identified, the support team executes the procedures step for step. If there is an exception that they cannot identify, they still have a procedure for this case that may involve contacting someone with more knowledge of the systems.

Handling exceptions in this way leaves little guesswork as to how to initially react. For many exceptions, recovery is a matter of reacting safely and sanely first, and then following the steps. So for your organization's exception-handling process, write down in step-wise fashion what people should do when there is an emergency exception, a impactful exception, and a minor exception. Every detail, including which people/departments should be contacted, the appropriate time frames to wait for responses, and any system or documentation entries should be noted. As a bonus, when you are out of the office and someone is covering for you, they can cover for you as effectively as possible during times of exception by following the steps.

For those of you who dislike process and procedure, keep in mind that the exception procedures exist to assist you, not to burden you. Your brain-power is most needed for problem-solving during the exception period - not remembering to make entries in systems A, B, and C; and worse, not remembering to notify people critical to your business. Another point of assistance is rendered when you can collect data from your exception-handling system and analyze the types and frequencies of the exceptions that occur in your organization. This may lead to some business insights your organization may otherwise not have reached.

Technology features for handling exceptions will be covered in a later post, but it is sufficient to say for now that your technology features that handle exceptions should interface and operate just like the technology features that run the normal business.

Tuesday, October 28, 2008

2008 Fairfield/Westchester .NET Code Camp

I will be presenting at the 2008 Fairfield/Westchester .NET Code Camp on Saturday November 8, 2008. The Code Camp is an all-day conference that allows allows professional developers and students the opportunity to hear from experts in the field on a variety of topics, from programming language tools to Web 2.0 development to the latest and greatest in Microsoft technologies. Several Microsoft MVPs, Evangelists, and some of my past colleagues will be presenting. The camp will be held at UCONN Stamford Campus in Stamford, CT, and you can get all the details here.

As part of the Web 2.0/Agile track, I will be leading a live interactive Test-Driven Development session, which will allow you to observe a test-driven approach to solving a real-world problem. I hope to see you there!

Friday, October 17, 2008

Does Our Technology Equate To Lies?

I recently had a conversation with a CTO with experience heading large and global teams. His recent work had concentrated on installing SOA-based systems into his organizations. He brought a viewpoint of his to my attention that related custom software development to a lie. The lie that you tell yourself, he said, was that a custom or one-off module or block of code solves your business problem the way you want it to. You're lying to yourself because the problem wasn't solved in an configurable, and maintainable, and service-oriented way.

He went on to support this statement by relating a story about lining up the dates in a fiscal calendar of a system that did not lend itself easily to change. A custom solution working around the constraints of an existing system was needed to meet the demands of the business, and he was not quite satisfied that the solution had to be of a custom nature. In his view, he preferred the solution to be service-oriented and configurable.

While I understand the CTO's view towards a service-oriented and configurable architecture, I don't think that his lie is attached to the appropriate concepts. Service-oriented or not, you can effectively eliminate the lie he speaks of by managing the lifecycle of the customized solutions.

If your business is sufficiently pained by the problem where you need a customized solution implemented and deployed urgently, and the solution has been appropriately constructed and tested, then by all means deploy it. But if the custom solution is not the most optimal, configurable, or maintainable, then put an expiration date on it and schedule the time and resources to implement the optimal solution by the expiration date. This way you would provide urgently-needed relief of the pain your business is experiencing, while having an outlook toward the optimal future.

But it is crucial that the follow-up happens. The common occurrence, and the real fear here, is that once a solution is implemented and deployed, people move on to solving the next problem or working on the next great project without coming back to readdress that sub-optimal solution. This custom development often ends up being yet another buried non-catalogued nugget of logic.

In the CTO's anecdote above, the custom solution is necessary to work around the larger system's constraints. But what if this custom development was configurable, maintainable, and service-oriented from the start? It is still a separate custom component, one more component to be maintained in the catalog of assets. In maintaining a service-oriented architecture, having an up-to-date and discoverable catalog of components, the contracts they satisfy, and their development artifacts is key to performing effective maintenance. SOA implementations typically have more components than non-service solutions, not less. SOA implementations without these catalogs can quickly become burdensome and error-prone to maintain.

So I think that the real ways organizations lie about custom software development are:

- not assigning an expiration date to sub-optimal solutions, and not managing the transition process from sub-optimal to optimal

- not maintaining an effective catalog of solutions, contracts, and components

- expecting a silver bullet or all-inclusive solution to eliminate the need for custom development

- not actually solving the organization's problems because no custom (or market-leading) solution is considered to be the optimal solution for the business

Unlike the CTO's organization, I've seen organizations with very painful problems put off implementing solutions for years because the solution candidates don't fit into the "optimal" category, service-oriented or not. If the solutions are not considered to be the holy grail, then nothing will be implemented at all. And the business continues to limp along in pain without further technology assistance.

Wrapping up my conversation with the CTO, he was considering implementing one of the larger off-the-shelf tools that can effectively assist in service orientation. He left me with the impression that these solutions, their price tag, and their large deployment footprint offered him great comfort in removing the custom development lies from his organization. But alas, that is a lie for another day.

Sunday, October 12, 2008

Invited To An Idea

Once upon a time a company had an idea - an idea whose direction was generally contrary to the overall company's market. This company made their idea wildly successful, as the great sages of the company built systems and processes to effectively implement the company's idea. But the great sages became trapped by their systems, because they required so much of their hands-on manual effort and control to execute successfully.

Enter our hero. Early in our hero's career, a few great sages invited our hero to become part of this company's idea. They did this by inviting our hero into their office and explaining the reports they needed to systematize and automate their manual systems. The great sages also explained that having these reports in place would free them to continue implementing the company's idea.

The great sages took their time with our hero, explaining why the reports were important and badly needed, and how the reports fit into supporting the company's idea. They always took time to answer our hero's questions and concerns about the reports.

These reports solved certain problems with supporting the company's idea, and freed the great sages to pursue supporting the company's idea further. This led to building robust systems, and led to freedom for many to further implement the company's idea. Over time our hero became a great sage at the company, one who would be free to enlist others and further implement the company's idea.

But after the years of work successfully implementing the company's idea, our hero was still not free.

There were certain functions in the systems that required only our hero. Our hero had willingly executed these functions because our hero believed strongly in the company's idea. And when there were problems with the systems requiring corrective action, our hero was there to the rescue nearly ever single time.

Over the years our hero grew to be responsible for executing more than double the number of functions than was originally intended. The company's great sages became used to the excellent service of our hero, and cheered our hero on all the way. Our hero had been invited to become part of the company's idea, but instead our hero had unwittingly been self-installed as a key cog in the systems implementing the idea. Under this arrangement, our hero could not be free.

When our hero finally realized the situation, our hero took corrective action. Our hero systematized and automated greater and faster than ever before - freeing our hero, and many others, from having to execute the functions that our hero solely used to perform.

But by this time, the company hit some turbulent times, lost sight of its idea, and lost one-third of its people. The company was no longer interested in inviting people to become part of its idea. But people were still needed to handle exceptions and support the company's idea using the systems the great sages and our hero had put in place. And without these people, our hero could not be free.

Our hero realized that at this point the only way to achieve true freedom to pursue the company's idea was by freeing himself from the company. And so our hero, exhausted but thoroughly grateful for the experience, moved on from the company.

Has your organization invited you to become part of its idea? Have you become stuck in the execution of your organization's systems like our hero was? Can you see a way to systematize and automate to free yourself, before moving on becomes the only path to freedom?

Monday, October 6, 2008

The Exception-Tolerant Organization – Part 1

As explained in my introductory post on the Exception-Tolerant Organization (ETO), ETO’s can embrace change and uncertainty while systematically executing their business. There are two keys here to making this a practicality:

- Being able to systematically execute the business

- Having an entry point in each business process for welcoming the change and uncertainty once an exception event occurs

First, ETO’s are able to systematically execute their business. There is a system for each business process that the ETO’s people follow to execute, manage, and report on their business. This is not as complicated as it sounds, as there are systems everywhere in business: accounts payable, software development, computer machine and image preparation, accounting. And when the ETO’s people understand that the systems exist to support the major ideas and goals of the organization, no system is considered too mundane to be ignored, improved, allowed to decay, or allowed to bloat in size. There are many books and references on the web related to understanding the importance of systems and business processes, so I will not expand further here.

If your organization is not systematically executing your business, but rather executing in an ad-hoc and undisciplined fashion, it can be difficult to embrace any changes or external events. Your organization is already dealing with so much noise and individual solutions to regular business issues ---that it will not be able to differentiate an exception event from a regular business event. Note that this can be an advantage when looking to create a system, in that your best ad-hoc process may work just as well for handling an exception event as it does for conducting your regular business. If you find your organization in this situation, use your best ad-hoc process as a starting point for implementing a system that can be executed repeatedly without fail.

Second, each one of an ETO’s systems and business processes has at least one entry point for addressing an exception or a change. Organizations need a way for someone to bring an event warranting change or representing uncertainty to the business’ attention. As an example, Toyota production workers are able to stop the line when they see a problem during the production process. Stopping the line is their entry point.

Entry points for processes that must be executed daily and on-time can be more difficult to see with the naked eye. As an example, an investment portfolio that must be valued on a daily basis - one or two issues with the price of an investment can make the portfolio's reported value wildly inaccurate, and become grossly misleading to a fund manager's investors. The system of valuating the portfolio must have an entry point where exceptions with prices can be raised, diagnosed, and handled. The identifying and handling of these exceptions is a systematic process itself, often assisted by robust technology. The entry point can be an automated review of the portfolio, followed by an exception reporting tool with a pricing exception report as a backup.

An entry point for an organization experiencing a problem with internally-developed software is a help desk department, which has rules and procedures around when it is available to take calls, and its expected turnaround time when responding to issues and exceptions. In other words, it is a system. Other organizations may have a developer, system administrator, DBA, infrastructure engineer, or manager as the entry point for problems. All of these technologists have different schedules and structures to their day, and can serve as an effective entry point - when they are available. Business principals may even feel more comfortable going to them directly, as they feel that the technologists are closer to the solutions than the help desk professionals.

This often ends up being more effective from time to time, but less systematic. Technologists are often steered away from their scheduled and time-sensitive work to handle exceptions, particularly after-hours and overnight. But here is a situation that calls out for leveraging a system so that effectiveness can be assured every single time the entry point is used. Being exception-tolerant allows us to handle these exceptions without negatively impacting regular business. This will be continued in Part 2.

Thursday, October 2, 2008

Leadership During Downtime

When I browsed to LinkedIn early this morning, this is what showed up on a web page entitled "Oops!":

Sorry, we can't display this page right now.

Something unexpected has gone wrong. Please wait a few seconds and try again by hitting the reload button.

We apologize for the inconvenience. An error report has been filed and our team is working on fixing the problem.

If you have any questions, please email us at customer_service@linkedin.com.

For many developers and business users of web applications, this is an all-too-familiar sight when a web site is experiencing problems. But does it need to be?

LinkedIn is no doubt a leader in on-line networking and community. But while there is a link to contact customer service via email, the web page is pretty much out of character with the rest of the web site: no ads, no links to its user community's services and sites, no information about LinkedIn itself - in other words, nothing useful. We might as well have received the standard error page from the browser.

In a period of downtime, LinkedIn is missing out on an opportunity to continue leading the way as a premier networking and community portal. Just a few links and paragraphs of text can make all the difference, so that during downtime LinkedIn would never be completely offline.

Wednesday, October 1, 2008

A Breakdown In System Testing

The latest release of the middle-office system has gone live, but somehow the pricing analysis screen, the most important screen in the entire system, will not accept prices entered for equity swap positions. It is at the end of the trading day, and the traders are getting heatedly upset.

Ask the developers, who are newcomers to developing for this system, if they tested the screen, and they say "Yes, but we didn't modify anything on that screen, so we didn't test it fully." Ask the business analyst if he tested the system, and he says "Yes, but I didn't test that screen because the developers said they didn't modify anything on that screen, so I just entered a few prices as a litmus test and that was it." Ask the development manager what went wrong, and he says "Didn't anyone bother to test the system? If I go into the system and try to enter this price, it's clear to me that it doesn't work! Are you all blind?"

Did the developers modify the screen directly? No. Is the business analyst lazy? No. Is everyone in the development manager's group blind? No. But is this breakdown of testing a common occurrence in software development? Yes. (A change to a pricing validator component shared by multiple components of the system, including the pricing analysis screen, was the cause of this situation.) And is everyone feeling sore about it? Absolutely.

So what is to be done? In this particular environment there is no systematic procedure for testing. The development group is weeks or months away from being fully up-and-running with automated development and testing practices and facilities, if they can carve out the time away from their regular responsibilities. Other than the business analyst, there is no budget for a dedicated QA/testing group. Yet something must be put in place quickly so that a system with shared yet inter-connected components can be reliably tested without negatively affecting the business users upon release.

This is the perfect time for this group to begin a testing practice built upon a foundation of acceptance testing. Acceptance testing proves the real-world conditions that every feature and function of the system must satisfy correctly and repeatedly.

Acceptance tests represent the common point of understanding and agreement between the business users and the technologists responsible for a system. If the system can satisfy all these tests consistently, every time the tests are run, then any changes to components can be readily verified as not having a detrimental effect on the system. And since acceptance tests are satisfying real-world conditions, the business users are given some measure of the system's reliability before the system is released. *

It does take a bit of work to get started and enumerate the acceptance test cases. It also takes some work to get both the business users and IT developers and analysts to buy into the process and realize the benefits. But ask the traders above and their staff if it would be better to be frustrated by a malfunctioning system. But building the foundation takes less time than you may think. You can start with one simple spreadsheet listing the test cases, but the point is to start somewhere. The journey of 1000 miles begins with a single step.

Once you have this foundation, then your developers can branch out into Test-Driven-Development and other testing practices, and your business users can be more self-sufficient in setting up test cases. There are even open-source tools that can translate Excel files and "natural language" test cases into actionable code (FIT, for example). Imagine the situation where your business users can keep up with changing business conditions by submitting test cases in Excel on their own, without having to know XML or a cryptic language. To receive the most up-to-date feedback, automate the tests so that their execution is a convenience to be enjoyed by all, not a burdensome task to be carried out by a lone savior/scapegoat. You may even find your organization creating defect-free releases before too long.

And the next time the middle-office system is tested, the only things taking shots are the components, business processes, and assumptions made about the features and functions - but NOT the people developing, testing, or using the system.

*BONUS FEATURE: Acceptance tests provide some very valuable documentation of the functions and features of the system. The value of documentation will be addressed in a later post.

Tuesday, September 30, 2008

Entertained By Deception

When you watch a movie and see dinosaurs chasing people across a field, do you think to yourself "That's pretty amazing!", or do you think to yourself "This isn't real."?

When you watch a magician perform on stage and step out from a tiny little box that could not possibly fit the magician's body, do you think to yourself "That's pretty amazing!", or do you think to yourself "This isn't real."?

When a movie showcases special effects, or a magician performs on stage, the deception of the senses leads to amazing entertainment. If we later learn how the movie's special effects were created, or we wrap our brains around figuring out just how did that magician appear out of that tiny little box, the amazement subsides. We may be impressed by the special effects methods, or impressed by our own explanation of the magician's work, but we are no longer nearly as amazed or entertained. We arrive at "This isn't real."

Behaviorally, we suspend disbelief and allow the deception to amaze us and entertain us, and it can be a thrilling experience. Knowing that we can be thrilled in this way, this may explain why we suspend disbelief when we invest in and follow the financial and housing markets. We all want that thrill, so we suspend disbelief: we don't learn how the investments are being made, how the investments are constructed, who is constructing them, or even wrap our brains around figuring out just how did house prices rise 10% during the year. We are happy just to allow the deception of sharp upward market movements to amaze us and provide us with a thrill.

But after the movie, or after the magic show, if someone came up to us and said "Hey, that wasn't real, you know.", we would respond "Yeah, so?" without feeling like we were suddenly slapped back into reality. So then why in 2008, when someone came up to us and said about the financial and housing markets "Hey, that wasn't real, you know.", we feel like we've been slapped?

The next time you watch the financial and housing markets climb so quickly and steeply that they lead to record highs in record time, will you still think to yourself "That's pretty amazing!", or will you think to yourself "This isn't real."?

Monday, September 29, 2008

The Exception-Tolerant Organization

Everyone by now knows of the Pareto 80/20 rule and where it applies to conducting business on a daily basis: people spend 80% of their effort on 20% of the issues. Many of these issues are exceptions to the normal course of business, and do warrant greater attention and effort. But the 80% of the effort spent begins to take away attention and resources from handling the “normal” 80% of the issues a business faces on a daily basis, the very issues that keep the business alive. This can manifest itself in an organization slowly at first, like an insidious virus that attacks from the inside out.

By the time an organization realizes that its ability to handle the “normal” issue has been compromised, the gross margins have declined and the organization has lost its competitive edge. But an Exception-Tolerant Organization has learned to rise above this, to make 100% of their effort effective on 100% of their issues while keeping pace with changing conditions. This post introduces the Exception-Tolerant Organization (ETO), and subsequent posts will cover the major principles in greater detail.

First, what does it mean to be Exception-Tolerant? Today’s management consultants and management thought leaders preach the mantras of embracing change and embracing uncertainty. Being able to embrace change and embrace uncertainty are indeed important. But being Exception-Tolerant is about taking these embracements and bringing them into the practicality of day-to-day business. Being Exception-Tolerant is not about solely reacting to business events and then making on-thy-fly adjustments into your workflows and systems just to keep above the water level.

Being Exception-Tolerant is about anticipating these business events and proactively building workflows that allow both people and systems to adapt and adjust smoothly, sometimes within minutes or hours of the exception occurring. In current-generation software development tools, there are language constructs that allow you to anticipate the exceptions that may occur during processing, and build a framework to handle them. Since we can do this with software, why can we not create these types of constructs in our own business processes, and with our own people?

We can, but only if we have the facilities and communication channels to do so. An Exception-Tolerant Organization (ETO) has the Exception-Tolerant people, the Exception-Tolerant business processes, and the Exception-Tolerant technology to proactively execute during times of change, uncertainty, and exception.

Exception-Tolerant Organizations:

- embrace change and uncertainty while systematically executing their business

- build the communication pathways, business processes, and technology features to handle exceptions systematically

- emphasize their strengths and compensate for weaknesses by working together and communicating openly

- practice daily Risk Management

- do not tolerate waste, stifling of innovation, or inflexibility

And when ETO’s practice the above effectively, they turn out to be exception-al organizations, not exception-less. For if you are exception-less then you don’t stand out and cannot be exception-al in business.

Can your people, your processes, and your technologies tolerate the physical, logical, internal and external events and forces that cause exceptions to your business to occur? Can your people, your processes, and your technologies adapt within minutes and hours to update, and perhaps even create new necessary business processes?

In other words, are they proactively built to execute during times of change, uncertainty, and exception? Or do they solely react?

Do you want your organization to be an Exception-Tolerant Organization?

Wednesday, September 24, 2008

Getting Our Hands Dirty

During my career I've gained deep experience with financial software development, management, and leadership. But for many years I also provided on-call after-hours and overnight system support, first on a rotating-schedule basis, then 24/7/365. From this I learned the most critical details of what works and what does not work regarding data architecture, data transmission, business workflow, information flow, and human communication pathways within an organization. Learning these critical details often requires us to get our hands dirty, as this true story demonstrates:

One company’s hot-stove issue of the day was the paper problem: "We're spending so much on paper! Most of our paper reports are printed overnight at 5 am. Why do we print out so many reports overnight?" To be sure, this problem represented a sizable cost during a company’s belt-tightening period that required the Business and IT to conscientiously team together to get to the heart of the issue.

The long-tenured business-side Managing Director and the recently-hired CTO sat in the same room with a few key people from both Business and IT, to discuss the issue with the overnight support specialist responsible for collating and distributing the reports. After a few discussions on the purpose of the reports and the distribution lists, no clear solutions were presented. It was then the CTO declared, “Well if I have to come in at 5 am and see what is going on myself, then that’s what I’m going to do!”

A week passes by. Did the CTO make a call to action, or come in at 5 am? No - not even once. But one developer did not have to come in at 5 am to examine the company’s internal report-generation schedule and discover several large reports being generated and printed in duplicate, all delivered to the same recipient. In just a few hours after that meeting, the development team removed the duplicates from the report-generation configuration, alleviating some of the printing headache and cost beginning with the next night’s report run.

But the developer went further than that, as he had done previously when investigating IT problems in his company - he DID come in overnight and got his hands dirty. He sat with the overnight support specialist and analyzed when the reports were actually electronically delivered, how the specialist was collating and distributing the reports, and documented their purpose. After reviewing the developer's findings, the Managing Director agreed that most of the non-duplicate printing and distribution justified the cost for the time being, until the Business could get a paper-less business process in place.

When the buzz of the CTO’s declaration died down, the CTO was not willing to get his hands dirty. And what could have been a defining and inspiring leadership moment ended up being an action-less statement. But one developer took the lead, providing some immediate relief, but also investigating the problem in-depth and providing a basis for discussing a long-term solution. The developer was recognized by the Business for his leadership, while the CTO moved on from the company shortly thereafter.

The developer here learned that getting your hands dirty can produce results, and that leadership often requires that we get our hands dirty when investigating problems and formulating solutions. As today’s Leader – are you willing to get YOUR hands dirty?

Sunday, September 21, 2008

The Intersection of Business and Technology

What does the Intersection of Business and Technology look like?

Is the Intersection a shining Emerald City, where goals, ideas, and feedback are freely shared? Where IT principals understand the business and drive the technology to satisfy the business first, even before designing and deploying Greatest System Ever v10.0? Where business principals are sympathetic to the needs and specifications required for effective IT? Where regular partnership and joint accountability are merely business-as-usual, rather than a rarely-employed practice?

Is the Intersection a tall and wide impenetrable wall, where business goals are not shared, but volleyed back and forth between Business and IT? A place where business principals throw their problems over the wall as they hope and wait for a technology solution – any technology solution - to their operational problems and inefficiencies? Where the business principals rejoice when something – anything - has been thrown back over the wall for the first time in weeks or months? Is what they receive a true solution to their problems, or is it merely what the IT principals dictated as sufficient for the business without the ability or willingness to deliver more?

Is the Intersection a single blinking yellow traffic light, encouraging people to slow down but not stay around long enough to take a good hard look at what is being achieved or what is even possible? Or is the Intersection merely a desolate, unvisited crossroads whose only sign of activity is a tumbleweed traversing in the breeze?

When organizations establish their own Intersections, many treat the process like it is a Burning Man festival – that is to say, an exercise in temporary community. Principals from both Business and IT congregate for a few days to a week - often offsite - where they proclaim cooperation through rigorous presentations and sessions, show off their latest projects and business plans, share meals and shake hands, and then disband. When they return to their places of work, they often end up going back to the same or older practices, procedures, and policies - without making much progress. The partnerships do not survive the offsite event, and the joint accountability never materializes.

What does the Intersection of Business and Technology look like in your organization? How do you want it to look? Do you want merely the synergy of the temporary community without further partnership, the safety and protection from accountability that the impenetrable wall provides, or the shining Emerald City?

However you want the Intersection to look, one thing is certain: much like the Burning Man festival, the Intersection will contain only what you take with you.

Tuesday, September 16, 2008

A blog is born...

Welcome to my latest venture! This is the place to read about and discuss why effective leadership in uniting Business and IT goals and groups is so crucial for organizations, as they thrive and work to maintain their edge in a competitive marketplace.

This blog will cover:

- topics on leadership and unity of Business and IT groups of thriving organizations

- real-world anecdotes of effective (and not-so-effective) technology and business leadership

- people profiles and topics, to consider some outside viewpoints

- what it means to be an Exception-Tolerant Organization (ETO). I will flesh out this concept and its principles and practices in the coming days.

Who should read this blog:

- technologists of all levels looking to make a greater impact in their work and their organizations, and actively working to sharpen their growing edge of leadership

- business owners, executives, managers, and associates seeking more fulfillment and assistance from their IT groups and partners in accomplishing their business objectives

I hope that the blog will provide insight and open up a dialogue to improve leadership at all levels of an organization as we face the challenges ahead of us in the rest of 2008 and beyond.

Happy Reading!
Jason Sliss

Leadership for Uniting Business and IT