Site icon IT World Canada

Data scientists can’t spell sustainability

Shutterstock

As data science has grown in importance, it has encountered the IS department with increasing frequency. These encounters produce lots of misunderstandings, occasional conflict, gridlock and not much progress.

Data scientists and the IS department hold dramatically different views about how software will be used. Data scientists see only throw-away software that will be discarded once the breakthrough insights have been actioned by management. The IS department sees only thoroughly designed and tested software that will be used in production-quality applications.

Both parties are wrong in their opinion.

Goals of data scientists and the IS department are incompatible

The software goals of the two communities couldn’t be more different.

The IS department assumes everyone’s goal is production-quality applications that exhibit lots of sustainability features. The IS department operates with an elaborate definition of sustainability that includes the following components:

  1. Maintainability
  2. Performability
  3. Operability
  4. Reliability
  5. Availability
  6. Repeatability
  7. Recoverability
  8. Adoptability

Data scientists typically don’t care about any of the components of sustainability. For them, the holy grail is breakthrough insights that can advance the business plan by leaps and bounds.

Both parties are wrong in their approach. Data scientists should accept that it’s useful to run some software more than once, making sustainability features more important and perhaps essential. The IS department should accept that some software will be prototypes or informal solutions that will be quickly discarded and never placed into production.

Both parties need to adjust their thinking.

When should data scientists introduce sustainability?

After the data scientists present their fascinating insights to management and receive well-deserved adulation for the work completed, management often asks one or more of the following questions:

  1. Can you run the model again with the following tweaks?
  2. Can you enhance the model by adding the following data sources?
  3. Can you apply this model to a similar scenario in another department?
  4. Can you re-run the model on a regular basis, as the underlying data changes, indefinitely into the future?
  5. Can you reduce the error term associated with the model by improving data quality in some way?

The eager data scientists typically answer all these questions with Yes, no problem. That’s an inadequate answer. Even experienced data scientists usually fail to introduce sustainability at this moment when management is keen and willing to fund more work. Instead of responding with a simple Yes, data scientists should add the following statements:

    1. The breakthrough results you’ve seen today are based on prototype software, not a production-quality application. It only works when we baby-sit the execution of the model.
    2. We are pleased to continue our work, since we’ve obviously demonstrated value, but we will need additional resources to move toward a sustainable, production-quality application.

Not hearing these caveats leaves management with the assumption that the software is ready for production use with little or no additional effort.

How much sustainability makes sense?

I believe that the larger the end-user community that will access data scientists’ application, the more likely it is that data scientists are building a production-quality application, whether they recognize it or not. For larger audiences, more sustainability features are a must have requirement. Here are some likely scenarios:

  1. If the application is intended to support only a single individual, including only a few sustainability features such as basic backup and some error checking will improve consistency of results and reduce support effort.
  2. If the application is intended to support a work group, including modest sustainability features such as performance improvements, data integrity checking, and fast recovery will improve end-user satisfaction.
  3. If the application is intended to support a large number of end-users, adding considerable sustainability features such as a help system, high availability and easy rollback to a prior point in time will improve application stability.

I believe that in most cases data scientists are short-sighted about sustainability requirements and should improve their approach to software development by including some effort to add these some of these features to their project proposal. The IS department should lighten up and not be so adamant and inflexible in its position.

What strategies would you pursue to introduce the value of sustainability to data scientists? Let us know in the comments below.

Exit mobile version