A data superhero is something to be
A data warehousing superhero is something to be
Not all
that glitters is Big Data, and Big Data has a long way to go before it can
deliver anything like the same satisfying results, tangible benefits and
organisational agility that a properly implemented Inmon Enterprise Data
Warehouse can provide.
Therefore,
I have a question for you.
Do you want
to win friend and influence people in the world of data architecture and
management? Do you want to do something in IT that atypically will bring kudos
and credibility? Do you want to enjoy what you are doing because you are
actually doing the right thing right for an appreciative audience?
Okay, this a
recipe that I will now reveal, has the power to turn you into, not only a data
hero, but a 4th generation enterprise data warehousing superhero –
with Big Data bells and whistles attached, and even more amazingly, it is offered
for nothing, gratis, and for keeps.
Yes, you
read it right. I am feeling generous, and although a rare animal, there is such
a thing as a free lunch. In this instance, the free lunch takes the form of a
cookbook for successful data sourcing, warehousing and provisioning, one that
will turn you into a truly modern day digital superhero.
Follow the
suggestions to the letter and it will be hard to fail. However, drop any magic
ingredient from the mix and expect, eventually, to run out of luck – that is rhyming
slang for Donald Duck, down my way. Almost as important, please apply your own
criteria of good sense at every step of the way.
The craft of data
The craft
of data includes temporary-permanence in exploitation, revolution and institution.
When Sun
Tzu was talking about the Art of War, he was also talking about the craft of
data.
In the 21st
century the highest expression of the craft of data in an organisation, whether
public, private or military, is the enterprise data warehouse.
These are
some of the key rules and guidelines for ensuring that you prevail and not your
adversaries. The items are necessarily terse, but should provide a sound basis
for further research, thought and strategic practice.
So without
further ado, let us get to the crux of the matter.
1. This is the first piece of advice,
and it's a little bit of a 'downer', but you may just thank me for it later. The
business sponsor of any significant Data Warehouse initiative or iteration cannot
be the CIO, CTO or any member of the IT organisation. When this unfortunately happens,
and it happens far too often, you should know that this particular data
warehouse project is dead before it even gets off the ground - guaranteed. If
you can afford to walk away from such a project, then do so. Now for the more
positive aspects.
2. All data in the data warehouse must
be subject-oriented.
3. We must integrate all data before it
enters into the data warehouse.
4. All data in the data warehouse must
be time-variant or specifically indeterminate.
5. Data in the data warehouse must be
non-volatile – within periods of explicit and implicit snapshot coverage.
6. Data in the data warehouse is
primarily used to feed into management decision making (by order of importance:
strategic, tactical and then operational).
7. We build the data warehouse
iteratively and over time. We never build the data warehouse using a 'big bang'
approach.
8. We base each build iteration of the
data warehouse on a specific set of well-bound departmental-oriented
requirements, deliverable in a short and specific timeframe. We never try to
build the data warehouse using a 'boil the ocean' approach.
9. We never run more concurrent
iterative developments in a data warehouse programme than we would in any other
agile environment. This means that for a mature data warehousing setup, we run
a maximum of five concurrent developments. The more immature the organisation,
the less the number of concurrent iterations.
10. We use a contemporary two-tier
approach to the data-warehousing super-component. A well architected, designed
and engineered third-normal form database that supports true historicity and
time-variance-modelling forms the basis of the decision support database of
record.
11. We build departmental and
process-centric data marts on top of the data warehouse layer, as the
end-user-centric semantic-layer of the data warehouse.
12. We use 3NF to model the data
warehouse data-model. We typically use dimensional modelling to model the data
mart models, although other modelling options are also valid. Target use cases
will inform the decisions we make regarding the choice of data mart model.
13. Never trust anyone who claims that
we can service the strategic data needs of a complex and volatile enterprise by
implementing a faux data warehouse
built using a collection of conformed dimensions and facts. This approach may
initially appear to work, however, this is a massive strategic, tactical and
operational mistake, which will eventually involve costly reengineering, loss
of valuable data, organisational disruption and dissatisfied clients.
14. We store transaction in the data
warehouse at the lowest possible level of granularity. We store transaction and
fact data in the data marts at the aggregation levels appropriate to the target
audience.
15. Based on use cases and performance
needs, we will accordingly aggregate data in the data marts. If, in the future,
lower level data granularity is required in the data mart then we can easily
provide that by reconstructing the data mart from atomic level data stored in
the data warehouse.
16. We should never second-guess
business requirements. No business imperatives means no requirement. You're
aiming to be a successful data superhero, keep that goal in mind. Don't be
beguiled into doing the wrong things even when accosted by 'right-sounding
reasons'.
17. Data warehousing is about the permanent
incremental development and redefinition of minimum viable products and a
minimum viable service. Iteratively grow the data warehouse and ignore those
who claim that Inmon is about 'big bang', 'bottom up' and 'boil the ocean'.
18. Avoid pork barrel political games in
data warehouse programmes. You should not use a data warehouse programme as a
means to leverage a raft of other related data, operational and DevOps projects
in the organisation. For example, Corporate Data Governance, Data Quality and
Disaster Recovery/Business Continuity should not packed into the data
warehousing programmes, at any level. Again, this is a massive strategic,
tactical and operational mistake.
19. We ensure that as a minimum that
data in the data warehouse is as reliable as the data at source. Simply stated,
we do not allow unnecessary entropy to effect the data in the journey from
source systems to the target data warehouse or data marts.
20. No data is 'corrected' or 'cleaned'
in the data warehouse without the explicit, verifiable and express consent of
the fiduciary duty holder with respect to that data. If the data warehouse is
to act as a system of record then it must also hold metadata relative to any
'cleaning' that has been applied to that data, and should also hold 'before'
and 'after' states of corrected data – for auditing purposes.
21. We secure all data in the data
warehouse in accordance with prevailing legislation and corporate rules and
guidelines. In any conflict between corporate rule and legal jurisdiction, the
current laws prevail.
22. Ensure that competent and
independent design authorities, with the support of the Data Warehouse
architect, are ultimately responsible for all data-warehouse architectural,
process and design decisions.
23. Architectural and process choices
govern the selection of methodology, product and partner. Always remember mens sana in corpore sano. Prejudice,
speculation and opinion generally lead to very bad data-warehouse acquisition
decisions, and can potentially lead to strategic, tactical and operational
mistakes.
24. Data warehousing iterations have
clear top-level phases: start-up; DW management phase; analysis phase; design
phase; build phase; testing phase; and, implementation phase. We complement
these phases with data warehousing tracks: project management track; user track
and requirements; data track; technical track; and, metadata track. This
approach is used by a number of data warehousing methodologies, including the
Cambriano methodology for data warehousing, information management and data
integration.
25. To conclude, I would like to iterate
some of the reasons why we should follow an Inmon based approach to the
building of a Data Warehouse. The Inmon approach is very much based on:
i.
Iteratively
solving specific business challenges, iteration by iteration. This is not just
a flippant excuse for spending other peoples' money. The Inmon DW is not about 'boiling
the ocean', 'bottom up' or 'big bang'. Neither is it an insistence that one can
build a whale by carefully configuring a collection of minnows. There's a
'little bit more' to it than that.
ii.
Delivering
perceived and visible value within a reasonable timeframe.
iii.
Achieving
high returns on investment.
iv.
Meeting
or exceeding expectations.
v.
Meeting
user requirements, first time and every time.
vi.
Delivering
a quality data-warehouse solution on schedule, within budget, whilst
effectively utilizing the resources available.
vii.
The
rational and economic need to minimize the impact that any strategic data initiative
will have on operational systems and the organisation.
viii.
The
goal of maximizing information availability and analytical capabilities throughout
the organisation and even to stakeholders and clients, if we so wish.
ix.
Designing
towards maximum flexibility to ensure that we can accommodate much of the future
decision support needs immediately and that we swiftly and coherently address
new requirements.
Now what?
Now I've
given out a wealth of valuable information and indications you may be asking
'and now what?'
This is the
next step, dear budding data superhero:
1. Take each of the items mentioned
above and study them to the best of your ability. Do lots of research, and
start to fit together the pieces of the jigsaw.
2. Invent scenarios, or better still,
ask other people for scenarios and hypothetical challenges, and then work
through how you would go about responding to those scenarios and challenges.
3. If you have any questions that you
cannot research and answer yourself, then I will be glad to help. That is, if
the request is regarding a particular aspect of data warehousing or management.
Please email me your questions at martyn.jones@cambriano.es
Please use one email shot per question please (e.g. if you have three
questions, send three emails), so that I can prioritise the questions and manage
the time I can set aside to respond to them.
The subtle evolution of Inmon's definitive Data
Warehousing
What I have
described are elements and requisites of a solid, coherent and cohesive
approach to fourth generation Enterprise Data Warehousing, a proven approach to
the provision of quality data for management decision support. The approach is
the evolution of the classic Inmon approach, which has evolved over the intervening
decades, thanks to Bill Inmon himself, and those who adopted and developed his
approach to cohesive, coherent and comprehensive data warehousing.
Many thanks for reading
So, that's
it. Many thanks for reading this piece and I sincerely hope you found it of
interest.
Do keep in
touch. You can connect with me via LinkedIn and you can also keep up to date
with my activities on Twitter (User handle @GoodStratTweet) and on my personal
blog http://www.goodstrat.com (GoodStrat.com)
I am the
manager of The Big Data Contrarians group on LinkedIn. Consider joining that
group, if only for the critical thinking that it could potentially provoke.
You may
also be interested in some other articles I have written on the subject of Data
Warehousing.
Martyn Richard Jones
Palma de Mallorca
23rd September 2015