jueves, 17 de marzo de 2016

Professional networking? Yo! BlankedOut Sucked

Martyn Richard Jones

Hello, readers.

Before my Aunt Dolly went to a better life she received a handwritten letter from her dear friend and long-time admirer Sir Arthur Streeb-Greebling, which was to be passed on to the CEO of, what he called, an interweb professional dating site. Now, she didn't actually give me a precise name, so I now find myself at a loss. So, if there's anyone out there that recognises who this might have been written for, then please let me know.

What follows is Sir Arthur's text, as relayed to my Aunt Dolly. 

Dear Mister Def Archibald Quengler,

We've never met, and we probably never will, and I don't much like the cut of your jib, but, I would like to take this opportunity to draw your attention to the demise of a once burgeoning professional dating site where decent chaps and chapesses and an assortment of pathetic likeminded individuals could share likeminded individual things. Such as pictures of cats, sexist crap, professional resumes, tips and tricks, insightful comments, 'me too' inanities, hype, boloney, mendacity, political detritus and even worse religious detritus.

Personally, I blame lack of national service and the parents… oh, and the teachers.

Anyway, I am writing to you in the tepid hope that your amazing and absolutely fabulous online concern does not fall victim to the same malaise.

You may have heard of the once significant, successful and utterly sensational BlankedOut web setup, it was a so called professional link-up site for pros, or some such dreck.

I am reliably informed that people loved BlankedOut, just a bit, and that they also hated it even more. Many people said to me "BlankedOut is the Facebook of the crass and dim-witted wannabe class, and apart from some minority exceptions, it is a gushing channel of crap and a conduit of intense mediocrity." But, not being aware of the game, I was in no position to make a judgement. I will leave that for others.

BlankedOut has been variously described as "a place where capital interests took us for a ride", and where members were generally treated as hapless schmucks, captive clowns and useful idiots, and that according to the observers on the hustings, they did it in such a way that people lapped up the ride.

My distinguished colleague Mister Bernice Hill, PhN observed that, as a role model for a Big Data and Big Data analytics company, that BlankedOut "sucked, big time".

He went on to state "It sucks from top to bottom, from left to right, and around the whole global enchilada." Bernice was tough, a hard-hearted man, and he had a way with words.

The former judge Sir James Beauchamp, also didn't hold back when he stated "From its obtuse, obnoxious and incessant promotions of sponsored rancid 'content' to its insipid, trite and fatuous love-affair with its god-damn-awful Effluence®s and fawning sycophants, BlankedOut stood as a shining internet beacon of manipulation, exploitation and hypocrisy." I seem to recall a certain Barnie Puddle as being one of the mendacious and manipulative of the Effluence®s. But, whatever, as the young people are want to say these days

So, you see. I knew bugger all about the matter of these sorts of high-class professional career-oriented pimping sites, past or present. But now, and you may call me an incurable romantic, when I look upon the history of the deceased BlankedOut community, as dead as a Norwegian blue, what I see is something that leads my thoughts to visions of a massive work of misuse, abuse and deception. 

Which is not a good omen.

Of course, alternatives to BlankedOut existed, but they were ascetically professional and did not venture much into the wild-side of vulgarity, populism and cant. They stuck to their core competences, like troopers, and trusted their clientele to be just as serious, decent and professional as they were. More fool them, what?

But not so, at poor, dead and despised BlankedOut, lying in a state of disgrace, like some sort of dead pisshead society on a pyre of burning nothing.

So, Mister Def Whiner, heed my word, don't let your business turn into yet another BlankedOut. If ever there was one, an abject lesson in snatching failure from the jaws of success.

Carpe diem, man, carpe diem!

So, I just have this to tell you, and I will say it only once. Good will to all women and men and all of that.  If you are still an admirer of what was that dreadful BlankedOut business model,,, then, bugger off and take your bloody dogs with you!  

Yours sincerely,

Sir Arthur Greeb-Streebling
Admiral of the Grand Fleet, retired

Well, nothing much to add from me. Sir Arthur seems to have said it all. Although, I would still like to know who this letter is supposed to have been written to, because try as I might, I can't track down anyone who goes by the name of Mister Def Archibald Quengler. That stated, the next time I am in Palma de Mallorca I will ask my Aunt Dolly, now that she is in a far better place and has more free-time on her hands.

Next week I will be looking at financial scams that concern greenhorns, their parentages and protectors, culture establishments, incongruous financial arrangements, the government and more importantly, the police and the judicial system.

Stay tuned.

Many thanks for reading.


domingo, 6 de marzo de 2016

Free Business Analytics Content –Thanks to Wikipedia – Part 1

Why buy when you can get it for free?
Here is the first fantastic delivery of an amazing and fabulous selection of free and widely available business analytics learning content, which has been prepared… just for you.
  1. A/B testing is a way to compare two versions of a single variable typically by testing a subject’s response to variable A against variable B, and determining which of the two variables is more effective.https://en.wikipedia.org/wiki/A/B_testing
  2. Choice modelling attempts to model the decision process of an individual or segment via Revealed preferences or stated preferences made in a particular context or contexts. Typically, it attempts to use discrete choices (A over B; B over A, B & C) in order to infer positions of the items (A, B and C) on some relevant latent scale (typically “utility” in economics and various related fields). https://en.wikipedia.org/wiki/Choice_modelling
  3. Adaptive control is the control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. https://en.wikipedia.org/wiki/Adaptive_control
  4. Multivariate Testing. In marketingmultivariate testing or multi-variable testing techniques apply statistical hypothesis testing on multi-variable systems, typically consumers on websites. Techniques of multivariate statistics are used.https://en.wikipedia.org/wiki/Multivariate_testing_in_marketing
  5. In probability theory, the multi-armed bandit problem (sometimes called the K[1] or N-armed bandit problem[2]) is a problem in which a gambler at a row of slot machines (sometimes known as “one-armed bandits”) has to decide which machines to play, how many times to play each machine and in which order to play them.https://en.wikipedia.org/wiki/Multi-armed_bandit
  6. t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution if the null hypothesis is supported.https://en.wikipedia.org/wiki/Student%27s_t-test
  7. Visual analytics is an outgrowth of the fields of information visualization and scientific visualization that focuses on analytical reasoning facilitated by interactive visual interfaces.https://en.wikipedia.org/wiki/Visual_analytics
  8. In statisticsdependence is any statistical relationship between two random variables or two sets of dataCorrelation refers to any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to the extent to which two variables have a linear relationship with each other. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. https://en.wikipedia.org/wiki/Correlation_and_dependence
  9. Scenario analysis is a process of analyzing possible future events by considering alternative possible outcomes (sometimes called “alternative worlds”). Thus, the scenario analysis, which is a main method of projections, does not try to show one exact picture of the future. Instead, it presents consciously several alternative future developments. https://en.wikipedia.org/wiki/Scenario_analysis
  10. Forecasting is the process of making predictions of the future based on past and present data and analysis of trends.https://en.wikipedia.org/wiki/Forecasting
  11. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. https://en.wikipedia.org/wiki/Time_series
  12. Data mining is an interdisciplinary subfield of computer science.[1][2][3] It is the computational process of discovering patterns in largedata sets (“big data“) involving methods at the intersection of artificial intelligencemachine learningstatistics, and database systemshttps://en.wikipedia.org/wiki/Data_mining
  13. In statistical modelingregression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’). https://en.wikipedia.org/wiki/Regression_analysis
  14. Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning https://en.wikipedia.org/wiki/Text_mining
  15. Sentiment analysis (also known as opinion mining) refers to the use of natural language processingtext analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews and social media for a variety of applications, ranging from marketing to customer service.https://en.wikipedia.org/wiki/Sentiment_analysis
  16. Image analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing [1] Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face.https://en.wikipedia.org/wiki/Image_analysis
  17. Video content analysis (also Video content analyticsVCA) is the capability of automatically analyzing video to detect and determine temporal and spatial events.https://en.wikipedia.org/wiki/Video_content_analysis
  18. Speech analytics is the process of analyzing recorded calls to gather information, brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. https://en.wikipedia.org/wiki/Speech_analytics
  19. Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated randomsampling to obtain numerical results. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other mathematical methods. Monte Carlo methods are mainly used in three distinct problem classes:[1]optimizationnumerical integration, and generating draws from a probability distribution.https://en.wikipedia.org/wiki/Monte_Carlo_method
  20. Linear programming (LP; also called linear optimization) is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming (mathematical optimization).https://en.wikipedia.org/wiki/Linear_programming
  21. Cohort analysis is a subset of behavioral analytics that takes the data from a given eCommerce platform, web application, or online game and rather than looking at all users as one unit, it breaks them into related groups for analysis. These related groups, or cohorts, usually share common characteristics or experiences within a defined time-span. https://en.wikipedia.org/wiki/Cohort_analysis
  22. Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables calledfactors. For example, it is possible that variations in say six observed variables mainly reflect the variations in two unobserved (underlying) variables.https://en.wikipedia.org/wiki/Factor_analysis
  23. Adaptive (or Artificial) Neural Networks. Like other machine learning methods – systems that learn from data – neural networks have been used to solve a wide variety of tasks that are hard to solve using ordinaryrule-based programming, including computer vision and speech recognition.https://en.wikipedia.org/wiki/Artificial_neural_network
  24. Meta Analysis. The basic tenet of a meta-analysis is that there is a common truth behind all conceptually similar scientific studies, but which has been measured with a certain error within individual studies. The aim in meta-analysis then is to use approaches from statistics to derive a pooled estimate closest to the unknown common truth based on how this error is perceived. In essence, all existing methods yield a weighted average from the results of the individual studies and what differs is the manner in which these weights are allocated and also the manner in which the uncertainty is computed around the point estimate thus generated. https://en.wikipedia.org/wiki/Meta-analysis
I hope you find the content useful. Of course all thanks should really go to Wikipedia and their unpaid expert contributors.
I will try and get the next part of ‘ Free Business Analytics Content’ onto Linked Pulse over the next weekend.
Many thanks for reading.
Just a few points before closing.
Firstly, please consider joining The Big Data Contrarians, here on LinkedIn:https://www.linkedin.com/groups/8338976
Secondly, keep in touch. My strategy blog is here http://www.goodstrat.comand I can be followed on Twitter at @GoodStratTweet. Please also connect on LinkedIn if you wish. If you have any follow-up questions then leave a comment or send me an email on martyn.jones@cambriano.es
Thirdly, you may be interested in other articles I have written, such as:
You may also be interested in some other articles I have written on the subject of Data Warehousing.

martes, 1 de marzo de 2016

A data superhero is something to be


A data superhero is something to be



A data warehousing superhero is something to be


Not all that glitters is Big Data, and Big Data has a long way to go before it can deliver anything like the same satisfying results, tangible benefits and organisational agility that a properly implemented Inmon Enterprise Data Warehouse can provide.

Therefore, I have a question for you.

Do you want to win friend and influence people in the world of data architecture and management? Do you want to do something in IT that atypically will bring kudos and credibility? Do you want to enjoy what you are doing because you are actually doing the right thing right for an appreciative audience?

Okay, this a recipe that I will now reveal, has the power to turn you into, not only a data hero, but a 4th generation enterprise data warehousing superhero – with Big Data bells and whistles attached, and even more amazingly, it is offered for nothing, gratis, and for keeps.

Yes, you read it right. I am feeling generous, and although a rare animal, there is such a thing as a free lunch. In this instance, the free lunch takes the form of a cookbook for successful data sourcing, warehousing and provisioning, one that will turn you into a truly modern day digital superhero.

Follow the suggestions to the letter and it will be hard to fail. However, drop any magic ingredient from the mix and expect, eventually, to run out of luck – that is rhyming slang for Donald Duck, down my way. Almost as important, please apply your own criteria of good sense at every step of the way.

The craft of data  


The craft of data includes temporary-permanence in exploitation, revolution and institution.

When Sun Tzu was talking about the Art of War, he was also talking about the craft of data.

In the 21st century the highest expression of the craft of data in an organisation, whether public, private or military, is the enterprise data warehouse.

These are some of the key rules and guidelines for ensuring that you prevail and not your adversaries. The items are necessarily terse, but should provide a sound basis for further research, thought and strategic practice.

So without further ado, let us get to the crux of the matter.

1.       This is the first piece of advice, and it's a little bit of a 'downer', but you may just thank me for it later. The business sponsor of any significant Data Warehouse initiative or iteration cannot be the CIO, CTO or any member of the IT organisation. When this unfortunately happens, and it happens far too often, you should know that this particular data warehouse project is dead before it even gets off the ground - guaranteed. If you can afford to walk away from such a project, then do so. Now for the more positive aspects.

2.       All data in the data warehouse must be subject-oriented.

3.       We must integrate all data before it enters into the data warehouse.

4.       All data in the data warehouse must be time-variant or specifically indeterminate.

5.       Data in the data warehouse must be non-volatile – within periods of explicit and implicit snapshot coverage.

6.       Data in the data warehouse is primarily used to feed into management decision making (by order of importance: strategic, tactical and then operational).

7.       We build the data warehouse iteratively and over time. We never build the data warehouse using a 'big bang' approach.

8.       We base each build iteration of the data warehouse on a specific set of well-bound departmental-oriented requirements, deliverable in a short and specific timeframe. We never try to build the data warehouse using a 'boil the ocean' approach.

9.       We never run more concurrent iterative developments in a data warehouse programme than we would in any other agile environment. This means that for a mature data warehousing setup, we run a maximum of five concurrent developments. The more immature the organisation, the less the number of concurrent iterations.

10.   We use a contemporary two-tier approach to the data-warehousing super-component. A well architected, designed and engineered third-normal form database that supports true historicity and time-variance-modelling forms the basis of the decision support database of record.

11.   We build departmental and process-centric data marts on top of the data warehouse layer, as the end-user-centric semantic-layer of the data warehouse.

12.   We use 3NF to model the data warehouse data-model. We typically use dimensional modelling to model the data mart models, although other modelling options are also valid. Target use cases will inform the decisions we make regarding the choice of data mart model.

13.   Never trust anyone who claims that we can service the strategic data needs of a complex and volatile enterprise by implementing a faux data warehouse built using a collection of conformed dimensions and facts. This approach may initially appear to work, however, this is a massive strategic, tactical and operational mistake, which will eventually involve costly reengineering, loss of valuable data, organisational disruption and dissatisfied clients.

14.   We store transaction in the data warehouse at the lowest possible level of granularity. We store transaction and fact data in the data marts at the aggregation levels appropriate to the target audience.

15.   Based on use cases and performance needs, we will accordingly aggregate data in the data marts. If, in the future, lower level data granularity is required in the data mart then we can easily provide that by reconstructing the data mart from atomic level data stored in the data warehouse.

16.   We should never second-guess business requirements. No business imperatives means no requirement. You're aiming to be a successful data superhero, keep that goal in mind. Don't be beguiled into doing the wrong things even when accosted by 'right-sounding reasons'.

17.   Data warehousing is about the permanent incremental development and redefinition of minimum viable products and a minimum viable service. Iteratively grow the data warehouse and ignore those who claim that Inmon is about 'big bang', 'bottom up' and 'boil the ocean'.

18.   Avoid pork barrel political games in data warehouse programmes. You should not use a data warehouse programme as a means to leverage a raft of other related data, operational and DevOps projects in the organisation. For example, Corporate Data Governance, Data Quality and Disaster Recovery/Business Continuity should not packed into the data warehousing programmes, at any level. Again, this is a massive strategic, tactical and operational mistake.

19.   We ensure that as a minimum that data in the data warehouse is as reliable as the data at source. Simply stated, we do not allow unnecessary entropy to effect the data in the journey from source systems to the target data warehouse or data marts.

20.   No data is 'corrected' or 'cleaned' in the data warehouse without the explicit, verifiable and express consent of the fiduciary duty holder with respect to that data. If the data warehouse is to act as a system of record then it must also hold metadata relative to any 'cleaning' that has been applied to that data, and should also hold 'before' and 'after' states of corrected data – for auditing purposes.

21.   We secure all data in the data warehouse in accordance with prevailing legislation and corporate rules and guidelines. In any conflict between corporate rule and legal jurisdiction, the current laws prevail.

22.   Ensure that competent and independent design authorities, with the support of the Data Warehouse architect, are ultimately responsible for all data-warehouse architectural, process and design decisions.

23.   Architectural and process choices govern the selection of methodology, product and partner. Always remember mens sana in corpore sano. Prejudice, speculation and opinion generally lead to very bad data-warehouse acquisition decisions, and can potentially lead to strategic, tactical and operational mistakes.

24.   Data warehousing iterations have clear top-level phases: start-up; DW management phase; analysis phase; design phase; build phase; testing phase; and, implementation phase. We complement these phases with data warehousing tracks: project management track; user track and requirements; data track; technical track; and, metadata track. This approach is used by a number of data warehousing methodologies, including the Cambriano methodology for data warehousing, information management and data integration.

25.   To conclude, I would like to iterate some of the reasons why we should follow an Inmon based approach to the building of a Data Warehouse. The Inmon approach is very much based on:

                    i.            Iteratively solving specific business challenges, iteration by iteration. This is not just a flippant excuse for spending other peoples' money. The Inmon DW is not about 'boiling the ocean', 'bottom up' or 'big bang'. Neither is it an insistence that one can build a whale by carefully configuring a collection of minnows. There's a 'little bit more' to it than that.

                   ii.            Delivering perceived and visible value within a reasonable timeframe.

                 iii.            Achieving high returns on investment.

                 iv.            Meeting or exceeding expectations.

                  v.            Meeting user requirements, first time and every time.

                 vi.            Delivering a quality data-warehouse solution on schedule, within budget, whilst effectively utilizing the resources available.

               vii.            The rational and economic need to minimize the impact that any strategic data initiative will have on operational systems and the organisation.

              viii.            The goal of maximizing information availability and analytical capabilities throughout the organisation and even to stakeholders and clients, if we so wish.

                 ix.            Designing towards maximum flexibility to ensure that we can accommodate much of the future decision support needs immediately and that we swiftly and coherently address new requirements.

Now what?


Now I've given out a wealth of valuable information and indications you may be asking 'and now what?'

This is the next step, dear budding data superhero:

1.       Take each of the items mentioned above and study them to the best of your ability. Do lots of research, and start to fit together the pieces of the jigsaw.

2.       Invent scenarios, or better still, ask other people for scenarios and hypothetical challenges, and then work through how you would go about responding to those scenarios and challenges.

3.       If you have any questions that you cannot research and answer yourself, then I will be glad to help. That is, if the request is regarding a particular aspect of data warehousing or management. Please email me your questions at martyn.jones@cambriano.es Please use one email shot per question please (e.g. if you have three questions, send three emails), so that I can prioritise the questions and manage the time I can set aside to respond to them.

The subtle evolution of Inmon's definitive Data Warehousing


What I have described are elements and requisites of a solid, coherent and cohesive approach to fourth generation Enterprise Data Warehousing, a proven approach to the provision of quality data for management decision support. The approach is the evolution of the classic Inmon approach, which has evolved over the intervening decades, thanks to Bill Inmon himself, and those who adopted and developed his approach to cohesive, coherent and comprehensive data warehousing.

Many thanks for reading


So, that's it. Many thanks for reading this piece and I sincerely hope you found it of interest.

Do keep in touch. You can connect with me via LinkedIn and you can also keep up to date with my activities on Twitter (User handle @GoodStratTweet) and on my personal blog http://www.goodstrat.com (GoodStrat.com)

I am the manager of The Big Data Contrarians group on LinkedIn. Consider joining that group, if only for the critical thinking that it could potentially provoke.

You may also be interested in some other articles I have written on the subject of Data Warehousing.








Martyn Richard Jones

Palma de Mallorca

23rd September 2015