lunes, 23 de noviembre de 2015

Something funny happened on the way to the Data Lake



When one surveys the vast array of Big Data tomes and the even vaster output of the regularly amazing Big Data bullshit babblers, one thing that strikes you immediately is that Big Data isn’t explained starting with a story about the marvellous useful applications that you might imagine are generating all of this additional Big Data, but instead they almost invariably dwell on all of the data generated by social media applications of dubious business value and even more dubious social value.

Basically we are being led to believe that applications of peripheral interest, that is, peripheral in the grander scheme of things, are capable of simultaneously meeting almost all of our immediate needs and also of generating a voluminous wealth of data detritus that is somehow going to bring about an amazing economic revolution, the awesome galvanising of innovation and the creation of absolutely fabulous riches beyond the dreams of avarice.

What’s more, every day some Big Data proselyting dope has a cunning plan and pops up on the web like a drunken gopher with some extrapolating and theorising boloney about the quite unusual successes of very unusual businesses such as Facebook, Twitter and Google and then hammers square pegs into round holes in order to produce a generalised theory of awesome Big Data irksome-foolishness that will certainly mean, without a doubt, that we will see the raising of the luck and fortunes of all businesses, everywhere, if only they would buy into the Big Data religion, now.

If you don’t believe me then check it out for yourself. For example, take a look at the Big Data channel over on LinkedIn, now that it’s been apparently taken over by the master guru of astro-turfing Big Data bullshit and babble himself, it now comes with wall to wall Big Data hype, and it’s all the same, and it’s all so obviously wrong that it makes you wonder if LinkedIn is not deliberately converting itself into some species of professional laughing academy, for idiots by idiots, with no room for even the occasionally and superficially contrarian.


In fact, looking at some of the latest blunderbuss blasts of blatant bullshit I am reminded of the quote ascribed to Groucho Marx which I will makeover for modern-day consumption: “These are my Big Data principles, and if you don't like them... well, I have others.

Big Data with BIG SMILES

Having got your attention I would like to introduce you to a pragmatic, real-world and business centric approach to Big Data and Big Data Analytics. When I say that this is the best approach to Big Data you are ever likely to find in the whole universe and in your entire life, I am still significantly understating the magnificent utility, timeliness and the here-and-now facets of the approach.
Now with the introduction done and dusted with, and the virtues of the BIG SMILES approach exalted, it should come as no surprise that this eminently sensible, highly rational and thoroughly reasonable methodical and no-nonsense technique has been applied successfully in more than 500 business-oriented situations.
Best of all, this amazing Big Data approach is free of charge and with no-strings attached – you don’t even have to buy my book. Now, isn’t that amazing?
Let’s start with the basics. The BIG in BIG SMILES refers to Business Insight Gains. This refers to the focus of SMILES. Simples, right? Now what does SMILES refer to:
BigDataSmiles
Fig. The process chain of Big Data SMILES
SMILES is also an acronym, and it refers to the six major components of the SMILES Big Data approach (as illustrated above). Or, more precisely the various phases of the approach. Namely:
  1. Start with a significant data-centric business challenge
  2. Model high-level options and approaches
  3. Implement your chosen option
  4. Leverage the products
  5. Evaluate performance and value
  6. Socialise the outcomes
Let’s take a look at each of those aspects of Big Data SMILES in a little more detail.

1.      Start with a significant data-centric business challenge

It makes sound business sense to start any business initiative with a compelling business reason. If you don’t have one then don’t start, it’s as easy as that.
StartSMILESFig. Start SMILES
Now, having identified your significant business challenge you should ask the following questions:
  • What: What do you want to accomplish with respect to the significant business challenge?
  • Why: Why do we want to address the challenge?
  • Who: Who should be involved in helping address this challenge?
  • When and where: Can you identify the time and place the challenge first comes into effect?
  • Windows of opportunity: During what periods can we most effectively address the challenge?
  • Which: Can you enumerate the requirements and constraints associated with the challenge and the possible responses?
When compiling your view of the significant business challenge, you could look for example at questions along the lines of:
  1. How do I find the Big Data I need?
  2. What is the original source of the Big Data?
  3. How was this summarization, enrichment or derivation created in the Big Data?
  4. What queries and mechanisms are available to access the Big Data?
  5. How have Big Data related business definitions and terms changed?
  6. How do interpretations of the Big Data/Data vary across organizations?
  7. What business assumptions have been made that are related to this Big Data?
Don’t forget, asking great questions about significant business challenges will lead to even more questions, which is where you will want to highlight the new questions that could quite possibly be answered, wholly or partially, using Big Data. Also, don´t confuse great questions with complex questions, the idea is not to impress the audience but to identify and address the challenge.
Another important thing to point out in relation to the SMILES approach is that you should never, ever, in no way shape or form, try and boil the ocean. That is, do not try and implement and leverage Big Data analytics in your organisation in big leaps and bounds. Start out with baby steps, and treat it like a game of tennis. Win points, games, and sets and beat challenges and bad decisions one-step at a time.
Finally, make sure each iteration of SMILES starts with an objective that is big enough to be significant yet small enough to be doable in a reasonably short-time scale, one preferably made up of sprints of no more than 5 to 10 days.

2.      Model high-level options and approaches

Here we are looking at a process of discover and insight; conceptualisation and creativity; deign and innovation; and, prototyping expertise and domain capabilities.
ModelSMILESFig. Model SMILES
This phase has three parts:
  1. Defining the problem and developing options
  2. Evaluating and selecting the best model
  3. Finalising and developing the implementable prototype

Defining the problem and developing options

Using Co-operative Prototype Ideation, Concretisation and Realisation (a unique rapid development feature of SMILES), you may now choose to develop options and models to various stages of maturity and extensiveness.

Evaluating and selecting the best model

Through a cycle of hypothesise and test you may arrive at the model most suited to your needs. If however, no such model is forthcoming or the best model simply is not good enough or promising enough then do not proceed to the next step. Make sure you have a good reason to progress and inertia is definitely not a good a reason, and neither is ‘because we have to do something’.

Developing the implementable prototype

In this phase, you develop the chosen model through to the stage where it is ready ‘productised’ in the following prototype implementation phase.
In addition, as part of this phase, you will be looking at which technology options might provide the best fit with your requirements. Common technological options are typically associated in one way or another with the Hadoop ecosphere, so your problem may be adequately addressed using technology such as Hive, Pig, Bash, Spark or Python, or indeed with products such as MapR, Neo4j or EXASol.

3.      Implement your chosen option

In this phase, you take your well thought out proof of concept prototype and turn it into a business ready and production hardened product.
What is involved in turning a Big Data solution prototype into a product?
Here we are focusing on tools, build competence and teamwork; product piloting and development support; and, the realities of production and support.
Also, as a closing message for this phase description, please note that the implementation phase also includes the active participation of the development sponsors and target user group, as should every phase up to this point, and all of the follow on phases.

4.      Leverage the product

It’s been designed, prototyped, built and productised. Now what?
Well, here comes the moment of truth.
In this phase SMILES can help Big Data teams quickly react to changing business and market needs by capturing and managing new and changing requirements, and by constantly monitoring feedback. The framework can be totally integrated into true agile development processes especially where requirements are extremely dynamic and free-flowing but nonetheless must be managed at a complete Big Data product level.

5.      Evaluate performance and value

An assessment of the value of the Big Data initiative should be made periodically. However, there should be at least four event-pegged must-do valuations carried out during the life-time of the project products.
EvaluateSMILESFig. Evaluate SMILES
  1. Initial acceptance criteria alignment and qualitative valuation.
  2. Maturity performance and tangible ROI contribution. Include both the mitigation and avoidance of loss and the enablement of all direct and indirect gains.
  3. Life-time-value-to-date to be carried out before all major enhancements.
  4. Sunset life-time assessment and valuation. On replacement or withdrawal from service.

6.      Socialise the outcomes

These are the activities that generally put the smiles in SMILES.
Whatever happens with your initiative you must never fail to socialise the outcomes, for as kitsch, cute or painful the exercise may appear to you or anyone else.
Put it this way. You’ve gone to all the effort and trouble of making a Big Data initiative work, and it’s working well, people like it and it’s delivering value. So, what else is there to do? It’s a success, so you shout it from the rooftops, tell all your colleagues and peers, put the news on the intranet and in the company house magazine, and organise a party.
If your project is killed off early or late, or simply fails to deliver value, and sometimes this just happens, then hold a wake, in the Celtic manner, and learn all the lessons that are worth shaking a stick at and make them part of your corporate data management story and knowledge base.

The golden rules of SMILES for beginners

When entering the new dynamic world of Big Data and Big Data Analytics and to ensure that SMILES delivers the kind of success that others enjoy, you must take ensure that the 9 golden rules of SMILES are also followed. These are the golden rules for beginners.
9GoldenRules2Fig. The 9 golden rules of SMILES
  1. Infrastructure – Ensure that the infrastructure is adequate for the needs of the project and ensure that executive management disconnects your prototyping, development and deployment cycles from the necessarily rigorous, time-consuming and constricting requirements imposed on core business operational systems. Remember this is operational, but it is not life threatening, customers will not be lost nor will serious money be burned. So make sure your senior management unchains Big Data from Big IT bureaucracy – and if possible, on a forever basis.
  2. Pilot – When you first take on Big Data and Big Data analytics, always start with pilot prototypes and projects. Again, small enough to be doable (avoid overreach at all costs) and large enough to be significant (small and useful doesn’t have to be trivial).
  3. Timescale – Aim to deliver initial pilot Big Data prototypes in around the 3 to 6 week mark, and aim to get the first projects into the leverage phase in around 3 to 5 months, tops.
  4. Long-term – Aim to deliver fast, simple and elegantly, but also keep a keen eye on the long-term prospects and issues for Big Data and Big Data Analytics.
  5. Cash-flow – Control the cash but make sure you have enough to do what you need to do. Focus on value, keep sprints and iterations short, and be intelligent in the management of funding. (see comments on funding later in this piece)
  6. Continuous involvement and justification – Justify every decision in terms of business (mandatory) and technological (optional) drivers. Involve business partners, continuously. Make this an absolute mandatory condition for starting the project and continuing. If involvement from business stops, then stop the project until the implication and involvement of the business stakeholders picks up again. Make all of this clear from the outset. Continually seek and reaffirm business justifications for the projects existence – this is a showcase, and people are watching.
  7. Sponsors – Ensure that your project has high-level business sponsorship. This cannot come from IT and it cannot come from the CIO or the CDO, unless your Big Data project is to measure and report the performance of aspects of IT and Data Governance.
  8. Clean – Make sure that the data that you use is to the quality levels required. The data that you analyse must be at least as good at that stage as when it was sourced. In many cases you will need to scrub and clean data, especially when it’s coming from badly designed, tragically engineered and shoddily built web applications where the designers and developers have only had a passing acquaintance with sound database engineering principles, if at all.
  9. Tenacity – finally, never give up until it’s time to do so. If you believe that success is achievable then go for it. If you see that the project is on a suicide mission then kill it quickly, don’t wait until you’ve burned all your hours and cash.
When using SMILES keep these 9 golden nuggets of rules in mind, and you won’t go far wrong.

A note on funding

Try to make sure that your funding aligns with the phases of SMILES.
However, split your funding requests into three parts.
  • Start with a significant data-centric business challenge. 2. Start with a significant data-centric business challenge. 3. Model high-level options and approaches.
  • Implement your chosen option.
  • Leverage the products. 6. Evaluate performance and value. 7. Socialise the outcomes.
This ensures an optimum allocation of resources and provides additional executive and project management safeguards and options, and helps to ensure alignment of contractual assurances and obligations with committed and planned budgets.

A note on testing

Testing is an integral element of every phase of the SMILES approach. For brevity the approach has not been detailed in this document, but the philosophy is essentially one of test early and test often. Under normal circumstances the SMILES approach does not require a User Acceptance Testing phase, and if we are in shops that do require this testing then the UAT becomes a mere bureaucratic formality.

Things to remember

Some people may be surprised that the Big Data SMILES approach does not start with a strategy. In my view, the idea that strategy can begin without the need for identifying a significant challenge is to get things wrong on two counts: i. It’s not the way to go about strategy, and ii. It’s not the way to go about Big Data.
The BIG SMILES approach is not in itself a strategy, it is a guide, reference and framework for those who wish to develop a specific Big Data oriented strategy for addressing a significant business challenge. This is where it is powerful, useful and relevant. It’s a roadmap, a cookbook and a method to understand issues, formulate questions, and provide adequate, appropriate and timely responses, whilst separating the wheat from the chaff, the core from the periphery, and the important from the inconsequential.
Lastly, if you or your Big Data analysis, design development team haven’t done so already, I suggest that you take a look at the Cambriano Information Supply Framework, which provides a solid-basis upon which to architect and design Big Data oriented solutions.
That’s all from me for now. Have fun and enjoy the Big Data journey.
Thank you for reading.

If you would like to know more about SMILES, the Information Supply Framework, Core Statistics, Core Data Sourcing, Data Governors, the Analytics Data Store or 4th generation Enterprise Data Warehousing then please drop me an email or visit:
On a lighter note, readers may also be interested in joining The Big Data Contrarians, the friendliest, most relevant and massively irreverent Big Data community on the whole of the entire world wide web. So, if you think you are up for it then we can be found on LinkedIn at this address:

Big Data, Content and Analytics in HR

To begin at the beginning

As has been stated elsewhere, human resource management is a content and process intensive activity, which makes it somewhat amenable to the deployment of content and process centric IT solutions. In particular, Enterprise Content Management tools that also offer advanced process design and deployment, would seem to be an ideal fit for any significant and continuous human resource activity.
Like many other activities in business, the roles and responsibilities embodied in human resource management have emerged, developed and transformed over the years, and with subjective improvements and innovations the field has become more complex, more varied and more concentrated – in a wide range of aspects, but especially in terms of the explosive proliferation of process, business rules and content.
The reasons why the HR activity has become more intense is manifold.
Of course times, if management consultants are to be believed, are always critical for every business, every market, every department, every role, and for every responsibility, function and process. But sometimes, by providence or serendipity, the hype mysteriously coincides with the reality.
In recent times human resources has been highly affected by the explosion in the digital economy, with web sites such as LinkedIn and Xing appearing on the scene and creating certain degrees of disruption.
In addition, just as it is, and has been possible for businesses to reinvent their past, present and future, without being overly rigorous with regards to the facts, the realities and honesty, so too are people in the jobs market finessing their own curriculums in much the same way that companies embellish their products, inflate their service offerings and exaggerate their achievements. This is one of the problems in human resource management, CVs that do not represent realities or bend realities out of all recognition.
Another problem with human resource management is also to be found in job descriptions and job offers that are badly aligned with prevailing realities. Just as companies find that a hired candidate does not actually align very well to the job or to their own CV in quite such an ideal way, so too are some job descriptions woefully inadequate, insincere and misleading.
So, we have a situation where companies, hiring managers and candidates gild the lily.
Now we probably don’t want to waste time and effort on getting companies to be totally honest about themselves, but as businesses we can take steps to ensure that we interpret job requirements and candidate curriculums correctly by using the power of content management and comfortably contented analytics.
So, what content can we analyse, how do we analyse it and what do we do with the results?

Analysing the Curriculum Base

As Teresa Rees put it “[Shirley] Dex maintains that it is possible to construct and test theories in a range of disciplines using life histories, and argues that they represent a blurring of the traditional boundary between qualitative and quantitative methods.” This, in my view, is particularly relevant and potentially effective when it comes to the analysis of curricula, especially where a vast collection of such content is available, for example in businesses that specialise in providing human resource and project team building, management and dismantling.
Using tools like IBM’s Watson Content Analytics, it is possible to mine a whole range of hidden correlations, from symbolic, through quantitative to quasi-qualitative data. Not only that, but by applying arbitrary predicative analytics, it will be possible to create detailed, dramatic and realistic scenarios based on a whole heap of factors and unsubstantiated correspondences.
This heralds an exciting time for the automated generated of a wide rainbow of Cartesian products, on many plains, in many dimensions and in many interpretations.
By using the power of today’s cheap-commodity computer technology and the vast offer of ‘free’ open software, it will be possible at some time in the future to successfully replicate the skill and art of human resource management practiced in bygone-days but at a more attractive and increased multiplier of the previous cost. So, this will also please suppliers and those who take an undeclared incentivised cut from services and artefacts that are billed for.

Analysing the Job Description Base

As we saw in the example of the curriculum, so too will we be able to mine the vast collections of job descriptions, offers and mentions in order to create an all-encompassing view of market drivers, demands and movers.
In the future, mega-hr-corporations will be using Big Data Analytics, Enterprise Content Management and Content Analytics to suck up the nickels and dimes of the job market, chipping away financial benefit from wherever it may be accrued – from clients and workers alike, especially workers - like ginormous bottom-feeding catfish in a universal sized version of the Everglades or the sewers of a major inter-galactic city somewhere in our dreamy and dystopian inheritance.

Analysing social media and professional social media

Do you have your ducks lined up in a row and hanging from your living-room wall?
If a company is serious about HR, ECM and content analytics, it also needs to think about other things. We need to think of more metrics. Or as Social Media Examiner put it (March 28, 2013):
“Reach. You might want to measure the number of fans, followers, blog subscribers and other statistics to gauge the size of your community.
Engagement is measuring retweets, comments, average time on site, bounce rate, clicks, video views, white paper downloads and anything else that requires the user to engage.
Competitive data may include the brand’s “share of voice” across the web or number of competitors’ brand mentions.
Sentiment. You might want to measure the numbers of mentions with positive or negative sentiment.
Sales conversions. Do you want to measure social media referral traffic to the top of the sales funnel or number of sales aided by social media efforts?”
I couldn’t have stated it better myself, that is, if I had wanted to state it, which I probably didn’t. But it seemed like a good idea at the time.

Correlating results

Now it’s time to pull together all that wonderful analytical magic and coalesce it into a tangible and meaningful whole. That’s the whole idea, the answer is in the whole, their whole, our whole, and (as they say in Ireland) “your whole!”
Wikipedia states that "Correlation does not imply causation is a phrase used in statistics to emphasize that a correlation between two variables does not necessarily imply that one causes the other. Many statistical tests calculate correlation between variables. A few go further, using correlation as a basis for testing a hypothesis of a true causal relationship; examples are the Granger causality test and convergent cross mapping", now we know for an indisputable and unarguable fact that this is arrant nonsense invented by professional statisticians to defend their very shaky and untenable turf.
Believe me, as a fully paid up member of the Royal Order of Gentleman Data Scientists I know full well that correlation is king and that causation is for wimps. It’s been scientifically proven, not only by errant Readers Digest bloggers, but also by the most venerable members of the IT community itself.
This I why and how I can claim, with no element of doubt, shame or certainty, that when we derive nuggets of gold from the mining of CVs, job descriptions and gossip on social media - even the Facebook of the ‘connected and voguish’ professions, that we are deriving a value that surpasses that of the holy grail of all things analytic.
If you don’t take my word for it, just try it in your business, and you will soon become a believer, too.
Just remember, do it right, or all bets are off.

Driving insight and taking decisions

That’s why we can use what we learn in terms of insight to better perform:
  • Hiring - planning and decisions
  • Profiling – intelligence services, spying and subterfuge
  • Insider trading – rationality, pragmatism and coincidence
  • Requirements – planning, decisions and execution
  • Training - planning and decisions
  • Workforce realignment – planning, decisions and actions
And a million other things that take our fancy.

Analysing the analysis, after the event

Moreover, we can also analyse the analysis, to see what worked, what didn’t and in extreme cases, what Big Data driven decisions lead to massive success or fraud, bankruptcy, mass redundancies and jail time.
This part of particularly reliant on what I term the 4th generation Enterprise Data Warehouse.

Conclusions

There is a whole new world opening up in terms of data, Big Data, Big Data analytics, content analytics, human resources, abuse, social media, data privacy violations, Chinese walls and Enterprise Content Management.
Are you going to be part of the success story or will you be left to play catch-up at some time in the future, just when it might be too late to join in the fun or reap the massive benefits.
Many thanks for reading.

domingo, 22 de noviembre de 2015

The golden age of the Big Data babbling bullshitters



ONCE MORE... WITH A VENGEANCE!!!

If you enjoy this piece or find it useful then please consider joining The Big Data Contrarians: https://www.linkedin.com/grp/home?gid=8338976
Many thanks, Martyn.
Pundits far and wide are hailing the end of the period of big data babble, hyperbole and bullshit and are looking forward to an epoch of practical, tangible and verifiable Big Data success stories.
Gartner themselves came out some time ago and declared that Big Data was no longer in the hype cycle. Some took this as a sign that the Big Data bullshit bonanza was over, others were more cynical and suspected a highly orchestrated ruse, a move to the next level in the game plan.
But does this new attitude towards Big Data really ring true?
Accompanying this apparent bold openness, frankness and humility in the ranks of the rehabilitated Big Data bullshit babblers there is an awful lot of what appears to be ‘more of the same’. Or as the people of Thailand might say, “same, same, but different”.
As some of you might know, I am the administrative owner of The Big Data Contrarians community group on LinkedIn, and even I was somewhat taken aback by a recent piece by Bernard Marr entitled 20 Stupid Claims About Big Data. So much so that I wrote a fairly complimentary comment on LinkedIn about it. The thing is, even as a posted it I was thinking to myself “you’ll be sorry”.
Today I read yet another Big Data ‘reformation’ piece on LinkedIn Pulse, this time from Matthew Reaney and with the compelling title of The 5 Myths of Big Data.
Call me naïve, call me illusory, and a believer in humankinds need for basic decency, but I frequently have the idea that praising moderately acceptable behaviour leads to even more good behaviour. But it was not to be, and as fast as one could say ‘what the hell is going on here?’ back came a surfeit of astroturfed Big Data bananas – from all directions - bigger, brasher and more bogus than ever before.
Make no mistake, Big Data hype hasn’t gone away, it has become more subtle, more cunning and even more misleading.
Leading the charge is the initiative to discredit Data Warehousing by all means possible, and the amount of bullshit, disinformation and blatant lies doing the rounds is beginning to look like Big Data hype reflecting Big Data itself, if only in terms of the vast volumes, varieties and velocities that this Big Data babbling bullshit comes in.
But seriously, we are simply getting more of the same, as the end of the Big Data hype war is declared, we are subject to a bombardment of Big Data boloney via Cloud, IoT, the Hadoop ecosphere (as if using Hadoop was someone linked to ecology and saving the planet), and especially this incredibly obnoxious and dopey vehicle for Big Data tripe known widely as the Data Lake – more on that stupidity at some other time. But onwards and upwards…
This all reminds me of a joke from many decades ago, retold in part from memory.
A teacher was looking for a subject about which her class pupils could write, to set as a homework exercise.
After much deliberation she decided to as ask the children to write about what they thought of the police?
Sure, not a good question, I know, and as I stated, this was many decades ago, when even grown-ups could be innocent and naïve and hopeful.
Anyway, when the children had handed in all their essays, the teacher read the essays and was disappointed to find that most of them were very wishy-washy and that the children were almost all unanimously indifferent or grudgingly respectful of the police, except for one. One of the children, let’s call him Dave, was very critical and had written “I don’t think much of the police.” When the teacher asked Dave why he had written that, he replied “All police is bastards, Miss”. The teacher was vexed by the reply, but being a good and caring teacher she considered how she could change this obviously hostile view of the bobby on the beat and the police detective taking evil doers out of circulation, so she decided to do something about it.
She had a bright idea and took her problem to the police and discussed what could be done to give the children a much more positive view of the police and the work they did, so they would see the police as a necessary part of society, to be respected but not feared.
As a result, the teacher and the police organised a police day at the school. It was a big party, with lots of free goodies, badges and posters, rides in patrol cars, sirens, interesting stories and a movie, and a big discussion with the police dog handler and his faithful and brave police-dog, Ajax. The police took special interest in Dave, he was the one they wanted to convince the most, and he was the one they made the most fuss of.
At the end of the day, the teacher again asked the children to write about what they got from the school police day that she had organised.
The following Monday, after all the essays had been handed in by the children, she sought out and read Dave’s essay, eager with anticipation.
This time it contained the surprising phrase of “I really, really don’t think much of the police.”
Again, the teacher asked Dave why he had written what he had wrote, especially considering all the effort the police had gone to in order to leave a good and lasting impression with the children in general, and Dave in particular.
He simply replied “the Police is cunning bastards, Miss.”
Personally, I have respect for the professionalism, courage and hard work of many officers in our police forces, but when it comes to my view of certain Big Data pundits – and naming no names, just watch my eyes - the feeling is not the same.
Make of that what you will.
Many thanks for reading.
If you enjoyed this piece or found it useful then please consider joining The Big Data Contrarians: https://www.linkedin.com/grp/home?gid=8338976
Many thanks,
Martyn.