Tuesday, March 20, 2012

Preparing for Predictive Analytics

It’s been awhile since I have written here because I’ve been heads down on project work and have just had a chance to come up for air. I want to begin with some short posts to discuss a number of challenges that I’m currently working through. The major theme will focus on how companies can prepare for predictive analytics. If you aren’t familiar with the term, here’s a quick link to the Wikipedia topic to get you started.

About a year ago I joined a small company that is focused on delivering top-notch predictive analytics software to address some very specific commercial and educational market needs. Since joining the company as the solution architect, I have found that the single most complex and dynamic part of our engagement and solution delivery process is data acquisition, normalization, and access. This is largely because of the diversity in the platforms, technologies, and applications written for and used by our clients.

We work with our clients to analyze their data for the purposes of building predictive models. The requirements for model building are pretty straightforward. We need a clean and consistent view of the data - requirements that are not unlike any other analytical or reporting process needs. But the complexity of today’s enterprise environments make this more challenging. Additionally, we aren’t looking at a snapshot of data at a single point in time. We are looking at the data in real time or near real time in many cases. We are often working with both structured and unstructured data that come in a myriad of formats and accessed using many protocols. Everything from relational data in databases, to web services, to flat files, spreadsheets, etc. All of the information needs to be identified, cataloged, gathered, date/time stamped, and recorded for time based analysis.

Clients that understand master data management and have sound data governance policies are easier to work with because they understand the value of their data and most importantly how to get it. At the other end of the spectrum are those companies that have their data in many disparate systems, have no data governance or ownership policies, and don’t know the value of their data. Getting access to their data and getting it into a clean and consistent form can be quite a challenge.

Therefore, my next few posts will talk about the challenges we are facing and our approach to solving the data integration and normalization needs for a predictive analytics solution. My hope is that the information you find here will help you prepare for using predictive analytics in your organization to improve and optimize the decision making you do on a daily basis using one of your company’s most valuable assets – your data.

/imapcgeek