Friday, October 30, 2020
The Art of Utilizing Connections In Your Data

Recent Posts

Recent Comments




Turbo Charged Data

July 25th, 2018 by jwubbel-admin

I offer one-on-one consulting to executives that have the Data Science Vision for their enterprise but not necessarily the technical “where with all” to know where to start or the logical technical path forward in a cost effective manner. Otherwise, a data science project could be within the scope failure mode similar to the early days of many client/server application projects that never made the cut to a successful deployment. Consulting to start, save or salvage a project can make all the difference in the world where a successful data science initiative will be self propagating or become viral in the enterprise once key milestones show great returns.

Turbo Charged Data might mean using Vector Analysis tools like ANOVA (Analysis of Variance) and regression applied to non-orthogonal observational data matrices. This is called data mining. Before you get all excited about those possibilities, enabling data to be empowered starts with the vision and support of executives that oversees the big execution picture within the enterprise. It is extremely difficult to try to propagate the business case from the bottom up by experts in the organization to the C-Suite in charge due to the amount of buffering between the organizational layers and cross functional walls. Breaking on through to the other side is cumbersome or it may simply upset individuals that think you are going around them and catching them off guard.

If a firm is just starting out with machine learning initiatives, they need to go for those projects that support the primary key performance indicators for which executives rely upon to make business decisions. And some of those might be very complex such as accurately predicting “Time-To-Market” on seasonally manufactured, formulated and fulfilled products. One of the key points I make is to advise executive clients to keep a finger on the initiative because for some reason people feel there is not a need to let the higher levels of management know how processes are performing. The excuse is we will let upper management stay focused on the big picture and when there is a problem we will notify them. Usually the notification comes to late to manage. In my opinion a predictive analytical value should be a continuous metric that supports the KPI because like the weather the environment is constantly changing and the early prediction is there to augment intelligence around decision making.

Posted in Data Industrialization, Data Mining, Predictive Analytics | No Comments »

When You Need To Score Big!

July 22nd, 2018 by jwubbel-admin

In my previous blog article and based on solid experience around process monitoring, the SPCE Engine or Statistical Process Control Engine even as a uni-variate level of monitoring, the SPCE goes a long ways toward making your data ready for advance analytics as a function of its outputs. The typical classical statistics approach is to perhaps lessen the number of univariate charts and spend cash developing multivariate models for production use. However in reality, with the software tools available today more thought should be given to predictive and prescriptive analytics. Consequently the thinking around the topic of machine learning is actually a departure from the classical statistics approach.

The proof in the data science industry today can be seen in state of the art conferences or summits. I recently attended the AWS Amazon Summit in NYC where the turn out was beyond phenominal. The broad base of on-line tools for building infrastructure in the Cloud and the overwhelming interest in machine learning by the attendees would tell just how fast the world is marching onward in the direction of advance analytics. The number of very aggressive and progressive companies engaged is impressive. By engaged, I mean they are solid Amazon customers.

Large enterprises not already engaged or experimenting with AWS technologies are probably already behind the eight ball particularly from a competitive standpoint. I may also say here that it can be easy to be taken in or sold on AWS because they make it look so easy. And in fact, many facets are facilitated in ease of use, deployment to production and metered charges to reduce costs. But the fact still remains, data modeling is still a hard intellectual work product activity, particular when your process is complex to begin with. Iris demonstrations or pre-built data sets used in learning and model construction serve to show the progression or production cycle to actually deploy a model into production. Or, explain the algorithm used in solving a certain problem.

So what is the bridge to get from what you think is a good starting big data set to a model deployment in the Cloud and how do you cross the bridge?

JMP Pro happens to be the tool to quickly make that transition and transit over the bridge. Let’s say you have 5,500 process parameters and data records or rows going back 5 years with new data coming in daily. The first step is to think about what sub-process is the most important requiring further understanding and insight. Yield is a vary common attribute in many industries that may have variability and requires better optimization. So the first step might be to bring your subject matter experts in to discuss selection of parameters with a Y Response parameter in mind. The Y is what you want to predict. Alternatively if experts are not available, you can sit down with batch process records and use the JMP Pro Process Screening feature to study the data landscape. It should be relatively easy to exclude parameters not associated with the target sub-process knowing that you might have to revisit with more screening later around potential upstream process data to your sub-process that could have an impact.

Now that you are on your way to reducing the feature set, you can use some of the other JMP Pro features to further reduce potential parameters that have absolutely no contribution as a factor that would impact the response. These features include Principal Component Analysis (PCA), preliminary Multivariate or Predictor Screening. It is sufficient to say here that the type of model building you want to do could be Neural Network, Partitioning or Decision Tree. Thus, you would also likely have an idea as to whether your data is categorical, continuous or a little of both prior to the selection of which JMP Platform to utilize.

At this point, I usually start looking a second time at exactly how clean the data set is once it is subset down to the candidate parameters. Additionally, I can not emphasize enough how beneficial it is to be holding meetings with subject matter experts, process monitoring people etc., over the course of this work for review and refinement of selecting the parameters. The reason for the importance, review is the feedback on the relationships that exist throughout a process, the data semantics and dependencies uncovering both unknown knowledge about the process and clarifies false assumptions on how things work. Often you will find an excluded parameter that absolutely needs to be included. It has to be a team effort because as a data scientist your software engineering and statistical analysis skill sets probably does not include being an expert about the process.

At this point the data science team can take the data set and start building models. If we go into the Neural Net (NN) platform on JMP Pro you will likely run many models to be looking for the best model to use in production. These should be reviewed by the same people that gave input on the factors that would be required for the NN to learn. The Model Launch Dialog is configurable which will also change the type of learning that takes place when you select “Go” to build the model in terms of the layers and depth of learning.

If a team is developing multiple models, they can go into the Predictive Modeling platform and do Model Comparison. Again, one can do so much intellectual work about the data before getting to a production deployment stage. Your model can be incorporated back into your SPCE Engine data table to make predictions on Y Response variables as in-process mechanism for monitoring the process.

At the point when it has been decided on a particular model to use in production, you have the ability to “Publish Prediction Formulas” which takes you to the Formula Depot in JMP Pro. This feature facilitates generating code for external use in the C Programming Language, Python, Javascript, SAS Code or SQL Code. Thus, if JMP generates code in Python, it opens a wide ranging choice of deployment options on different architectures such as serverless frameworks like AWS Lambda. Or perhaps you are wanting to do real-time monitoring inline on process historian data. The process of applying a model to new data is known as scoring. Scoring is the process of using a model created by a data analysis application like JMP Pro to make predictions on new data, the sight unseen work horse of data mining.

Posted in Data Mining, Predictive PM, Scoring | No Comments »