Friday, October 30, 2020
The Art of Utilizing Connections In Your Data

Recent Posts

Recent Comments




Pulling Out All The Stops

September 29th, 2018 by jwubbel-admin

It would seem on the outside that mining Big Data and machine learning from it is very possibly the picking of the low hanging fruit. What about small data from complex processes or methods? I am not so sure much has been written about the more difficult problems regarding success or failures in learning or discovery.

I allude to this notion of small data in my book JMP CONNECTIONS. If you choose to tackle such a scenario, one really must pull out all the stops. Assuming you have achieved a very clean large data set from a lengthy complex process containing many sub-processes, here is how I have gone about putting the squeeze on my small data sets.

1. Know that you will be required to have perhaps several series of meetings with your data science team.
2. Know that in those meetings you will have to bring in certain people “Just-In-Time” fashion such as statisticians, subject matter experts, scientists and engineers to supplement knowledge, confirm assumptions and asking for evaluations.
3. Pick a realistic goal, are we trying to find insight about the process that is not currently known? Are we wanting to provide an analytical tool to production folks? Maybe maintain a consist level of yields even though the process has potential variability at times? Everyone likes it when yields are high. What is it do we want to predict?

As it turns out one might have thousands of parameters collected so consequently you have to go about the work of reducing your feature set. Initially you can do this in JMP with their Process Screening platform because it gives you the opportunity to learn more about your data. Once a goal has been selected, the team might zero in on a particular sub-process at which point the data set can all of sudden become smaller. The subject matter experts can explain the sub-process and the associated data, but the scientist or engineer will want to identify several y Response variables to make predictions on.

So the smaller data set now as potential factors may or may not have anything to do with what it is you want to predict when it comes to building a model. You could import the data set into a Multivariate Analysis to visually get a graphical picture or use the JMP PCA at which point you may discover an outlier that needs to be explained or excluded from the data set. There also happens to be a Prediction Screening tool in JMP that may quickly show you what likely factors have no impact when building a model on the y Response variable.

Now it happens that one is faced with a wide array of methods to choose from. Do not let that intimidate you. Pick one to start off like the JMP Neural Net (NN). You can use it with categorical or numeric continuous data sets to build your first model. Specify your y Response and the set of factors from your small data set and go to the model launcher to configure it prior to running the NN algorithm. Given the data set is small select kfold as the first approach to avoid overfitting the data.

It is important early on to start conducting demos with the appropriate team members and end users. Remember, they have never had a different look at their data nor how close the prediction is to the actual where the JMP PRO does the validation or training on the model. Know that model evaluation is a must prior to a decision to use the model in production.

When the team gets a chance to experiment with the Prediction Profiler inevitably someone will remember something data wise that is missing, something upstream to the sub-process. Or, two additional variables that could also be predicted. This is the time to start making a list of all the possible models that could enhance and control the sub-process to optimize yields. It gives everyone a chance to look at it from different angles. Think of it as the analog to a music recording studio where getting the setting on the mixers just right to make the perfect recording. It could be a transfer learning model set is needed where a prediction from one model is an input to the second model. Or, someone will speak up to say that in order to find the optimal settings for 3 predictor responses one needs a simulator. Low and behold the Prediction Profiler in JMP has a Simulator function.

Pulling out all the stops is the best way of putting this into perspective when data sets are small around a complex process. Once a model or several models are deployed to production evaluation continues as they are in use. Comparative study will help refine or fine tune model configurations for future build and training. The basis of course for doing all this work is a good business case that has a positive financial measurable component to justify your investment of time and talent.

Posted in Factors, Feature Reduction, Model Building, Neural Net, Prediction, Response Variable, Screening, Small Data | No Comments »

Turbo Charged Data

July 25th, 2018 by jwubbel-admin

I offer one-on-one consulting to executives that have the Data Science Vision for their enterprise but not necessarily the technical “where with all” to know where to start or the logical technical path forward in a cost effective manner. Otherwise, a data science project could be within the scope failure mode similar to the early days of many client/server application projects that never made the cut to a successful deployment. Consulting to start, save or salvage a project can make all the difference in the world where a successful data science initiative will be self propagating or become viral in the enterprise once key milestones show great returns.

Turbo Charged Data might mean using Vector Analysis tools like ANOVA (Analysis of Variance) and regression applied to non-orthogonal observational data matrices. This is called data mining. Before you get all excited about those possibilities, enabling data to be empowered starts with the vision and support of executives that oversees the big execution picture within the enterprise. It is extremely difficult to try to propagate the business case from the bottom up by experts in the organization to the C-Suite in charge due to the amount of buffering between the organizational layers and cross functional walls. Breaking on through to the other side is cumbersome or it may simply upset individuals that think you are going around them and catching them off guard.

If a firm is just starting out with machine learning initiatives, they need to go for those projects that support the primary key performance indicators for which executives rely upon to make business decisions. And some of those might be very complex such as accurately predicting “Time-To-Market” on seasonally manufactured, formulated and fulfilled products. One of the key points I make is to advise executive clients to keep a finger on the initiative because for some reason people feel there is not a need to let the higher levels of management know how processes are performing. The excuse is we will let upper management stay focused on the big picture and when there is a problem we will notify them. Usually the notification comes to late to manage. In my opinion a predictive analytical value should be a continuous metric that supports the KPI because like the weather the environment is constantly changing and the early prediction is there to augment intelligence around decision making.

Posted in Data Industrialization, Data Mining, Predictive Analytics | No Comments »

When You Need To Score Big!

July 22nd, 2018 by jwubbel-admin

In my previous blog article and based on solid experience around process monitoring, the SPCE Engine or Statistical Process Control Engine even as a uni-variate level of monitoring, the SPCE goes a long ways toward making your data ready for advance analytics as a function of its outputs. The typical classical statistics approach is to perhaps lessen the number of univariate charts and spend cash developing multivariate models for production use. However in reality, with the software tools available today more thought should be given to predictive and prescriptive analytics. Consequently the thinking around the topic of machine learning is actually a departure from the classical statistics approach.

The proof in the data science industry today can be seen in state of the art conferences or summits. I recently attended the AWS Amazon Summit in NYC where the turn out was beyond phenominal. The broad base of on-line tools for building infrastructure in the Cloud and the overwhelming interest in machine learning by the attendees would tell just how fast the world is marching onward in the direction of advance analytics. The number of very aggressive and progressive companies engaged is impressive. By engaged, I mean they are solid Amazon customers.

Large enterprises not already engaged or experimenting with AWS technologies are probably already behind the eight ball particularly from a competitive standpoint. I may also say here that it can be easy to be taken in or sold on AWS because they make it look so easy. And in fact, many facets are facilitated in ease of use, deployment to production and metered charges to reduce costs. But the fact still remains, data modeling is still a hard intellectual work product activity, particular when your process is complex to begin with. Iris demonstrations or pre-built data sets used in learning and model construction serve to show the progression or production cycle to actually deploy a model into production. Or, explain the algorithm used in solving a certain problem.

So what is the bridge to get from what you think is a good starting big data set to a model deployment in the Cloud and how do you cross the bridge?

JMP Pro happens to be the tool to quickly make that transition and transit over the bridge. Let’s say you have 5,500 process parameters and data records or rows going back 5 years with new data coming in daily. The first step is to think about what sub-process is the most important requiring further understanding and insight. Yield is a vary common attribute in many industries that may have variability and requires better optimization. So the first step might be to bring your subject matter experts in to discuss selection of parameters with a Y Response parameter in mind. The Y is what you want to predict. Alternatively if experts are not available, you can sit down with batch process records and use the JMP Pro Process Screening feature to study the data landscape. It should be relatively easy to exclude parameters not associated with the target sub-process knowing that you might have to revisit with more screening later around potential upstream process data to your sub-process that could have an impact.

Now that you are on your way to reducing the feature set, you can use some of the other JMP Pro features to further reduce potential parameters that have absolutely no contribution as a factor that would impact the response. These features include Principal Component Analysis (PCA), preliminary Multivariate or Predictor Screening. It is sufficient to say here that the type of model building you want to do could be Neural Network, Partitioning or Decision Tree. Thus, you would also likely have an idea as to whether your data is categorical, continuous or a little of both prior to the selection of which JMP Platform to utilize.

At this point, I usually start looking a second time at exactly how clean the data set is once it is subset down to the candidate parameters. Additionally, I can not emphasize enough how beneficial it is to be holding meetings with subject matter experts, process monitoring people etc., over the course of this work for review and refinement of selecting the parameters. The reason for the importance, review is the feedback on the relationships that exist throughout a process, the data semantics and dependencies uncovering both unknown knowledge about the process and clarifies false assumptions on how things work. Often you will find an excluded parameter that absolutely needs to be included. It has to be a team effort because as a data scientist your software engineering and statistical analysis skill sets probably does not include being an expert about the process.

At this point the data science team can take the data set and start building models. If we go into the Neural Net (NN) platform on JMP Pro you will likely run many models to be looking for the best model to use in production. These should be reviewed by the same people that gave input on the factors that would be required for the NN to learn. The Model Launch Dialog is configurable which will also change the type of learning that takes place when you select “Go” to build the model in terms of the layers and depth of learning.

If a team is developing multiple models, they can go into the Predictive Modeling platform and do Model Comparison. Again, one can do so much intellectual work about the data before getting to a production deployment stage. Your model can be incorporated back into your SPCE Engine data table to make predictions on Y Response variables as in-process mechanism for monitoring the process.

At the point when it has been decided on a particular model to use in production, you have the ability to “Publish Prediction Formulas” which takes you to the Formula Depot in JMP Pro. This feature facilitates generating code for external use in the C Programming Language, Python, Javascript, SAS Code or SQL Code. Thus, if JMP generates code in Python, it opens a wide ranging choice of deployment options on different architectures such as serverless frameworks like AWS Lambda. Or perhaps you are wanting to do real-time monitoring inline on process historian data. The process of applying a model to new data is known as scoring. Scoring is the process of using a model created by a data analysis application like JMP Pro to make predictions on new data, the sight unseen work horse of data mining.

Posted in Data Mining, Predictive PM, Scoring | No Comments »

SPCE – Engines To Power The Machines

June 5th, 2018 by jwubbel-admin

JMP CONNECTIONS is about the art of using your data in business, a take on the maturity of the information in the enterprise or perhaps better yet organizational maturity. While it might make us smarter or more informed, nothing can really substitute for experience and good judgment that results in making optimal decisions. Unfortunately experience can be undervalued in many corporate enterprises today. It may show up as job experience on a resume. That though does not equate to the type of experience I am speaking about here. Experience, the measure of which is easy to tell because it is likely not rewarded and promoted where it is most useful in business units or departments internally. Thus, it is a sure mark of resource immaturity across the enterprise with regard to human resource utilization and allowing those experienced individuals to use the connections in the data to make decisions.

Toward the effort to build models and incorporate the use of Artificial Intelligence or that branch known as Machine Learning (ML), most of the literature repeats across the media outlets, getting your data ready is 80% of the work.

So one method of quickly completing the 80% is to start monitoring your process data. Whether that is business processes, clinical process data, manufacturing processes or customer service, monitoring will quickly force data gathering, cleaning and preparatory tasks necessary to achieve clean data sets for doing the analytics. I felt that making the CONNECTIONS in JMP was so important, we developed the SPCE or Statistical Process Control Engine. SPCE is an automated program written in JSL that processes thousands of parameters very quickly. Basic engine functions calculate parameters on the fly, generates appropriate charting, alerts and outputs a Wide Data Table containing all the parameters processed by the engine. The Wide Data Table is very clean and data ready for use by other peripheral JMP scripts for extended analysis. It is ready for doing multivariate analysis but most importantly it is real-time ready for Model Building. The first step in building the model is selecting the feature set. Whether you select parameters through manual review or a technique such as PCA, you are now entering that 20% area of utilizing your data. The advantage of SPCE data table outputs is it allows the subject matter experts and process monitoring teams to review the data such that SPCE Engine modified directives over time can refine the engine performance and outputs. As a result this goes back to what I wrote in the JMP CONNECTIONS about elevating the capability maturity model on your enterprise data.

So for example, if you are using the Neural Net platform in JMP to build a model on a subset of the data table generated by SPCE, you can now incorporate that finished model back into the SPCE as a formula on a column for predicting a variable of particular interest. This feedback loop makes the SPCE a ML like tool that is easy to understand, extensible and practical from a cost standpoint. So subtle is the gain in experience people will achieve as outcomes from decisions as evidence; because the process can be adjusted, the model can be re-evaluated as well as parameter control criteria, people can hone their combined objective, subjective and empirical experiences around the knowledge or insight gained to make great decisions, judgment calls or even on target “Right First Time” process execution.

Posted in Data Industrialization | No Comments »

Where Does The Story Go From Here?

January 21st, 2018 by jwubbel-admin

JMP CONNECTIONS was intended to elevate the utilization of enterprise data to an advanced capability model or vision. Thus, to bring your organization to the edge of the Data Science world of practical intelligence augmentation, machine learning algorithms and artificial intelligence methods, the real beginning of making the connections that really matter to your business beyond classical statistics begins. At this cusp in the maturation of the data from simple information to actionable knowledge, statistics plays an advanced role in such areas as machine learning. Once strictly a science, more and more computing and machine learning power approaches or methods used to solve problems have proven themselves. So, it now becomes an art form in figuring out how to apply what has been learned from the data science to solve our own unique business problems and questions using the data that has been carefully cultivated through the JMP CONNECTIONS story.

The JMP Pro product is a good platform to be looking at the advance predictive tools offered with tutorials, web based instructions or JMP Summit demonstrations, clinics or training. However, as this blog evolves, our consultancy will be offering web based professional development programs for corporate customers to teach a business intelligence competency organization how to make effective use of data science such as active machine learning to ferret out and harvest the optimal connections an organization should be seeking in order to derive the value hidden therein. We coined the term “Connections In Your Data” because once data is structured, for example in supply chains, it is plainly evident, those connections visually come to light in tools such as graph browsers.

Making connections in your data is not the end of the story. Using those connections is the story we want to show and tell for the purpose of developing the intellectual work product of the competency center teams in the enterprise.

Posted in Storyboard | No Comments »

Value For The Money Spent

November 24th, 2017 by jwubbel-admin

The Art of Making Connections In Your Data comes from my deep interest in the Data Science field that includes such things as machine learning. At the most elementary level for most businesses data can be very much under utilized or easy to ignore all together. JMP CONNECTIONS does not go technically deep into the science aspects or what it means to perform data mining. Rather, the conversation strives to internalize for the reader that organizing business data allows for the realization of knowledge or optimized insights for better business decision making at a foundational level leading to higher capability maturity for improving business performance.

A Pennyworth investment in this book will give you many returns and can be obtained from the following locations:

John Wiley & SAS Business Series

SAS The Power To Know


Of course it goes without saying that a great statistics application like JMP could quickly become your prime business tool of choice.


Posted in Pennyworth | No Comments »

Making Your Connections

November 7th, 2017 by jwubbel-admin

Welcome to my blog, the first post to introduce you to my new book titled JMP CONNECTIONS published by John Wiley and SAS Business Series. The Art of Utilizing Connections In Your Data was released on October 23rd and I cannot tell you how excited I am to have a place to converse about one of my favorite topics around the science of data.

Making your connections is not about changing planes to arrive at your destination on time. However, in a simplistic way and perhaps unconsciously a few pieces of data are involved to make that a success, that is arriving at the gate on time for a connecting flight.

As my Word Press Theme symbolically implies, a bridge across the water connecting two cities leaves no gap. And so it is with data the connections that can be formed or the semantic link between nodes becomes the knowledge bridge for making better informed decisions in business and life in general.

As a forum, I will be communicating about subject matters of data science. A broad range of topics to be posted include structuring data, active machine learning, statistical process control and advancing metrics forward toward practical applications. Through the facility of this blog in reference to JMP CONNECTIONS, Video clips, Webinars, Power Point Presentations as well as Open Source referential works we will try to make this a bi-directional channel for learning and advancing the collective knowledge for all of our readership.

Posted in Uncategorized | No Comments »