You can access the raw code at: https://github.com/Tacosushi/SVM-for-polymer-phase-seperation Check out my personal site at: koshu.me

Part 1 Overview:

Here I create a first pass SVM model to see whether SVMs can theoretically be used to predict phase separation of polymer blends. I will be using Flory Huggin’s original derivation where the chi parameter is not a function of concentration nor temperature. A short comment on how the program can be applied to more research oriented questions will be described below.

Part 2 Implementation:

I plan on using Flory Huggin’s equation and using the criteria that the Gibbs phase of mixing < 0 as well as its 2nd derivative with respect to the volume fraction of species 1 > 0 as a criteria for mixture. Random values of chi, temperature, volume fraction of species 1, as well as molecular weight of species 1 and 2 will be chosen to generate a test set. Because this is merely an approximation, I am not too worried about the physical realisticness of my values.

Part 1: Creating the data

We start by importing our dependencies (numpy, pandas, and random)

I then define the number of datapoints I want in num_val.

Chi, mw1, mw2, frac represent in order: flory huggins parameter, molecular weight of species 1, molecular weight of species 2, and the volume fraction of species 1.

Gibbsfree and sec_gibbs are the arrays where I temporarily hold the values for the change in gibbs free energy and its 2nd derivatie with respect to volume fraction.

The phase array tells me whether the system has phase separated or not.

The function gibbs_free calculates the gibbs free change in energy. This comes from the equation Flory and Huggins derived.

I then use a numerical method to calculate the derivative of the gibbs free energy, and another numerical method to calculate its second derivative.

After that, I generate random chi, mw1, mw2, and volume fraction values. I generated values for them using the random uniform function between values I desired. I use volume fractions between 0 and 1 (effectively). And chose mw (chain length) of species ranging from 1 and 50, 51 and 500, 501 and 5000, and 5001 and 50000 because I wanted to separate the “classes” of molecular weights I used. I also used a chi value between -0.2 and 0.5 because these are physically realistic values.

Phase separation was determined by calculating the change in gibbs free energy, and its second derivative. If the change in gibbs free energy was negative and its second derivaitive was positive, phase separation occurs, otherwise, no phase separation occurs.

Finally, I created a decoy value for my SVM and saved this all into a dataframe/csv called flory.csv.

Part 2: Creating a SVM

I create my SVM using scikit and the following code.

I begin by importing my dependencies numpy, pandas, and matplotlib (which I don’t end up using). I also import scikit’s train_test_split, as well as the svm model. I also import the accuracy module and a minmax scaler.

Having imported my dependencies, I import the phase separation file I previously made. I separate the classifier from the features into an X and y array/dataframe.

I then scale my values so they normalize from 0 to 1. I do this because otherwise my program would take a really long time trying to manipulate the mw column where the numbers range from 1 to 500,000.

Lines 16–22 help me separate out my values into a test, train set.

Afterwards I create a SVM model in lines 37–39, where I declare what type of SVM model I want to use as well as what cost and gamma value I want. I iterate through different cost and gamma values to see which one gives me the highest accuracy.

I finally end by saving the accuracy into a file csv file.

Part 3: Results of my SVM

I redid the above program using four SVM models.

1. Linear 2. Radial distribution 3. 2nd order polynomial 4. 3rd order polynomial

I also changed the amount of data I used to test/train the model. I used 80, 400, and 800 datapoints to test my model. I also tested the model with data that included dummy variables as well as without dummy variables to see how the results would change, and they are all shown below.

For each heat map, as we go from right to left and from top to bottom the C and gamma values increase. The C values are represented horizontally and gamma increases vertically.

It seems that despite really high cost and gamma values that the system did not overfit, which is surprising.

We also notice that when there are no dummy variables that the system seems to perform better. This means that in actual practice, being able to reduce the features needed, and to only use features that are meaningful will be highly useful.

We also notice that the 2nd order polynomial kernel performed better than the 3rd order polynomial. There also seems to be a limit to how many datapoints is actually needed to conduct a sufficient SVM model. And 500 seemed plentiful.

Part 4: Comments

I only mentioned some of the more noteworthy and immediate findings of my SVM model, but you can look further into it if you want.

From this experiment, I do believe that, from a first pass perspective, it is possible to use machine learning methods to predict fundamental phase separation behavior in physically real systems. A more rigorous experiment would be to change the Flory Huggins equations to include that the chi parameter changes as a function of species concentration as well as temperature. Also because experiments tend to have standard deviations, it might be useful to generate some noise use a Gaussian model, and then retest to see whether a SVM can predict accurate phase separation data.