INTRODUCTION – Text Summary with PowerShell :

This is a PowerShell script to summarize long text document(s) depending upon your chosen word limit, it utilizes an algorithm which looks for parameters like Important words and Common content to score each sentence in order to generate a summary of the highest scored sentences in the sequence of there occurrence in the content.

HOW IT WORKS :

GET THE CONTENT : Get contents of a File or from ClipBoard and store it in a temporary variable SPLIT INTO SENTENCES : Split the complete document into sentences using Newline string object and remove empty or blank lines. RANK EACH SENTENCE : Once you’ve all sentences, rank each sentence in content, with scores mainly depending on two main following criterias – IMPORTANT WORDS : To identify important words in the content, calculate the frequency distribution for each word in the content, remove words smaller than 3 alphabets (Example – “The”,”Are”, “For”, “As” etc) and group them, sort them by count and select top 10 Important words and Give them a weight in multiples of Frequency of that word in content COMMON WORDS IN EACH SENTENCE: Now in order to get an idea how many words which in each sentence is common to all others sentences, we find Intersection of each sentence to every other sentence in the content. i.e. Scoring each Sentence on basis of words common in every other sentence, more a sentence has common words compared to all other sentences, more it defines/summarizes the complete document SELECT THE BEST SENTENCES :



Once we’ve scored each sentences using above to parameters, we should add these individual scores ( CommonContentScore + ImportanceScore = SentenceScore ) and sort sentences from highest score to lowest score. Count the words in each sentence and select only highest scored ones within the word limit. NOTE: It is a must to order Best sentences in the sequence of their actual occurrences in content so that they make more sense. Otherwise, they will look jumbled and won’t be like a summary. OUTPUT SUMMARY: Display best sentences on the screen in form of a paragraph, which will be the summary of the complete document

INSTALLATION:

The module is available on Powershell gallery and you can install it directly if you have PowerShell V5

SCRIPT:

You can also download the module from TechNet or from my GitHub Repository here

View the code on Gist.

HOW TO RUN IT :

Once you’ve downloaded the module, import it in your Powershell host session like below

Provide a path to a text file in the cmdlet and it will generate a summary for you, by default it summarizes it to less than equal to 100 words.

You can also provide a value to ‘-WordLimit’ parameter to increase or decrease the length of the summary.

Or, mention a ‘-Verbose’ switch to view summarization ratio, i.e, Original number of words to number of words in Summary.

You can also Use ‘-FromClipboard’ switch to summarize the content copied to the clipboard

If you find this script useful you may also like my previous blog post on highlighting keywords in PowerShell console, which work very well with this module, here is a screenshot of both working together

Follow @SinghPrateik



