One of my script is scheduled to download everyday the proxy logs files from multiple proxies (Approx 1>2GB per file) of the previous day. The second step is to parse each of them and get the top 200 domain names within a specific environment. Finally at the end of the month another script create a report on the monthly internet usage.

I thought this was an interesting exercise even if some tools would probably do a better job ($$$). Also we shouldn’t take those results too seriously since some protocols like Ajax or HTML5 talk a lot to the servers, keep refreshing pages even if you are not actively working on them.

In this post, I will talk about the last part of this process and how I combine all those files to get a real monthly top domains.

Example of daily report

Here is a sample of daily CSV file result after the parsing has done its job:

So at the end of the month I have a bunch of CSV files that were generated during the months and I need to know how many time each domains were accessed.

One-Liner Magic!

To calculate the Sum of each domains for each months I can simply use the following One-Liner:

Get-ChildItem -Path .\*.csv | # Get each CSV files ForEach-Object -Process { Import-Csv -Path $PSItem.FullName # Import CSV data } | Group-Object -Property Name | # Group per Domain Name Select-Object -Unique -Property Name, @{ Label = "Sum"; Expression = { # Sum all the counts for each domain ($PSItem.group | Measure-Object -Property Count -sum).Sum } } | Sort-Object -Property Sum -Descending | Out-GridView -Title "Top Domains"

Step by Step

Get all the CSV File

```

```

Now we import the data inside each file. I added a Sort-Object so you can see that we have duplicates (since we have the same values coming from different files).

Using Group-Object by Domain Name . This will group also all the different “counts” for each domain