





[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Introduction

Solution

Data Model

Metadata

Setup

TME: Epoch time, CLNG: Calling number, CLED: Called number, FCLED: Call forwarded number (80000 is voicemail), TYP: Call type, ANS: Answered or Voicemail, FWD: Forwarded number, CPNY:Company name TME,CLNG,CLED,FCLED,DUR,TYP,ANS,FWD,CPNY 1449158574,4045551212,4041212555,80000,70,Incoming,VM,FWD,AmeyaSoft 1449159132,9195551212,8191212555,8191212555,458,Outgoing,Ans,Orig,AmeyaSoft 1449159135,8045551212,5401212555,80000,304,Incoming,VM,FWD,AmeyaSoft 1449159146,2705551212,8151212555,8151212555,451,Incoming,Ans,FWD,AmeyaSoft

CREATE CONSTRAINT ON ( connect:Connect ) ASSERT connect.id IS UNIQUE; USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/.../CDR_Gist3.csv" As line WITH line, apoc.date.format((ToInteger(line.TME)),('s'), ('yyyy/MM/dd'), ('PST')) as dates WITH line, dates, REPLACE(dates, "/", "") as dt MERGE (company:Company {name: line.CPNY}) MERGE (calldate:CallDate {id: toInt(dt)}) MERGE (company)-[:CALL_DATE]->(calldate) MERGE (connect:Connect {id: TOINT(line.TME)}) MERGE (calldate)-[:CONNECT_TIME {date: toInt(dt)}]->(connect) FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Incoming" THEN [1] ELSE [] END | FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (calling:IncomingFrom {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:INCOMING]->(calling) CREATE (calling)-[t:CALL_TO]->(called) SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END, r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END ) FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (callingvm:IncomingVM {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:VOICEMAIL {call: "Incoming"}]->(callingvm) CREATE (callingvm)-[:CALL_TO]->(called) SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END ) ) FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Outgoing" THEN [1] ELSE [] END | FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (calling:CallFrom {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:OUTGOING]->(calling) CREATE (calling)-[t:CALL_TO]->(called) SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END, r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END ) FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (callingvm:VoiceCall {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:VOICEMAIL {call: "Outgoing"}]->(callingvm) CREATE (callingvm)-[:CALL_TO]->(called) SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END ) );

// Added 152 labels, created 152 nodes, set 422 properties, created 151 relationships

// graph

Incoming Calls

MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n:Connect)-[r:INCOMING]->(n1)-[]->(n2) RETURN path;

//graph_result

Voicemail

MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n:Connect)-[r:VOICEMAIL]->(n1)-[]->(n2) RETURN path;

//graph_result

Call Distributions

MATCH (n:Connect) WITH n, size((n)-[:INCOMING]->()) as Incoming, size((n)-[:OUTGOING]->()) as Outgoing, size((n)-[:VOICEMAIL]->()) as Voicemail RETURN count(*) as Total, sum(Incoming) as Incoming, sum(Outgoing) as Outgoing, sum(Voicemail) as Voicemail;

//table

Analysis of call detail records (CDR) provides greater insights into the telecommunications activity like inbound calls, outbound calls, dropped calls, abandoned calls and unanswered calls. This helps organizations track details on extensions, toll fraud and multiple locations, and it also helps them prevent productivity loss and control expenses.CDR data can reach big data proportions for some enterprise businesses. Updating very large tables in Microsoft SQL Server can be a time-consuming task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues. Neo4j has been used in the telecom industry before for both churn reduction and event log data , but both of these applications are at the telecom provider level.Here, I use Neo4j at the business level that consumes the services provided by the telecom provider.Call detail records data files come with about 120 columns. For this gist, I selected call time, calling number, called number, call forwarded number, call duration, call type (incoming/outgoing/voicemail), call answered flag, call forwarded flag and company name.: Answered or went to voicemail.: Forwarded, answered or went to voicemail.: Answered or went to voicemail.: Called.: Voicemail.This graph data model stores call detail records (CDR) in nodes, relationships and properties. Call date is modelled as an integer following the style of this article Here is a sample of data in the CSV file: Cypher query to setup the database (I used only 50 call records):Call detail records provide detailed logs about every phone activity and this allows a business to make better management and personnel decisions by analyzing patterns and trends. A report can also be generated on money spent on calling minutes by a specific department or a business unit at a different location.Employee utilization is an extremely important part of a business. A report on all incoming calls received by each employee can show call volume and minutes for each employee. Higher call volume indicates the need to add more resources for better customer service.A busy network or not enough personnel may result in a high volume of calls going to voicemail. This may lead to lost calls and ultimately this will lead to losing good customers. Analysis of all unanswered calls allows a business to make better management and personnel decisions.An organization can use CDR data to verify billing disputes and keep track of the usage of phone systems for the entire organization. A report on total calls grouped by date, call type (incoming/outgoing/unanswered/toll free calls/conference calls) gives an aggregate view of all the calling activities.Total incoming, unanswered, and outgoing calls:

Total incoming, voicemail, and outgoing calls by each employee. This way employees performance can be evaluated by managers.

//Total calls by phone number MATCH (n:Connect)-[]->()-[:CALL_TO]->(n2:CallTo) RETURN "Total" as Type,n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC UNION MATCH (n:Connect)-[r]->(n1)-[:CALL_TO]->(n2:CallTo) RETURN type(r) as Type, n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC;

//table

Total call count per employee

Incoming calls per employee

Outgoing calls per employee

Voicemail count per employee

More Details

MATCH (n:Connect)-[r:VOICEMAIL {call: 'Incoming'}]-(n1:VoiceCall) WITH DISTINCT n MATCH (n1:Connect)-[r1:VOICEMAIL {call: 'Outgoing'}]-(n2:VoiceCall) WHERE n1.id = n.id WITH DISTINCT n1 MATCH path = (x:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n2:Connect)-[r2:INCOMING|VOICEMAIL|OUTGOING]-> (n11)-[:CALL_TO]->(n21) WHERE n2.id = n1.id RETURN path;

Conclusion

Conference calls



Toll-free calls



Returned voicemail messages

Resources

Want to learn more about what you can do with graph databases? Click below to get your free copy the O’Reilly Graph Databases book and discover how to harness the power of graph technology.



Get My Free Copy

10 out of 11 calls to 4041212555 went to voicemail: Perhaps a signal for poor performance by the employee?In a multi-site organization, there will be a central location which receives the call detail records from every site. The data from each site will have a field to store a code that identifies the site.For a single-site organization, CDRs contain one record for each call time. For multi-site organizations, there can be multiple records (one from each site) for one call time. The following query displays this multi-site scenario (with a bigger dataset).Call detail records provide a wealth of information on telecommunications activity for an entire organization. Modeling the data with Neo4j provided orders of magnitude improvement in generating statistics of all call types compared to Microsoft SQL Server.In this gist, I covered only a few metrics. There are many other metrics to consider that I did not cover here, including:Dataset: https://github.com/kaisesha/cdrgraph/blob/master/CDR_Gist3.csv