There are many methods to import data into R. However, importing data into a matrix or data frame is only a mere step into the preparation. Within this R tutorial, we will create a data.frame instead of importing the data.

Many organizations perform employee yearly performance ratings within a few weeks into the new year and based on the employee ratings, employees may be able to be put up for promotion if they hit a certain rank.

Employee Promotion Rankings

Rank range between 19-25 are eligible for promotion Rank range between 14-18 are not eligible for promotion Rank range lower than or equal to 13 must perform a performance probationary period.

Employee Performance Rankings

Rating of 1 – Needs Improvement Rating of 2 – Not Meeting Expectations Rating of 3 – Meeting Expectations Rating of 4 – Exceeding Expectations Rating of 5 – Strongly Exceeding Expectations

Below is a table with the ratings for each of the five employees.

1 2 3 4 5 6 7 8 9 10 11 Employee ID Date Gender Age R1 R2 R3 R4 R5 A12OI 1 / 22 / 2018 F 55 5 4 5 4 3 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 J908R 1 / 25 / 2018 F 33 2 2 3 2 3 BNL98 1 / 22 / 2018 F 42 5 3 3 4 5 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 CO43R 1 / 26 / 2018 M 38 5 4 3 5 4

As you can see, each of the employees is rated by their boss in five separate areas of work to distinguish the task rating for each employee.

With the given above data, we can create the employee data frame. We will complete these by using the c() function which returns a vector (a one-dimensional array).

Input:

1 2 3 4 5 6 7 8 9 10 11 empid < - c ( "A12OI" , "C90R2" , "LOI98" , "M908Y" , "J908R" , "BNL98" , "EW09P" , "QA214" , "JU87Y" , "CO43R" ) date < - c ( "1/22/2018" , "1/23/2018" , "1/24/2018" , "1/25/2018" , "1/26/2018" , "1/22/2018" , "1/23/2018" , "1/24/2018" , "1/25/2018" , "1/26/2018" ) gender < - c ( "F" , "M" , "F" , "M" , "F" , "F" , "M" , "F" , "M" , "M" ) age < - c ( 54 , 37 , 23 , 43 , 33 , 42 , 58 , 31 , 22 , 38 ) r1 < - c ( 5 , 5 , 3 , 4 , 2 , 5 , 2 , 4 , 3 , 5 ) r2 < - c ( 4 , 3 , 5 , 3 , 2 , 3 , 3 , 4 , 4 , 4 ) r3 < - c ( 5 , 4 , 4 , 5 , 3 , 3 , 3 , 5 , 4 , 3 ) r4 < - c ( 4 , 3 , 3 , 3 , 2 , 4 , 2 , 3 , 4 , 5 ) r5 < - c ( 3 , 5 , 3 , 5 , 3 , 5 , 2 , 1 , 3 , 4 )

Now that the vectors are created, let’s move onto creating the data.frame with the data.

Input:

1 2 empratings < - data . frame ( empid , date , gender , age , r1 , r2 , r3 , r4 , r5 , stringsAsFactors = FALSE )

Creating the data.matrix is very simple as we only added the vectors that were created for each employee and will return a matrix.

Input:

1 empratings

Output:

1 2 3 4 5 6 7 8 9 10 11 empid date gender age r1 r2 r3 r4 r5 1 A12OI 1 / 22 / 2018 F 54 5 4 5 4 3 2 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 3 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 4 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 5 J908R 1 / 26 / 2018 F 33 2 2 3 2 3 6 BNL98 1 / 22 / 2018 F 42 5 3 3 4 5 7 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 8 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 9 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 10 CO43R 1 / 26 / 2018 F 38 5 4 3 5 4

Before we move to the next step, it’s a good idea to be familiar with the arithmetic and logical operators in R.

Arithmetic Operators

1 2 3 4 5 6 7 8 Operator Description Example - Subtraction 5 - 1 = 4 + Addition 5 + 1 = 6 * Multiplication 5 * 3 = 15 / Division 10 / 2 = 5 ^ or * * Exponentiation 2 * 2 * 2 * 2 * 2 as 2 to the power of 5 x % % y Modulus 5 % % 2 is 1 x % / % y Integer Division 5 % / % 2 is 2

Logical Operators

1 2 3 4 5 6 7 8 9 10 11 12 Operator Description Example < less than 5 < 10 <= less than or equal to <= 5 > greater than 10 > 5 >= greater than or equal to >= 10 == exactly equal to == 10 != not equal to != 5 ! x not x x < - c ( 5 ) , ! x x | y x or y x < - c ( 5 ) , y < - c ( 10 ) , x | y x & y x and y x < - c ( 5 ) , y < - c ( 10 ) , x & y isTRUE ( x ) tests whether x is true x < - TRUE , isTRUE ( x ) [ 1 ] FALSE

Learning the above arithmetic and logical operators will help a great deal in solving the next few tasks for this R tutorial.

With the above employee data and operators, we can now create an additional column in the empratings data.frame to sum the five employees ratings.

Input:

1 2 empratings $ total < - ( r1 + r2 + r3 + r4 + r5 ) empratings

As you will see below, the total column is added to the end of the total five rankings.

Output:

1 2 3 4 5 6 7 8 9 10 11 empid date gender age r1 r2 r3 r4 r5 total 1 A12OI 1 / 22 / 2018 F 54 5 4 5 4 3 21 2 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 20 3 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 18 4 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 20 5 J908R 1 / 26 / 2018 F 33 2 2 3 2 3 12 6 BNL98 1 / 22 / 2016 F 42 5 3 3 4 5 20 7 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 12 8 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 17 9 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 18 10 CO43R 1 / 26 / 2018 F 38 5 4 3 5 4 21

Before moving forward, let’s find the average rank of each individual’s ranks. We can accomplish this by taking the total of the ranks and dividing by the count of ranks.

Input:

1 2 empratings $ average < - ( empratings $ total / 5 ) empratings

Output:

1 2 3 4 5 6 7 8 9 10 11 empid date gender age r1 r2 r3 r4 r5 total average 1 A12OI 1 / 22 / 2018 F 54 5 4 5 4 3 21 4.2 2 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 20 4.0 3 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 18 3.6 4 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 20 4.0 5 J908R 1 / 26 / 2018 F 33 2 2 3 2 3 12 2.4 6 BNL98 1 / 22 / 2016 F 42 5 3 3 4 5 20 4.0 7 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 12 2.4 8 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 17 3.4 9 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 18 3.6 10 CO43R 1 / 26 / 2018 F 38 5 4 3 5 4 21 4.2

As you can see above, the mean column is now added to the empratings data matrix for each employee. Could this have any impact on the employee promotion? Promotions will most likely l be based on the total ranks and not the employee ranking average.

Now that we have the total of the rankings for each employee, let’s create an additional variable to categorize the three performance rankings.

Input:

1 2 3 4 5 6 empratings $ performance [ empratings $ total <= 25 & empratings $ total >= 19 ] < - "Promotion Eligible" empratings $ performance [ empratings $ total <= 18 & empratings $ total >= 14 ] < - "Not Promotion Eligible" empratings $ performance [ empratings $ total <= 13 ] < - "Performance Probation" empratings

Output:

1 2 3 4 5 6 7 8 9 10 11 empid date gender age r1 r2 r3 r4 r5 total average performance 1 A12OI 1 / 22 / 2018 F 54 5 4 5 4 3 21 4.2 Promotion Eligible 2 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 20 4.0 Promotion Eligible 3 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 18 3.6 Not Promotion Eligible 4 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 20 4.0 Promotion Eligible 5 J908R 1 / 26 / 2018 F 33 2 2 3 2 3 12 2.4 Performance Probation 6 BNL98 1 / 22 / 2016 F 42 5 3 3 4 5 20 4.0 Promotion Eligible 7 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 12 2.4 Performance Probation 8 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 17 3.4 Not Promotion Eligible 9 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 18 3.6 Not Promotion Eligible 10 CO43R 1 / 26 / 2018 F 38 5 4 3 5 4 21 4.2 Promotion Eligible

Now with the additional performance column, we can select observations by only pulling the employees that are Promotion Eligible, Not Promotion Eligible and Performance Probation.

Below are a few data.matrix created for selection to group each level() that we created:

Promotion Eligible Employees

Input:

1 2 promotionEligible < - empratings [ empratings $ performance == "Promotion Eligible" , ] promotionEligible

Output:

1 2 3 4 5 6 empid date gender age r1 r2 r3 r4 r5 total average performance 1 A12OI 1 / 22 / 2018 F 54 5 4 5 4 3 21 4.2 Promotion Eligible 2 C90R2 1 / 23 / 2018 M 37 5 3 4 3 5 20 4.0 Promotion Eligible 4 M908Y 1 / 25 / 2018 M 43 4 3 5 3 5 20 4.0 Promotion Eligible 6 BNL98 1 / 22 / 2016 F 42 5 3 3 4 5 20 4.0 Promotion Eligible 10 CO43R 1 / 26 / 2018 F 38 5 4 3 5 4 21 4.2 Promotion Eligible

Not Promotion Eligible Employees

Input:

1 2 notPromotionEligible < - empratings [ empratings $ performance == "Not Promotion Eligible" , ] notPromotionEligible

Output:

1 2 3 4 empid date gender age r1 r2 r3 r4 r5 total average performance 3 LOI98 1 / 24 / 2018 F 23 3 5 4 3 3 18 3.6 Not Promotion Eligible 8 QA214 1 / 24 / 2018 F 31 4 4 5 3 1 17 3.4 Not Promotion Eligible 9 JU87Y 1 / 25 / 2018 M 22 3 4 4 4 3 18 3.6 Not Promotion Eligible

Performance Probation Employees

Input:

1 2 performanceProbation < - empratings [ empratings $ performance == "Performance Probation" , ] performanceProbation

Output:

1 2 3 empid date gender age r1 r2 r3 r4 r5 total average performance 5 J908R 1 / 26 / 2018 F 33 2 2 3 2 3 12 2.4 Performance Probation 7 EW09P 1 / 23 / 2018 M 58 2 3 3 2 2 12 2.4 Performance Probation

Now with the given data above, a manager will be able to meet with each employee and discuss what actions need to be taken to increase employee performance throughout the new year.

With the newly added data for each employee, the manager could possibly want only data that’s necessary for the promotion. With this information given, the only data that would technically be needed is empid, date(ensure it’s the correct year), total and performance. We will be able to pull this data by using a subset().

Promotion Eligible subset()

Input:

1 2 3 promotionEligible < - subset ( empratings , performance == "Promotion Eligible" , select = c ( empid , date , total , performance ) ) promotionEligible

Output:

1 2 3 4 5 6 empid date total performance 1 A12OI 1 / 22 / 2018 21 Promotion Eligible 2 C90R2 1 / 23 / 2018 20 Promotion Eligible 4 M908Y 1 / 25 / 2018 20 Promotion Eligible 6 BNL98 1 / 22 / 2016 20 Promotion Eligible 10 CO43R 1 / 26 / 2018 21 Promotion Eligible

Subsets are a great function to only pull data that’s necessary and exclude all filler data that has no means for an outcome.