After surviving the first Valentine's Day with my girlfriend, I felt like I had the romantic planning down to a science. Buy three sunset-colored roses. Take a stroll through Central Park in New York City. Indulge in a single overpriced drink at some rooftop bar with a view. End with dinner at her favorite French restaurant.

It was all going according to plan again on Valentine's Day this year — until I decided to splurge on an Uber to take us back home.

We sat together holding hands, making small talk with the driver about whether he'd caught anyone getting a little too intimate in the backseat that night. As we pulled up to a red light a few blocks from our apartment, the driver turned to me and said, "You seem like a nice guy, so why do you have a 4.5 rating?" I felt my girlfriend's hand loosen its grip on mine ever so slightly in shock. My "nice guy" image was blown to hell.

Anyone who has hailed a ride with Uber or Lyft, or requested household services from fast-growing on-demand startups like Handy, is probably accustomed to the many prompts to rate professionals on a scale of one to five stars. Less known, at least until recently, is that these professionals are rating you too.

This two-way rating system was popularized by Uber in 2010 and adopted by its growing number of billion-dollar startup peers. Unlike online services like eBay, which has offered peer ratings for years, this is not some faceless reviewer but rather a collection of ratings from people you've met with in the flesh.

"In eBay's case it's mostly about the transaction. What's novel about this is the face-to-face piece and the immediacy of it," says Ravi Dhar, a professor at the Yale School of Management who focuses on consumer behavior. "That creates some anxiety — and how people deal with the anxiety might differ."

My Uber rating is 4.7 and I'm kind of bummed I'm not a 5. Where did I go wrong? — Daniela Cadena (@DanielaCadena) September 22, 2015

My Uber rating is 4.9. I wonder what I did to lose the 5 rating. Probably that time I ate my baon in the car and it stank. But I apologized! — Camille (@camiejuan) September 23, 2015

My uber rating is a 4.4 and I'm feeling pretty salty about that 0.6 — Emmy Jo Favilla (@em_dash3) September 22, 2015

Grad School: Hey what's your GPA Me: Well my Uber rating is a 5 and I think that says a lot — Alex Miller (@AlexTheMiller) September 29, 2015

The douchebag matrix

The goal of the two-way ratings system, as one spokesperson for Handy put it ever so delicately, is to "incentivize excellence on both sides of the supply and demand equation." In reality, it can lead both sides to feel unfairly judged — with potentially serious consequences.

For the contractors who work with these startups, enough bad ratings (and bad is really anything below 5 in this grade-inflated world) can temporarily deactivate them from the system, threatening their very livelihood. For the customers, a particularly bad overall rating can cause workers to think twice before picking them up or offering service, effectively limiting their access to the increasingly ubiquitous "on-demand" economy.

"Personally, I don’t select what rides I accept based on a passenger's rating, but there are a good number of drivers that do," says Jonah Price, a driver for Lyft who has dabbled with Uber as well.

That is more of an extreme scenario, however. More common: a bad rating may simply dent your sense of self.

"It's a douchebag rating of some kind," says Peter Ashlock, a longtime Uber driver based in San Francisco.

Interviews with and informal surveys of on-demand workers, combined with reviews of multiple online forums for drivers (see example below), reveal a dizzying and unexpected range of reasons for giving customers a lower rating. On the short list: the customer doesn't tip, fails to show up at the right time and place, takes too short a trip (surprisingly common for Uber drivers) or is simply too disrespectful, too rude or too damn drunk.

"It's a lot of small things," says Harry Campbell, a driver for Uber and Lyft and creator of The Rideshare Guy, a popular blog for drivers in the industry. "If you are slamming the doors, and trying to be real nosey with directions, that's when I start doing two or three-star ratings."

Ashlock sums up the thinking of drivers best: "It comes down to the sense that this person doesn’t think I’m present — that there's not another human being present to offend."

Uber, but for unintended consequences

Five years after Uber launched, ratings are now on the cusp of jumping the shark. We are not just rating — and being rated by — drivers, but also food delivery workers, housecleaners and plumbers. And soon everyone, thanks to a new stunt-like app called Peeple, which appears to be intended for rating friends, co-workers and, well, people with the same five-star scale as Uber and Yelp. It seems too aggressive to be true.

“This is about abundance for all, and lifting people up and finding out who the really good people are," Julia Cordray, the app's co-creator, explained in an introductory video. "I’ve never believed in anything more than I believe in this product."

It probably won't be that simple. A chief complaint among workers and users we spoke with is that that ratings are often a sloppy science. Many say they reflexively give four or five-star ratings only to think after the fact that the person didn't deserve it. Some complained of fat-finger syndrome, pressing the wrong star. Still others succumb to peer pressure from contractors or customers pleading for a better rating.

Uber, for its part, frowns on this. In one stock email to a driver reviewed by Mashable, the company advises: "Never ask for a 5-star review, but focus instead on providing an excellent experience."

That fear of a bad rating can nonetheless incentivize questionable behavior, whether it's passengers forcing themselves to be overly friendly (one driver complained of men calling him "pal" and making excessive small talk like he's "the shoe shine man) or drivers who speed a little too much or park in traffic to avoid asking the passenger to wait or walk a longer distance to the car.

In some cases, the fear of a bad rating may also undermine one of the better selling points of Uber and Lyft: preventing DUIs. The reason: drivers struggle with the behavior and ratings of drunk passengers.

"There's definitely drivers who have tried doing the late nights and for whatever reason they couldn’t because their ratings were much higher during the day," says Campbell. "You often find drivers who for whatever reason don’t mesh well with that crowd at night."

For some, this journalist included, the ratings can lead to positive changes. News of my disappointing 4.5 star rating led to some soul searching. I cringed remembering the times I kept drivers waiting a few minutes too long or talked too loudly on my phone in the backseat. I pledged to do better: asking drivers about their work, texting rather than chatting on the phone and pushing myself to be punctual.

Seven months after that fateful Valentine's Day night, I requested my rating from Uber. A few hours later, I nervously opened the email to find my new personality score: 4.6.

Progress comes slowly.