About this talk

Watch how to transition from front-end to full-stack utilizing Node.js. After this talk you will be able to write efficient Node.js applications that can act as API for your front-end.

Transcript

So I was talking to a recruiter recently on the phone, and he was telling me, "Hey, Ben, I see you have a lot of experience with Node." And I was like, "Yeah, it's pretty much what I do." And he was asking me, "Do you know React as well?" You know, I used it every now and then. And the next question he asked me was, "Do you know JavaScript as well?" Uh, no, never heard of it. But, I guess he just got his checklist upside-down or something, I don't know what went wrong there. But the basic idea that there are key differences between the front-end and the back-end, even though they are both using JavaScript, is still true. And I want to talk to you tonight about some of them that I personally think are the most important that you should keep in mind when you come from the front-end background and want to do some server-side coding with Node. The things I want to talk to you about tonight are pretty much those, so we're gonna talk about those in a little more detail. But first, by a show of hand, any of you ever done anything with Node whatsoever, a little Hello World at least, or something more? All right, so you all know Node anyway already, at least a little, that's good. Nonetheless, I got that from the Node website, and it essentially says, Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. Node.js uses an event-driven non-blocking I/O model that makes it lightweight and efficient. And three bits of that are actually important to us, the first one being that it's built on the Chrome's V8 JavaScript engine, so if you ever done anything for Chrome whatsoever, you basically worked on the same runtime. Second one being event-driven. You all probably know the event loop and how that works, it's pretty much the same on the front-end than it is for Node, so you should be aware of it and if you're not, read up on it, it's important. And the last thing being non-blocking I/O. Again, coming from front-end it's probably not a big surprise to you. You're probably very natural with it. Coming more from different technologies on the back-end it's more, more of a deal from PHP, Java, Ruby, and so on. But basically it just means you can handle a lot of requests, a lot of I/O requests without blocking your process, which is quite important given that Node is a single-process spread. Alright, so moving on, if you done any front-end work whatsoever you probably came across this meme at some point or another. And if you do it already for a little longer you might even came across this one. And the good thing about Node is, you don't have to deal with that. You have a very controlled environment. If it runs on your server it runs on your server. You just find what your server is. You don't have to deal with all those different runtimes. Big difference is, if you do front-end, you take your code, you send it out to the client, and the client executes it, and you don't have any control whatsoever over that execution. Whereas on Node you do, it runs and you're on your own server, you have full control over that. There are obviously a few exceptions to that. So, first off, it's actually not necessarily true, you do have more runtimes. You have probably your development machine, you might have a testing environment, you probably have a staging environment, and hopefully your production environment. And all those can be different, so a lot of you might use a MacBook to develop but your server runs Ubuntu, which are essentially very different runtime environments. You might have a different version of Node on your local machine than you have on your server, and all those little things can make things very unexpected. So you have to keep those in mind. Obviously you just have three, you have full control over those three, it's still a lot better than what you have to deal with on the front-end. The next thing that you can have is a dependency mismatch. So you probably use npm, npm-install, something, you run it from your local machine, you put it on your server, it pulls it from npm as well. What can happen is, that you get a slightly different version. By default npm gives you minor updates which usually shouldn't break anything, it's like for security patches and so on, so no big deal. But every now and then it can make something happen that you didn't quite expect to happen. So you should keep that in mind as well. And then, of course, third-party behaviour, and by third-party I mean everything like your databases, that might be MS SQL, Redis, Memcached, third party APIs, you might use, you know, somebody to send off your emails or whatever, and they might behave slightly different in a development environment than they do in a production environment, so you have to keep that in mind. All of those things are really small issues that usually don't effect you, ever. And on top of that you have thousands of solutions to solve that. If you use something like Docker that pretty much solves all of that right away. All three of them. If you use something like package-lock, Shrinkwrap, it takes care of your dependencies. So there are lots of solutions for those anyway rather small problems. The next thing I want to talk to you about, the bad thing essentially about Node. So now we have, that we've said, "Okay", every user downloads a source code and runs it in their own runtime. Now Node, you have your own runtime but every use accesses your runtime, so if anything goes wrong in your runtime it affects every single user. On the browser if something is wrong in one user's runtime, it just affects him at this particular point in time. If you mess anything up on your server you now have a problem. So basically, concurrency is nice but can be a problem. If you look at this example here and you have a simple API and you take a name or whatever as parameter to reuse it 'til, at a later stage. You can do it like so. The problem you will experience is, as soon as you have a second user using your app it will override the name of the first user, which you probably don't want to. So we have to keep that in mind. If you're working on the front-end you're always very aware of your scopes in JavaScript, or you should be at least, you know, you probably learned that you shouldn't put too much in the global scope, and it's even more important in Node. For everything you do, be aware of who has access to it, who should have access to it, and who shouldn't. So in this case not everybody should have access to this particular point, just one user should. So keep that in mind, think about that. Think about the scopes of your variables very carefully. On the other hand, you can do really cool stuff. You might do something like this, just a simple counter, and every time somebody requests something from your API your increment, your counter. And that works really well for the same reason, since everybody, accessors use the same runtime. A friend of mine started a startup a few years ago, and as it goes with startups, he doesn't really have, and didn't really have any money. So he just wrote the MVP himself. And did something similar like this for session management, so implemented his authentication and session management, and basically put it in a map in an object, similar to the counter here. And every time a user would request, send a request, it would send his authentication to open and he would just look it up in its map, if it's in there then that must be the user. Fair enough, that works very well, you can do that, it's fine. But this startup actually got a lot of tractions, and it was featured on Product Hunt and stuff like that. And he got at the limit with the server and what he did was just putting up a second instance, putting a load balance in front of it, and what happened is, that user suddenly had to log in every three requests or so, and they got kicked out of the session all the time. And basically what happened was, that the load balancer put you to instance A of your server for maybe two requests or three requests, and then afterwards to instance B. Instance B would look up the token in its map and wouldn't find it. It's because instance B doesn't share its memory with instance A, they can't cross-access it. So if you scale, you have to keep in mind that things like that, essentially everything that is shared between users or must last longer than one request can't be stored in memory. To be clear, you can totally do that and it's fine. I do it all the time, but just as long as you have one instance. Keep that in mind, document it, if you want to scale you have to do it differently. There are tonnes of solution, you can do stuff like Redis as an in-memory database, and then you have the load balancer going to instance A or instance B, and instance A and B just access the same storage. As I said, for single instances it's fine to do it that way, it's cheap, it's easy, it's straightforward. Just keep in mind if you start to scale you have to put it in a different storage solution. The next thing I want to talk to you about is security. If you did any kind of work on the front-end you're probably aware of security issues. You keep it in mind, I hope so at least. But the security issues you come across on the front-end are vastly different from the ones that you have on the back-end. And one of the biggest things I've found is user input. Whenever you take an input from a user you have to assume it's malicious. They will send you some stuff that just doesn't make sense. On the front-end that happens as well, but in the worst case they mess up their own runtime environment 'cause, you know, they run it locally. On the Node server you share a lot of data, they can access a lot of stuff that you might not want to share with them. Like this simple example here, which is a Node server that would allow you to download files. So essentially we just take the download parameter, look it up, and send back the file, whichever that might be. And you probably can already guess what you could do, what can happen, can't happen here. So for example a user might try to download your source code, he could just download your source file, which might not even, you know, such a big deal for you, you just say "Okay, my code's open source anyway, whatever." But he might also access your config files and get access to your keys to your database or so similar. Or even worse they could try to do something like this and get access to your server. So you, now you didn't just put your application in danger but the actual whole server that it runs on. So whenever a user gives you any kind of data, make sure to validate it, make sure it makes sense in the context that you would think. A few other things, the first one being, again, validate user input. A different story on that. I was splitting an API, basically, and that API would take JSON as a request parameter. I think we did like a profile, update profile, and you could send a name and I think an age, date of birth, something like that. And we would validate all those single things, like the name makes sense, it's between like three and 20 characters, and whatever, all those kind of things. But what happened then was that a user would send us its JSON with all those fields, but then a whole lot of other JSON that wouldn't make any sense. So he sent us like a one megabyte JSON or whatever with the three fields we wanted and just a trash tonne of other stuff that didn't make any sense whatsoever. And what happened was, we were just passing it and then validating it, and it made sense, right? But the problem was, we'd pass this whole thing, the whole JSON, and it basically blocked Node for like 100 milliseconds. So I'm not sure if you're aware of that, but if you would do JSON passing in JavaScript, it's somewhat efficient but it blocks the event loop. It's basically, as long as it does it it can't do anything else. So when you get this user input that was one megabyte, it basically made the whole app unresponsive for like 100 milliseconds. So it's basically a super straightforward and easy DDoS attack that's to be experienced here. So again, make sure to validate your user input on all levels. So in this example if you take a JSON make sure that the JSON is not bigger than what could reasonable be. Make sure it's not bigger than, you know, a few kilobytes that you expect it to be before you start passing it. And all those little things. When users send you something it's probably malicious. Always work with the assumption that they're trying to mess you over in some way or another. The other thing, dependencies, we talked about that earlier. Sometimes npm gives you minor updates or in general you probably have a lot of dependencies and you probably didn't actually go through the code, read it to yourself, audit it. So you run code on your server that you don't actually fully know what it does. So you have to be aware of that, you have to look into that. There are cool projects that pass your package JSON for you, and tell you weather there are known vulnerabilities in it. And those are really helpful, so you might want to use them to make sure you don't run some code that gives your project vulnerabilities even though your own code is actually safe. Then the other thing, rate limiting, which goes kind of, the thing that I had talked about earlier with the JSON. DDoS has a real problem, a lot of applications experience it. An easy solution to that is just to rate limit all your codes. If you're doing API or a website just put a rate limiting in it, it takes you five minutes, it's super helpful, and most of the frameworks support it either natively or have a plugin that do it for you. So, easiest thing to do, really helpful. And the last thing, verify authentication. And there are two bits to that, the first thing is for resources that are just for users, make sure the user's logged in, but also make sure the user is who he says he is. What I saw not too long ago was basically that somebody did verify always that the user's logged in and has a session, but basically didn't check that the user is who he said he was, so I think it was again around editing a profile or something like that, and in the request the user would just change the user ID, but on the server, so it wouldn't match where there's a session that I'm logged in with, is for the same user that I'm requesting to change the resource for. So always make, not just sure that the user's logged in, but also make sure that the user is who he says it is. So a simple thing to forget, but can be quite, you know, not so good for you. Right, and isomorphic websites, whenever you come from the front-end, one of the big thing is, I can reuse my JavaScript code on the front-end that I use on the server side and we can kind of share the code. This is what you call isomorphic websites and the one big use case for that is that you can render the pages both on the client and the server sides. Essentially what that means is if you do front-end right now, you probably use something like React or Angular and it's, you download all the assets and your source code and it builds the dom for you and then starts rendering it, which can take quite a while. So the idea is that you build the dom server-side, send the static dom to the client, send them then all the assets and so on, but it can already start rendering it a lot quicker. So you can improve the performance for, and the user experience for your end user. And that can be really powerful and I think most of system NET frameworks like React and Meteor and so on do that for you so you don't have to worry about that, which is really, really powerful. But that's pretty much the only thing that you can use both on the client side and the server side. There are a few things like form validation and a few others, but generally speaking the stuff you do on the server side is vastly different from the stuff you do on the front-end side. So it doesn't make so much sense to read with it anyway. For this single-use case it's great, and you should use it for that. Everything else, it's probably not all that helpful after all. Right, let's speak about when Node actually makes sense. It's great for a lot of things, but maybe there are a few things where it's not so great at. The, what we already said, if you have a lot of concurrent requests, it's I/O heavy, Node is great. Because of the event loop, I'm not sure how well you know other language such as PHP or Ruby on Rails. But essentially whenever somebody makes a request to your page, you create a new process, keep this process open until the request is finished. So if you have 100 users using a page, you have 100 processes, which can take up quite a lot of power of your server. Whereas Node always just has this one process, just keeps a reference to your request, puts it on this event loop, and that just takes up a little bit of memory, but not that much more. So it's quite good at that compared to most other solutions. For the, pretty much for the same reason, it's quite good at real time applications as well. So if you do like chatrooms, HTML5 games, or anything like that, Node is probably great for you on the back-end. However, for everything CPU-heavy it's usually a bad solution. Because it can't deal with more than one thing at a time, since it's in one process, if you do anything that takes a lot of CPU cycles it can't do anything at the same time, which means that you can't take any other requests and your app becomes unresponsive. So if you have anything like image processing, video processing, or such, you can't really use it for that. But with that in mind, you can obviously just build a small service that does your image processing for you, does a callback into Node. So essentially Node just says, "Here, can you do that for me?" And then once it's done, you can just tell me "I'm done", and Node can go on and just give it back to the client. But just don't do it directly with Node, it's not the best idea. Another thing is serving static content. You can do it, it's perfectly fine, but it's not the best use of your Node application. You want it to focus on its core work load, which is your dynamic content. And that's one of the big reasons why you usually see an NGINX or similar in front of a Node application, so it can serve for you all the static content and free up your Node application to actually take care of the things your really care about. So while you can do it for small apps, it really doesn't matter, but if you really want to get the most out of your Node server and, you know, improve a little your performance, then make sure to serve the static content through other means. Right, to go over what we talked about. So, we don't have an Internet Explorer Node, which is awesome, you have to be careful about your scopes, 'cause otherwise you share data you really don't want to share. Be careful what you store in-memory if you want to scale. Always validate your user input. Pre-render your sites on the server side if you can, if you do websites. And use it for everything high-concurrency, but not so much for CPU-heavy stuff. Alright, thank you. I hope that was helpful for some of you.