Wednesday, March 19, 2008

How to become a Unix Administrator

So you want to become a Unix Jedi?  There are a number of elements that have been suggested for what it takes, but here is the short list:

  • Read a lot of books
  • Enjoy Unix
  • Find a mentor

The bulk of the Unix System Administrators out there are self-taught to varying degrees.  If you want to be a great System Administrator you need to love to learn and spend a lot of your free time doing so.  Beyond reading you will get a lot of mileage out of just doing it.  Run your own unix server for fun - if you don't have the spare hardware, download VMware Server, and run your server in a Virtual Machine.  

You need to find something that interests you, if you don't already have a project in mind, here are some ideas:

  • Web Server (Apache)
  • FTP Server
  • Email server (First using sendmail, then try it with postfix)
  • DNS Server


There are those who would recommend trying to run some flavor of unix for your workstation OS.  If that is something that excites you than that would make a great project for you.  

Many believe that finding a mentor is very important to success, but that can be easier said than done.  If you don't have a mentor you can often find great assistance in online communities.  I would highly recommend joining SAGE and USENIX.  

I've collected a recommended reading list at the bottom of this message to help you get started on your journey.  Go read through the reviews online and decide which books are right for you for your current ability level.  The list below runs from beginner to advanced with some great references thrown in for good measure (The majority of these were recommended reading from the SAGE Members mailing list). 


Unix Books

  • Unix in a Nutshell, Fourth Edition by Arnold Robbins
  • Unix Programming Environment by Brian W. Kernighan, Rob Pike
  • The UNIX Philosophy by Mike Gancarz
  • Essential System Administration, Third Edition by Æleen Frisch
  • Unix System Administration Handbook by Evi Nemeth
  • Unix Power Tools, Third Edition by Shelley Powers, Jerry Peek, Tim O'Reilly, Mike Loukides
  • Practice of System and Network Administration, The (2nd Edition) by Thomas A. Limoncelli, Christina J. Hogan , Strata R. Chalup

FreeBSD Specific Books
  • Absolute BSD: The Ultimate Guide to FreeBSD by Michael Lucas (Author), Jordan Hubbard
  • The Complete FreeBSD: Documentation from the Source by Greg Lehey
  • The Design and Implementation of the FreeBSD Operating System by Marshall Kirk McKusick , George V. Neville-Neil

Thursday, June 21, 2007

Walled Garden: FreeBSD + natd + ipfw + squid

This is going to be an overview of the steps it takes to create a Walled Garden using FreeBSD, natd, ipfw and squid.

The basic scenario: You have a private IP network that you want to allow people to connect with, and you allow them basic web access (we'll just do port 80 for now). For your default access you only want to allow these users to access certain URL's - if they try to access anything else it will redirect them to your "portal" page. Presumbably your portal would have software that would do account signups and such, and once you authorize an ip you would allow it to connect to anything on the internet. Portal design won't be discussed here, but I will show you how to punch a whole through the firewall.


For this exercise we are going to have a private ip network, and a public ip. Splitting off a management IP is highly advisable, but that won't be covered here.

Our private IP network is going to be 10.7.0.0/16 our "public ip" is going to be 192.168.0.1 (which is really private, but ignore that - when deploying this substitute in a real public ip here)

First things first, you need to make sure your kernel has some options compiled into it, before doing anything else, go compile these in right now:

options IPFIREWALL
options IPDIVERT
options IPFIREWALL_FORWARD

Once you install that kernel and reboot your server we can proceed with configuration.

For the next step let's go ahead and install squid. This can be done using whatever method for installing software you prefer, but I'm going to list the package add method, because it's so simple:

# pkg_add -r squid


And that will get your squid installed.

For my installation the public interface is em0 and the private is em1.

Put the following in your rc.conf :

defaultrouter="192.168.0.254" #make sure to put YOUR defaultrouter in, not this one
hostname="wall.yourdomain.com"
ifconfig_em0="inet 192.168.0.1 netmask 255.255.255.0" #again, your IP, and your netmask
ifconfig_em1="inet 10.7.255.254 netmask 255.255.0.0" #we are setting 10.7.255.254 to be the gateway of our walled garden machines
gateway_enable="YES"
firewall_enable="YES"
firewall_script="/etc/rc.ipfw-walledgarden"
natd_enable="YES"
natd_interface="em0"
squid_enable="YES"


A note here - if you confuse your internal (private) and external (public) interfaces you are not going to be able to pass traffic from inside of your private network to the world. You can waste a lot of time fighting the sillyness of typing em1 instead of em0 (or vice-versa).

Now let's edit /usr/local/etc/squid/squid.conf
(Edit: Squid changed it's config at 2.6, modified entry to include both versions)
Squid <2.6:
acl garden_customers src 10.7.0.0/16 127.0.0.1
http_access allow garden_customers
http_reply_access allow all
httpd_accel_host virtual
httpd_accel_uses_host_header on
httpd_accel_with_proxy on
ie_refresh on
redirect_program /usr/local/bin/walled_garden
Squid >= 2.6:
acl garden_customers src 10.7.0.0/16 127.0.0.1
http_access allow garden_customers
http_reply_access allow all
http_port 127.0.0.1:3128 transparent
ie_refresh on
redirect_program /usr/local/bin/walled_garden

This set's squid up to act as a transparent proxy for the relevant networks, and hands off the job of figuring out what to do with redirecting url's to an external program named "walled_garden" (included later).

Here is a sample you can use to start yourself off for /etc/rc.ipfw-walledgarden:
#!/bin/sh


ipfw="ipfw -q"

$ipfw -f flush

private_if="em1"
public_if="em0"
public_ip="192.168.0.1"
private_ip="10.7.255.254"


$ipfw add 00050 divert natd ip4 from any to any via $public_if


#Setup loopback
$ipfw add 00060 allow ip from any to any via lo0
$ipfw add 00061 deny ip from any to 127.0.0.0/8
$ipfw add 00062 deny ip from 127.0.0.0/8 to any

#allow our firewall to talk DNS directly
$ipfw add 00070 allow udp from $public_ip to any 53
$ipfw add 00071 allow udp from any 53 to $public_ip

$ipfw add 00074 allow tcp from $public_ip to any 80
$ipfw add 00075 allow tcp from any 80 to $public_ip

#Allow icmp to the gateway IP, deny everything else from private network from talking to gateway
$ipfw add 00100 allow icmp from 10.7.0.0/16 to $private_ip
$ipfw add 00101 deny ip from 10.7.0.0/16 to $private_ip

#This needed?
$ipfw add 00120 allow tcp from 10.7.0.0/16 to $private_ip 3128

#Allow dns for private network
$ipfw add 00130 allow udp from 10.7.0.0/16 to any 53 via em1
$ipfw add 00131 allow udp from any 53 to 10.7.0.0/16

#An authorized client
$ipfw add 10000 skipto 65000 ip from 10.7.0.1 to any


#walled garden - forces through transparent squid proxy
$ipfw add 64000 fwd 127.0.0.1,3128 tcp from 10.7.0.0/16 to any dst-port 80

#Allow web for private network
$ipfw add 64140 allow tcp from 10.7.0.0/16 to any 80 via em1
$ipfw add 64141 allow tcp from any 80 to 10.7.0.0/16


#$ipfw add 65000 allow log logamount 1000 ip from any to any
$ipfw add 64500 deny log logamount 1000 ip from any to any

$ipfw add 65500 pass all from any to any


This is a really rudimentary firewall setup, it just allows DNS and port 80 web traffic through. If you look at rule 10000 that is a rule that is specifically exempting the ip 10.7.0.1 from being trapped in the walled garden. It does this by skipping past the section of firewall rules dealing with that, and goes straight up to the end of the firewall where it get's to do whatever it wants. For a production deploy you should get more hardcore about protecting the firewall box itself as well.

The last bit you need is your walled_garden script that let's you decide what is good and what isn't. I've written a pretty lame one, but it does the trick of showing the example:

Contents of /usr/local/bin/walled_garden:
#!/usr/bin/perl

use warnings;
use strict;
use Sys::Syslog qw(:DEFAULT setlogsock);
setlogsock('unix');
openlog("walled_garden", "pid", "auth");
syslog('info', "started");

my $portal = 'portal.yourdomain.com';

$| = 1;

while(<>) {
chomp;

my ($orig_url, $ip,$host) = (m#^(\S+)\s([^/]+)/(\S+)#);

my $new_url = $orig_url;
my $accept = 0;
my ($mech,$request) = $new_url =~ m#^(\w+)://(\S+)#;

if ($request =~ m#^(?:portal.yourdomain.com|www.yourdomain.com|webmail.yourdomain.com/webmail)#) {
$accept = 1;
}


unless ($accept) {
$new_url = $mech . '://' . $portal;
}

my $out = $new_url . " " . $ip . '/' . $host;

syslog('info', "client:$ip host:$host requested $orig_url, sending to $new_url");
print "$out\n";
}
syslog('info', "finished");

And that is it! The general gist isn't hard to do - but messing up any of the details can be quite tricky to troubleshoot, so move in baby steps if you have to. When setting up your firewall I'd recommend using this trick - first turn your firewall off, then:
sysctl net.inet.ip.fw.enable=1 && sleep 10 && sysctl net.inet.ip.fw.enable=0
It will turn your firwall on for 10 seconds and then automatically shut it off. If you are working remotely this is a very good thing, as botching firewall rules and locking yourself out of them is very frustrating.

Thursday, May 24, 2007

Reputations and Accountability

The internet is a wonderful tool, but the reality is that it's a hostile environment. There are a lot of bad actors out there trying to cause mischief and the internet provide them a vast playground.

In the spam arena we've had a lot of work put into RBL's (lists of naugty IP addresses) but I believe it's time we took this to the next level. All spam filtering companies collect IP based statistics, identifying the individual sending bad guys isn't terribly difficult - but we can do better.

All IP addresses are assigned from ARIN - and you can look this information up for any given IP. This ties the IP into a network that was assigned to a specific entity (and possible delegated) - what this represents is the chain of accountability for that IP space. It is time to start getting really serious about combining the ARIN data and our spam statistics and light a more serious fire under all network owners.

We need a new generation of publicly available tools for holding these organizations to account, my expertise is spam fighting, but this holds just as true for security threats - networks that originate hostile network attacks need to be held to account just as much as the spam networks do. ARIN gives us physical addresses and possible company names - add in some other databases, and let's start seriously applying reputation scores, and get these in the public eye. Some parts of the internet are always going to be cess pools - let's identify it and make a framework that responsible network administrators can use to start walling off the worst of it.

I would particularly like to see a reputation score like this prominently displayed in google search results for a company. Let those search results give the searcher fair warning that they are about to step into the internet slums.

Wednesday, May 16, 2007

nagios

Another application I would like to draw your attention to is nagios. Nagios is an open source monitoring tool that has all the bits and pieces you need to be able to keep tabs on your environment.

One of the first complex applications I made was a suite of monitoring programs I called Imp (Never publicly released) and for it's time it was nice and all, but it was all command line driven and extending it wasn't the most friendly thing. Then one year we came across Nagios and we found it was an exceptional replacement across the board and we phased Imp out and moved to Nagios. I'm bringing this up just to highlight an important lesson - You will come up with a brilliant tool, and then someone will come up with a better one - you need to retain objectivity and be able to know when it's best to put your pride asside and move to the better solution.

So back to nagios - the tool has some great logic built into it, by supporting both active and passive checks you can monitor anything, even behind NAT. The main genius of nagios is the ease of writing checks. They defined a very easy to use standard (using STDIN, STDOUT) for your checks and they let NRPE/NSCA do all the network heavy lifting. So you have a script you can run on it's own and it'll tell you if what you are checking is working or not, all in all it makes building and debugging new checks a snap.

This was a great example of elegant design, by making it so easy to make checks they pushed us to make a decision -> If any outage occurs that we didn't catch via monitoring than you must write a check that will find it next time. This policy has been in place for a number of years and we almost never experience any outages of any kind without the monitoring service letting us know about it.

Another note I'd like to bring up is the value of using RRD heavily (with Cacti if you'd like) and Nagios. Graphs are great, but they tend to just grow in the corner and no one looks at them. Nagios gives you an easy mechanism to create alerts that will go off that will make an admin go look at those graphs - Ideally they will make that happen long before anything is service impacting.

RRDtool

Understanding what is really going on with a complex system is a tough job - fortunately there is a relatively easy to use tool that can help you paint a visual picture.

That tool is RRD written by Tobias Oetiker . An RRD is a "Round Robin Database" the principle is that you collect data over time, and the most interesting data is recent, but you still want to know about historic data in general. What this means is that you can define a data set to say have a granularity of every 15 minutes for the last week, but by the time you get to looking at data from a year ago, it only has a 12 hour granularity.

The beauty of this tool is that once you get it setup and start shoving data into it, it manages all the magic of compressing your data over time for you. Just keep pushing things in and it'll take care of the rest.

Now if this was all it did, it wouldn't be terribly useful - so where it's real power comes in is the ability to generate graphs. Now instead of just seeing a giant pile of numbers in your reports, you can actually see the flow of how your operation functions over time. This can help you zero in on problem areas, particularly the ones that happen at 2 am when no one is watching all the closely.

Another huge value of having tons of data shoveling into these RRD graphs all the time is that many problems don't spring up over night. So when that service starts flaking out, and you don't know why, you can go look at your graphs and see "oh wow, it's increasing the amount of Disk I/O performed every day for the last week" - so you can zero in on what action you need to take to fix a situation much more rapidly. If you aren't collecting that data all you would know is that you were now out of disk I/O and you might assume your only option is to add disks, because you can't see that the problem only really started a week ago when you added a poorly written application to the server that is slowly gobbling up all your I/O.

What I'm trying to paint out with this series of blogs is the fact that there are some excellent tools out there, and you can make really amazing things if you know how to find and then use these tools. I started this all off by talking about how I'm a perl junkie - the fact is perl is exceptional at stringing all kinds of totally unrelated applications together to make something new and uniquely useful.

On a closing note relating to RRD I would encourage you to also take a look at Cacti. The cacti developers have put together a pretty click tool with tons of templates, so if you have a relatively standard environment (routers, switches, servers, etc) odds are good you can drop Cacti in and hit the ground running with great graphs of the most important things you need to keep an eye on. You should spend you time solving new and interesting problems, not re-solving old problems.

MySQL

Next on the list is an application that shouldn't be a stranger to most - MySQL.

Before diving into it, I should not that the real power house here is SQL and there are other options such as Postgres that almost certainly could do as well or better.

The reason I wanted to bring this up is that having a powerhouse database server sitting behind you when you are trying to put applications together is amazing. There are a ton of data management problems that you solve in one feel swooop by just choosing to use MySQL as a back end to store your data.

If you have an application that needs to have hundreds of thousands of entries added per day, and manage that insane quantity of information so that you can use it in a productive way - then MySQL is a great tool.

A thing to note here - if you just have an application that doesn't need any persistent data and doesn't ever need to interact with anything than of course you don't need a relational database added into the mix. The really interesting projects tend to be really complicated and so I tend to lean on MySQL a lot for it. In my last blog I talked about postfix - a key lesson I took away from postfix is that it's great to write small app's that each do targeted jobs. MySQL is a great enabler for this approach. Say you have an application that requires real time threat analysis and triggers actions to stop hostile attackers. Using MySQL as the point of IPC (Inter Process Communication) it let's you split the job up into relatively easy to manage chunks. You have agents that collect your data and feed it back, and then you have a series of analysis scripts that look for trends in that data, and finally you have scripts designed to process that analysis into actions.

One of the big weaknesses of relying too much on MySQL is that it creates a central point of failure for your application. I hate designing applications with this kind of a dependency in them - fortunately there are ways to deal with this to. First I like to use queue's for data that needs to be written to a SQL server. Rather than add a dependincy on a mission critical service (like email delivery) to the MySQL server, I instead make my application dump it's logs into a queue. I then have a secondary program that processes that queue of log data and loads them into the SQL server. In this way if the sql server is down, service still progresses just fine, and when it comes back up it get's all the logs sent over to it, so it doesn't miss anything.

If you are using your SQL server to impact the configuration of applications there are other tricks that work to get around that. The most basic one is to use SQL for all of your configuration, then write scripts that generate application specific configuration files that are then distributed to your remote servers. In this way you get all the wonderful advantages of SQL for managing your data, and you keep all of your services from depending on the SQL server being up in order to function.

The preceding is all underscoring an important reality for many applications - the application must have 99.99% uptime, but the ability to modify settings for that application has a whole lot more breathing room. To illustrate - how mad are you if you can't get your email? how mad are you if you can't change your password?

I'm trying to communicate a lot of best practices in these blogs - you can solve any problem a whole lot of different ways, but I would encourage you to look over these ideas and at least think of them when you are diving in to make your own "useful things".

Tuesday, May 15, 2007

Postfix - better than sendmail

I'll dive into the C# stuff as I actually get a clue what I'm talking about, in the mean time I'm going to start going over a number of the tools I've used over the years that I really like and want to share some information on them.

To start this off I'd like to talk to you about postfix. Postfix is incredibly powerful mail server
designed by Wietse Venema. You can read up on it's history on their site, but it is a much nicer tool to use that sendmail in every way I've ever cared about.

I ran my ISP using sendmail prior to 2002, and it worked well enough. Upgrades were a pain in the neck, and security exploits were disturbingly common. Making configuration changes was horrifically complicated. Sendmail is extremely powerful, but it made things far too complicated. I like to make things I can hand to someone else so that I don't have to maintain it for all time - sendmail configuration complexities made this very hard to do.

When we first wrote mailarmory we put it together using milters, which are a very cool idea. The very first iteration we tried out used mimedefang from Roaring Penguin Software. It was an excellent tool, but it couldn't alter the weaknesses of sendmail. If anything the situation was made much worse with the addition of the complexities of milters.

For the uninitiated, milters allow you enormous amounts of control over everything that happens in an SMTP discussion. This could let you do something like rejecting all mail at the door that has the word "monkey" in it, and send a response like "550 Server does not permit the discussion of monkeys"

I really don't want to trash talk sendmail - it's a great application, and many people have had great luck using it. But postfix was designed by someone who took a really good look at the weaknesses of sendmail, and built it better. One of the first things I'd like to draw attention to is the compartmentalized nature of postfix. Sendmail is basically one executable that does everything remotely related to mail. By contrast postfix has tons of smaller daemons that each just do their little part. This compartmentalization has allowed each one of these little daemons to be optimized and very well secured. It's hard to make huge applications immune to exploitation, it's relatively easy to secure simple programs though.

One of the big differences between sendmail and postfix is how the message queues themselves are handled. Postfix now allows control over the "before queue" filtering via the recently added Postfix Milters or by using the Postfix Policy Daemon's, but back in 2001 neither of these existed, and the only option for postfix was after queue filtering.

This distinction is really important - Once you queue a message that means you have accepted responsibility for that message for better or worse. So which is better - before content filtering, or after content filtering? The answer isn't as clear as you might think, and it is because filtering is really expensive processing wise, and anything that makes your smtp conversation take longer kills your performance. So ideally you would want all content filtering to happen before the queue, and you would want it to take no time at all so that your server can handle an unlimited amount of email. In the sendmail model we originally employed it worked more or less like this, with the downside that there was a rather serious performance hit by adding the content filtering in. There was also a really nasty side effect of this approach - you are handing spammers the ability to DOS your server. By putting so much processing near the edge you are giving them a tool to use to grind your servers to a halt. This hit us pretty hard in the early days as we were learning the ropes. (FYI - morally bankrupt != stupid, many spammers are quite intelligent).

Let me try and show an example of why this can hurt so badly - if a single SMTP process can accept one message per second, then to accept 10 messages per second you would need 10 SMTP processes. If you are doing before-queue filtering and it now would take 5 seconds to accept that message, you would now need 50 SMTP process. Now figure on the ugly fact that a server under load does everything slower, so that 5 seconds becomes 10 seconds, which means it would need 100 SMTP processes to keep up, which slows it further and .... you are having a really bad time of it. The thing to keep in mind is that basically when things start going bad, they go really bad really fast, and your server can handle less and less messages per second the more that tries to come in.

After queue filtering doesn't entirely solve this problem, but it gives some great tools that help quite a bit - most particularly it let's you control the rate that you send messages to your filter engine, while still allowing you to accept messages from the outside world. This is still not an ideal situation as the unprocessed mail backs up in your queues, so you get some mail delays, but they are much more minor than what you get with before content filtering.

So if both filter methods would let you handle say 10 messages/second, they both behave very differently when the number of incoming messages goes higher than 10.

Before queue
15 msgs/sec incoming -> only can handle 5 msgs/sec
20 msgs/sec incoming -> only can handle 1 msg /sec
100 msgs/sec incoming -> server probably crashes, at the least it will fail incoming SMTP requests

After queue:
15 msgs/sec incoming -> handles 10msgs/sec
20 msgs/sec incoming -> handles 10msgs/sec
100 msgs/sec incoming -> handles 10msgs/sec

The numbers above are made up, but the impact described is very real, the lessons I'm describing there were learned the hard way.

But - this isn't to say before queue filtering is not a really cool and powerful thing, remember it lets us stop a message before we accept it into our queue, if we block it at the door we don't have to bounce it if we later decide we don't want it. The key lesson you should walk away with is - Keep all of your before queue filtering extremely light weight. Do your heavy lifting in after queue filters.

Which brings me back to the point of this blog -> postfix puts the controls for all of this in your hands in a much easier to understand way than sendmail, and it also gives you a whole lot more options for doing whatever you want to do. So if you do things one way at first, it is often fairly straightforward to change your mind and do them a different way later.

There is a philosophy I picked up from perl that I encourage you to think about "The simple things should be easy, and the hard things should be possible". I believe postfix nails this, and it's one of the reasons I strongly endorse it.

I'd like to close on a note for you to consider: when our environment ran sendmail, I believe we had to do emergency sendmail upgrades due to late breaking remote root exploits about a dozen or so times. Since we moved to postfix, we have not made a single security related upgrade ever. Every upgrade we've done (and they are pretty easy) has been done because postfix added some cool new feature we wanted to take advantage of.

If you have any questions about anything I am writing about speak up - I know I get rambly on explaining all of this, and if you'd like to know more (or some actual details) about how to do things with postfix I'd be happy to share them as well.

Take care,
Neil