Programming Answer: January 2011

Saturday, January 29, 2011

what is the difference between motherboard/processor for server and the ones for desktop ?

I'm looking to build a Windows Home Server and looking into a low powered CPU/motherboard which fits into a small form factor. Reading through articles on the web gives the impression that motherboard/processor for servers are constructed differently compared desktop technology.

One focuses on stability and others focuses high performance in short period of time.

Can you help me understand how to differentiate between motherboard/processors that was made for servers as opposed to desktops? What do I need to look for?

From serverfault ronaldwidha

When I attended the training for ASE Compaq certification, the teacher explained that the main difference is the optimization of the chipset/structure. Servers are optimized for good I/O and (relatively) poor "interactive" experience.

It also depends of the mission of the server: if is a file or database server you need good I/O, but id we are talking about number crunching server, the I/O is not as important as the CPU horsepower.

For a home/SOHO point of view, I don't think you can't even notice the difference between a good PC and a "standard" server, from the hardware point of view..

ronaldwidha : fair point. The only thing I expect might happen often is serving multiple files in one time. thanks

From lrosa
In very general terms server (and some/most workstation) mobos and processors vary over desktop equivilants in the following ways;
- multi-CPU capabilities - this is a major one really, most desktops get the one CPU slot as far as I know
- more memory slots - some servers have 72 memory slots these days!
- often no, or low end, on-board disk controllers - this is on the assumption that buyers will be adding their own specific disk controllers
- no sound other that a beeper
- more on-board (and more capable) NICs, often specifically laid out so to be on different PCI buses
- lesser GPU, often very low end
- mobo capable of taking power from mulitple PSUs - this is another biggy
- MUCH more instrumentation to spot and predict failure (more temperature sensors, voltage sensors etc.
- out of band management processors and dedicated NICs - of HUGE benefit to lights-out data centres
- better power management options
- CPUs expected to run 24/7/365 and handle IO far better, also deal with things like single-bit errors better
- memory controllers to work with 'better'/more-reliable memory types and deal with failure more gracefully
- more PCI slots
- built for rack-mounting or blade enclosures
As I say these are very general differences, I'm sure there are exceptions but thought it might be of use.

ronaldwidha : excellent explanation. Thanks

Dan Carley : Congrats on the 10k :)

Chopper3 : cheers dude! :)

From Chopper3
In regards to CPUs - server CPUs have bigger cache sizes

From DmitryK
WHS is so lightweight and requires such a small footprint that, while Chopper3's list is accurate, you're NEVER going to notice the differences between mobos with similar CPUs and memory. More than anything else, you'll notice external issues.
- Boot-up speed will depend on the speed of your drives and how much memory you have
- Backups will depend on how much bandwidth you have on your LAN (is it Gigabit?)
- Remote access will depend on your router (is it compatible?), your ISP (do they block ports?) and your upload speed (downloading to the laptop you brought on vacation).
I have a homebuilt WHS machine that used a low-end mobo, dual core AMD CPU and am very happy with it - except that, for some reason, if the system reboots (like from an update) and has an external USB hard drive, something in the BIOS hangs - doesn't matter if I have a USB DVD burner, that's ok, just a hard drive makes everything stop. So my requirement to have as many internal SATA ports as possible left me with plenty of space.

Next time, though, I'm going with a pre-packaged HP deal. Probably when the next version of WHS comes out.

ronaldwidha : I have similar concerns. The only thing is I can't get HP MediaSmart anywhere in the country I currently live in. Sucks I know. The closest country hat provides WHS is London, and it cost a bomb there. I was hoping that I could save some moolah by building it myself.

David : The cheapest method is taking someone's old PC, adding some disks to it and 'repurposing' the whole thing. Then the only significant cost is the WHS software itself.

From David

OPENVPN error "Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)"

Hi I get the following error when I try to run openvpn in my Ubuntu Server

Fri Jan 8 02:12:59 2010 OpenVPN 2.1_rc11 i486-pc-linux-gnu [SSL] [LZO2] [EPOLL] [PKCS11] built on Mar 9 2009 Fri Jan 8 02:12:59 2010 WARNING: --keepalive option is missing from server config Fri Jan 8 02:12:59 2010 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts Fri Jan 8 02:12:59 2010 Diffie-Hellman initialized with 1024 bit key Fri Jan 8 02:12:59 2010 WARNING: file '/etc/openvpn/easy-rsa/2.0/keys/server.key' is group or others accessible Fri Jan 8 02:12:59 2010 /usr/bin/openssl-vulnkey -q -b 1024 -m Fri Jan 8 02:12:59 2010 TLS-Auth MTU parms [ L:1543 D:140 EF:40 EB:0 ET:0 EL:0 ] Fri Jan 8 02:12:59 2010 ROUTE default_gateway=192.0.2.1 Fri Jan 8 02:12:59 2010 Note: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13) Fri Jan 8 02:12:59 2010 Note: Attempting fallback to kernel 2.2 TUN/TAP interface Fri Jan 8 02:12:59 2010 Cannot allocate TUN/TAP dev dynamically Fri Jan 8 02:12:59 2010 Exiting

This is my config file for server side

dev tun
proto tcp
port 1194

ca /etc/openvpn/easy-rsa/2.0/keys/ca.crt
cert /etc/openvpn/easy-rsa/2.0/keys/server.crt
key /etc/openvpn/easy-rsa/2.0/keys/server.key
dh /etc/openvpn/easy-rsa/2.0/keys/dh1024.pem

user nobody
group nogroup
server 10.8.0.0 255.255.255.0

persist-key
persist-tun

#status openvpn-status.log
#verb 3
client-to-client

push "redirect-gateway def1"

log-append /var/log/openvpn
comp-lzo

I'm runining from my root account. So I don't know why the permission is denied. Also if I type in modprobe tun I get the following output.

WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/. FATAL: Could not load /lib/modules/2.6.18-128.2.1.el5.028stab064.7ent/modules.dep: No such file or directory

From serverfault $pageUsers[$entry.posts[0].author].name

The tun/tap module isn't loading, because it looks like your kernel isn't installed correctly. Therefore while you probably do have permission to use the device node, there's no device answering on the kernel side. Resolve the modprobe errors (just running depmod -a as root might do it) and see what happens then.

Andrew McGregor : So, /lib/modules/2.6.18-128.2.1.el5.028stab064.7ent actually exists? If not, something is badly wrong with your kernel installation, and you want to track that down. I presume you ran depmod as root.

Andrew McGregor : Ok, there is a problem with the way the kernel was installed in that VM image. So, either contact the provider, or install a new kernel yourself.

From Andrew McGregor
Hope this helps

http://forums.quantact.com/viewtopic.php?f=25&t=1106

From $pageUsers[$post.author].name
I wish I had caught this before, as this is something that has happened to me many times before.

You're running on an OpenVZ VPS. Therefore, kernel modules such as tun will not work. You will need your provider to enable them for you.

Consequently, things such as FUSE will not work without the provider enabling that as well, and also things like swap cannot work whatsoever.

From Keith Morrow

How to determine which NIC data/traffic is using?

How to determine which NIC data/traffic is using? To make sure routing is correct.

From serverfault mattlandis

You can use IPTRAF, which I've found is nice, and will let you see the data that is traveling between each device in real-time.

Farseeker : Worth noting that IPTRAF is *nix only, AFAIK

From JPerkSter
To find out which route the traffic is taking to a certain address over TCP, use the 'Trace Route' tool.

For Windows, the command is:

tracert [host]

Where [host] is what you want to check, say google.com or mymachine.domain.local.

By looking at the first hop, you'll know where it went. For example:
```
C:\Users\Mark>tracert google.com

Tracing route to google.com [66.102.11.104]
over a maximum of 30 hops:

  1    14 ms    13 ms    14 ms  nexthop.nsw.iinet.net.au [203.55.231.88]
```
Tells me that my traffic to Google is going over my iiNet NIC, but if I tracert 118.82.45.35 (where I have a static route going to a different NIC) I see:
```
Tracing route to 118.82.45.35 over a maximum of 30 hops

  1     1 ms    <1 ms    <1 ms  192.168.161.1
  2    29 ms    28 ms    28 ms  lns20.syd6.internode.on.net [150.101.199.159]
```
Which tells me that it's going to a different gateway (and different NIC), and over my Internode internet connection.

-- Update --

If this is not enough evidence, or the first hops are the same, then your only option will be to do as Zoredache said, and use WireShark and watch the actual packets flow through the NIC.

mattlandis : this doesn't tell me what nic it left my pc on. Or am i missing something?

Farseeker : You can tell which NIC by looking at the first hop. Each NIC will have a different gateway (and if they don't, they should), so you should see the gateway's IP address there.

mattlandis : I just checked and tracert does not help me at all.

Farseeker : If you hve two nics on differet subnets, then there should be a clear difference between the two traces, unless your network is configured in a highly peculiar manner.

From Farseeker
How to determine which NIC data/traffic is using?

If I was in doubt, I would check the routing tables and things like other people mentioned, but then I would fire up my sniffer (wireshark, tcpdump) and perform a capture on the interface in question while I generate some traffic to actually determine if it is doing what it is supposed to be doing.

From Zoredache
Zoredache comment about sniffer is, I think, the gold standard.

Another options: use the ifconfig command to see if the number of transmitted and received packets on the interface is increasing. Also, can see if any other interfaces are being used.

For shell scripts, you may be able to query the proc data structures directly.

Farseeker : Note also IFCONFIG is also a *nix utiltity, for future reference

From EricJLN

What happens during a live SQL Server backup?

Some of my coworkers were surprised when I told them that I can back up an SQL Server database while it's still running and wondered how that's possible. I know that SQL Server is capable of backing up a database while it is still online but I'm not sure how to explain why it's possible. My question is what effect does this have on the database?

If data is modified (by an insert, update, or delete) while the backup is running, will the backup contain those changes or will it be added to the database afterwards?

I'm assuming that the log file plays an important role here but I'm not quite sure how.

edit: Just as a note, my case involves backing up the databases using SQL Server Agent and the effects of database modifications during this process.

From serverfault Dynamo

You can't just copy it over since there can be alterations to the database mid-copy as you alluded to in the question.

It has to be done with agents that are aware of the database functionality and then take a "snapshot" via OS functions or can use a utility to dump the database in a safe state (like mysqldump, if using mysql).

Otherwise you get a backup that can be corrupted and you won't know it until you restore it. I think Joel and Jeff recently talked about it a little on a recent StackOverflow podcast.

And you're right in that the log file is important. If the journal/log file is out of sync with the actual data, restoring the files will result in corruption.

It boils down to a backup taken using a safe state of the database, either through a database-aware agent or snapshot application or application that is aware of how to properly hook the database into dropping data without interfering with updates during the data dump then backing up the resulting file.

Dynamo : This is a more generalized answer but I forgot to include that I was working through SQL Server Agent at first. It still provided some good details as to what might be going on during this process despite the method so it helped anyways! Thanks.

From Bart Silverstrim
There are so many ways to do this (generally speaking, no idea how MSSQL normally does it) like simply dumping the database to file while appending any changes to a log file which is committed after the dump is completed - to utilizing file system specific snapshot features like VSS on Windows.

David Spillett : Also databases that are MVCC based from the ground up (like PostGres) can make use of that set of behaviours to maintain a snapshot that the backup copies from while still allowing updates to the datafiles by transactions that start during the backup run. As you say there are several methods, the one used depends upon the core design of the data storage engine and transaction processing features.

From Oskar Duveborn
Full backup contains both the data and log. For data, it simply copies each page of the database into the backup, as is at the moment it reads the page. It then appends into the backup media all the 'relevant' log. This includes, at the very least, all the log between the LSN at the start of the backup operation and the LSN at the end of the backup operation. In reality there is more log usually, as it has to include all active transactions at the start of backup and log needed by replication. See Debunking a couple of myths around full database backups.

When the database is restored, all the data pages are copied out into the database files, then all the log pages are copied out into the log file(s). The database is inconsistent at this moment, since it contains data page images that may be out of sync with one another. But now a normal recovery is run. Since the log contains all the log during the backup, at the end of the recovery the database is consistent.

Dynamo : Great post and the link helped provide a good example. Thanks.

From Remus Rusanu

BSD - Remove non-ascii characters from all files in a directory recursively...

I'm trying to migrate a bunch (300GB+) of files from a FAT32 drive to my freeNas ZFS filesystem but every command I throw at it (tar,pax,mv,cp) throws an 'invalid argument' when it encounters a non-ASCII filename - it's usually something that's been created under Windows and it reads something along the lines of "foo?s bar.mp3..." where the ? may have been an apostrophe or such.

Can anyone help with a few lines of code to recursively go through the directory tree and rename files to remove the offending characters.

Much appreciated.

From serverfault Dan

Rename can do this..

try something like
```
find dir -depth -exec rename -n 's/[^[:ascii:]]/_/g' {} \; | cat -v
```
you may need the cat -v to properly display any weird characters without your terminal getting screwed.

if that prints acceptable substitutions change the -n to -v.

That said, it sounds like the charset on your filesystem is wrong(mount -o utf8 ?), since this sort of thing should really work...

Dan : Thank you for the reply-- I have read that I should be able to mount my filesystem as something different but the web seemed to indicate that it doesn't apply to FAT32 partitions? I'd like to be corrected though if this isn't the case? FreeNAS auto-mounts the drive when it starts but I do believe there's the option to override/re-mount etc.

Dan : Hmm... I don't seem to have the rename command on this box? I've tried man rename with no luck

From Justin
Try mounting the filesystem with the iocharset option set to the encoding it uses.

From man mount under the "Mount options for fat" section:
```
   iocharset=value
          Character set to use for converting between 8 bit characters and
          16 bit Unicode characters. The default is iso8859-1.  Long file‐
          names are stored on disk in Unicode format.
```
See also under the "Mount options for vfat" section:
```
   uni_xlate
          Translate  unhandled  Unicode  characters  to  special   escaped
          sequences.   This lets you backup and restore filenames that are
          created with any Unicode characters. Without this option, a  '?'
          is used when no translation is possible. The escape character is
          ':' because it is otherwise illegal on the vfat filesystem.  The
          escape  sequence  that gets used, where u is the unicode charac‐
          ter, is: ':', (u & 0x3f), ((u>>6) & 0x3f), (u>>12).
```
and
```
   utf8   UTF8  is  the  filesystem safe 8-bit encoding of Unicode that is
          used by the console. It can be be  enabled  for  the  filesystem
          with this option or disabled with utf8=0, utf8=no or utf8=false.
          If `uni_xlate' gets set, UTF8 gets disabled.
```
Edit:

I'm sorry, that was Linux, this is for BSD (from man mount_msdosfs:
```
 -L locale
     Specify locale name used for file name conversions for DOS and
     Win'95 names.  By default ISO 8859-1 assumed as local character
     set.

 -D DOS_codepage
     Specify the MS-DOS code page (aka IBM/OEM code page) name used
     for file name conversions for DOS names.
```
Dan : Thanks for the reply, here's what I tried:
```
oracle:/mnt# mount -t msdos -o iocharset=utf8 /dev/ad6s1 /mnt/Elements
```
But this failed with:
```
mount: Using "-t msdosfs", since "-t msdos" is deprecated.
mount_msdosfs: /dev/ad6s1: mount option  is unknown: Invalid argument
                                    
```
Dennis Williamson :
```
 doesn't work in comments, use backticks instead. 
                                    
```
Dan : Okay I tried with `mount_msdosfs` but I'm unlear what to specify for the L or D switches as the docs don't go into any specific detail. I figured the drive meant I needed the `large` option and here's what I tried: `mount_msdosfs -o large /dev/ad6s1 /mnt/Elements`. Unfortunately this didn't work either, I'm still getting pax (and others) choking on certain filenames containing 'special' characters.

Dennis Williamson : What is the origin of the disk? Was it from a Windows system? What language version? Likely values for -L or -D would include CP437 or IBM437, CP1252, ASCII, ISO-646, other ISO-8859-* or variations of those names or others. On my Ubuntu system there's a directory at `/usr/share/i18n/charmaps/` with files of character maps (the filenames and the header text in them) can be informative. See also: http://en.wikipedia.org/wiki/Character_encoding

Dan : It was a Western Digital "Elements" External USB HDD, of which the external case failed, which I removed it from and popped it into a spare SATA port in my NAS box. OS X showed it as a FAT drive so made the assumption of it being FAT32. I've used it with UK versions of Windows XP and OS X.

Dennis Williamson : If you look at WD's tech support pages, they have warnings about sharing their drives between OS X and Windows. My opinion is that it's just CYA, but you might look into it. However, are the problematic files exclusively of OS X origin? Does the problem have anything to do with resource forks?

Dan : CYA? The files are a mix but mainly of Windows origin

Dennis Williamson : CYA=Cover Their Posterior (approximately)

Dan : Haha, I'll note that for future use. So back home , I tried your advice: `oracle:/mnt/Elements# mount_msdosfs -L ISO8859-1 -D CP1252 /dev/ad6s1 /mnt/Elements/ mount_msdosfs: ISO8859-1: No such file or directory` So I tried without the 'L' switch thusly: `racle:/mnt/Elements# mount_msdosfs -D CP1252 /dev/ad6s1 /mnt/Elements/mount_msdosfs: cannot find or load "msdosfs_iconv" kernel module mount_msdosfs: msdosfs_iconv: No such file or directory` Bugger! I seem to have stalled on this one again. It does appear that there are some issues with this switch though `http://bit.ly/6KsTGW`
From Dennis Williamson
Use convmv to convert the file names if they are really incorrectly encoded. You should prefer mounting the filesystem with the correct encoding in the first place.

From joschi

Fiber link suddenly degrades, cold weather? (Turns out not.)

I live in a dorm where the network is mostly resident-managed and budget is limited.

The network is a basic star topology, with a central switch in the server room branching out to switches that feed two hallways each, with no redundancy. The links from the server room to the hallway switches are fiber optical runs, running at 1 Gbps.

Yesterday, one of the links suddenly degraded heavily, and begun intermittently losing large amounts of packets. The link still somewhat works, but there are periods ranging from a few seconds to upwards 15 minutes where almost all packets are lost. There are no signs of faults in router or switch software or hardware. Our testing shows it is most likely the fiber run that is failing.

We have tried degrading the link to 100 Mbps, this does not improve the quality. We still haven't finally determined the fiber link to be the fault, but intend to make a final check for that later.

There is no option of running a copper cable.

I suspect that it may somehow be caused by the winter weather, although there shouldn't be a good explanation for this, I assume there is proper insulation where the cable runs. The real question here is: Is cold weather a plausible explanation for this sudden fault, and in that case, can we expect the link to improve again once the weather gets warmer?

Update: It turns out that the fiber pair is in fact fine and it has to be a failing client causing the problem. We should be able to handle that.

From serverfault nielsm

If parts of the system are getting very cold, there could be mechanical changes due to thermal expansion. I seem to recall that fiber is very sensitive to alignment issues.

Though I would be surprised to find that an issue with this cause was intermittent.

From dmckee
I've seen fiber do weird things when it fails. Usually though it just drops, period, and won't sync up. You're saying that it just "degrades".

Have you determined that it isn't being flooded with something? It's odd that it works but is just dropping packets. No excess network traffic? Mainly because I wouldn't be surprised seeing as this is a dorm room you're getting degradation from.

Is it possible that winter weather causes something? It's possible. Depends on how exposed the conduit is. Glass is glass, if exposed to extreme temps it could do something odd. Ideally it's not exposed to such circumstances, though.

One thing to try if you're not using VPNs or special network configuration is switch the working fiber on the switch with the non-working one and see if the behavior suddenly shows up on the other dorm while the other one has it go away. That will tell you if it's the fiber connection on the switch and not the fiber run. If it's a Cisco switch with modules you can switch the modules on the dorm side to see if it's the hardware causing issues.

I'd definitely rule out someone trying to use file sharing or torrent software though. Any of that can bog down a switch fairly easily, depending on the hardware.

nielsm : It would be very hard for a single room to overload the fiber link, as the links to individual rooms are 100 Mbit. The traffic graphs towards the Internet don't show any irregular traffic patterns. I forgot to mention, both fibres seem affected, traffic in either direction is lost so there is no fallback. It's good to know that the cold may affect it, unfortunately the forecast shows the freezing weather may continue for another two weeks.

Bart Silverstrim : Not if the routing tables are being overwhelmed. We've had that happen with malware that hits hundreds of IP's in a very short span of time. For the fibers, you mean two pair (one to building one, one to building two) are affected, or one building is being affected? It's *possible* that cold can cause it. But it shouldn't be. If it's that exposed, you're going to probably have trouble over time and it should be replaced with new fiber (better insulated) or a radio link of some kind, depending on the configuration of the site.

Bart Silverstrim : Better insulated meaning the conduit path, not some special fiber clad in space age cladding :-)

nielsm : It's just one fiber pair that is affected, this pair is servicing two hallways in the building. (There is only one building.) The remaining 7 fiber pairs seem to be unaffected. I'm not sure how thorough a malware check has been performed, so another may be in order.

Bart Silverstrim : If your switch supports it, you can mirror traffic to another port. Then monitor traffic through traffic sniffing. You see a lot of weird traffic via all email, or random streams of requests all over with no rhyme or reason, chances are malware or torrents can be contributing. SNMP may help also with gathering stats on how much traffic is going by protocol where and on what port.

Bart Silverstrim : When you're asking about temperature, is the fiber actually exposed outside or in a particular location? I.e., does it make sense that it's affecting this one pair and not the others, when or if other pairs are running in similar paths to this one and similar environments?

Bart Silverstrim : Again, moving that pair to another fiber module on the switch where they all consolidate links in the central server area can narrow down if it's the ports on the switch vs. the run. If you just swap them on the central concentrator and the problem suddenly moves to a new location, bing! You know it's the fiber module on the switch. If you can swap modules on the remote side of the link that can narrow that down too.

mfarver : While it would not make you popular with the other hallway, you could try swapping the fiber pair to another hallway. That would narrow down it to be a switch/user/application issue, or if it really is in the fiber. Fiber properly installed is fairly resistance to environmental issues, but there are a lot of mistakes that can be made. In this case you may have a pinch or a kink somewhere and the movement of the fiber with temp shrinkage has cause a failure. If that is the case it may not come back when temps warm up.

nielsm : We tried hooking up a PC directly to the fiber, instead of the switch, obviously taking the entire hallway down. There were no problems with the link then, so the problem in fact has to be a failing client. We'll figure out which port is causing the problem and talk to the resident. Thank you for your help!

From Bart Silverstrim

(updated) Subfolder needs whitelist and standard redirect for all others

How can I allow access to the foo.html files in the .com/song/private/ subfolder for:

a logged-in Wordpress user; or
any referral domains (including subfolders) I add; or
any URL on our own domain from the com/song/private folder;

For all others, the user should be redirected to the corresponding public version of the Post, which is the same html filename and structured .com/song/foo.html. (The private versions uses a different template with different custom fields for each Post.)

Update: Here's what I have so far:

<Limit GET POST>

 order deny,allow
 deny from all
 allow from domain.com/song/private
 allow from otherdomain.com

</Limit>

RewriteRule ^(.*)$ ../$ [NC,L]

More:

Will that last rewrite rule take people back to the public version, from com/song/private/foo.html to com/song/foo.html?
I found the following rule for detecting Wordpress logged-in status, but what do I put aferward with a RewriteRule, and will it work anyway? (If not, is there an alternative?)
```
RewriteCond %{HTTP_COOKIE} !^.*wordpress_logged_in_.*$
```
N.B.
- I have added code to my root .htaccess allowing me to insert additional .htaccess files in other subfolders as needed.
- Copied from Stack Overflow, where they suggested I ask here.

From serverfault Superstrong

If I understood you correctly, what you ask is probably impossibile to do at the HTTP server level, and needs to be (at least partially) done in PHP.

The allow from that you wrote are probably wrong, they work on the client ip address / hostname, not on the referral of the URL, as you can read in Apache's documentation.

Also, you shouldn't trust anything sent by the client to determine logged in status, that could be easily forged - you should always compare the session data sent by the client with the session info stored server side (and apache alone won't do that). This also means that the first person who finds out that you're allowing access from certain referrals (2nd and 3rd point in your list) will just have to fake them to obtain access to your 'protected' resource.

From Luke404

Move VMware Server 2.0 Image to ESXi

I need to move a vmware server 2.0 image to ESXi and need info on how to accomplish this task. The host is win 2008.

any help would be greatly appreciated

From serverfault Brandon Grossutti

I recently converted from a VMware Server environment to ESXi 4. For each VM, I ran the VMware Converter app and converted from VMware Server (the server 2.0 machine) to virtual (the ESXi machine). Worked great. The only wrinkle was that I had to reactivate each Windows machine. Apparently the "hardware" changed enough to trigger a new activation.

Brandon Grossutti : the issue was that i didnt include all files, in previous moves of servers i only needed the vmx and the disk, i guess the convertor uses more, so i included all files and it worked just fine, thanks guys

From Chris_K
If for whatever reason you can't use the converter to move the machines there are a few ways to do this. First shutdown the guest and find it's data files on the VMware Server 2.0 machine.

You then need to copy those files to the ESXi server either through the Infrastructure Client (host->configuration->storage; right click the datastore and click browse) or through using SCP (eg WinSCP) and enabling SSH on the ESXi machine (http://www.yellow-bricks.com/2008/08/10/howto-esxi-and-ssh/).

Then import the machine by browsing to it's .vmx file in the datastore, right clicking and selecting add to inventory.

Hope these instructions work for VMware ESXi 4...

pehrs : I am pretty sure the file format changed between Server 2.0 and ESXi4 and you need to use the converter.

Antitribu : I've always been able to get away with it as most of the time they are quite good with backward compatibility and the vhd images rarely change. The converter is a much better idea but I've had a few scenarios where that hasn't been an option for a raft of reasons.

From Antitribu

Using an SMTP server that isn't attached to my domain?

I have some clients for whom I manage hosting but I don't actually run the hosting servers. Most of the domains are together on one server/IP address (along with other domains that aren't mine).

One of my clients has started getting a bunch of her emails rejected because they are being blocked by IP-based spam filters. I wrote to my hosting company and their suggestion is that she use her ISP's SMTP server instead of smtp.herdomain.com .

Is that a reasonable request? {She is totally a non-technical person so changing this on her outlook/iphone/blackberry (yes, BOTH) is going to be a huge hassle that I'd like to avoid.} Do most people use their own/ISP SMTP addresses instead of the ones attached to their domain? I was under the impression that using a not-your-domain SMTP server was actually MORE likely to get you marked as spam.

(Apologies if this is not appropriately "answerable". I'm just not even sure whether this is standard practice or a lame hacky solution by my hosting company. Thanks!)

From serverfault Eileen

There really is no general answer (as you pointed out yourself :-)), but AFAIK it's fairly common to use your ISPs SMTP server for outgoing mail, among other things because many systems will view mail coming from a hosted server with suspicion (i.e. flag it as spam); particularly if it's from a dialup IP (there are even blacklists for that).

That said, one simple solution would be to keep your own SMTP server, and have it forward all outgoing mail to your ISP's SMTP server. That is the common setup for intranets: An SMTP server in the intranet, which delivers internal mail and forwards outgoing mail to the ISP's SMTP server.

I was under the impression that using a not-your-domain SMTP server was actually MORE likely to get you marked as spam.

Hard to say (spam filters vary a lot), but this is quite common, as many home users and small businesses have their own domain and use it in their mail address, but don't have their own server; so they will all send via their ISP (I do that myself).

Another option: Instead of your ISP's SMTP, use your hosting provider's SMTP server (they should also have one), so SMTP server and domain will "match".

Scott Lundberg : Concerning the "Another option", that's what her client is already doing... Using the hosting provider SMTP server. BTW, it's SMTP not STMP

sleske : Thanks, fixed the typo.

sleske : As to "using the hosting provider's SMTP server": If she's already doing that, then the hoster apparently somehow landed on a blacklist, so she should probably complain to them, as in Scott Lundberg's answer. Or just use her ISP's server.

From sleske
Yes, I believe it is a reasonable request, however it suggests that the hosting service is allowing people to use their SMTP service for SPAM, which is why RBLs are saying her emails are SPAM.
Anyway, as long as her ISP can add her domain to the SPF records on their DNS server, it should not increase the probability that mail from her domain via her ISP's SMTP server will get tagged as SPAM.

From Scott Lundberg
If emails are being blocked by IP based filters, it is possible that
1. Your IP is on some RBL somewhere
2. Your Email server needs some additional configuration that other servers expect.
If it is a simple problem of getting your IP off some RBL, your client may be able to continue with current SMTP Server.

To check out what could enable you to continue using your SMTP Server, try this Email Server Test

If you are not able to find any issues reported by this test (or) you are unable to carry out the technical changes suggested, the best option would be to use the ISP's SMTP Server.

From Srikrishnan Chitoor

How does RAID detect a faulty HD?

I have been looking over Raid levels over the past 3 days. And have been weighing up the pro/cons of raid controllers hardware/software. I understand that RAID is not a backup solution and I'm perfectly fine with it, though one question still remains.

How does a RAID controller, even Raid 1 to Raid 6 actually detect that a hard disk drive is failing. The research that I have done have showed that most common hard disk drive manufactures use ECC in their hard disk drive design that is suppose to protect against 1 bit failures to an extent 3 bits.

Though when thinking about this, lets say you have Raid (1) and two hard disk drives that are identical. Lets say, data is read from drive 0, and also at the same time from drive 1. Though drive 1 reports a ECC read failure to the Raid Controller.

Now this is the big question, with hardware raid what would the Raid controller do? Its got a signal from the hard disk that the read failed. It can report the hard disk drive as faulty and need replacing.

Does the Raid Controller Seeks to a different hard disk drive for the data until it gets a successfully read from the drive. (Yes, a drive can report read correct and the data can still be corrupted, and RAID does not check polarity or ECC on read)

From serverfault Chad

The answer to the question is going to depend greatly on the RAID controller manufacturer and how they implemented error/failed drive detection.

Chad : The trouble is, I cannot find correct documentation on any RAID controllers how they do this! Its really frustrating to try to find documentation on their error recovery procedures. This I think they do not openly release because trade secrets. Though if you know any RAID controllers that do tell you what happen I would love to read their doco's. For example, the hard drives I use are Server Grade, that report a read failure. Though how/what/why who reads this information is a mystery.

womble : If they're trade secrets, then publishing them on this site would be a very stupid thing to do.

Zypher : The exact algorithms they use are definatly going to be "secret souce"

Bart Silverstrim : Publishing secrets doesn't matter. If a competitor wants to know about it they just reverse engineer it. Pepsi knows very well how to make Coke, and Coke can make Pepsi if they wanted. They don't because there's no point in Coke making an exact duplicate of Pepsi. Same with RAID cards...why make a card that's already out there? Make your own and make it perform better and be more reliable. If it can be proven they reverse engineered anything or "stole" code (which would probably be copyrighted), they'd be sued into oblivion and skewered in public opinion anyway.

From Zypher
There are various methods that a RAID implementations can assess the "health" of a disk (SMART, SCSI "Check Condition" and "Sense Key" messages), but I'm not aware of any published "standard" as to how RAID implementations should act on these methods. The specific steps that each make and model of RAID controller firmware (or, for that matter, a software RAID implementation in an OS) uses are going to vary depending on the manufacturer's design.

All hard disk drives use error correcting codes (ECC) today. At the data densities we're working at bit errors are just a fact of life. Unrecoverable read errors are what matter to a RAID controller. At the level you're interested in, you'd have to have the design specs on both the RAID controller and the drive firmware to really understand how media errors would be reported up the device stack to the OS, and ultimately the user.

From Evan Anderson
Implementation is entirely up to the manufacturer. They could use any mix of tools... calculating parity of data as it's written to the drive and if it's wrong, it flags a possible issue, it could watch hard disk status if there's onboard SMART status, reading errors straight from the drive, see if there's issues through multiple errors to a particular drive, etc...

I've had a controller that didn't KNOW there was an issue with a drive. We had a three-drive RAID 5 where one disk completely failed. Installed a new drive, and in the process of rebuilding one of the good disks upchucked an unrecoverable read error, which is an issue more and more as drives get bigger and manufacturers allow a certain number of these in the manufacturing process. End result? Rebuild from bare metal backup. So when you ask how the controller "knows" the drive is bad, it doesn't necessarily know.

In other words, RAID controllers just do the best they can. They still fail.

The end result is that RAID controllers usually simplify your setup by abstracting the work from the software, they offload processing power to dedicated hardware, and they add (usually) some better support for telling the end user which drive is bad (through software tools and/or blinky lights) so you don't have to guess which one is bad.

Software RAID is integrated with the OS, it's far far cheaper, and it's just about as reliable now (if you're talking about Linux especially) and nearly as speedy (in some cases, faster). It also doesn't need special drivers unlike many controllers. If you use a high-end card it'll probably perform better but for most home-grade RAID they tend to be comparable in speed.

If you're talking about motherboard RAID, it's not really RAID. It is a crappy version of software RAID, and it makes it nigh impossible to recover data if your motherboard goes south because often they're vendor-specific in how they mess with data on the drive. I've had cases where a system failed and you couldn't take the drive from the array to another system to recover data from.

Overall, unless you're talking RAID for servers in a business or have really specialized needs, software RAID is probably on par with hardware RAID for %90 of what home users would use it for.

From Bart Silverstrim
I asked a NetApp engineer who was giving us a talk this very question. His answer, more or less, was:

Nobody reads the checksums on reads. There's no point. Reading a checksum means you have to read the entire slice plus checksum, then compute the checksum to verify you have the correct data. Plus the orthoganal checksum if you are running RAID-6 or whatever. It is a total performance killer because it breaks the ability to randomly seek to totally different sectors on different disks at the same time. Similarly, almost nobody reads both sides of a mirror in RAID-1 because if you only read one side you can alternate which side of the mirror you read from so that you get faster throughput, and if you suddenly have a mismatch, which disk do you take as correct and which do you take as broken? All modern RAID systems depend on the on-disk controllers to signal the RAID controller that they are in distress (through SMART or the like), at which point that disk is almost always kicked out of the array. Checksums are used for rebuilding arrays, not for read-verification.

Chad : Thanks, exactly what I needed to know. RAID is not a real backup solution so the only conclusion I can come to RAID is to use it purely for the performance boost and not data recovery or data fault tolerance. If it does recover from a hard disk drive failure (great) but a pure backup and restore solution is better for fault tolerance.

John Gardeniers : Wrong Chad. The real purpose of RAID is to provide redundancy. Redundant Array of Independent (originally it was Inexpensive) Drives. Performance boosts are an added bonus for some configurations.

Chad : John there is no point talking about redundancy if your cannot guarantee that the data your reading from the drive correct or not. There is no redundancy I see in RAID, if it does recover then its all good but you have no validation if the data your reading is corrupt or have been silently corrupted from a faulty hard disk drive. So what Im saying is redundancy is useless if you cannot guarantee integrity. Something that Raid does not provide. So your better using it for performance and using backup that provide integrity.

Chad : Though correct me If I'm wrong, you can have all the redundancy in the world, but it could all be corrupt so it becomes worthless.

Helvick : It could always be corrupt but the redundancy gives you additional confidence that you can support service continuity to meet a target SLA. Nobody will give you 100% guarantees, what you get are better guarantees for higher costs and you need to balance those. No matter what your business continuity targets and mechanisms are you will still need a disaster recovery plan and a mechanism to selectively restore (or recover from archive) that can deal with the situation(s) when redundancy isn't enough or doesn't provide the service you need.

David Mackintosh : Chad, you can never make guarantees. There are always possible causes of errors -- cosmic rays, phantom writes, meteor showers, maintenance staff plugging vacuums into the wrong circuits, whatever. What RAID-1, -5, -6, -10 do is increase the possibility that for some errors you will be able to recover from it without losing data and (depending on the controller) without any downtime. RAID is not a backup.

Chad : Thank you Helvick and David Mackintosh for leaving the comment. You two are correct, and I was wrong. I rang up a server professional to discuss RAID and he said exactly the same thing. Though that RAID is redundancy as a fall back as a slower service but still keeps running well you correctly restores your system. Though it larger discussion than just RAID, went into backup and restore and long storage durations.

From David Mackintosh

Windows Server 2008 task scheduler History issue

I am baffled by the new Task Scheduler in Windows server 2008. I have an application I wrote that Performs some data related tasks. I run this app every 10 mins. If new data is present then it is processed. This app has run for years under Win2k3 server.

I set upi the task in Windows Sever 2008 using the "create basic task" wizard. Most of the menus loook familiar and it looks like there are even more options now. When I get to the end, I open the dialog to tweak the settings and set it to run every 10mins.

The first thing the application does is create a log file, so I wait for the log file to appear. It never does. I launch the application myself by doubld clicking and it appears, so the app is running fine (under the administrator account in which I created the task)

Next I let a day go by and return to examine the "History" tab. According to this, the app has been run every 10 mins for 24 hours or more yet no log file has been created!

Close inspection reveals 6 "events" associated with each occurance: EventID - Task Category - Operational Code

107 - Task triggered on schedule - (1)
319 - Task Engine received message to start - (1)
100 - Task Started - (1)
200 - Action Started - Info
129 - Created Task Process - (2)
201 - Action completed - (2)
102 - Task completed - Info

The Application creates a windows form with which the user can interrupt the processing if need be. On Win23k I would observe it popping up for a second or two and then dissappearing to confirm that the app is being triggered during the day. I do not see the dialog.

To deepen the mystery, the app does actually perform the tasks, meaning it is running.

Can someone explain what is going on here please?

From serverfault Mike Trader

Sounds like a security\permissions issue related to the creation of the log file to me. Your script isn't breaking so that means that either you're not trapping that error or it is actually creating the log file but the account you're using to look for it doesn't have access although that last possibility seems unlikely.

I think your title is inaccurate- you say that the app does perform the processing tasks so the scheduler is telling the truth and it is executing what you ask it to when required. What's not working is the precise behavior of the script you are running in the security context provided by the scheduler on Win2k8. Have you tried explicitly scheduling it in a particular user context? Simply scheduling a task while logged in as an Administrator will lead to the task being run in System Context which may not do what you expect. There have been changes in this area between W2K3 and W2K8 so it's not surprising (to me at any rate) that something that worked OK on W2K3 would fail on W2K8.

Mike Trader : Wow that was fast. I am open to suggestions for a better title

Helvick : Much better title now. :) . Without knowing more about your application I cant be sure but it looks to me that you might be running into a security boundary problem. W2K8 has tightened the controls on some things like having scheduled tasks running in system\service\admin contexts providing interactive interfaces to users as these have been used for privilege escalation attacks in the past.

From Helvick

How to boot machine with failed drive to run SeaTools diagnostics ?

I have a failed 750GByte SATA Seagate drive which has failed within the first year of it's 3 year warranty.

The Seagate warranty site stresses the importance of testing the drive with "SeaTools" (and obtaining some diagnostic error code) before returning it.

If I attempt to boot up a machine (Dell T5400, XP64) with the drive connected by a SATA cable, the machine doesn't get past POST (waited 5+ mins to see if there was a timeout... maybe should have waited longer?). If I disconnect the drive, disable that SATA channel and reboot it still doesn't post. If I disconnect the drive I can boot, but if I hotplug the drive once past POST SeaTools doesn't detect it.

Is the drive just very dead (too dead for SeaTools to even notice it) or is there some way I can get it to be visible to SeaTools ?

I've tried the drive in 2 machines. The other one does eventaully POST after a couple of minutes (and there's no hint of the drive in the BIOS's view of drives), but is a Linux box (so no SeaTools).

From serverfault timday

I'd tend to agree with your "very dead" assessment-- especially if you can get the same behavior out of another machine.

I have seen the failure mode you describe with failed SATA and PATA disks on various machines going back into the late 90's. I don't know the specifics of the communications protocol used by ATA devices, but it certainly seems to be a lot more prone to a "crazy" device causing the kinds of issues you're seeing than, say, a SCSI device.

Seagate has taken returns from me, in the past, with a trouble reporting indicating that SeaTools could not be run because the BIOS either did not detect the drive or would not POST with the drive attached. Your luck, obviously, may vary.

Dave M : Seagate has alos taken returns based on the fact that the systemn is "dead" and you cannot run SeaTools. Somettimnes a secind call gets a different agent with a different attitude or understanding of the nature of things.

timday : The return authorization system had some option like "system won't POST" so I picked that. They sent me a replacement drive.

From Evan Anderson
You can say on the RMA form that you're running a Mac, and since they don't provide SeaTools for Mac, they waive the requirement for a SeaTools diagnostic code.

I got a drive replaced a few weeks ago and this was my experience.

From frou
I had same issues with dell vostro 1520 what i have done here i got fedora 12 live cd got my all data back from 250GB hdd.

timday : Data recovery isn't an issue: the failed drive was one half of a RAID-1ed pair, already restored from degraded mode with a fresh (non-Seagate!) drive. Hurrah for RAID1! Actual downtime on the production system was maybe 20 mins. The question is about getting Seagate to honor their warranty and the problem of running Seatools when a system with the drive in won't post.

From Rajat

Do NATs verify the source port of a SYN received after a SYN has been sent?

I'm unsure of whether this question is more appropriate for stackoverflow or serverfault. If you think it's more suited for stackoverflow, let me know and I'll delete this and move it over.

I have a STUNT implementation. If you don't know what STUNT is, it's a protocol used to make a direct TCP connection between two peers behind separate NATs. I'll give a brief overview, though my question doesn't directly relate to the protocol.

It's done by using a third party to predict which port each of the peers' NATs will map their next outgoing connection do. The third party then tells each peer the other's predicted port, and they both attempt to connect to each other on the predicted ports. When they fire off their SYN packets to each other, it opens a 'hole' in the NAT at that port, which allows the SYN packet through and the handshake to take place.

One of my colleagues suggested having the peer that initiates the STUNT connection attempt to fire off SYN packets to the predicted port, as well as the next four ports, in case the predicted port is used by another application (or even our application) before our connection is attempted, but after the predictions have been made.
An example of this would be predicting that the other peer is going to connect to us from port 80, but then another application ends up using port 80, so the connection actually comes from port 81; but firing off SYN packets to five different ports, we would (in theory) succeed if it comes from 80, 81, 82, 83, or 84.

However, this isn't the case; when tested, only the first SYN packet has a chance of succeeding. Even if the first SYN packet is sent to the wrong port, but one of the next four are sent to the correct port, they're all silently dropped; there's no response, the connection attempt just times out.

A quick example:
Peer A is initiating a STUNT connection to Peer B.
Peer A is predicted to connect to Peer B from port 1000.
Peer B is predicted to connect to Peer A from port 2000.
The server sends 1000 to Peer B and 2000 to Peer A.
Another application on Peer B's computer makes a new connection, taking over port 2000.
Another application on Peer A's computer makes a new connection, taking over port 1000.
Peer A simultaneously attempts to connect to Peer B on ports 2000, 2001, 2002, 2003, and 2004; the connection attempts come from ports 1001, 1002, 1003, 1004, and 1005.
Peer B attempts to connect to Peer B on port 1000; the connection attempt comes from port 2001.
A hole is created in Peer A and Peer B's NATs, at ports 1001 and 2001 respectively; it should allow SYN packets through.

What I believe is happening is that Peer B's NAT isn't allowing Peer A's SYN packet through port 2001, because it was expecting the connection to come from 1000 but it came from 1002. This suggests to me that the NAT is verifying the source of the SYN packet.

However, from what I've read, one flaw of STUNT is that when creating the connection there's a window of vulnerability where the connection can be hijacked by another source. If it's standard behavior for the NAT to verify the source of the incoming connection, then I don't see how this window of vulnerability can exist.

Note that my implementation is not flawed. If either of the predicted ports is correct, the connection will succeed; the problem happens when both of the predicted ports are wrong and it resorts to attempting multiple ports at once.

As a side note to anyone familiar with the protocol, I'm using the method that involves sending a SYN that I expect to be silently before attempting the actual connection. I'm not using the low-TTL implementation.

Is my NAT rejecting the SYN packets because they aren't coming from the expected port, or is it something else that's rejecting them? If it is my NAT, is this the expected/standard behavior? Note that by rejected I mean 'silently dropping,' as no RST or any response at all is being returned, the connections just time out.

EDIT: The routers in question are both the kind you'd buy at Walmart for home-use, not the kind large businesses would use.

From serverfault dauphic

Short answer: Yes-- your NAT is rejecting packets because they're not exactly what it expects. This is going to vary from NAT implementation to NAT implementation.

Editorial aside: I hadn't heard of STUNT before... (STUN, yes-- not STUNT). Oh, ick... what a hack. It makes me weep that the Internet has turned into this NAT trailer park and that we're resorting to hacks like this.

After reading the paper I get what they're doing. It's basically STUN with the added bit of packet sniffing on each "endpoint" to sniff the intial sequence numbers and trade them up via a third-party on the 'net who can accept arbitrary incoming TCP connections. It's a cute trick, actually.

You're never going to get 100% reliable communication like this. I see that the researchers who initially implemented this have some test results that look fairly good at port prediction, though... That's actually a very interesting, albeit dense, little chart. The paper goes into all the prediction issues in more depth, and gives some success percentages. It's shocking to me that they did so well, actually.

Any sane NAT implmentation is going to use, at minimum, source IP, destination IP, protocol, protocol source port, and protocol destination port to identify "connections". (Hopefully they're tracking sequence / acknowledgement numbers, too.)

If your SYN doesn't exactly match the combination of these attributes stored in the device's NAT table the NAT implementation really ought to silently drop it. That would be the behaviour that I'd expect.

squillman : Nice. Even though I live in a trailer park. (jk)

dauphic : Thank you, I'll take it. I just wanted a second opinion before deciding whether it would make sense to leave this 'scatter shot' implementation in, or just completely remove it because it wouldn't do much to increase the chance of success.

Evan Anderson : So many of us live in Internet trailer parks. I'm horrified that we're not going to see IPv6 adoption but, rather, the rise of "carrier grade NAT" implementations. I can hear the dripping as Time Warner salivates over putting their Customers behind gigantic NAT implementations.

Farseeker : It dissapoints me that even in 2010 ISP's "Do not believe that running out of IPv4 addresses is anything to be concerned about". Only two or three Australian ISP's support it, and NONE of the big ISPs do. They don't even offer IPv6 tunnels.

dauphic : Luckily, UPnP is supported by most routers these days. It's a bit annoying, but I believe UPnP gives us a not-completely-terrible-like-STUNT solution to most of our peer-to-peer communication difficulties.

joeqwerty : It doesn't make sense to me that a firewall would track the SYNs and ACKs in it's NAT, state, or session table as it would introduce additional memory and CPU overhead and also the fact that UDP is connectionless and doesn't use the three way handshake, so it would only be effective for TCP connections, which would seem like a half baked way of manufacturing a firewall. AFAIK, the NAT table tracks the 4-tuple that comprises the connection: source port, source address, destination port, destination address.

Evan Anderson : The NAT implementor can be really slipshod in their tracking of TCP state, or they can be really precise. It's up to them. (Obviously, one can use different tuples for tracking different protocols. TCP and UDP are just two protocols that might need session-level tracking. There could be lots more, and you could be tracking application-layer tuples, too. Ideally, you design your NAT engine to be extensible on a per-protocol basis.) Personally, I'd want a NAT implementation that tracked the TCP state very closely because it would be much more difficult for spoofed packets to "leak" by.

Evan Anderson : TCP sequence number tracking and initial sequence number (ISN) randomization might be nice if I were NAT'ing a box stuck running an operating system that had easily guessable ISNs, too. Mangling the sequence number, source port (on outbound connections), and possibly the TOS would do a great job of obscuring the operating systems and number of hosts being the NAT device, too.

Teddy : You *read the paper* **and** wrote this answer in just 40 minutes? FGITW indeed.

Evan Anderson : Oh, no-- I skipped around in the paper and didn't read it word-for-word. I got the idea they were trying to communicate fairly quickly, since it's basically another STUN implementation, just with some added wrinkles.

From Evan Anderson

Saturday, January 29, 2011

Blog Archive