IBM/Lenovo BNT G8000 – the official support response

This is a follow up post on my previous post in regards to a firmware upgrade of an IBM/Lenovo BNT G8000 gone bad.

I was hoping I was going to keep talking to Erik, because he seemed to have understood what I was looking for. Unfortunally, he was off on friday, and I ended up trying to explain this to a new person (who I will not name).

So, I’ve again been mailing back and forth with the System-X support for a day, trying to explain that I am not looking for support – but rather looking to report a severe error in their firmware for the BNT G8000 that can bring down entire networks if a certain hardware error occurs.

As a sidenote, thinking about this for a while and talking with some colleagues about it made me think that there might be a serious security flaw involved aswell. What if one would create something that mimics the behaviour of the broken switch – would one able to take any BNT based network offline?

Food for thought, but it is concerning.

Anyway, I know you want to know the official response from the IBM System-X support, and it is as follows:

 

I have talked to my managers about this issue, and it is as I informed in my previous e-mail.

Since the machine is not covered by a support agreement, you are using a firmware that you are not entitled to use. If you want further help with this, you must approve the cost suggestion for support, then we can move forward.

Please let us know how you want to proceed.

 

So yes, the support organisation of IBM/Lenovo System-X wants me to pay 553 EUR per hour to help them sort out what seems to be a severe flaw in their BNT product line.

It seems IBM might have made the decision on moving to another brand very easy, but I have not yet given up completely. It could be that I am just stuck with a support organisation following their instructions to the letter without room to handle special cases.

Got some other approaches to look into on monday to get this report handled, but the frustration of banging my head against the wall here is .. big.

Addition. Just to clarify, as some people seem to misunderstand this post for some reason: I am not trying to “open a support case”. I am not looking for support at all. The switch is broken – a new one will be bought. Shit happens. I have however tried to report a severe system flaw and potential security issue in their product, and the existance of a support agreement or “right to use the firmware” (which was a “quiet change” in IBMs terms less than a year ago, shame on me for missing it) is not relevant at all here.

Please add your comments, thoughts and questions in the comment section below – and don’t forget, you can always reach me at blog@engren.se if you want a personal contact.

 

IBM/Lenovo BNT G8000 – fimware upgrade gone wrong – part 1

blade-wops

Hello reader!

You most likely found your way here because Google didn’t show that many posts on this subject – so let’s get straight to my last experience of the IBM/Lenovo BNT G8000 switch and the IBM/Lenovo System-X support organisation.

I mean, when the IBM/Lenovo BNT G8000 is working, it’s a beast. It handles the network smooth and extremely fast – and as a network administrator, the features for sure kept me happy. In combination with the selectable CLI (either ISCLI, that is “Cisco compatible”, or “Menu”, which is basically the Nortel/Alteon switch interface <3 ) – it kept me happy, and the system administrators with their iSCSI depending systems very happy aswell.

We all slept like babies, knowing that our IBM/Lenovo BNT G8000 switches were performing extremely well, 365/24/7.

We did regular firmware upgrades to the units, following all of the v6 series of the firmware, and decided to hold back for a while when they ramped up to v7 and included MCP Linux kernel updates to the firmware.

Two days ago, it was finally time to take the step. Go from v6 firmware, up to the latest available v7 release. Obviously, alot of stuff has changed, but I went through the release notes several times and found nothing that seemed alarming or cause for massive concern. Some bug fixes, some security fixes – the usual stuff.

I started the process by prepping the switches with the new firmware as image2, and uploaded the latest boot code aswell. Smooth and straight forward, as expected based on previous experiences.

Leaned back, waiting for the service window at this particular site to start. Talking to the system administrator who wanted to be kept in the loop, joking and talking about general things.

Service window time. Woop!

Rebooted the first switch. The magical seconds that feels like minutes passed, and the switch came up again. Slighter faster reaction in the command line interface, some new options in the menu. Seemed good.

Checked in with the systems guy, and as expected, he noticed no downtime – this specific site where this set of IBM/Lenovo BNT G8000 units are located are built with redundancy and resilience in mind, with every host server connected to the two IBM/Lenovo BNT G8000 units with 2 (or more) ethernet cables.

Not one single component shall be able to have a noticable impact on anything if it goes down.

Sounds pretty normal these days, right?

I know.

I take a short break to leave the switch in place with the new firmware running, just in case something would be acting up. iSCSI traffic flowing as expected, nothing weird on interface counters anywhere. Everything seems perfectly normal.

After a while, I decided it was time to get moving with the second IBM/Lenovo BNT G8000 switch. Since everything is prepped, it’s just a matter of setting what image to use on boot – and then reset it.

Said and done.

The magical seconds pass. Some more seconds pass. Hm. What’s going on?

Oh, there. It responds to ping again! But only 8. WTF?

At this time, our system administrator is losing connectivity to the servers, and I am shortly after booted out of switch 1 that is no longer responding to ping.

Every now and then, like once per second minute, I get access to it for 15-20 seconds – then access is lost again. Switch #2 is completely unreachable.

Okey, this is bad. Really really bad. This site has iSCSI based storage for all guest servers in a virtualized enviroment, and a routine firmware upgrade just caused it to go down?!

Thoughts coming through my head right now are many, including what headaches this will give to the system administrators and the users of the services in this site.

Okey, time to get my priority straight. Get atleast one of the switches online ASAFP, and look at what happened second.

I get in touch with our hosting provider, and ask them for remote hands. As usual, they respond quickly and promptly, and went on site to power off both switches, and then just power on switch #1. After all, things were working fine with switch #1 before I updated switch #2, so it seemed like a logical choice to move forward quickly.

I get access to switch #1, and login to disable all LACP ports to switch #2. With that done, I ask the hosting provider to turn on switch #2 again, so I can connect to the switch using serial console cable and see what’s going on.

Upon connecting to the serial port, the following is flooding the serial terminal so badly that I have no real idea what is going on in the system interface.

ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=23
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=2
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=18
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=1
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=4
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=6
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=15
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=16
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=17
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=19
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=23
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=2
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=18
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=1
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=4
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=6
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=15
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=16
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=17
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=19
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=23
ERROR: mp_bpdu_send_ucast failed to send packet to u=1 p=2

Ok, this is obviously STP … that can’t send packets to machines connected to itself? Okey, that’s weird… but hey, let’s get rid of the logging.

/cfg/sys/syslog/log all dis
/cfg/sys/syslog/console dis
apply

Pasted the above once I figured out that I was at a password prompt and managed to login. No more logging should be presented to the screen now, and yet, I am still flooded with the above error messages. Guess it’s handled outside the ordinary logging routines. This alone made me very curious, and worried – errors bypassing the logging routines are usually not common at all and basically debug code left behind by the developers.

But, okey. It’s STP. If I kill off STP completely, this should no longer happen.

/cfg/l2/stp off
apply

Yep. Error messages stop. At this time, I’m slowly starting to realize that this switch is broken in a way that is really catastrophic. If STP has gone b0nkers on this switch, it has likely sent invalid STP data to the firewalls and to the neighbouring switches – which could explain the lost connectivity to neighbouring switches and weird behaviour in the entire enviroment.

I go through some of the logs, and realize that STP has been killing off every single port, and enabling them again – but what caught my eye made me start suspecting an actual hardware failure :

?????? 7:21:27 COMPANY-SW2 NOTICE server: link up on port 3

This should be compared with the log entries on switch #1 :

Mar 25 7:22:01 COMPANY-SW1 NOTICE link: link up on port 43

wtf5

Yeah, something really has gone wrong with the firmware upgrade, and it sortof smells like a hardware failure. As I’m sitting here thinking just about what I have just seen happening, another error message presents itself – again, like the BPDU messages, completely outside the scope of the normal logging routines:

UNIT 0 ERROR interrupt: 4 PCI Fatal Errors on Memory read for TX

Right. Fatal memory read error.

Now, these units are not under a support agreement or hardware warranty agreement, so in a normal case, it’s a matter of “let me just order a new one” – but the effects by this error are so extremely severe that I decide to report this to IBM/Lenovo System-X either way.

I got in touch with an awesome support guy named Erik, who completley understood my situation and realized that I wasn’t really looking to get “support”.

I wanted to report a severe software issue, both in regards to being able to get brought down by an erroring neighbour, aswell as in regards to hardware checks and how to handle situations in regards of a severe hardware failure.

Now, I got a ticket number just today, and Erik just recently got my massive reply to his questions which I guess he’ll be looking into tomorrow – but I am expecting a call from somebody that sees the severity in this situation and wants to accept my error report.

Heck, I’ll even send them this broken IBM/Lenovo BNT G8000 switch to be able to investigate this properly – NO enviroment should ever be able to get taken offline like this because of ONE component failure .. and still, this happened.

And yes, this means that firmware updates on the other units on other sites are currently halted and will not proceed until this has been sorted by IBM/Lenovo.

Oh, right, almost forgot – the automatic IBM/Lenovo support system has sent me a mail with a price suggestion on support for equipment without a support agreement – 553 EUR per hour, minimum billable time 2h.

Funny. There will be no 553 EUR per hour paid to be allowed to help IBM/Lenovo System-X fix a severe system flaw in their BNT product line – but it will be considered, if the 2 hours would result in a replacement IBM/Lenovo BNT G8000 ending up in the datacenter – but it’s more tempting to go look at another brand and replace all IBM/Lenovo BNT G8000 at the other sites if this report isn’t handled properly by IBM/Lenovo.

To be continued …

Additional comment: The System-X support has informed me that they updated the terms for downloading and using firmware less than a year ago, to require a support contract to be “allowed” to use it. Oups! OK, that’s a separate issue to look into. However, it’s important to note that I am not looking for support here. I do not want to open a support case at all. I want to report a fatal flaw in their software – and this flaw exists, no matter if there is a support agreement in place or not…

2015-03-28 – I’ve gotten a response. Read the follow up post here.

 

Some pfSense commands to keep handy!

721px-Pfs-logo-vector.svg

 

command description
pfctl -d Deactivate the pf packet filter – disables all fw functions
pfctl -e Activate the pf packet filter – enables all fw functions
pfctl -sn Shows current NAT rules
pfctl -sr Shows current filter rules
pfctl -ss Shows the current state table
pfctl -sa Show as much as possible.
viconfig Manually edit the configuration in /conf/config.xml. Once file has been saved and editor exited, the /tmp/config.cache is removed so the next config reload event will load config.xml, not the cached version. You could run the next command to trigger an instant reload.
/etc/rc.reload_all Reload the Firewall with all the configuration. This also restarts the webgui and sshd – but keeps the current ssh sessions active just as a regular sshd restart.
 
 

Auto updating WordPress CentOS VPS with GleSYS Internet Services

Update 2013-11-01: GleSYS has released a new version of their control panel, so I’ve remade the steps necessary to create the server in this post.

Hi guys,

A friend/ex-colleague of mine has been on the look-out for a decent web hotel that would be able to serve their need of WordPress, preferably with phpmyadmin and other web based tools for administering the server. I’ve left out phpmyadmin here, but I’d be happy to post a follow up if anyone wants to.

Just because of that, I figured I’d put together a small tutorial on how to create an OK VPS for a usual WordPress installation. The advantages of using a VPS is ofcourse that you can crank up the performance in case your WordPress site becomes very popular.

First of all, you need an account with GleSYS Internet Services AB – http://www.glesys.se/ – you can register online, and ofcourse, they accept Paypal payments so no need to hand out your card details anywhere.

Once registered and logged into the control panel, you go to the server tab on top. This brings you to an overview page. Before we start to create a server, we need to reserve two IP addresses for your server – one IPv4, and one IPv6. Select “Manage IPs”, and then get fill out the setup like this. Click “reserve address” once you selected one of the available addresses.

And repeat for IPv6 ;

Now, to setup your server – make sure you are on the “Server” tab still, and on the right – you have the option of “”Create new server”. That brings you to a separate page with multiple steps.

Step 1 – select a datacenter and type of VPS, and operating system.

 


 

As the operating system, I’ve chosen CentOS 6 64bit. You might prefer another flavour for your Linux distribution, but please keep in mind that CentOS is basically a rebranded Red Hat – one of very few enterprise grade Linux distributions out there. I’d recommend CentOS any day simply because of the stability it provides – and this guide is ofcourse also aimed fully towards CentOS. ;)

Depending on what you are looking for in regards to VPS type, you can choose between Xen, OpenVZ and VMware. In this example, I’ve chosen Xen simply because it’s the best option available at this point. OpenVZ is cheaper, but not a fully virtualised system. VMware is still in beta, but the difference between Xen and VMware won’t really be noticable for you, unless you have special needs.

Moving on to step #2 !

In this step, you will configure the base bits of the server. Hostname, root password, amount of RAM, diskspace and how much you initially expect the server to be using in regards to transfer. I’ve chosen a minimalistic approach here – but if you have a very popular WordPress site, you might want to increase the CPU cores and amount of transfer already.

Do not forget to select the IPv4 and IPv6 addresses in the dropdown selection boxes on the right. You want to be able to reach your server once it’s installed.

Click “create server”, and GleSYS automated system will create the server for you in a few seconds, and then present you with a new summary screen :

Once completed, you will see your server active and running – and you are now able to get your hands dirty with some Linux and CentOS commands to get automated updates in place, Apache, MySQL, WordPress and other tools that you might be intrested in!

Use your favourite SSH client to log into the server. I prefer putty, but you might already have your favourite. Log into the server using the username “root” and the password that you selected. This should bring you right into the system, with the following prompt appearing on your screen, with the prompt blinking restlessly wanting you to do something:

[root@myhostname ~]# 

The first command to run is yum upgrade to bring your installation of CentOS 6 to the latest available. This process will be automated later on in this tutorial, but for now, let’s just get everything up to speed with the latest release to be able to move forward!

[root@myhostname ~]# yum upgrade

You will be prompted with the following text once you’ve typed yum upgrade

Transaction Summary
================================================================================
Install 1 Package(s)
Upgrade 143 Package(s)

Total download size: 119 M
Is this ok [y/N]:

Press ‘y’, and the system will start upgrading. This might take a few minutes.

Once upgraded, you want to add a user that is not root. This is to avoid using the superuser user for anything really. If your root password happens to slip into the public, you have major problems!

The following will add the user ‘hans’, with home directory in /home – and we’ll also make sure to change the password to something that we’ll remember but that is hard to guess.

[root@myhostname ~]# adduser -d /home/hans -c "Hans Engren" -m hans
[root@myhostname ~]# passwd hans
Changing password for user hans.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Now, before proceeding, follow the tutorial available at http://www.pendrivelinux.com/how-to-add-a-user-to-the-sudoers-list/ prior to moving on here. This ensures that your user will end up in the sudoers file, and you will never have to touch the root user again.

In order to make sure you don’t have to do complex scripting, I’ve taken the liberty to create some ready shell scripts for you, to be placed in different locations. I’ll try to explain each script for you.

yumupdate.sh should be placed in /etc/cron.daily/. This script will perform a ‘yum update’ once per day, to ensure that your system is always up to date with the latest updates and security patches around. Important!! => This is only for OS level – your WordPress installation will still have to be maintained and updated by yourself!

[root@myhostname ~]# wget -O /etc/cron.daily/yumupdate.sh https://www.engren.se/scripts/yumupdate.sh
[root@myhostname ~]# chmod +x /etc/cron.daily/yumupdate.sh

You also want to install rpmforge, so your system can use suPHP. suPHP is a pretty nifty Apache extension that ensures that only the user that owns a certain PHP script is allowed to run it. This means that nobody can sneak in bad scripts on your system and fool you into running them. Probably very little risk on a dedicated system like this, but I prefer to use this as a best practice.

[root@myhostname ~]# rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt
[root@myhostname ~]# cd /tmp
[root@myhostname tmp]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm

Now, make sure that the rpmforge you downloaded is the right one.

[root@myhostname tmp]# rpm -K rpmforge-release-0.5.2-2.el6.rf.*.rpm

The result should say something like:

rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm: (sha1) dsa sha1 md5 gpg OK

Now, time to install the package.

[root@myhostname tmp]# rpm -i rpmforge-release-0.5.2-2.el6.rf.*.rpm

Move on with installing Apache, taking backups of the configuration file and autostarting Apache:

[root@myhostname tmp]# yum install httpd
[root@myhostname tmp]# cp /etc/httpd/conf/httpd.conf ~/httpd.conf.backup
[root@myhostname tmp]# /sbin/chkconfig --levels 235 httpd on

Moving on, we’re installing MySQL:

[root@myhostname tmp]# yum install mysql-server

The output will be something like this:

================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
mysql-server x86_64 5.1.61-1.el6_2.1 updates 8.1 M
Installing for dependencies:
mysql x86_64 5.1.61-1.el6_2.1 updates 881 k
mysql-libs x86_64 5.1.61-1.el6_2.1 updates 1.2 M
perl-DBD-MySQL x86_64 4.013-3.el6 base 134 k
perl-DBI x86_64 1.609-4.el6 base 705 k

Transaction Summary
================================================================================
Install 5 Package(s)

Total download size: 11 M
Installed size: 32 M
Is this ok [y/N]: y

Ofcourse, you answer ‘y’ on this prompt. Once installed, you want to autostart MySQL too.

[root@myhostname tmp]# /sbin/chkconfig --levels 235 mysqld on

Then, let’s start MySQL and launch the configuration script mysql_secure_installation.

[root@myhostname tmp]# /etc/init.d/mysqld start
[root@myhostname tmp]# mysql_secure_installation

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!
In order to log into MySQL to secure it, we'll need the current
password for the root user. If you've just installed MySQL, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none):
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MySQL
root user without the proper authorisation.

Set root password? [Y/n] Y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
... Success!
By default, a MySQL installation has an anonymous user, allowing anyone
to log into MySQL without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] Y
... Success!

Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] Y
... Success!

By default, MySQL comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] Y
- Dropping test database...
... Success!
- Removing privileges on test database...
... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] y
... Success!

Cleaning up...

All done! If you've completed all of the above steps, your MySQL installation should now be as secure.

Thanks for using MySQL!

This has secured your database a bit, and we can now move on with creating the database for your WordPress. Use the MySQL database root password that you entered in the secure installation script previously.

[root@myhostname tmp]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.1.61 Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database wordpress;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on wordpress.* to 'myuser' identified by 'mypassword';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

That’s MySQL configured, with database created for WordPress. All good. Time to install PHP and suPHP.

[root@myhostname tmp]# yum install php php-pear php-mysql php-gd php-cli mod_suphp

Just answer ‘yes’ on the prompt that pops up, and you’ll have the necessary components installed.

The next step is to add a user for WordPress PHP files.

[root@myhostname tmp]# adduser -d /var/www/html -c "Wordpress" wordpress

You also want to ensure that you create a password for the wordpress user. Not having a password might be a bad idea, so setting a password for freshly created accounts is a good thing. You can chose something complicated here – you won’t be using it as WordPress 3 can do auto updates without an FTP account.

[root@myhostname tmp]# passwd wordpress

Follow the on screen instructions.

Next step, change ownership of /var/www/html:

[root@myhostname tmp]# chown wordpress:wordpress /var/www/html

Also, there are some files that you need to replace. I have prepared them for you, but in short, /etc/suphp.conf needs to have some options enclosed within quotes (“) and the package from rpmforge does not have that – and the Apache suphp.conf requires some modifications to override the built in PHP configuration and run it via suPHP instead.

[root@myhostname tmp]# wget -O /etc/suphp.conf https://www.engren.se/scripts/new-suphp.conf
[root@myhostname tmp]# wget -O /etc/httpd/conf.d/suphp.conf https://www.engren.se/scripts/suphp.conf
[root@myhostname tmp]# wget -O /var/www/html/phpinfo.php https://www.engren.se/scripts/info.php.txt
[root@myhostname tmp]# chown wordpress:wordpress /var/www/html/phpinfo.php

Now, restart Apache.

[root@myhostname tmp]# /etc/init.d/httpd restart

It’s now time to check if your PHP and Apache installation worked – much simpler than you might think!

Use a web browser, and browse to http://YOURIP/phpinfo.php

If you see something simular to this picture, PHP is installed and functional just as expected! If not, something failed – contact me either via the comments field or via mail and I’ll be happy to help you out.

This means it’s time to move ahead with the WordPress installation.

[root@myhostname tmp]# wget -O /usr/local/src/wordpress.tar.gz http://wordpress.org/latest.tar.gz
[root@myhostname tmp]# cd /var/www/html
[root@myhostname html]# tar xfz /usr/local/src/wordpress.tar.gz
[root@myhostname html]# mv wordpress/* .
[root@myhostname html]# rmdir wordpress
[root@myhostname html]# chown -Rf wordpress:wordpress *

Congratulations, we’re almost done with the WordPress installation! Open up a browser, browse to http://YOURIP/ and follow these picture instructions for the remaining bits’n’pieces. It should be pretty clear from here on!

 

 

 

 

 

 

 

 

That’s it – you now have a working WordPress installation on a self-updating CentOS 6 system, where WP updates are working as expected from day 1, hosted on a VPS with GleSYS Internet Services, one of Swedens absolute top VPS providers.

For further information om how WordPress works, you can find that information here; http://www.siteground.com/tutorials/wordpress/wordpress_start.htm

Please let me know via mail or the comment field below if you feel that I have missed something in this document – or even if you just want to provide feedback if you used it. :-)

Enjoy!