Dan Larsen's posterous
Dan Larsen

Systems architect, developer and entrepreneur.
Tags: #PHP #C/C++ #Ajax #iPhone #Debian #Cloud

TwitterFacebook

Search

November 28th, 12:14pm 4 comments

Solving the SPAM issues - once and for all!

I can't be the only one, that is fed up with SPAM?!?!

I have an e-mail address dating back to 1999! look it up, if you don't believe me :-)

This causes me to get >99% SPAM or so!

Why don't I just get a new e-mail address? I have used the address for all kinds of sites, contact information, company info, etc... I wouldn't know where to begin, if I were to let everybody know that my e-mail changed.

I have started the process by handing out a new address, but the e-mails actually gets sent to the former mailbox.

Why? Well... some websites don't even let you provide a new address...... :-S

 

Anyways... To the solution:

It's actually not that hard, and some has already tried it half-assed (i.e. Microsoft).

 

The real power lies in simplicity!

Basically there are 2 kinds of mail-servers: senders and receivers...

These mail-servers talks to DNS (Domain Name Service) servers, to know where mail for any given domain goes.

This is the problem: I can write a seriously simple script, to send e-mails to almost any receiving mail-server, postulating I am sending the e-mail from whomever I like it to believe (I could be steve@apple.com).

Most receiving mail servers today has a SPAM filter - so what? If I am a bit careful, it will still get through - even worse: who cares if 2 out of 10 doesn't? 1 billion SPAM e-mails, will still get you a very nice profit (I have heard), even if "only" 827 million mails gets through...

This is due to the fact, that there are MANY people out there, with not enough knowledge to seperate SPAM from the rest and actually buys the €#%"#@.

 

The simple solution: Stop accepting mails from servers (and computers, botnets, mobilephones, etc), that hasn't been pre-approved as a sender for a given domain!

We have DNS servers, lets put in another line: server a.b.c.d and v.x.y.z is allowed to send mails from techba.se 

Now we can start changing the mail flow over 2 steps:

 

  1. Stop accepting mails from computers that aren't pre-approved in the DNS - if the domain has pre-approved computers
  2. Stop accepting mails from computers that arent' pre-approved in the DNS

Of course some will think, that it would make everything incredibly hard and cumbersome, not to mention raise the administration amount...
... I have a shit-load of domains, that all has mail records, that would need to be changed once in while...
I wouldn't mind though... I would gladly spend 1/20th - 1/10th of the time, that I get from almost not having to deal with SPAM!

 

Most forget, that we today spend hours and hours setting up SPAM filters and systems, that almost never works anyways.

I have set up my own SPAM filter - it works pretty well, which means: it catches ~98% of the SPAM.

But... Sometimes it thinks a perfectly fine e-mail is SPAM - this is the biggest of my problems! I have to look through ALL (~1500/day) of my SPAM, if I want to be sure I don't loose any mails...

That is not a good enough solution!.. Actually that is not even making it much less of a problem!..

 

Come on people! Let's change the world of e-mail, once and for all! :-)

Filed under DNS SPAM e-mail solution
Posted
November 18th, 10:34am 0 comments

Cost of Clouds vs Dedicated Servers

Yesterday, one of my clients, needed me to advice them on which servers to pick, for a project that needs scaling cabilities.
In that connection, I wanted a better overview of some of the providers out there.
I haven't included expensive managed solutions like Rackspace's dedicated servers, as these don't fit in this category (they are managed).
The reason for this comparison, was to get a less know factor into the equation, when deciding whether to choose a dedicated server or a cloud server.

The question that I wanted answered was: "What is the comparable price, for self-managed servers?".

Some of the cloud companies are professionally vaque in describing, what they are giving you.
To make things even more difficult to compare, in reality they aren't even giving you the same product.
Amazon allocates CPU for you, while Rackspace Cloud limits you if necessary.
This can make a Rackspace Cloud machine VERY much faster, as it will have acces to something like dual quad core 2GHz CPUs.
What makes this an uninteresting fact for me - actually close to a negative point, is that you can run a test a 100 times at one point, but the result is completely useless, as you can not tell, if this will hold at a later point of time.

Amazon allocates a certain amount of EC2 Compute Units (ECU), which they describe as:
One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

If this isn't vaque enough for you, then try out this description from Rackspace:
Each cloud server has 2 quad core processors that are at least 2Ghz+. The 256MB plan will get 1/64 of the CPU allocation, the 512MB plan will get 1/32 of the CPU allocation, and the 1GB plan will get 1/16 of the CPU allocation. The 2GB plan will get 1/ 8 CPU, the 4GB plan will get 1/4, the 8GB plan will get 1/2, and the 15.5GB plan will get all CPU allocation in the server.
Which fortunately is comparably vaque to Amazons description, as the calculation gives you 1+GHz pr. 1GB RAM :-P

The worst of them all, I am sad to say, is definitely Media Temple.
I have had servers hosted at Media Temple several times.
The service is good, customer support fine and pricing reasonable - but... They are the vaquest "cloud" hosting service of them all.
The only way, I could do some comparison, was by checking out their "nitro" product, finding out, what that server was physically, then assume the "dedicated virtual" servers was running on the same hardware. Finally I calculated the CPU like Rackspace Cloud does - by dividing.

There is a lot of different factors, not calculated into this little experiment, like:
  1. Which hard drives, how many and how much capcity
  2. Backed up or not
  3. How many CPU cores
  4. Connection to the internet - but: clouds generally has massive connection, while dedicated are more diverse
  5. General hardware: server-grade or not? I.e. consumer CPUs or server-grade? etc.

Also, traffic is an important factor - especially for the cloud services, as these are "pay-as-you-go".
I chose to calculate 200GB of traffic into the price.
The traffic is spread as: 25% traffic from client to server, 75% from server to client.

Anyways... Here it is! The pretty little chart, that roughly gives an idea, of the cost for 1GHz CPU + 1GB RAM + 200GB data pr. month.
There are a different amount of point pr. provider - these are different configurations.
Amazon and Rackspace Cloud has an amazingly consistent price!

Screen_shot_2009-11-18_at_09

Just a couple of final reminders:

Most of us already knew, that clouds were more expensive, than dedicated servers.
But this gives an idea of how much.

Cloud servers are not directly comparable to dedicated servers, as dedicated servers has ALL resources allocated for you.
Cloud servers is influenced by being managed by virtualization softwaree, sharing resources with a lot of other servers, sometimes not having real disks, etc.

The upside of cloud servers is, that you can start / stop them at any time, dedicated servers is usually paid pr. month or more + it can take days before it's up.
It's always a good thing, to have your servers physically close - with clouds you can start i.e. 100 servers for 2 days and then shut them down again, while maintaining all of the goodness of physical closeness, if your other servers are in the cloud.

Hope this helps a couple of decission makers! :-)
Posted
November 14th, 2:14pm 1 comment

CO2 neutralitet eller elendig matematik? (danish)

Hver gang jeg hører udtrykket "CO2 neutralitet", er der noget i mig, der får lyst til at råbe og skrige: "VRANGLÆRE!!!"...

Lad mig lige gøre det klart med det samme: jeg bekymrer mig om vores planet og dens klima!

Men når jeg ser initiativer som denne: CO2 neutralt website, der lokker med sætninger som: "Hjemmesider med god samvittighed" - ja, så er der noget, jeg ikke helt kan få til at stemme...

Dette er betydningen af neutral, i følge ordbogen.com:

som hverken skader eller gavner

 

Så... Påstanden er altså, at man kan købe sig til, at CO2 udledningen for dit website, fuldstændigt ophører med at eksistere...

Dette er naturligvist IKKE sandt!

Udleder du 1 ton CO2, er der udledt 1 ton CO2.

Hvis du samtidigt forhindrer andre i at udlede 1 ton CO2 - ja, så er der stadig udledt 1 ton CO2.

 

Jeg var personligt involveret, den gang vi naivt troede, det gjorde en forskel, at opkøbe og annullere CO2 kvoter...

I dag er jeg ret overbevist om, at man nok skal finde de CO2 kvoter man har behov for... Afrika skulle være et godt marked - de bruger dem ikke alligevel...

Hvilket bare gør påstanden om neutralitet værre - resultatet vil være 1 ton udledt CO2 + 1 ton udledt CO2 - dine penge!

 

Uanset ovenstående, er påstanden "CO2 neutralitet" et falsum!

Den eneste måde at nedbringe CO2 udledningen er: skær ned på din udledning!

 

Hvis du virkelig bekymrer dig om CO2 udledning, i forbindelse med dit website, skulle du snakke med din foretrukne nørd om Cloud computing!

Fordelene ved Cloud computing er bl.a. deling af resourcer, dermed et lavere samlet forbrug, dermed lavere CO2 udledning.

 

Hvis du er interesseret i at lære mere, er her et par links (på engelsk):

 

Amazon EC2 (Cloud Computing)

Cloud Computing's Green Benefits <- nogle virkeligt interessante udregninger af normal server brug i forhold til cloud computing

 

Hvis der er interesse for det, vil jeg gerne skrive en mere dybdegående artikel, om hvad der gør den store CO2 forskel ved Cloud Computing.

Posted
November 13th, 10:47am 0 comments

Google at it again! Go! Google Go...

An article about my initial experince with Google's new programming language "Go"!

This is probably a subject for the more nerdy of us ;-)

Google has finally had it, with centuries old programming languages and their cumbersome ways!

Anybody working with the technically side of the online industry, will know the names of at least the 10+ programming, scripting or markup languages.
Some of the more common are:
HTML, xHTML, CSS, PHP, ASP, JSP, Python, Ruby, Java, JavaScript, ActionScript, ECMAScript, C, C++, C#, Visual Basic, Bash, etc...
While some of the less common or less known are:
Lisp, Erlang, LUA, SmallTalk, Java (oops! dreaming...), Tcl, etc...

Then there is technologies, protocols and standard/typical ways of doing things, which then gets a name... Like AJAX...

Well... It looks like Google wants to clean up a bit...
So why not create a new programming language?!?! :-P

It's actually one of their "Innovation Time Off" a.k.a. "20% projects", but it's being evaluated along other technologies, for use in new systems and setups.

So... What is it?
It's kind of a new programming language, but not really, but then again... :-P
It's a compiled language. The compiler is written in C, some C code can be linked with Go programs, making the language a little hard to define.
I think you can say, that Go is a new C++.

But why?..
WE NEED IT!
As they state on their site (http://golang.org/), we are using centuries old programming language, not designed to do what we need today.

Most languages has been "updated", hacked, expanded, call it what you like, to do what we need...
But doing what we need, is not the same as doing it good or easy.

I have created a higly scalable multiserver program in C/C++, where the servers know about each other - the core of the code, is very complicated, is held in many files and probably has a bug or two I haven't discovered yet.
With Googles Go language, I could have written it all in 1 (ONE!) file!!!
It would almost be hard to introduce a bug!

But how is that possible?
Well... When working with multiple threads or forks, a lot of network connections, a bunch of files and a memory cache layer... You are in for a ride...
There is SO many things that can go run - so many things, that needs to be handled correctly, so many things that needs locks!

What Google has done with Go, is to encapsulate a lot! of these tasks.
More specifically, they have sncapsulated the parts that are hardest and yet most used.

When you need to have inter-process communication, you establish a channel and communicate!
In C/C++ this is done through either socket communication with all it's glory and weirdness - or through shared memory queues or variables, needing locks and unlocks on everything...
The last time, I wrote code for message queue, inter-process communication, I ended up with 500+ lines of code (if I remember correctly).
The basics of this, would in Go be a couple of lines of code, and with good error handling etc., it would probably be in the 10s!!!

But... Even better yet, is the threading handling... In Go a parallel process is called... a go process :-)
Whenever yo need to run a parallel process, you just write that!
What is interesting, is what happens in the background!

Go handles thread creation for you, but even better: it handles running you code in these threads!
Every system has a thread limit (and a fork limit), which limits you to a set amount of truly parallel processes.
But parallel processing acts as everything else on a computer - it waits... it waits A LOT! Mostly on resources being free'd and accessible.

I'll try to describe this, by giving a quick overview of an example webserver:
If you run a thread pr. connection AND lowers the default allocated memory pr. thread, you would normally get about 6-7000 threads on a typical webserver.
This means you can handle 6-7000 connections every second, as long as you can deliver the content within a second...

That a lot... If you have small commercial site, that tells a little about your company etc...
But let's say, you had a video delivery system...
If YouTube were running on a single server, with this kind of setup, only 6-7000 people would be served over a minut or so.
6-7000 clients would connect, the same amount of threads would start - but then... the threads would spend most of it's time, just waiting for all kinds of things, like response from the client, when it has received a packet of data, etc.

So... When you create this kind of you server, you would look into letting one thread handle more than 1 client.

Go does this for you!
It even selects the thread to execute the code, based on whether it is in a blocking mode or not!

To make things even more beautiful, it encapsulates the network communication in the same way!

I mean it very seriously, when I claim, I would be able to melt down the core of my current C/C++ project to less than a 1/10th of the current, just by rewriting it in Go!!!

So... Why don't I?
There is 2 simple reason:
1) Google discourage live use, just yet...
2) Go doesn't have a MySQL connector

To sum up, why I believe Go will be my preferred language in a year or two:
* Unlike Java, Erlang and the likes, it is a compiled language! (no buggy VMs!!!!)
* It encapsultes most of the absolutely most difficult stuff like threads, inter-process communication and network communication
* It's a new language for new problems, aiming at doing high load, online server stuff the right way, easily!
* It rides on the back of C, removing the needs for a completely rewritten compiler, etc.
* It's kinda C/C++, but less difficult - i.e. it has garbage collection, nicer pointers handling, etc.* Because it's the only compiled language, aiming at the online server market

I just want to make 1 thing extremely clear:
Go is a compiled language!
Like C/C++ you need a compiler on a (one) machine...
When you are done coding, compiling, testing - you just copy the final program to other machines!
You can cross-compile for other systems!

A lot of people, loves the "write once, run everywhere" of Java... I hate it!
It's a big fat lie!
There's like a million Java VMs (Virtual Machines) out there!
If you want to run your code on any machine, you need a VM for that OS and CPU.
When you have installed that - you need to test your code on that machine!!!
"But shouldn't it just run?" - Yeah... but it don't!
There is a different set of capabilities and incapabilities pr. OS+CPU and a different set of bugs.
If you have Java VM xxx.xx.01 on one machine and Java V; xxx.xx.02 on another... You need to test! Trust me!

I'd rather spend some extra time, coding and compiling for a couple of platforms, than testing and getting hit by weird bugs in live systems!

I hope this will entice others to give Go a go!..
I think it will be worth your time, to at least get into what this is all about...

Filed under Go Google Programming
Posted
November 13th, 9:17am 0 comments

Microsoft granted a patent on 30 year old UNIX technology!..?!.. :-?

Microsoft was granted a patent, that basically describes the UNIX program sudo!

It's like patenting the grinding of coffee beans, before pouring hot water over them!

For those who don't know much about UNIX/Linux:

root: UNIX/Linux administrator

sudo: a small program, which purpose is to let "normal" users act as root, by "becomming" root or execute commands as root - of course root must grant you access to use the program!..

So... It seems Microsoft was granted a patent, that describes something that *NIX users has known and used for 29 years!!!!

Am I the only one, who just wants to denounce the authority of the US patents system? I mean... If somebody says to you: "you are violating our patent", I really just want to respond with: "what's that... pad-end? What is it? AND WHAT THE #@!&% ARE YOU ACTUALLY ACCUSING ME OF!?!?! VIOLATING??!?!"... And so on... you get my drift :-P

 

Read more:

Groklaw article

Wikipedia article on sudo

 

Filed under linux microsoft patents unix
Posted