Many programmers prefer typeless, interpreted languages like PHP and Ruby for several reasons. They are more concise and easier to read and write for a novice. They tend to be interpreted languages, not compiled, which are simpler to use and typically offer faster round-trip time between making a change and seeing the result. They support a “google, cut and paste” type workflow more easily which, frankly is how many programmers operate these days.
And yet still strongly typed languages are more wide used, particularly as the complexity of the project and the number of the developers grows. I have discussed this issue with a number of colleagues and wanted to write down my thoughts. It’s important to choose the right language for the right job and today unfortunately, there’s no one size fits all answer so knowing the details may help. My opinions were formed by poking around into the guts of the JVM, Python, PHP, Ruby, and Flash interpreters, and from coding in Java, C, and C++ extensively.
Typeless versus Typed
One reason I believe typed languages are used is the robustness of the code itself. Typeless languages offer a single-point of failure with each code construct. If you misspell a variable name, you do not find out until runtime and only by debugging the problem or through code inspection. With a typed language, each misspelling is caught at compile time because every name must occur in the program at least twice, once for the type definition, once for the usage. This fact alone will often make up for the extra key strokes you need to use in a typed language.
With typed languags, more is known about the system during the code editing process. This makes the tooling opportunities richer and reduces keystrokes which can make it faster to write code in a typed language than an untyped one, even though the typed language is more verbose. For example, handling imports, completion of member or method names. The “find all usages” feature is extremely valuable at tracing code paths and doing refactoring. Typeless languages may offer such features but they are much less specific as they must do only name matching, not type+name matching. The ability to change a field or method name and reliably update all references is a big time saver when modifying a large existing project.
Another reason people prefer typed languages of course is runtime performance. But why exactly do typed languages run so much faster? The biggest reason is that they offer a much faster way to evaluate “a.b” expressions and do method lookups (a.b()) at runtime. With a dynamic language, every single indirection requires a hashtable or binary search which turns into dozens or 100s of instructions. With a typed language, a compiler can frequently generate an “a.b” with just a few instructions using a “load from fixed offset” pattern. That’s why a typeless language will run usually at least 10X slower than a typed language no matter how many engineers Facebook puts on the problem.
Some folks today are trying to infer types in typeless languages to improve runtime performance. In limited cases they could compile typeless code to use fixed offsets. That may well be an area of research which could improve the performance of some typeless code. I suspect though that the code which will speed up will need to be well organized around common types and so written a lot like a typed language.
It also perhaps poorly understood that even typed languages do not always realize the a.b speedup for using fixed offsets. For example, when you use a feature like interfaces in Java, you do end up with some searching to find the right method in the general case. You may not see this all the time because Java employs a trick to cache the offset for the last type seen which sometimes eliminates that search in many cases. I have a project in which changing one interface to an abstract class improved performance by over 50%.
One other poorly understood performance factor in comparing typeless and typed language is when interpreted code calls native code. For example when PHP or Java calls some C function. Native transitions are usually substantially slower than normal method calls because of the extra work they need to do in translating data types, pinning down memory used by the native code, copying memory from an unmanaged to managed environment etc.
Though both typed and typeless languages suffer the same problem, in general typeless languages use more, higher level C libraries. That’s probably because writing them in the language itself is too slow or just the effort involved in writing the code itself is too high given the limited commercial support for typeless languages. With more native transitions, the performance hit for this design increases so just moving more code into the native layer may not make things faster when you need to make lots of native method calls.
Of course more use of native code turns into an advantage when you have a small amount of typeless code which just strings together a few efficient but long running native methods, like copying a file. In these systems the typeless language is almost as fast as C.
In general typeless languages have faster round trip times between changing code and seeing the change. Because they are typeless, when you update a module, you do not have to update the entire application. Changed code constructs can co-exist with unchanged constructs. In a typed language however, you have to update the type in a way that preserves the stricter typing contracts. Since the code itself relies on fixed offsets, when those offsets change, you have to update all of the code atomically which is hard to do and get right. Most typed languages cannot do that seamlessly and worse still, there’s no way to know when it will or won’t work making “Class patching” useful only in special cases where you can isolate all dependencies on that class that is being changed.
Interpreted versus Compiled
To get good performance as a project grows, even interpreted languages these days must cache compiled descriptions of the code. They do however retain the ease of use benefits in most cases because this is all done transparently, by the browser or the runtime engine. When the code changes, these caches are updated automatically. Without such a feature, interpreted languages bog down as code sizes grow. Each time a process restarts, too much code must be interpreted before you can use the system.
Thread Architecture
Java, C, and C++ are all multithreaded using operating system thread scheduling. In general, this means that all code must be “thread aware” though in practice, frameworks try to reduce the likeklihood of thread conflicts. When a framework is well designed, the burden of synchronization is not imposed on application code.
You need a threaded architecture when you need to share a large pool of memory or efficiently perform I/O with a bunch of sockets or files. You can more easily leverage a multi CPU environment with OS threading.
In contrast, even multi-threaded VMs like Python may have a global interpreter lock or will do VM based thread scheduling. Either of these architectures eliminates opportunities to do parallel I/O unless you switch to a multi-process model. For example, PHP will run each HTTP request in a separate process and so achieves some form of parallelism that way. But in doing so, it eliminates the use of shared memory which reduces the efficiency of memory caching. It also means that any data structure used by all HTTP requests must be replicated across all PHP processes further increasing both computation and RAM usage.
So for PHP, you’ll need even more memory and more CPU to populate that memory. You do still benefit from OS level file caching of course.
What about the Future?
I tried to be neutral in my analysis but you can probably tell from the above that I like the benefits of typed languages. When you consider long term costs, and include modifications, enhancements, transfer of code between developers, runtime efficiency for either large scale or mobile deployments, strongly typed languages win out.
I agree however with Ruby and PHP developers that we are not there yet when any strongly typed language today will beat out PHP and Ruby for any given project. As long as the code is easier to read and edit for most people, the typed language advantages may easily be outweighed by availability of people, cost, and the poor workflows that exist between complex typed languages like Java, C, C++ for designers, analysts, and admins.
To bridge the gap, we need a strongly typed language which has:
- simplified tools – the Java IDE is too complex for entry level programmers and others who work with PHP and Ruby code today
- syntax improvements to eliminate imports, use inferred typing, and in general simplify the syntax will bring typed languages much closer to untyped languages in readability/brevity.
- mixed interpreted/compiled modes and a way to migrate code between them as it solidifies
- updating of types for the common cases for immediate code updates. When that’s not possible the ability to know as soon as the code is changed that a restart is required.
- built in compilation, dependency management for automated builds, updates, deployments. Maven, ant, and IDE configuration are too complex today.
What do you think? Did I miss any important issues that affect your choice of a language? Let me know in the comments!
I’ve been using Scala lately and it’s a good solid advance over Java with the advantage that it sits on top of the JVM and lets you use Java libraries and idioms without forcing you into functional programming style, although it does encourage it.
Scala does have a number of syntax improvements and stylistic florishes. I particularly like the way that case classes interact with pattern matching and the implicit handling that lets you pass around a “context” to every method in a class or convert parameters from UUID to String and vice versa.
Play 2.0 is the web framework that Scala will be using in the future. It inherits a number of stylistic features from Rails, is written to be simple to use, and addresses updating of types, runtime compiles and dependency management through a tight integration with sbt (a scala version of Ant that uses Ivy underneath the hood.) It’s still in beta, but it’s possible to automatically create Eclipse or IDEA projects from a Play project using gen-idea or eclipsify.
The big problem that Scala has is that it’s not a simple language… or rather, it has a number of concepts that work well individually, but allow for very terse and packed code when used all at once. In addition, Scala has also historically suffered from a userbase familiar with Haskell and intent on leveraging Scala to be more like Haskell by adding Typeclasses, Lenses, Monads and the like — concepts that most developers are unfamiliar with and may be overkill for any particular problem they try to address.
The creator of Scala has recently formed a company, Typesafe, which is explicitly working on the Scala IDE to make it do more of the heavy lifting, including Akka (a concurrency library built on Actors), and Play as the “Typesafe Stack.” It’s worth checking out.
Thanks for the great comment Will. I have seen the play video for Java. It is very cool for building stateless web apps. If the app is stateless, the “code refresh” problem becomes a lot easier. Do you know how it works if you need stateful features in your app or use stateful libraries?
I’m not a big fan of tools like play or roo which do a lot of “source generation” behind the scenes though I guess it’s a necessary reality for the current state of Java frameworks.
If you have stateful features in your app, you typically rely on some sort of state caching — people started off with memcached, but I think NoSQL systems like Redis, Cassandra and MongoDB are more in use these days — they’re fast enough to provide run-time state to systems that expect to see a session.
If you’re using a stateful library that expects to have access to java.servlet.HttpSession (or even HttpServletRequest) then you have a problem in Play, because they’re just flat out not available. I had to fake one for Shiro: https://github.com/wsargent/play-shiro
More than that, many of the older Java libraries expect there to be a “session” of some kind available to the app — Hibernate uses one, Shiro uses one, and under the newer models with async callback you may not be guaranteed to end up in the same thread, or even the same VM. As in most architectures, the implicit assumptions are worse to deal with than the explicit compile time issues.
Regarding the source generation… yeah, compiled code is still faster than interpreted. Play 1.2.4 has templates based on Groovy — compare to Play Scala 0.9.1: http://www.jtict.com/blog/rails-wicket-grails-play-lift-jsp/
Great post Jeff. I still am attracted to how quickly I can get a website up and running on RoR versus using Java. But I do agree that as the project gets bigger the benefits of RoR decrease quickly.
I think one other issue for the small developer is hosting. Though Java hosting is becoming more available the ease and number of choices for hosting typeless apps is far larger.
Any thoughts on Node.js?
Also what do you think of Go, http://golang.org/? Sounds like the designers were thinking about the same things as you did when they created the language. Google is supporting it in its app engine.
On the hosting question, I guess with VPS prices being so cheap these days, particularly if you can run with a low-memory footprint that hosting is not a big deal. As one data point, I have two Java servers running on a 256MB VPS for $15/month. I recently setup a VPS for a client and needed 512MB for Apache/PHP at $30/month and already had to reconfigure because we ran out of memory when a bit of load came in (apache was spawning too many processes I think).
I haven’t coded with Node.js but haven’t seen a technology achieve such rapid adoption in a long time. Javascript as a language I’d say suffers both the brittleness of typeless languages without any advantages in productivity like you get with PHP or Ruby. I played around with GWT but feel like it may be to heavyweight a solution… compile times are forever and debugging is challenging. I’d rather see a simpler, faster, more direct translation of Java to Javascript and where it wraps Node.js instead of implementing a new toolkit.
Go seems like a solid candidate to replace C for systems programming but who would want to replace C? I love C!
I don’t trust Node.js — it seems to me as if the rapid adoption is directly from JS engineers who see a chance to break out of client side apps into server side code. For me, the time to start using a new framework is when people start complaining about it and proposing workarounds.
Honestly, I don’t think anything will replace Javascript on the client side; it’s “good enough” for what it does and it is possible to use Coffeescript (or Uberscript if you really want types) to abstract the worst parts of Javascript into something usable.
Jeff, good points about VPS. Which Java servers do you prefer using?
Jeff and Will, thanks for the feedback of Node.js. I found Node. js interesting because of how fast it ran. Seemed like a good solution for a server for mobile apps. But your feedback is deterring me. 🙂
Oops… I’m embarrassed. I thought Node.js was just JQuery, not server side Javascript. JQuery is what I was talking about 🙂 Node.js seems like a bad idea though I have not looked at it at all.
For simple things I use Jetty. I like Resin for performance/scalability/support.
I have heard Coffeescript is hard to debug and find the code hard to read. I think DART maybe a better alternative down the road.
Heroku prefers Jetty too.
Great discussion Jeff! 🙂
…Jeff, every language you called “typeless” is, in fact, dynamic. There /are/ typeless languages, mostly dialects of assembly. See at least http://en.wikipedia.org/wiki/Dynamic_typing#Dynamic_typing for a brief refresher, and ideally the rest of the article as well.
Thanks for that link. I’ll update this article at some point to be consistent with that terminology. I don’t think it changes any conclusions. The value of a type system that permits an invalid “a.b” reference without an error even at runtime leads me to the conclusion that you cannot rely on that type system. It’s like mt climbing without a rope. OK for short distances but a potentially fatal choice for a large system you hope to be maintained by a large team over a long time.
Possibly relevant: http://existentialtype.wordpress.com/2011/03/19/dynamic-languages-are-static-languages/
Wrote a blog post about Scala, going into more detail: http://tersesystems.com/2012/12/16/problems-scala-fixes
Hey everyone, guess what? I was wrong! (mostly).
Type theorists use “untyped” (like jeff) where most (like I) use “dynamic”, because dynamic typing doesn’t fit their definition of typing. (see: stackoverflow.com/questions/9154388/does-untyped-also-mean-dynamically-typed-in-the-academic-cs-world ) You (jv) could add a note to that effect, or not.
@Will: (haha) re: the first link…I really respect this guy’s knowledge of type theory, and he’s probably more right than I realize, but nonetheless I have a few objections:
He claims dynamic languages are constrained, even straitjacketed by having only one type. This is backwards: types /are/ constraints. Sometimes they’re liberating constraints, sometimes…not.
Example: I prefer C (and family) to Pascal (and family), despite the former’s bizarre operator precedence and tendency to surprise, because C manages to work more often than Pascal without the whips and chains.
Now, I /think/ Robert would claim that both families have primitive type systems and so land in an uncanny valley between “unityped” (dynamic) and the One True Type System, and I prefer the more permissive of the two only because I don’t know the glory of the OTTS.
Here’s the catch: the OTTS either hasn’t been discovered, or hasn’t yet attracted the mindshare/dev time a language needs to get most things right. If you like your languages with documentation, tutorials, optimization, libraries, or more than 12 other developers, your choice is between explicit dynamic typing, bondage-and-discipline static typing, or technically static but superweak to compensate.
I acknowledge that we may eventually arrive at the Promised Land of OTTS, where type systems are sufficiently expressive and powerful to be called “dynamic”. But when I say “we”, I mean programmers in general, not you or I. I’m thinking grandchildren here.
re: Scala: Ooh! Lots of interesting stuff there. I’ll give it a whirl, despite the JVM dependance.
I think the big difference between practical languages (Java, C, C++) and theoretical languages is how they handle errors. The practical languages either DON’T (it’s your problem) or throw exceptions (it’s someone else’s problem). The theoretical languages return a failure case or punt and call unsafeIO. It’s frustrating, because error handling is never the focus of a language, but it’s always a presence in the background.
Anyways. More fun with type systems: http://blog.vasilrem.com/tic-tac-toe-api-with-phantom-types
I think C++11 addresses most of what you ask for at the end of your article. The changes to auto make a huge difference in the read/write-ability of code. Being able to just type for (auto i = foo.begin()…) is a joy.
As far as compilation systems, partial build etc. C++ has always championed this. Header files and implementation files, that’s all you need. Change an implementation file and that’s all that recompiles. The rest of your code base is oblivious.
If building a C++ project after making modest changes takes a long time, there are serious issues with the fundamental design of that project.
Thanks, I’ll take a look at C++11. Is there any way to run applications in a memory-safe way? I feel like exposing C and C++ code directly to the internet lowers your system security in such a fundamental way that I would not build websites on it. When I switched to Java from C/C++ I noticed a 2X speedup in my coding efficiency due to not having to free memory. I lament having to architect systems around a relatively small heap size (2-3G) but feel like memory safety, lack of system crashes, reduced memory leaks, is worth that tradeoff.
In terms of round-trip times, it’s not just the recompilation time but also the application restart that can be a factor. Starting servers, loading configuration, etc.
Obviously C++ is still the right language for so many problems. Glad to hear they are making it better.