Intoduction

Synchronous unsubscription

IObservable

IObserver

IEnumerable

IEnumerator

IObservable

subscribe()

IDisposable

interface IDisposable { void dispose(); } interface IObserver<T> { void onNext(T t); } interface IObservable<T> { IDisposable subscribe(IObserver<T> observer); } IObservable<Integer> source = o -> { for (int i = 0; i < Integer.MAX_VALUE; i++) { o.onNext(i); } return () -> { }; }; IDisposable d = o.subscribe(System.out::println); d.dispose();

IDisposable

range()

range()

IDisposable

range()

Subscriber

Observable<Integer> source = Observable.create(s -> { for (int i = 0; i < Integer.MAX_VALUE && !s.isUnsubscribed(); i++) { o.onNext(i); } }); Subscription d = o.subscribe(new Subscriber<Integer>() { @Override public void onNext(Integer v) { System.out.println(v); unsubscribe(); } // ... });

take(1)

unsubscribe()

isUnsubscribed()

Subscription

subscribe()

Subscriber<Integer> s = new Subscriber<Integer>() { @Override public void onNext(Integer v) { System.out.println(); } // ... } Schedulers.computation().schedule(s::unsubscribe, 1, TimeUnit.SECONDS); source.subscribe(s);

Observable

Observable.subscribe()

Subscription

Subscriber

Resources of the Subscriber

Subscriber

Subscription

Subscriber

SubscriptionList

range()

SubscriptionList

Subscriber

Subscriber

SubscriptionList

SubscriptionList

Subscriber

Subscriber request

Subscriber

request()

Producer

setProducer

Producer

Producer.request

Producer

Subscriber

Producer

Subscriber

Subscriber.request()

Producer.request()

Subscriber

Subscription

request()

Subscriber

AsyncSubscriber

Subscriber

Lift

Observable.lift()

Subscriber

Observable

map()

public final <R> Observable<R> map(Func1<? super T, ? extends R> func) { OperatorMap<T, R> op = new OperatorMap<T, R>(func); return new Observable<R>(new OnSubscribe<R>() { @Override public void call(Subscriber<? super R> child) { Subscriber<? super T> parent = op.call(child); Observable.this.unsafeSubscribe(parent); } }); }

Operator

Observable

OnSubscribe

flatMap

Observable

Observable.range(1, 1_000_000).flatMap(v -> Observable.just(v).observeOn(Schedulers.computation()).map(v -> v * v)) .subscribe(...);

lift

OnSubscribe

Observable

Operator.call

Subscriber

OnSubscribe

Observable

Observable

Operator

lift()

lift()

Create

Observable.create()

Observable

create()

Observable

Observable

OnSubscribe

Observable

create()

Observable

OnSubscribe

create()

Observable

Observable

public final <R> Observable<R> map(Func1<? super T, ? extends R> func) { return new ObservableMap<T, R>(this, func); }

lift()

create()

map()

Observable

create()

Conclusion

RxJava is now out more than 3 years and lived through several significant version changes. In this blog post, I'll point out design and implementation decisions that I personally think wasn't such a good idea.Don't get me wrong, it doesn't mean that RxJava is bad or I knew all along how to do it "properly". It was a learning process for all of us involved, but the question is, can we learn from those mistakes and do it better in the next major version?In the early days, RxJava mirrored the architecture of Rx.NET which consisted of two important interfaces,and, derived through dualizing theand. (This was also true for my own library, Reactive4Java).If we look at, we find themethod that returns an. This returned object allows one to dispose or cancel a running sequence. However, it has a critical problem I demonstrate with a minimalistic reactive program:If we run this code, it starts to print a lot of numbers to the console, despite we called dispose on the returned object by the subscribe method. What's wrong?The problem is that the source observable can only return itsobject only after the for-loop finishes, but then it has nothing to do. The whole setup is synchronous and thus this structure can't be reasonably cancelled.Although Rx is good at async processing, many steps in a typical pipeline is synchronous and is affected by this synchronous cancellation requirement. Since Rx.NET is at least 3 years older than RxJava, how could this shortcoming still be in today's Rx.NET?The example code above is just the well knownoperator and if we run a similar code in C#, we find that it doesn't print or stops printing almost immediately. The secret is that Rx.NET'soperator runs on async scheduler by default, so that for loop runs on a different thread and the operator can immediately return with a meaningful. Therefore, the synchronous cancellation issue is averted, but I wonder, was it a conscious or unconscious decision to sidestep the underlying problem? Who knows.If you look at the source code of Rx.NET's range, you'll find something more complicated. It uses a recursive scheduling technique to deliver each value to the observer. When I measured it, it could only sustain 1M ops/second on the same machine which could do 250M ops/second with RxJava while delivering 1M elements.Now RxJava'snever used a scheduler thus the synchronous cancellation problem was discovered and mitigated by introducing theclass. An instance can be checked if it still wants events or not. The example above can be rewritten so the for loop checks its subscriber and quits accordingly.The Subscriber acts like aand unsubscribes itself after the first item. Thiscall internally sets a volatile boolean flag that is read byand the loop above is stopped. Note, however, that you still can't unsubscribe via thereturned bybecause the lambda with the for loop doesn't exit until its terminal condition is met.It doesn't seem to solve our initial problem that well, does it? Luckily, this new structure has the property that it can be cancelled before or while the loop is running, the latter done from another thread of course:The fact that you can basically inject the cancellation support upfront, before anything is even subscribed to, allows proper propagation of unsubscription even with the most complicated operators.In addition, there is a deeper implication of this new structure. In the new, the lambda doesn't return anything, butdoes still return awhich is practically the samesent in as the parameter. (Technically, it is a bit more involved process; see Jake Wharton's excellent video talk on the subject).The insight: you can't be fully reactive if you return something. Returning something implies synchronous behavior and the method has to provide some result, even though it can't at that moment. This is when one is forced to block or sleep until the real logic can produce the relevant object to be returned. I showed this in my post about the OSGi Asynchronous Event Streams initiative.Theclass offers the ability to associate resources with it in the form ofinstances. When the operator is cancelled (or terminates), these resources are unsubscribed as well.This is quite a convenience for operator developers, however, has its own cost: allocation.Whenever ais instantiated with the default constructor, an innerinstance is also always created, whether or not the Subscriber is likely to hold resources or not. In the previous example,doesn't need resources thus theis never really used.On one hand, there exist many operators that don't manage resources so creating the extra container is wasteful. On the other hand, many operators do use resources and expect this convenience to be present.In addition, you may recall thathas a constructor that takes anotherand gives the option to share the underlying. Certainly, this could help reduce the allocation count, but most operators that use resources themselves can't share the same underlyingas this would allow them to unsubscribe resources downstream (see pitfall #2 ). Thus, the currentstructure is more of a burden, performance wise, than a win for operator writers.You may now think, what's wrong with giving convenience tools to operator writers? I agree that operators implemented outside RxJava should get as much help as reasonable, however, I believe internal operators should take the diligence and have efficient implementations to begin with.I've tried a few times to resolve this problem but given the architecture of 1.x, I have doubts it can be achieved. Fortunately, the Reactive-Streams' architecture and thus RxJava 2.x solves this problem by making the resource management the responsibility of the operators.If you look into howis implemented, you'll see the protected finalmethod. This is a convenience method that makes sure if there is aset via, the request is forwarded to it or accumulated until onearrives. Basically, this is an inlined producer-arbiter One might think the method's implementation gives significant overhead to request management, but JMH benchmarks confirmed they don't really affect the overhead outside a small +/- 3% difference, that may also be due to noise.The real problem with this method is that it has the same name as, making it impossible to implementwhen one extendsat the same time.This has the unfortunate consequence that one usually needs an extraobject along with the mainif the operator does some request-manipulation.This has the consequence of extra allocation during subscription time which affects GC the most with short-lived sequences. The second property is that it increases the call-stack depth and may prevent some JIT optimization.Sinceis also part of the public API, it can't be renamed in 1.x to make room forAgain, the solution will come with 2.x: there, since Reactive-Streamsandare both interfaces, both can appear the same time, plus, a conveniencemethod can be moved into a convenience implementation of(i.e.,) without affecting the operator internals. (This also means it will be discouraged to use conveniences within operators.)Along with backpressure, the methodis considered by many as the best addition to the library. It lets you step into the subscription process and given a Subscriber from downstream, you can return anotherfor upstream that does the business logic for that operator.It became so popular almost all instance operators ofare now using it.Unfortunately, the convenience has a cost: allocation. For most operators, applying that operator to a sequence incurs 3 object allocations. To show this, let's unroll the application of theoperator:We have 1) theinstance, 2) theinstance and 3) theinstance for each application.This may not of concern for direct sequences that use map, but imagine you have this 3 allocation a million times because you happen tosomething whose inners have operators applied to them:Theoperator is practically aninstance that captures the upstreamand callswith the downstream. Clearly, one could just implement operators directly withand have the upstreamas a parameter; the total instance sizes wouldn't change much but both the allocation count and stack depth get reduced.The current lift structure has another adverse effect: it makes operator-fusion difficult to impossible in its current form because 1) it is an anonymous class and one can't discover its upstreamandeasily, and 2) even if made a named class, the two classes are hidden behind indirection and any discovery process now faces more overhead.Luckily, the shortcomings mentioned so far can be remedied without affecting the public API, but requires diligence of writing and reviewing thousands of lines of code changes.Unfortunately, when I implemented RxJava 2.0 developer-preview last September, I did not think of this overhead thus the current 2.x branch still usesextensively.However, there is light at the end of the tunnel: Project Reactor 2.5 doesn't go down on thepath and now has lower overhead than RxJava.Lately, I'm quite outspoken againstand now I think it should be named something more scary so beginners avoid it and look for proper factory methods inthat do backpressure and unsubscription properly. I can see it as a tool for demonstating to one's audience how to enter the reactive world, but I'm convinced it should receive less spotlight in those presentations.Regardless, the problem withis that it encourages creating 2 instances per: 1) theinstance itself and theholding the subscription logic.The approach that one creates aninstance withwas born from the encouragement: "composition over inheritance". From general design perspective, this sounds okay, but one has to note that in Java, composition means object allocation: outer objects and inner objects, and more inner-inner objects.To avoid all these allocations, the solution would be to makenot hold an instance ofby default (but keepas the lambda-factory version) and operators (both source and intermediate) should extenddirectly. All operator methods would still reside inThus, withoutandwould allocate a singleinstance per application.Such change, I believe, wouldn't affect the public API sincemethods are static or final to begin with and operators would be still a subclass of. The change also would help with operator-fusion because each upstream source can now be directly identified and its parameters exposed without indirection.Again, Project Reactor 2.5 is ahead of RxJava and doesn't use themechanics. Its operators are implemented extending a base class,, the way suggested above.Designing and implementing RxJava version was and is a learning process as well with unanticipated effects on complexity and performance.You may think, why the hassle about structures and allocations that clearly work in their current form? Two reasons: the Cloud and Android/IoT. For the cloud, where billions of events happen, any inefficiency or unnecessary overhead is amplified along with the numbers. You may not easily calculate how much does that range-flatmap example above cost you on your laptop, but Cloud suppliers will make you pay for each second, gigabyte and gigahertz of using their service. For Android and IoT, the resource constraints of the devices and the expectancy of more and more features requires one - eventually - to budget memory usage, GC and battery life.