15.11.14

Just thinking: modern GAPI

Disclaimer: this is another useless post without much of technical information.

I wanted to place this text in previous post, but that was hard to do logical division, so I devided into two posts. This post will store my thoughts about graphics APIs and their evolutions/revolutions. In discent times we programmed GAPI ourself, that gave as direct-to-metal access: sounded good, but it was impractical and there were much problem in case of porting to another hardware. So Microsoft made Direct3D for such graphics abstraction, then SGI made IrisGL which was opened later and transformed to OpenGL. Both of them are gold standard for graphics. They do a lot to make it easier to program graphics, and we can see the richest graphics in games today(like CrySis by CryTek or Battlefield by DICE). Anyway abstraction is made for simplification, but graphics made richer and now we have another problem: huge CPU usage in high numbers of draw calls.

That's why there is Mantle by AMD and Metal by Apple(only for iOS for now), also is Direct3D12 on the way. They are made for dropping out this overhead of draw calls by moving more work to programmers. It is really funny to see that the idea of "low-level" Direct3D12 is back to Direct3D1: we are making command buffer, then we are making commands and then driver consumes this buffer. The same we might say about Mantle and Metal - the are too "low-level" with somewhat "same" functionality.

Alex St. John has written a good post with comparison of Direct3D and Metal, and thoughts about OpenGL. Also, I'd like to talk about OpenGL too... It's stuck. Literally. I know Khronos Group tries to push OpenGL to the top of graphics industry, they are making a good job(but OpenGL is still lacking some cool features, like context-independent concurrency), but the main problems are IHVs. Every IHV is making it's own implementation, because OpenGL is only the specification and it doesn't have run-time component like Direct3D has. That's really bad that no one in Khronos who could bang the table and say "that must be as I say!" to stop this noncense of varying OpenGL implementations. I don't want to talk too much about this, because thare a lot of negative posts that already have the same opinion about OpenGL. Anyway, I'll still wait for OpenGL NG, I believe that Khronos Group will make it.

That's enough of "void" talking, let's consider my thoughts about modern graphics API.
First of all, API must provide one view of resource as raw memory. This means that we will work with it like in our programs via malloc/realloc/free, just like in CUDA. Second thought is continuation of the first, we must control allocated chunk of memory by ourselves, making needed memory barriers or doing sequental consistency read/write operations and etc. Third is that API must provide controlling interface of GPU scheduler. That would bring us continous kernel execution like in CUDA, where we can write kernel that has ability to launch other kernel functions. The third means that we will not be CPU-bounded. Ideally it would be the greatest thing to do programming of GPU like CPU. The best reference to this approach is CUDA. Also it's good to see something

That's all what I want to say. This is the last useless post, next time I'll write about our engine.

30.6.14

It was so quiet here...

...because there were not any reasons to write another useless/pointless post. Now I have a lot of information to say here, so let's start.

Engine update:
NanoEngine will be open-sourced forever. Yes, that's it. It will be under BSD license. For now, the engine is in Bitbucket's repository and it's not complete. Foundation is the most stable and complete module of the engine, while others are not, or are in draft quality on my PC. The complete list of modules will be:
    - Foundation: where all low-level(in terms of engine) routines, like compiler definitions, containers, memory allocators, threading primitives, multitasking system, etc.

    - Rendering, Audio, Input: these are foundation-like low-level implementations, which wraps selected APIs in common form. They are just back-ends for high-level implementations.

    - Framework: one of clue elements of the engine, it glues all low-level modules and boots them up for further use with higher level modules. There is also central dispatching engine, which automatically calculates count of jobs to be used with

    - RML: is a centralized resource management layer, is the place, where all resources are processed asynchronously.

    - Graphics: high-level rendering implementation, based on Rendering module with more specific implementations of features needed by application.

    - Sound: high-level audio implementation, based on Audio module, which features event-based audio playing, custom sound effects, which are applied specifically to sounds or places where sounds are playing.

    - Physics: is a wrapper around ODE calls.

    - Game: the highest-level of the engine, which implements game mechanics and scripting basics.

    - Launcher: there is nothing to say much, it's just a launcher which starts up Framework module. All platform-specific actions for window are connected here with other parts of the engine, like Rendering and Input.

Current progress (26.06.2014):
    - Foundation: ~90% - (In repository)
    - Rendering: ~70% (In repository)
    - Audio: ~30% (In repository)
    - Input: ~10% (Not in reposity)
    - Framework: ~20% (In repository)
    - RML: ~50% (Not in repository)
    - Graphics: ~30% (Not in repository)
    - Sound: N/A
    - Physics: ~10% (Not in repository)
    - Game: ~10% (Not in repository)
    - Launcher: ~70% (In repository)


Engine code:
There are some things which are always colliding: code complexity and clarity of code. I am using a lot of templates in the engine. Yes, templates are one of cool features of the language, but it's also vital to use them where they are needed.
Let's consider this example from the code:
/// Intrusive smart pointer declaration
template< typename Type, typename Policy > class IntrusivePtr;
/// Policy declaration
template< typename Type, typename Tag > class SmartPtrPolicy;
/// Policy tags
struct SmartPointerRelaxedTag {};
struct SmartPointerThreadSafeTag {};

/// Use case example
IntrusivePtr< MyClass, SmartPtrPolicy< MyClass, SmartPointerRelaxedTag > > myClassPtr;
As you see, there is big problem with templates(in case of text complexity). But they are used for synchronization specialization, for automatic code path selection in compile-time for threading purposes. One might say, it's better to change this from full type-templated specialization to value-templated, to make something like this:
template< typename Type, Threading::MemoryOrder accessOrder, Threading::MemoryOrder writeOrder > class IntrusivePtr;
// Use case example
IntrusivePtr< MyClass, Threading::MemoryOrderEnum::Acquire, Threading::MemoryOrderEnum::Release > myClassPtr;
As you see, it is as bad as...I don't know...as the worst thing that you may imagine: too many chars to write, too big length of string, so it would be harder to read and understand code because of "extreme navigation". Another pro might be, automatic exclusion of dead code in compile-time, but not every compiler will do it, because you need to specify it yourself. In Visual C++ you can specify /O2, but this would work for release version, and not debug, where you want to explore the whole assembled code without any optimizations. So, that's why I'm using typed-templated version with tag selector: the old good method, which is not abandoned nowadays.

It was something like introduction to engine's specialization. I made such example with templated smart pointer to show that engine will make use of such pointers for automatic garbage collection. There various types of them: IntrusivePtr with intrusive reference counting in class and deletion is also needs to be specified inside class, SharedPtr with self-allocated reference couting. Both of them use SmartPtrPolicy for threading model selection.
As it was mentioned in Engine Update section, the engine aimed at agressive multithreading, so we need containers for inter-thread communications. There are various types of lists, queues and rind buffer implementations, which are Lockfree SPSC(Single-producer/single-consumer), Lockfree MPSC(Multi-producer/single-consumer), Lockfree MPMC(Multi-producer/multi-consumers), and Blocking ones.

That's all for code section now, because I don't want to share more details before uploading new version into Git. Also, if you want to see code, I'll give an access to our git storage, just email me: techzdev@4dvisionzgamesz.com(remove all "z")