Making Parallel Sort Comparator Thread Safe

March 10, 2015, 9:23 am

Latest and popular articles on Intel Technologies

≫ Next: TBB Warning: Leaked 1 observer_proxy objects.

≪ Previous: Determination of the number of words in the file.

Hello,

I'm using parallel_sort and regretfully my comparator needs to use a object that is not thread safe. I can make copies of this object for each thread and then the comparator would just need to use the right copy for the thread it is on, kind of like a thread_local but not global.

I'm looking for some tips on how to achieve this with TBB. I know task_scheduler_observer and enumerable_thread_specific are both options for implementing thread_locals but I'm not sure which is better in my case.

Thanks in advance!

↧

TBB Warning: Leaked 1 observer_proxy objects.

March 12, 2015, 12:08 am

Latest and popular articles on Intel Technologies

≫ Next: What's New? Intel® Threading Building Blocks 4.3 Update 4

≪ Previous: Making Parallel Sort Comparator Thread Safe

Hi,

Recently I added my custom task scheduler observer. I got the following warning when debugging:

TBB Warning: Leaked 1 observer_proxy objects.

Does anyone know how to get rid of this warning?

My custom observer is pretty simple:

class PipelineObserver : public tbb::task_scheduler_observer
{
ADrawContext& mDC;
void operator=(const PipelineObserver&);
PipelineObserver();
public:
PipelineObserver(ADrawContext& dc)
: mDC(dc)
{
observe(true);
}

~PipelineObserver()
{
observe(false);
}

virtual void on_scheduler_entry(bool /* is_worker */ )
{
if(mDC.HasVirtualDevice())
mDC.VirtualDevice().AcquireLocalOGLContext();
}
};

I used the observer as below:

{
// Pipeline scheduler observer. It observes the worker threads entering the
// task arena.
PipelineObserver o(dc);

// use TBB to execute a parallel while
tbb::parallel_while<ApplyIterator> w;
ApplyIterator body(*this, iteratorStream, eyePosition, maxCell, &w);
w.run(iteratorStream, body);
}

Thanks!

↧

What's New? Intel® Threading Building Blocks 4.3 Update 4

March 12, 2015, 5:57 am

Latest and popular articles on Intel Technologies

≫ Next: can concurrent_bounded_queue deadlock if producer and consumer have different thread priorities?

≪ Previous: TBB Warning: Leaked 1 observer_proxy objects.

Changes (w.r.t. Intel TBB 4.3 Update 3):
- Added a C++11 variadic constructor for enumerable_thread_specific.
The arguments from this constructor are used to construct
thread-local values.
- Improved exception safety for enumerable_thread_specific.
- Added documentation for tbb::flow::tagged_msg class and
tbb::flow::output_port function.
- Fixed build errors for systems that do not support dynamic linking.
- C++11 move aware insert and emplace methods have been added to
concurrent unordered containers.

Preview Features:
- Interface-breaking change: typedefs changed for node predecessor and
successor lists, affecting copy_predecessors and copy_successors
methods.
- Added template class composite_node to the flow graph API. It packages
a subgraph to represent it as a first-class flow graph node.
- make_edge and remove_edge now accept multiport nodes as arguments,
automatically using the node port with index 0 for an edge.

Open-source contributions integrated:
- Draft code for enumerable_thread_specific constructor with multiple
arguments (see above) by Adrien Guinet.
- Fix for GCC invocation on IBM* Blue Gene*
by Jeff Hammond and Raf Schietekat.
- Extended testing with smart pointers for Clang & libc++
by Raf Schietekat.

Developers

C/C++

Intel® Threading Building Blocks

URL

Theme Zone:

IDZone

↧

can concurrent_bounded_queue deadlock if producer and consumer have different thread priorities?

March 12, 2015, 9:07 am

Latest and popular articles on Intel Technologies

≫ Next: getting errors while using tbb library

≪ Previous: What's New? Intel® Threading Building Blocks 4.3 Update 4

I am debugging an issue in my code where I have a concurrent_bounded_queue two producers threads and one consumer thread. The consumer has default thread priority and the producers have low thread priority.

Sometimes I get a deadlock with these stack traces:

Thread 66 (Thread 0x7ffedffff700 (LWP 9126)):

#0 0x00007ffff2025737 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff7981f4d in pause (this=<synthetic pointer>) at src/tbb/tbb/include/tbb/tbb_machine.h:333
#2 __TBB_LockByte (flag=@0x7ffff1b37b08: 0 '\000') at src/tbb/tbb/include/tbb/tbb_machine.h:846
#3 scoped_lock (m=..., this=<synthetic pointer>) at src/tbb/tbb/include/tbb/internal/../spin_mutex.h:93
#4 tbb::internal::concurrent_monitor::notify_relaxed<tbb::internal::predicate_leq> (this=this@entry=0x7ffff1b37b08, predicate=...) at src/tbb/tbb/src/tbb/concurrent_monitor.h:223
#5 0x00007ffff798177f in notify<tbb::internal::predicate_leq> (predicate=..., this=0x7ffff1b37b08) at src/tbb/tbb/src/tbb/concurrent_monitor.h:181
#6 tbb::internal::concurrent_queue_base_v3::internal_push (this=this@entry=0x145d368, src=src@entry=0x7ffedfffe530) at src/tbb/tbb/src/tbb/concurrent_queue.cpp:403
#7 0x00000000004a0840 in push (source=..., this=0x145d368) at ../src/tbb/tbb/include/tbb/concurrent_queue.h:263

Thread 64 (Thread 0x7ffee57fa700 (LWP 9124)):
#0 0x00007ffff2025737 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff7981f4d in pause (this=<synthetic pointer>) at src/tbb/tbb/include/tbb/tbb_machine.h:333
#2 __TBB_LockByte (flag=@0x7ffff1b37b08: 0 '\000') at src/tbb/tbb/include/tbb/tbb_machine.h:846
#3 scoped_lock (m=..., this=<synthetic pointer>) at src/tbb/tbb/include/tbb/internal/../spin_mutex.h:93
#4 tbb::internal::concurrent_monitor::notify_relaxed<tbb::internal::predicate_leq> (this=this@entry=0x7ffff1b37b08, predicate=...) at src/tbb/tbb/src/tbb/concurrent_monitor.h:223
#5 0x00007ffff798177f in notify<tbb::internal::predicate_leq> (predicate=..., this=0x7ffff1b37b08) at src/tbb/tbb/src/tbb/concurrent_monitor.h:181
#6 tbb::internal::concurrent_queue_base_v3::internal_push (this=this@entry=0x145d368, src=src@entry=0x7ffee57f9530) at src/tbb/tbb/src/tbb/concurrent_queue.cpp:403
#7 0x00000000004a0840 in push (source=..., this=0x145d368) at ../src/tbb/tbb/include/tbb/concurrent_queue.h:263

Consumer thread:

Thread 33 (Thread 0x7fff7ffff700 (LWP 9092)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007ffff7981a3b in futex_wait (comparand=2, futex=0x7fff7fffe960) at src/tbb/tbb/include/tbb/machine/linux_common.h:67
#2 P (this=0x7fff7fffe960) at src/tbb/tbb/src/tbb/semaphore.h:212
#3 commit_wait (thr=..., this=<optimized out>) at src/tbb/tbb/src/tbb/concurrent_monitor.h:152
#4 commit_wait (thr=..., this=<optimized out>) at src/tbb/tbb/src/tbb/concurrent_queue.cpp:406
#5 tbb::internal::concurrent_queue_base_v3::internal_pop (this=this@entry=0x145d368, dst=dst@entry=0x7fff7fffea20) at src/tbb/tbb/src/tbb/concurrent_queue.cpp:426
#6 0x00000000004a274b in pop (destination=..., this=<optimized out>) at ../src/tbb/tbb/include/tbb/concurrent_queue.h:269

Is this a known bug? Or is the problem likely to be in my code?

I am using TBB 4.1 (6100) on a Linux 64-bit machine.

↧

getting errors while using tbb library

March 16, 2015, 10:00 am

Latest and popular articles on Intel Technologies

≫ Next: spawn with enqueue

≪ Previous: can concurrent_bounded_queue deadlock if producer and consumer have different thread priorities?

hello ,
I have built a library(say MYLIB) in which i have used tbb to paralleize. This MYLIB i am trying to use for one application i am building and when i try to build the application getting following erros :

/usr/include/tbb/internal/_concurrent_unordered_impl.h:146:85: error: expected a type, got ‘397’
/usr/include/tbb/internal/_concurrent_unordered_impl.h:146:85: error: template argument 2 is invalid
/usr/include/tbb/internal/_concurrent_unordered_impl.h:146:90: error: ‘bool tbb::interface5::internal::operator!=(const int&, const int&)’ must have an argument of class or enumerated type
friend bool operator!=( const solist_iterator<M,T>& i, const solist_iterator<M,U>& j );

can anyone please help , i am new to tbb .

↧

spawn with enqueue

March 17, 2015, 9:42 am

Latest and popular articles on Intel Technologies

≫ Next: Using Flow Graph for a node editor

≪ Previous: getting errors while using tbb library

Hi,

I want to exclude main thread. I have a longrunning work that should run independent of main thread. Also, the longrunning work has to work by splitting it multiple tasks. I have tried to create a root task and enqueue it. Afterwards, the root task spawned the child tasks.

For example,

longtask *root_task = new(task::allocate_root())longtask();

task::enqueue(*root_task);

class longtask

{

task* execute()

{

set_ref_count(m_task_count + 1);

       for(int i = 0; i < m_task_count; ++i)
       {
           spawn(*new(allocate_child())task_func(m_func, m_data, i, m_task_count));
       }

wait_for_all();

}

Why doesn't it work? It works when I create explicitly worker thread(The worker thread spawns the childs and calls wait_for_all, there is no enqueue). So Main thread or another explicitly created worker thread can spawn childs only?

↧

Using Flow Graph for a node editor

March 17, 2015, 7:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel Software Tools Technical Webinar Series

≪ Previous: spawn with enqueue

Hi,

I've been developing a node based editor and figured the TBB Flow Graph would be a perfect fit for my application. However, I can't seem to figure out how to setup the graph. The problem is as follows (I'm also attaching an image to illustrate this):

I have a bunch of nodes, each of which have a varying (not known at compile time) number of input- and output sockets. The nodes that do not have any incoming connections (node 0 and 3 in the attached image) are considered to be root nodes and should be evaluated first. These nodes have edges to the "source_node" node which does N iterations (a run-time value that determines how many times the graph should be evaluated).

The sockets are simply value containers that can either hold an integer, a float, or any other type that is defined by my application (boost::variant)

Node 2 has an additional gray socket, this is simply to indicate that there could be an additional input there, this is not known at compile time.

Now the question is, how do I setup my graph so that each node has access to the values of each of the output ports to which it is connected? So, for example, node 2 needs 3 input values to be able to evaluate successfully. In this case, it needs to grab the values from both Node 0 and Node 3.

I've tried searching for examples, but all that I was able to find were examples dealing with a fixed number of inputs/outputs (e.g. a multifunction_node using a tuple)

Any help would be greatly appreciated!

Thanks,
Martijn

Attachment	Size
Download nodes.gif	8.96 KB

↧

Intel Software Tools Technical Webinar Series

March 19, 2015, 1:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Dynamic allocator replacement on OS X* with Intel® TBB

≪ Previous: Using Flow Graph for a node editor

These free technical webinars cover tips and techniques that will help sharpen your development skills to create faster, more reliable applications. Technical experts will cover topics ranging from vectorization, code migration, code optimization, using advanced threading techniques (e.g., OpenMP 4.0, Intel® Cilk™ Plus, Intel® TBB), and error checking. Bring programming questions to the live session for our technical experts to answer. A replay of each webinar will be available shortly after the live session so you can share with those unable to attend the live session.

Times indicated are Pacific time. PST: Standard (UTC/GMT -8 hours), PDT: Daylight Savings (UTC/GMT -7 hours)

Archived Webinars

Upcoming Webinars

Webinar Details	Description
New Vectorization Features of the Intel Compiler Apr 7 9:00 A.M. Pacific Presenter: Martyn Corden Register	The vectorization features of the Intel compiler continue to get more powerful with each succeeding version. In this webinar, we will look beyond the vectorization of simple loops over intrinsic data types, to examples involving STL (Standard Template Library) vectors; indirect addressing (gathers and scatters); multi-dimensional arrays, including data alignment; and explicit outer loop vectorization, using the SIMD feature of OpenMP 4.0. Code samples will include C, C++ and Fortran.
Vectorize or Die – unlock performance secrets with data driven software design Apr 14 9:00 A.M. Pacific Presenter: Kevin O'leary Register	The free ride of faster performance with increased clock speeds is long gone. Software must be both threaded and vectorized to fully utilize today’s and tomorrow’s hardware. But modernization is not without cost. Not all threading or vectorization designs are worthwhile. How do you choose which designs to implement without disrupting ongoing development? Learn how data driven threading and vectorization design can yield long term performance growth with less risk and more impact.
Parallel programming models - tips and tricks Apr 21 9:00 A.M. Pacific Presenter: James Tullos Register	As computing advances, parallel architectures are becoming more common. In order to take advantage of parallel systems, software must adapt and use more parallelism. In this webinar, I will discuss various parallel programming models for shared memory and distributed memory parallelism, and give advice for how to utilize each of these models. I will also discuss how Intel® Advisor XE, Intel® VTune™ Amplifier XE, and Intel® Trace Analyzer and Collector can assist with adding parallelism to your programs or to improve your current parallelism.
Fast, light weight, scalable MPI performance analysis May 5 9:00 A.M. Pacific Presenter: Gergana Slavova Register	Developers of modern HPC applications face a challenge when scaling out their hybrid (MPI/OpenMP) applications. Cluster sizes continue to grow, the amount of analysis data collected can easily become overwhelming when going from 10s to 1000s of ranks and it’s tough to identify which are the key metrics to track. There is a need for a visual tool that aggregates the performance data in a simple and intuitive way, provides advice on next optimizations steps, and hones in on performance issues. In this webinar, we’ll discuss a brand new tool that helps quickly gather and analyze statistics up to 100,000 ranks. We’ll give examples of the type of information provided by the MPI Performance Snapshot including memory and counter usage, MPI and OpenMP imbalance analysis, and total communication vs. computation time. We’ll feature screenshots of the tool running in real-time and showcase some of its runtime and filtering capabilities.
Respect programming models – manage Intel Xeon Phi’s in your Clusters for enhanced user experience May 12 9:00 A.M. Pacific Presenter: Michael Hebenstreit Register	HPC cluster programming model number 1 has been MPI for the past 10 or more years. The Advent of coprocessors and accelerators forced many users to rethink their strategies and re-structure the code, even though a clever setup of Xeon Phi system allows to use them without being forced to do so. This webinar will present a number of techniques to help the system administrator with his task. Overview of Xeon Phi programming techniques Basic requirements to get Xeon Phi running How to mount cluster file systems on the Xeon Phi User integration Use of startup scripts and their role in batch scheduling Live demo
3 Tuning Secrets for better OpenMP performance using Intel® VTune Amplifier XE May 19 9:00 A.M. Pacific Presenter: Sumedh Naik Register	Parallelism delivers the capability High Performance Computing (HPC) requires. The parallelism runs across several layers: super scalar, vector instructions, threading and distributed memory with message passing. OpenMP* is a commonly used threading abstraction, especially in HPC. Many HPC applications are moving to a hybrid shared memory/distributed programming model where both OpenMP* and MPI* are used. This webinar focuses on the OpenMP parallel model, and particularly on profiling the performance of OpenMP-based applications. Intel supplies a powerful performance profiling tool, Intel® VTune™ Amplifier XE, that is quite handy for finding performance bottlenecks in OpenMP codes. In this webinar, we will go through the steps necessary to profile OpenMP applications, and will describe how you can quickly identify performance issues with task granularity, workload imbalance and synchronization using Intel® VTune™ Amplifier XE.
Vectorizing Fortran using OpenMP 4.x - filling the SIMD lanes May 26 9:00 A.M. Pacific Presenter: Ron Green Register	The Intel® Fortran Composer XE adopted the OpenMP* 4.x Standard’s new SIMD clause and feature in 2014. The OpenMP SIMD directive is a portable and easy-to use feature, particularly for those already familiar with OpenMP. The SIMD feature allows Fortran programmer to more directly control vectorization and thereby extract maximum performance from modern Intel® Architecture Processors. Some existing knowledge of vectorization, memory alignment, and OpenMP is helpful but not necessary. Ronald W. Green from the Intel Fortran Support team at Intel will lead this discussion of Fortran OpenMP SIMD directives and use, vectorization, and optimizations that will get your development efforts powered.

Intel® Threading Building Blocks

Intel® Cluster Studio XE

Intel® Fortran Studio XE

Intel® Parallel Studio XE

Intel® Advisor XE

Intel® VTune™ Amplifier XE

Intel® Inspector XE

Intel® Many Integrated Core Architecture

URL

Learning Lab

Theme Zone:

IDZone

↧

Dynamic allocator replacement on OS X* with Intel® TBB

March 23, 2015, 2:20 am

Latest and popular articles on Intel Technologies

≫ Next: parallel task with several parallel stages

≪ Previous: Intel Software Tools Technical Webinar Series

The Intel® Threading Building Blocks (Intel® TBB) library provides an alternative way to dynamically allocate memory - Intel TBB scalable allocator (tbbmalloc). Its purpose is to provide better performance and scalability for memory allocation/deallocation operations in multithreaded applications, compared to the default allocator.

There are two general ways to employ Intel TBB scalable allocator in your application:

Explicitly specifying TBB scalable allocator in source code, either by using memory allocation routines (like “scalable_malloc”) or by specifying Intel TBB scalable allocator for containers:

#include <tbb/scalable_allocator.h>
std::vector<int, tbb::scalable_allocator <int> >;

Automatic replacing of all calls to standard functions for dynamic memory allocation (such as malloc) with the Intel TBB scalable equivalents. This option was introduced in Intel TBB 4.3

One way to do the automatic replacement is to link the main executable file with the Intel TBB malloc proxy library:

clang++ foo.o bar.o -ltbbmalloc_proxy -o a.out

Another way does not even require re-building, so you can provide a new memory allocator to the same binary. This is done by loading the malloc proxy library at an application start time using the DYLD_INSERT_LIBRARIES environment variable:

DYLD_INSERT_LIBRARIES=libtbbmalloc_proxy.dylib

In OS X, simple loading libraries with DYLD_INSERT_LIBRARIESrequires using flat namespaces in order to access the shared library symbols. If an application was built with two-level namespaces, this will not work, and forcing usage of flat namespaces may lead to a crash.

Intel TBB overcomes this problem in a smart way. When libtbbmalloc_proxy library is loaded into the process, its static constructor is called and registers a “malloc zone” for TBB memory allocation routines. This allows redirecting memory allocation routine calls from a standard C++ library into TBB scalable allocator routines. So the application doesn’t need to use TBB malloc library symbols, it continues to call standard “libc” routines, thus there are no problems with name spaces. Also, OS X “malloc zones” mechanism allows applications to have several memory allocators (e.g. used by different libraries) and manage memory correctly. It guarantees that Intel TBB will use the same allocator for allocations and deallocations. It is a safeguard against crashes due to calling a deallocation routine for a memory object allocated from another allocator.

Additional links:

Intel TBB: Memory Allocation
Intel TBB documentation: dynamic memory interface replacement on OS X
Intel TBB documentation: Memory Allocation reference

memory allocator

Developers

C/C++

Intel® Threading Building Blocks

Parallel Computing

Threading

URL

Theme Zone:

IDZone

↧

parallel task with several parallel stages

March 25, 2015, 3:52 am

Latest and popular articles on Intel Technologies

≫ Next: [aarch64][patch] Enable GCC builtin intrinsics (__sync_*) on Clang

≪ Previous: Dynamic allocator replacement on OS X* with Intel® TBB

I have an algorithm that can be parallelised using the divide-and-conquer approach, but requires more than one parallel stage per task. In other wordsk, a given task can be split into several consecutive stages which cannot be done in parallel, but each stage can potentially be split into several parallel tasks. I have implemented this as sketced below.

struct staged_task
  : tbb::task
{
  const int num_stages=2;              ///< total number of parallel stages
  int stage=0;                         ///< stage counter
  staged_task(task_data const&);       ///< constructor
  bool little_work();                  ///< is this task best done serially?
  void serial_execution();             ///< do this task serially
  /// add another child to list of children
  void add_child(int&child_count, tbb::task_list&children, task_data const&data)
  {
    ++child_count;
    children.push_back(*new(allocate_child()) staged_task(data));
  }
  /// create child tasks by calling add_child()
  void make_children(int&child_count, tbb::task_list&children);
  /// execute
  tbb::task*execute() final
  {
    tbb::task*child=nullptr;
    // 1     serial execution
    if(0==stage && little_work())
      serial_execution();
    // 2     next non-empty parallel stage
    else if(stage<num_stages) {
      // 2.1 allocate children, skipping empty stages
      int child_count=0;
      tbb::task_list children;
      for(; stage<num_stages && child_count==0; ++stage)
	make_children(child_count,children);
      // 2.2 prepare for continuation and spawn children
      if(child_count) {
	recycle_as_continuation();
	set_ref_count(child_count);
	child = children.pop_front();
	spawn(children);
      }
    }
    return child;
  }
};

The real code is more complex with different types of staged tasks, but this is the essence. The code appears to work fine, but recently it ran into a problem when the number of active threads dropped to 2 (as judged by command top) and wallclock time was very long (much longer than 2 threads should have needed).

I was wondering whether my code above could be at fault. And/or otherwise how I can debug this issue.

↧

[aarch64][patch] Enable GCC builtin intrinsics (__sync_*) on Clang

March 27, 2015, 5:20 am

Latest and popular articles on Intel Technologies

≫ Next: How to download file attachment in private email?

≪ Previous: parallel task with several parallel stages

Hi,

Clang supports GCC builtin intrinsics (__sync_*) used for atomics. I tested master branch and Clang 3.5 in Fedora 21 on x86_64 and aarch64 machines. include/clang/Basic/Builtins.def from Clang contained every intrinsic used in TBB. I compiled TBB (current version, tbb43_20150316oss) on x86_64 with Clang with -DTBB_USE_GCC_BUILTINS=1 and it compiled. I seems Clang isn't fully maintained and some tests fail even on x86_64. Similar tests failed on aarch64 (plus one additional).

TBB headers are parsed by C++ interpreter (Cling) and that's there it failed for us on aarch64. Considering that Clang supports needed bits, I would suggest to enable use on GCC builtins in Clang too. Patch below. I enabled it for 3.5, because that's the oldest version I tried.

BTW, I was not able to login to TBB contribution page: The website encountered an unexpected error.

diff --git a/include/tbb/tbb_config.h b/include/tbb/tbb_config.h
index c65976c..567c36a 100644
--- a/include/tbb/tbb_config.h
+++ b/include/tbb/tbb_config.h
@@ -261,7 +261,7 @@
 /* Actually ICC supports gcc __sync_* intrinsics starting 11.1,
  * but 64 bit support for 32 bit target comes in later ones*/
 /* TODO: change the version back to 4.1.2 once macro __TBB_WORD_SIZE become optional */
-#if __TBB_GCC_VERSION >= 40306 || __INTEL_COMPILER >= 1200
+#if __TBB_GCC_VERSION >= 40306 || __INTEL_COMPILER >= 1200 || __TBB_CLANG_VERSION >= 30500
     /** built-in atomics available in GCC since 4.1.2 **/
     #define __TBB_GCC_BUILTIN_ATOMICS_PRESENT 1
 #endif
@@ -542,7 +542,7 @@
     #define __TBB_SSE_STACK_ALIGNMENT_BROKEN 0
 #endif

-#if __GNUC__==4 && __GNUC_MINOR__==3 && __GNUC_PATCHLEVEL__==0
+#if !defined(__clang__) && (__GNUC__==4 && __GNUC_MINOR__==3 && __GNUC_PATCHLEVEL__==0)
     /* GCC of this version may rashly ignore control dependencies */
     #define __TBB_GCC_OPTIMIZER_ORDERING_BROKEN 1
 #endif

↧

How to download file attachment in private email?

March 28, 2015, 5:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Multiple FlowGraph instances

≪ Previous: [aarch64][patch] Enable GCC builtin intrinsics (__sync_*) on Clang

Sorry off-topic but I could not find any other forum to post this.

I got file attachments (at least, the sender told me that) from an Intel support, but I do not see any download button or link in the private message thread. Anybody know how to download these files?

Thanks!

↧

Multiple FlowGraph instances

March 30, 2015, 6:48 am

Latest and popular articles on Intel Technologies

≫ Next: custom_scheduler calling pure virtual function task::execute()

≪ Previous: How to download file attachment in private email?

Hi all,

Is it possible to run concurrent, multiple, FlowGraph instances? If so, what would be the best way to execute them? My immediate thought is to spawn 2 threads, then run each flowgraph in a separate thread, checking for termination, etc.

This is a rather weird design case, where I need to parse the same (huge) data stream twice: hence would be buffering to temporary file mid-flow. The two flow graphs would then be pre-file and post-file. Although it is possible to achieve this using a single flow-graph, there are all sorts of issues surrounding the stalling of the graph should all the data not arrive on time: hence why it's desirable to keep things separate.

Any ideas?

Mat

↧

custom_scheduler calling pure virtual function task::execute()

March 30, 2015, 12:56 pm

Latest and popular articles on Intel Technologies

≫ Next: Errors in tbb interface and concurrent_unordered_impl

≪ Previous: Multiple FlowGraph instances

Hi,

We are occasionally experiencing a crash using tbb:parallel_pipeline that I'm hoping someone can help me narrow down. Any help, or suggestions for additional areas to check, would be greatly appreciated.

#0 0x00007f74b6311425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f74b6314b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f74b6c0cb05 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f74b6c0ac76 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f74b6c0aca3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f74b6c0b77f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f74ba018672 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f7478c18100, parent=..., child=<optimized out>)
at ../../src/tbb/custom_scheduler.h:455
#7 0x00007f74ba014356 in tbb::internal::arena::process (this=0x7f5a1adf0080, s=...) at ../../src/tbb/arena.cpp:106
#8 0x00007f74ba013a7b in tbb::internal::market::process (this=0x7f74b2cf1b00, j=...) at ../../src/tbb/market.cpp:479
#9 0x00007f74ba00fa0f in tbb::internal::rml::private_worker::run (this=0x7f74af81af00) at ../../src/tbb/private_server.cpp:283
#10 0x00007f74ba00fc09 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:240
#11 0x00007f74b8426e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f74b63ceccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

The crash is happening when trying to invoke a pure virtual function execute() on a task pointer:

customer_scheduler.h:455:
t_next = t->execute();

We run this pipeline with 4 outstanding tasks and 4 filters. The first and last filters are very fast, the second filter is the slowest, and the third filter is about 1/5th the second filter.

tbb::parallel_pipeline(4,
     tbb::make_filter<void, long>(tbb::filter::serial_in_order,
        [&] (tbb::flow_control& fc) -> long
        {...}& tbb::make_filter<long, long>(tbb::filter::parallel,
            [&] (long& offset) -> long
        {...}& tbb::make_filter<long, long>(tbb::filter::parallel,
            [&] (long& r) -> long
        {...}& tbb::make_filter<long, void>(tbb::filter::serial_out_of_order,
        [&] (long& count)
        {...}
     );

About 5 million elements are generated from the first pipeline stage. We've noticed that each time the crash happens, it is always with 4 elements left in the pipeline-- 1 waiting to execute on 3rd stage, 3 waiting to enter 4th stage. We are of course critically interrogating our filter code, but this common theme of 4 elements remaining lead us to suspect the pipeline.

We are running tbb 4.2. We have not seen this on 4.3, but we also don't consider our current testing to date on 4.3 conclusive to say that we won't see it on 4.3 in the future.

Below are two additional stacktraces for non-idle tbb threads at this time:

This thread appears to have just finished a task:

#0  0x00007f74b63caee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f74ba00f888 in futex_wakeup_one (futex=0x7f74af81a1ac) at ../../include/tbb/machine/linux_common.h:77
#2  V (this=0x7f74af81a1ac) at ../../src/tbb/semaphore.h:225
#3  notify (this=0x7f74af81a1a0) at ../../src/rml/include/../server/thread_monitor.h:250
#4  wake_or_launch (this=0x7f74af81a180) at ../../src/tbb/private_server.cpp:322
#5  tbb::internal::rml::private_server::wake_some (this=<optimized out>, additional_slack=<optimized out>, additional_slack@entry=0) at ../../src/tbb/private_server.cpp:401
#6  0x00007f74ba00fb88 in propagate_chain_reaction (this=<optimized out>) at ../../src/tbb/private_server.cpp:174
#7  tbb::internal::rml::private_worker::run (this=0x7f74af81ac80) at ../../src/tbb/private_server.cpp:291
#8  0x00007f74ba00fc09 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:240
#9  0x00007f74b8426e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f74b63ceccd in clone () from /lib/x86_64-linux-gnu/libc.so.6

This is the thread calling tbb::parallel_pipeline:

#0  0x00007f74ba00b7f9 in tbb::internal::stage_task::execute (this=0x7f73e88189c0) at ../../src/tbb/pipeline.cpp:363
#1  0x00007f74ba018672 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f5a1addc100, parent=..., child=<optimized out>)
    at ../../src/tbb/custom_scheduler.h:455
#2  0x00007f74ba016ad0 in tbb::internal::generic_scheduler::local_spawn_root_and_wait (this=0x7f5a1addc100, first=..., next=@0x7f5a1add6838: 0x7f5a1add69c0) at ../../src/tbb/scheduler.cpp:668
#3  0x00007f74ba00c621 in spawn_root_and_wait (root=...) at ../../include/tbb/task.h:705
#4  tbb::pipeline::run (this=this@entry=0x7f591efde000, max_number_of_live_tokens=max_number_of_live_tokens@entry=4, context=...) at ../../src/tbb/pipeline.cpp:666
#5  0x0000000000ba48a9 in parallel_pipeline (context=..., filter_chain=..., max_number_of_live_tokens=4) at /opt/sfdev-6.28/include/tbb/pipeline.h:654
#6  parallel_pipeline (filter_chain=..., max_number_of_live_tokens=4) at /opt/sfdev-6.28/include/tbb/pipeline.h:660

Thanks!

Jared

↧

Errors in tbb interface and concurrent_unordered_impl

April 3, 2015, 3:18 am

Latest and popular articles on Intel Technologies

≫ Next: License changes in Intel® Parallel Studio XE 2016 Beta

≪ Previous: custom_scheduler calling pure virtual function task::execute()

I had successfully compiled my code 6 months back with TBB version 3, but now when I try with version 4 or 3, it gives the errors as shown below.
Please help, as I'm stuck with this for 4 days already.

Am using Linux localhost.localdomain 2.6.32-431.el6.x86_64
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)

TBB version 4:
On using /usr/local/tbb43_20150316oss/include
/usr/local/tbb43_20150316oss/lib/intel64/gcc4.4/libtbb.so

g++ -c -g -Iinclude -I../Estimator/include -I/opt/intel/composer_xe_2015.2.164/mkl/include -I/usr/local/tbb43_20150316oss/include -MMD -MP -MF build/Debug/GNU-Linux-x86/source/Group.o.d -o build/Debug/GNU-Linux-x86/source/Group.o source/Group.cpp
In file included from /usr/local/tbb43_20150316oss/include/tbb/concurrent_hash_map.h:47,
from /usr/local/tbb43_20150316oss/include/tbb/tbb.h:42,
from source/Group.cpp:17:
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:270: error: ‘__TBB_FORWARDING_REF’ has not been declared
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:270: error: expected ‘,’ or ‘...’ before ‘t’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:286: error: expected nested-name-specifier before ‘__TBB_PARAMETER_PACK’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:286: error: expected ‘>’ before ‘Args’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: ‘Args’ was not declared in this scope
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: there are no arguments to ‘__TBB_FORWARDING_REF’ that depend on a template parameter, so a declaration of ‘__TBB_FORWARDING_REF’ must be available
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: expected ‘)’ before ‘__TBB_PARAMETER_PACK’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: ISO C++ forbids initialization of member ‘create_node_v’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: making ‘create_node_v’ static
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:287: error: template declaration of ‘tbb::interface5::internal::split_ordered_list::node* tbb::interface5::internal::create_node_v’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h: In member function ‘tbb::interface5::internal::split_ordered_list::node* tbb::interface5::internal::split_ordered_list::create_node(tbb::interface5::internal::sokey_t, int (*)(Arg))’:
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:275: error: ‘forward’ is not a member of ‘tbb::internal’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:275: error: expected primary-expression before ‘>’ token
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:275: error: ‘t’ was not declared in this scope
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h: At global scope:
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: expected primary-expression before ‘)’ token
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: there are no arguments to ‘__TBB_FORWARDING_REF’ that depend on a template parameter, so a declaration of ‘__TBB_FORWARDING_REF’ must be available
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: expected ‘)’ before ‘value’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: expected primary-expression before ‘pnode’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: ISO C++ forbids initialization of member ‘internal_insert’
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: making ‘internal_insert’ static
/usr/local/tbb43_20150316oss/include/tbb/internal/_concurrent_unordered_impl.h:1288: error: template declaration of ‘std::pair::iterator, bool> tbb::interface5::internal::internal_insert’
In file included from /usr/local/tbb43_20150316oss/include/tbb/flow_graph.h:2910,
from /usr/local/tbb43_20150316oss/include/tbb/tbb.h:53,
from source/Group.cpp:17:
/usr/local/tbb43_20150316oss/include/tbb/internal/_flow_graph_indexer_impl.h:37: error: expected ‘>’ before numeric constant
gmake[2]: *** [build/Debug/GNU-Linux-x86/source/Group.o] Error 1
gmake[2]: Leaving directory `/home/nkipe/NetBeansProjects/Assoc'
gmake[1]: *** [.build-conf] Error 2
gmake[1]: Leaving directory `/home/nkipe/NetBeansProjects/Assoc'
gmake: *** [.build-impl] Error 2

BUILD FAILED (exit value 2, total time: 9s)

----------------------------------------------------------------------------------------------------------------------------------------

TBB version 3
On trying /usr/local/tbb30_20100406oss/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/libtbb.so and /usr/local/tbb30_20100406oss/include

g++ -o dist/Debug/GNU-Linux-x86/Assoc build/Debug/GNU-Linux-x86/source/AssignmentPair.o build/Debug/GNU-Linux-x86/source/Assoc.o build/Debug/GNU-Linux-x86/source/CoordinateConversion.o build/Debug/GNU-Linux-x86/source/DummyPredictedTrack.o build/Debug/GNU-Linux-x86/source/GateInfo.o build/Debug/GNU-Linux-x86/source/GenerateCostMatrix.o build/Debug/GNU-Linux-x86/source/Group.o build/Debug/GNU-Linux-x86/source/GroupHypo.o build/Debug/GNU-Linux-x86/source/GroupNode.o build/Debug/GNU-Linux-x86/source/Hungarian.o build/Debug/GNU-Linux-x86/source/Hypo.o build/Debug/GNU-Linux-x86/source/LapJv.o build/Debug/GNU-Linux-x86/source/Main.o build/Debug/GNU-Linux-x86/source/Measurement.o build/Debug/GNU-Linux-x86/source/MergeGroup.o build/Debug/GNU-Linux-x86/source/Mht.o build/Debug/GNU-Linux-x86/source/MhtProcessingThread.o build/Debug/GNU-Linux-x86/source/MhtSendThread.o build/Debug/GNU-Linux-x86/source/Murty.o build/Debug/GNU-Linux-x86/source/PlotAsterixDecoder.o build/Debug/GNU-Linux-x86/source/ReceiveData.o build/Debug/GNU-Linux-x86/source/SplitGroup.o build/Debug/GNU-Linux-x86/source/Track.o build/Debug/GNU-Linux-x86/source/TrackAsterix.o build/Debug/GNU-Linux-x86/source/UdpServer_.o ../Estimator/dist/Debug/GNU-Linux-x86/libestimator.a /opt/intel/mkl/lib/intel64/libmkl_rt.so /usr/local/tbb30_20100406oss/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/libtbb.so /usr/local/tbb30_20100406oss/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/libtbbmalloc.so
build/Debug/GNU-Linux-x86/source/Assoc.o: In function `~parallel_do_feeder_impl':
/usr/local/tbb30_20100406oss/include/tbb/parallel_do.h:189: undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
build/Debug/GNU-Linux-x86/source/Group.o: In function `~task_group':
/usr/local/tbb30_20100406oss/include/tbb/task_group.h:164: undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
/usr/local/tbb30_20100406oss/include/tbb/task_group.h:168: undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
build/Debug/GNU-Linux-x86/source/Group.o: In function `~parallel_do_feeder_impl':
/usr/local/tbb30_20100406oss/include/tbb/parallel_do.h:189: undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
build/Debug/GNU-Linux-x86/source/Mht.o: In function `~parallel_do_feeder_impl':
/usr/local/tbb30_20100406oss/include/tbb/parallel_do.h:189: undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
build/Debug/GNU-Linux-x86/source/Mht.o:/usr/local/tbb30_20100406oss/include/tbb/parallel_do.h:189: more undefined references to `tbb::interface5::internal::task_base::destroy(tbb::task&)' follow
collect2: ld returned 1 exit status
gmake[2]: *** [dist/Debug/GNU-Linux-x86/Assoc] Error 1
gmake[2]: Leaving directory `/home/nkipe/NetBeansProjects/Assoc'
gmake[1]: *** [.build-conf] Error 2
gmake[1]: Leaving directory `/home/nkipe/NetBeansProjects/Assoc'
gmake: *** [.build-impl] Error 2

BUILD FAILED (exit value 2, total time: 38s)

↧

License changes in Intel® Parallel Studio XE 2016 Beta

March 30, 2015, 2:07 pm

Latest and popular articles on Intel Technologies

≫ Next: Message Flow Graph Example

≪ Previous: Errors in tbb interface and concurrent_unordered_impl

This Beta release of the Intel® Parallel Studio XE 2016 introduces a major change to the 'Named-user' licensing scheme (provided as default for the 2016 Beta licenses). Read below for more details on this new functionality as well as a list of special exceptions. Following a thorough Beta testing period, implementation will carry forward into the product release.

Description of changes:

The ‘Named-user’ license provisions in the Intel software EULA have changed to only allow the software to be installed on up to three systems. During the Intel® Parallel Studio XE 2016 Beta program, product licensing will be updated to check for this when it checks for valid licenses, and it will track systems by the system host ID. The installer will automatically detect the host ID and create the appropriate license. If your system cannot access the internet during install-time, you will need to manually create a host-specific Beta license. For more details on how to determine the host ID on your machine, follow the directions in this article.

We would love to get your feedback on this new license scheme. If you reach the allowable number of activations or have other 'Named-user' license problems, please contact us at the Intel® Premier Customer Support website. You will also be asked to complete a Beta survey at the end of the Beta program where you can give some final thoughts on this new functionality.

Limitations:

Using this new 'Named-user' license scheme may not be possible in one of the following cases:

Doing a distributed cluster install of the Beta software on a cluster with more than 3 nodes
- NOTE: You will only hit this issue if the directory where the Beta tools are being installed is not NFS-mounted across the cluster and a distributed installation is required
Installation of the following stand-alone packages:
- Intel® Advisor XE Beta (Linux*, Windows*)
- Intel® VTune™ Amplifier XE Beta - OS X* host only
- NOTE: You will only hit this issue if installing the stand-alone packages. This does not affect installation of these individual components when done via the Intel Parallel Studio XE 2016 Beta installer.

Workaround:

We expect that the new 'Named-user' license scheme will work in the majority of installation cases. If you encounter either of the situations described previously, you can easily replace the default 'Named-user' license provided during the Beta with a new license for manual offline installation.

In order to do this, return to the Intel® Parallel Studio XE 2016 Beta registration page and select the first option ("Generate license for manual offline installation") under the Email field:

Enter your email address and select the "Continue" button. A new license file will be emailed to you. You do NOT have to download the packages again.

Intel® Cluster Checker

Intel® MPI Benchmarks

Intel® Trace Analyzer and Collector

Intel® C++ Compiler

Intel® Fortran Compiler

Intel® Data Analytics Acceleration Library

Intel® Integrated Performance Primitives

Intel® Math Kernel Library

Intel® MPI Library

Intel® Threading Building Blocks

Intel® Software Development Products Registration Center

Intel® Parallel Studio XE

Intel® Parallel Studio XE Cluster Edition

Intel® Parallel Studio XE Composer Edition

Intel® Parallel Studio XE Professional Edition

Intel® Advisor XE

Intel® VTune™ Amplifier XE

Theme Zone:

IDZone

↧

Message Flow Graph Example

April 6, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: concurrent_bounded_queue Template Class

≪ Previous: License changes in Intel® Parallel Studio XE 2016 Beta

This example calculates the sum x*x + x*x*x for all x = 1 to 10. The layout of this example is shown in the figure below.

A simple message flow graph.

Each value enters through the input_node<int>input. This node broadcasts the value to both squarer and cuber, which calculate x*x and x*x*x respectively. The output of each of these nodes is put to one of join's ports. A tuple containing both values is created by join_node< tuple<int,int> > join and forwarded to summer, which adds both values to the running total. Both squarer and cuber allow unlimited concurrency, that is they each may process multiple values simultaneously. The final summer, which updates a shared total, is only allowed to process a single incoming tuple at a time, eliminating the need for a lock around the shared value.

#include <cstdio>
#include "tbb/flow_graph.h"

using namespace tbb::flow;

struct square {
  int operator()(int v) { return v*v; }
};

struct cube {
  int operator()(int v) { return v*v*v; }
};

class sum {
  int &my_sum;
public:
  sum( int &s ) : my_sum(s) {}
  int operator()( tuple< int, int > v ) {
    my_sum += get<0>(v) + get<1>(v);
    return my_sum;
  }
};

int main() {
  int result = 0;

  graph g;
  broadcast_node<int> input(g);
  function_node<int,int> squarer( g, unlimited, square() );
  function_node<int,int> cuber( g, unlimited, cube() );
  join_node< tuple<int,int>, queueing > join( g );
  function_node<tuple<int,int>,int>
      summer( g, serial, sum(result) );

  make_edge( input, squarer );
  make_edge( input, cuber );
  make_edge( squarer, get<0>( join.input_ports() ) );
  make_edge( cuber, get<1>( join.input_ports() ) );
  make_edge( join, summer );

  for (int i = 1; i <= 10; ++i)
      input.try_put(i);
  g.wait_for_all();

  printf("Final result is %d\n", result);
  return 0;
}

In the example code above, the classes square, cube and sum define the three user-defined operations. Each class is used to create a function_node.

In function main, the flow graph is set up and then the values 1-10 are put into the node input. All the nodes in this example pass around values of type int. The nodes used in this example are all class templates and therefore can be used with any type that supports copy construction, including pointers and objects.

Caution

Values are copied as they pass between nodes and therefore passing around large objects should be avoided. To avoid large copy overheads, pointers to large objects can be passed instead.

Note

This is a simple syntactic example only. Since each node in a flow graph may execute as an independent task, the granularity of each node should follow the general guidelines for tasks as described in Section 3.2.3 of the Intel® Threading Building Blocks Tutorial.

The classes and functions used in this example are described in detail in topics linked from the Flow Graph parent topic.

Parent topic: Flow Graph

Language English

↧

concurrent_bounded_queue Template Class

April 6, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Modifiers

≪ Previous: Message Flow Graph Example

Summary

Template class for bounded dual queue with concurrent operations.

Syntax

template<typename T, class Alloc=cache_aligned_allocator<T> >
            class concurrent_bounded_queue;

Header

#include "tbb/concurrent_queue.h"

Description

A concurrent_bounded_queue is similar to a concurrent_queue , but with the following differences:

Adds the ability to specify a capacity. The default capacity makes the queue practically unbounded.
Changes the push operation so that it waits until it can complete without exceeding the capacity.
Adds a waiting pop operation that waits until it can pop an item.
Changes the size_type to a signed type.
Changes the size() operation to return the number of push operations minus the number of pop operations. For example, if there are 3 pop operations waiting on an empty queue, size() returns -3.
Adds an abort operation that causes any waiting push or pop operation to abort and throw an exception.

Members

To aid comparison, the parts that differ from concurrent_queue are in bold and annotated.

namespace tbb {
           template<typename T, typename A=cache_aligned_allocator<T> >
           class concurrent_bounded_queue {
           public:// types
               typedef T value_type;
               typedef A allocator_type;
               typedef T& reference;
               typedef const T& const_reference;// size_type is signed typetypedef std::ptrdiff_t size_type;
               typedef std::ptrdiff_t difference_type;

               explicit concurrent_bounded_queue(const allocator_type& a = allocator_type());
               concurrent_bounded_queue( const concurrent_bounded_queue& src, const allocator_type& a = allocator_type());
               template<typename InputIterator>
               concurrent_bounded_queue( InputIterator begin, InputIterator end, const allocator_type& a = allocator_type());// C++11 specific
               concurrent_bounded_queue( concurrent_bounded_queue&& src );
               concurrent_bounded_queue( concurrent_bounded_queue&& src, const allocator_type& a );

               ~concurrent_bounded_queue();

               // waits until it can push without exceeding capacity.
               void push( const T& source );// waits if *this is emptyvoid pop( T& destination );// C++11 specific
               void push( T&& source );// the same as push except that the item is constructed with given arguments.
               template<typename... Arguments>
               void emplace(Arguments&&... args);// skips push if it would exceed capacity.bool try_push( const T& source );
               bool try_pop( T& destination );// C++11 specific
               bool try_push( T&& source );// the same as try_push except that the item is constructed with given arguments.
               template<typename... Arguments>
               bool try_emplace(Arguments&&... args);void abort();
               void clear();// safe to call during concurrent modification, can return negative size.size_type size() const;
               bool empty() const;size_type capacity() const;void set_capacity( size_type capacity );
               allocator_type get_allocator() const;

               typedef implementation-defined iterator;
               typedef implementation-defined const_iterator;// iterators (these are slow an intended only for debugging)
               iterator unsafe_begin();
               iterator unsafe_end();
               const_iterator unsafe_begin() const;
               const_iterator unsafe_end() const;
        };
    }

Because concurrent_bounded_queue is similar to concurrent_queue, the following table describes only methods that differ.

Member	Description
void push( const T& source )	Waits until size()<capacity, and then pushes a copy of source onto back of the queue.
void push( T&& source)	C++11 specific; Waits until size()<capacity, and then moves source onto back of the queue.
template<typename... Arguments> void emplace(Arguments&&... args);	C++11 specific; Waits until size()<capacity, and then pushes a new element into the queue. The element is constructed with given arguments.
void pop( T& destination )	Waits until a value becomes available and pops it from the queue. Assigns it to destination. Destroys the original value.
void abort()	Wakes up any threads that are waiting on the queue via the `push` and `pop` operations and raises the `tbb::user_abort` exception on those threads. This feature is unavailable if `TBB_USE_EXCEPTIONS` is not set.
bool try_push( const T& source )	If size()<capacity, pushes a copy of source onto back of the queue. Returns: True if a copy was pushed; false otherwise.
bool try_push( T&& source )	C++11 specific; If size()<capacity, moves source onto back of the queue. Returns: True if an item was moved; false otherwise.
template<typename... Arguments> bool try_emplace(Arguments&&... args);	C++11 specific; If size()<capacity, constructs an item with given arguments and moves it onto back of the queue. Returns: True if an item was moved; false otherwise.
bool try_pop( T& destination )	If a value is available, pops it from the queue, assigns it to destination, and destroys the original value. Otherwise does nothing. Returns: True if a value was popped; false otherwise.
size_type size() const	Returns: Number of pushes minus number of pops. The result is negative if there are pop operations waiting for corresponding pushes. The result can exceed capacity() if the queue is full and there are push operations waiting for corresponding pops.
bool empty() const	Returns: `size()<=0`
size_type capacity() const	Returns Maximum number of values that the queue can hold.
void set_capacity( size_type capacity )	Sets the maximum number of values that the queue can hold.

Parent topic: Containers Overview

Modifiers

April 6, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Containers

≪ Previous: concurrent_bounded_queue Template Class

The following tables provides additional information on the members of the concurrent_unordered_set and concurrent_unordered_multiset template classes.

Member	Description
std::pair<iterator, bool> insert(const value_type& x)	Constructs copy of `x` and attempts to insert it into the set. If the attempt fails because an item with the same key already exists, the copy is destroyed. Returns: `std::pair(iterator,success)`. The value iterator points to an item in the set with a matching key. The value of success is true if the item was inserted; false otherwise.
iterator insert(const_iterator hint, const value_type& x)	Same as `insert(x)`. Note The current implementation ignores the hint argument. Other implementations might not ignore it. It exists for similarity with the C++11 classes `unordered_set` and `unordered_multiset`. It hints to the implementation about where to start searching. Typically it should point to an item adjacent to where the item will be inserted. Returns: Iterator pointing to inserted item, or item already in the set with the same key.
std::pair<iterator, bool> insert(value_type&& x)	C++11 specific. Moves `x` into new instance of `value_type` and attempts to insert it into the set. If the attempt fails because an item with the same key already exists, this instance is destroyed. Returns: the same as `insert(const value_type& x)` version.
iterator insert(const_iterator hint, value_type&& x)	Same as `insert(x)`. Note The current implementation ignores the hint argument. Other implementations might not ignore it. It exists for similarity with the C++11 classes `unordered_set` and `unordered_multiset`. It hints to the implementation about where to start searching. Typically it should point to an item adjacent to where the item will be inserted. Returns: the same as `insert(const_iterator hint, const value_type& x)` version.
template<class InputIterator> void insert(InputIterator first, InputIterator last)	Does `insert(*i)` where `i` is in the half-open interval [`first,last`).
void insert(std::initializer_list<value_type> il)	C++11 specific. Inserts a sequence to the set by inserting each element from the initializer list.
template<typename... Args> std::pair<iterator, bool> emplace(Args&&... args);	C++11 specific. Constructs new instance of `value_type` from `args` and attempts to insert it into the set. If the attempt fails because an item with the same key already exists, this instance is destroyed. Returns: the same as `insert(const value_type& x)` version.
template<typename... Args> iterator emplace_hint(const_iterator hint, Args&&... args);	Same as `emplace(args)`. Note The current implementation ignores the hint argument. Other implementations might not ignore it. It exists for similarity with the C++11 classes `unordered_set` and `unordered_multiset`. It hints to the implementation about where to start searching. Typically it should point to an item adjacent to where the item will be inserted. Returns: Iterator pointing to inserted item, or item already in the set with the same key.
iterator unsafe_erase(const_iterator position)	Removes the item pointed to by `position` from the set. Returns: Iterator pointing to item that was immediately after the erased item, or `end()` if erased item was the last item in the set.
size_type unsafe_erase(const key_type& k)	Removes item with key `k` if such an item exists. Returns: 1 if an item was removed; 0 otherwise.
iterator unsafe_erase(const_iterator first, const_iterator last)	Removes `i` where `i` is in the half-open interval `[first,last)`. Returns*: `last`
void clear()	Removes all items from the set.

The following table provides additional information on the concurrent_unordered_set template class.

Member	Description
`void swap(concurrent_unordered_set& m)`	Swaps contents of `*this` and `m`.

The following table provides additional information on the concurrent_unordered_multiset template class.

Member	Description
`void swap(concurrent_unordered_multiset& m)`	Swaps contents of `*this` and `m`.

Parent topic: concurrent_unordered_set and concurrent_unordered_multiset Template Classes

Language English

↧

Containers

April 6, 2015, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: parallel_for Template Function

≪ Previous: Modifiers

The container classes permit multiple threads to simultaneously invoke certain methods on the same container.

Like STL, Intel® Threading Building Blocks (Intel® TBB) containers are templated with respect to an allocator argument. Each container uses its allocator to allocate memory for user-visible items. A container may use a different allocator for strictly internal structures.

Parent topic: Intel® Threading Building Blocks Reference Manual

Language English

↧

Theme Zone:

Upcoming Webinars

Theme Zone:

Theme Zone:

Description of changes:

Limitations:

Workaround:

Theme Zone:

Caution

Note

Summary

Syntax

Header

Description

Members

See Also

Note

Note

Note