I'll pick an easy one of "TODO" first and will add some more later, in no order -- so there'll be a few edits, return for more :-) 99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n: use strict; use warnings; use PDL; my $x = pdl([1.0, 0.0, 3.0, 8.0], [2.0, 0.0, 1.0, 1.0], [1.5, 2.5, 1.0, 0.0]); my $n = 4; my $mask = ( $x == $x-> rint )-> andover & ( $x-> sumover == $n ); print $x-> transpose -> whereND( $mask ) -> transpose; [download] -------------------------- Added: -------------------------- print $PDL::Version::VERSION; Is it officially recommended way? What's the benefit over usual $PDL::VERSION; ? I also wonder what's the $VERSION = eval $VERSION; in that tiny module. -------------------------- 4. How to find the memory size of any matrix? print $z-> info( '%M' ); [download] -------------------------- 35. How to compute ((A+B)*(-A/2)) in place? There's maybe a typo or two in Python solution (what's the "C"?). But to operate inplace, I'd do this: my $a = ones(3); my $b = 2 * ones(3); $b += $a; $a /= -2; $b *= $a; print $b; [download] IIRC, combined assignment operators are overloaded to work inplace, but I can't find a reference right now, will do it later. -------------------------- 53. How to convert a float (32 bits) array into an integer (32 bits) in place? Good question. The ceil and floor convert double to long inplace. Not sure if PDL allows to do so for 32-bit types. -------------------------- 45. Create random vector of size 10 and replace the maximum value by 0: More efficient: my $z = random( 10 ); $z( $z-> maximum_ind ) .= 0; print $z; [download] -------------------------- 64. Consider a given vector, how to add 1 to each element indexed by a second vector (be careful with repeated indices)? my $z = zeroes( 10 ); my $i = pdl( 1, 3, 5, 3, 1 ); indadd( 1, $i, $z ); print $z; [download] -------------------------- 81. Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14] , how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]] : my $z = 1 + sequence 14; my $len = 4; print $z-> lags( 0, 1, 1 + $z-> nelem - $len ) -> slice( '','-1:0' ); [download] -------------------------- 87. Consider a 16x16 array, how to get the block-sum (block size is 4x4)? If I understand the task correctly, and now that I've learned about lags : my $x = sequence 16, 16; print $x-> lags( 1, 4, 4 ) -> slice( '', '', '-1:0' ) -> xchg( 0, 1 ) -> sumover -> lags( 0, 4, 4 ) -> slice( '', '-1:0' ) -> sumover; [download] (Sigh...) Utilizing benefits of idle commuting and thinking things over: my $x = sequence 16, 16; print $x-> reshape( 4, 4, 4, 4 ) -> reorder( 0, 2, 1, 3 ) -> clump( 2 ) -> sumover [download]

Thanks for all of these I assume you wouldn't mind me merging the solutions into my 'master' POD, with credit of course. Just from reading your solutions I learned a number of new functions, including maximum_ind (and it's sister functions maximum_n_ind , minimum_ind , minimum_n_ind and max2d_ind ) and lags . This is what I hoped would happen, as everyone seems to know a different subset of PDL functionality.

Hi, mxb, of course I woudn't mind. + Note, your recipe #66 doesn't do what's expected, -- but to shift bytes left manually, as Python guys do, isn't nice neither. I'd do this: pdl> $x = sequence 2,2,3 # "2x2 planar RGB" image, 4 unique colors pdl> $x = $x-> glue( 0, $x ) # "4x2 RGB" image, 4 unique colors pdl> $x = $x-> glue( 1, $x ) # "4x4 RGB" image, still 4 unique colors pdl> $x-> set( 2,2,2, 100 ) # make them 5 pdl> p$x [ [ [0 1 0 1] [2 3 2 3] [0 1 0 1] [2 3 2 3] ] [ [4 5 4 5] [6 7 6 7] [4 5 4 5] [6 7 6 7] ] [ [ 8 9 8 9] [ 10 11 10 11] [ 8 9 100 9] [ 10 11 10 11] ] ] pdl> p $x-> clump(2)-> transpose-> uniqvec-> getdim( 1 ) 5 [download] ----------------- As to combined assignment operators working in-place, here is simple experiment (Windows), either line #1 or #2 un-commented on different runs: use strict; use warnings; use feature 'say'; use PDL; my $x = zeroes 1e8; my $y = ones 1e8; $x = $x + $y; # 1 #$x += $y; # 2 say qx{ typeperf "\\Process(perl)\\Working Set Peak" -sc 1 } =~ /.+"(.+)"/s; __END__ >perl pdl180504.pl 2427752448.000000 >perl pdl180504.pl 1627779072.000000 [download]

Good idea! Here is my take on #100 (TODO) The long version as a standalone program: Read more... (6 kB) The short version: # get a random sample with replacement from input vector # 2nd param is a RNG object which will select N random # elements (with replacement) from input vector # 3rd param is OPTIONAL: the size of the sample, # default is the size of the input vector sub resample { my $original_sample = $_[0]; my $arng = $_[1]; my $No = $_[2] || $original_sample->getdim(0); my $indices = ($No * $rng->get_uniform($No))->floor(); my $newsample = $original_sample->slice($indices); return $newsample; } my $M = 1000; # num bootstrap resamplings my $X = ... ; # input data piddle (1D) my $R = $X->getdim(0); # size of bootstrap resamples my $rng = PDL::GSL::RNG->new('taus')->set_seed($seed); my $means = zeroes($M); for(my $i=0;$i<$M;$i++){ # get a re-sample from original data with size R # use our RNG to do the re-sampling: my $asample = resample($X, $rng, $R); $means->set($i, PDL::Ufunc::avg($asample)); } # now sort the means vector and pick the elements at # the confidence intervals specified, e.g. 5% my $sorted_means = PDL::Ufunc::qsort($means); my $confidence_intervals_values = [ $sorted_means->at(int(0.05*$M)), $sorted_means->at(int(0.95*$M)) ]; [download] EDIT 1: The above script can easily be parallelised. Input data is readonly (X). Each worker writes results (mean, stdev) to the same piddle but at different array locations guaranteed. So, there is no need for locking afaics. Unfortunately I can not get it to parallelise with threads because PDL::GSL::RNG seems not to like threads. The RNG is needed in order to get a random sample from the original data on every bootstrap iteration. In fact each worker/thread can have its own RNG, not a copy but a different RNG local to each thread. However, even like this I get the dreaded pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug Any ideas? EDIT 2: the parallelised version as a standalone program: Read more... (15 kB) corrections welcome (especially on good practices about how to parallelise)

Hi, bliako, thank you so much for detailed answer, my statistics skills were (hopefully) auto-vivified :). After following your links and code in earnest, I felt brave enough to make some experiments and write a comment, but in the process I discovered something strange ;). ---------- First, my impression is that solutions to exercices were supposed to be simple (as, KISS). So, perhaps to translate, almost verbatim, Python solution to PDL, answer to #100 can be: use strict; use warnings; use feature 'say'; use PDL; my $n = 100; # input sample size my $m = 1000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $idx = random $r, $m; $idx *= $n; say $x-> index( $idx ) -> avgover -> pctover( pdl 0.05, 0.95 ); __END__ [ 0.4608755 0.55562806] [download] Interesting, here, PDL DWIMs for me -- no need to floor an index to thread over a piddle (just as with Perl's array indices). I also stand corrected in " floor converts to Long in-place" -- it rounds in-place, but piddle stays Double. This 'never to explicitly loop in vectorized language' answer, unfortunately, hides the ugly truth that for very large data we can end with huge R x M matrices of random indices and equally huge (equally unnecessary) matrices of all re-samplings, and thus die because of 'Out of memory!'. I was experimenting with this or that (PDL's automatic parallelization, in particular), which I'm skipping now, because next is something weird. Consider this version of the above, which avoids 2-dimensional index matrix and results of re-samplings, but is still un-parallel: use strict; use warnings; use feature 'say'; use Time::HiRes 'time'; use PDL; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $avg = zeroes $m; for ( 0 .. $m - 1 ) { my $idx = random $r; $idx *= $n; $avg-> set( $_, $x-> index( $idx )-> avg ) } say $avg-> pctover( pdl 0.05, 0.95 ); say time - $time; __END__ [0.49384165 0.49941814] 6.11959099769592 [download] Next is solution where I'm starting to try to parallelize, but because of selected parameters (single thread) I'm not only expecting no gain, but due to overhead it must be slower. And yet: use strict; use warnings; use feature 'say'; use Time::HiRes 'time'; use PDL; use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random $n; my $avg = zeroes $m; share_pdls x => $x, avg => $avg; threads-> create( sub { my ( $x, $avg ) = retrieve_pdls qw/ x avg /; for ( 0 .. $m - 1 ) { my $idx = random $r; $idx *= $n; $avg-> set( $_, $x-> index( $idx )-> avg ) } }); $_-> join for threads-> list; say $avg-> pctover( pdl 0.05, 0.95 ); say time - $time; __END__ [0.49384165 0.49941814] 4.57857203483582 [download] Why is that? :) I tried to insert use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /; share_pdls x => $x, avg => $avg; ( $x, $avg ) = retrieve_pdls qw/ x avg /; [download] into no-threads solution (does retrieve_pdls set any flags that speed things up? Nope.) $ perl -v This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-li + nux-thread-multi (with 1 registered patch, see perl -V for more detail) $ perl -MPDL -E 'say $PDL::VERSION' 2.019 [download]

vr your code is superior than the long code I have posted! If I may add: using oddpctover() might be preferred because it does not interpolate when there is no data at the exact percentile position. Regarding the time difference when running with and without "use threads", I have discovered that avg() is the culprit. If you use x-> index( $idx )->at(0) rather than x-> index( $idx )->avg the performance is the same (which means idx() is also excluded as possible cause).

Hi, vr Tonight came across your post and modified your demonstration to run with 4 threads. # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; use PDL::Parallel::threads qw(retrieve_pdls); use threads; use MCE::Shared; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size my $x = random( $n ); $x->share_as('x'); my $avg = zeroes( $m ); $avg->share_as('avg'); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { srand; my ( $x, $avg ) = retrieve_pdls('x', 'avg'); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; $avg->set( $seq_n, $x->index( $idx )->avg ); } } threads->create( \¶llel_task ) for 1 .. 4; # ... do other stuff ... $_->join() for threads->list(); say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49395242 0.49936752] 1.28744792938232 seconds [download] Afterwards, re-validated PDL with MCE and released 1.847. The effort is mainly for folks running Perl lacking threads support. Here it is, PDL and MCE::Shared running similarly. # https://www.perlmonks.org/?node_id=1214227 use strict; use warnings; use feature 'say'; use PDL; # must load PDL before MCE::Shared use MCE::Hobo; use MCE::Shared 1.847; use Time::HiRes 'time'; srand( 123 ); my $time = time; my $n = 30000; # input sample size my $m = 10000; # number of bootstrap repeats my $r = $n; # re-sample size # On Windows, the non-shared piddle ($x) is unblessed in threads. # Therefore, constructing the piddle inside the worker. UNIX # platforms benefit from copy-on-write. Thus, one copy. my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n ); my $avg = MCE::Shared->pdl_zeroes( $m ); my $seq = MCE::Shared->sequence( 0, $m - 1 ); sub parallel_task { $x = random( $n ) unless ( defined $x ); while ( defined ( my $seq_n = $seq->next() ) ) { my $idx = random $r; $idx *= $n; # $avg is a shared piddle which resides inside the shared- # manager process or thread. The piddle is accessible via the # OO interface only. $avg->set( $seq_n, $x->index( $idx )->avg ); } } MCE::Hobo->create( \¶llel_task ) for 1 .. 4; # ... do other stuff ... MCE::Hobo->wait_all(); # MCE sets the seed of the base generator uniquely between workers. # Unfortunately, it requires running with one worker for predictable # results (i.e. no guarantee in the order which worker computes the # next input chunk). say $avg->pctover( pdl 0.05, 0.95 ); say time - $time, ' seconds'; __END__ # Output [0.49387191 0.49937053] 1.29038286209106 seconds [download] Regards, Mario

Hi While I cannot assist with your parallisation issue, many thanks for your contribution. I hope someone else with more experience may be able to assist. Thanks

Very nice idea! ++ Wouldn't GitHub be a more suitable place where people can create pull requests with fixed exercises? ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a, [download]

Thanks! I originally posted it to PM as it's one of the few active Perl forums, so I hoped there would be people with PDL experience around. I wasn't too sure of the reaction or whether people would be interested in helping. Seems like I have my answer now! :P I have no objection to putting the completed documents on Github. They are in POD format, so could easily add a makefile to generate html/pdf/etc. Thanks to everyone who is contributing solutions.

shhhhh, PM needs the traffic ;-)

++BRILLIANT! I've been using R for statistics and AI/ML just because it was used in the class I took. I've seen PDL, but was unable to grok the documentation as a beginner - same boat as you it seems. I just saw your meditation so haven't gone through it yet, but I always install the Strawberry Perl with PDL so I have it and I'll take a look when I have some free time. Thanks again!

That's great! I have made a few attempts to learn PDL, and while I've been able to do a few things, I think going through this exercise will help me solidify some ideas. Once the TODOs get fleshed out, I think this should be added to the Tutorials. I would suggest wrapping the code/answers in <spoilers> tags for the final version of the tutorial. Also, for (at least some of) the "n/a" or numpy-specific questions, it may be possible to convert them to perl equivalents.

Excellent idea ++. But I am unfortunately not able to help with PDL. It's a bit funny: just a few days ago, I was thinking about possibly carrying out a somewhat similar exercise (namely a port to Perl) with this: https://krother.gitbooks.io/python-3-basics-tutorial/content/en/.

PM is a good choice for this project. It's got the discussion of the best solution(s) to the question. Hopefully someone jumps in an offers a better solution than mine. It feels a little ugly explicitly stating the indices. Given two arrays, X and Y, construct the Cauchy matrix C (C ij =1/(x i - y j )) First, what's a Cauchy matrix? Ahh, this question is just how to construct a matrix from 2 arrays, where no elements from one array are in the other. Just create a sequence of number for the first array and then make the second array 0.5 more than the first array to get the inputs. Here's a brute force method. use PDL; use PDL::NiceSlice; my $x = sequence(8); my $y = $x + 0.5; my ($nx, $ny) = (nelem($x), nelem($y)); my $C = zeroes($nx, $ny); for (my $i = 0; $i < $nx; $i++) { for (my $j = 0; $j < $ny; $j++) { $C($i,$j) .= 1/($x($i) - $y($j)); } } print $C; [download] I like the PDL::NiceSlice for indexing. It makes sense to me. I could have also created the matrix with my $C = outer($x, $y); or gotten the size of the arrays with $x->getdim(0) and if I grokked threading rather than just skimming PDL::Threading, this might look way cooler. NB the " .= " in the assignment breaks the link between the matrix and the 2 arrays. It's important. That was just the Cauchy matrix. Some people want the Cauchy determinant (as long as the 2 arrays are the same size). Easy! Just import PDL::MatrixOps use PDL::MatrixOps; print det $C; [download] Sometimes I can think of 6 impossible LDAP attributes before breakfast. YAPC::Europe::2018 — Hmmm, need to talk to work about sending me.

This is Much Better When I was young, I coded in for loops, but when I read the documentation, I put away these childish things and learned to Thread $x = sequence(8); $y = sequence(7) + 0.5; $c = 1/($x->dummy(0,$y->nelem)->transpose - $y->dummy(0,$x->nelem)); [download] TaDAAA!! You create 2 vectors (of different sizes), inflate it along the other dimension using dummy to fit the other vector's size (using nelem), flip one of them using transpose and do the calculation in one line. PDL takes care of the loops and does it faster than you can in Perl. I had to transpose $x to get the same result as the for loops above, in order to prove they are the same by pdl> p $C - $c Ea Sometimes I can think of 6 impossible LDAP attributes before breakfast. Sometimes I can think of 6 impossible LDAP attributes before breakfast. Mojoconf was great!

I was working on something very similar (different question set, but same idea). I am working from 101 NumPy Exercises for Data Analysis (Python). The only significant difference is I am trying to do them in one-liner format; Python's semantically meaningful whitespace makes doing command line scripting painful. Very quick data analyses can be done at the terminal in Perl, if you know what PDL modules to use. I posted a few examples on my public scratchpad. I don't have whole lot written, but some guidance and feedback would be helpful. PDL is a lot of fun to use!

93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B? (★★★) use strict; use warnings; use PDL; my $A = floor(random(3,8) * 6); my $B = floor(random(2,2) * 6); my $C = $A->in($B->slice(',0'))->sumover * $A->in($B->slice(',1'))->su + mover ; my $rows = which($C > 0); print $rows; [download]

GitHub the thing, pump it up on blogs.perl.org and the perl reddit channel and watch the PRs come in for the TODOs. Seriously.

Duh. I found it on reddit. :-( Need more coffee.

hi, I took the liberty of putting this on gitlab, hope that's alright.



https://gitlab.com/jtym/pdl-100/blob/master/README.md

Some notes below your chosen depth have not been shown here