[dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app

Hunt, David david.hunt at intel.com
Wed Jul 5 13:15:51 CEST 2017


Hi Jerin,


On 5/7/2017 6:30 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Tue, 4 Jul 2017 08:55:25 +0100
>> From: "Hunt, David" <david.hunt at intel.com>
>> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>> CC: dev at dpdk.org, harry.van.haaren at intel.com, Gage Eads
>>   <gage.eads at intel.com>, Bruce Richardson <bruce.richardson at intel.com>
>> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
>> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>>   Thunderbird/45.8.0
>>
>> Hi Jerin,
> Hi David,
>
> I have checked the v6. Some comments below.
>
>>
>> On 3/7/2017 4:57 AM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>
--snip--
>>>> +#
>>>> +#   Redistribution and use in source and binary forms, with or without
>>>> +#   modification, are permitted provided that the following conditions
>>>> +#   are met:
>>>> +#
>>>> +
>>>> +static unsigned int active_cores;
>>>> +static unsigned int num_workers;
>>>> +static long num_packets = (1L << 25); /* do ~32M packets */
>>>> +static unsigned int num_fids = 512;
>>>> +static unsigned int num_stages = 1;
>>>> +static unsigned int worker_cq_depth = 16;
> Any reason not move this to struct config_data?

Sure. All vars now moved to either config_data or fastpath_data.

>>>> +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
>>>> +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1};
>>>> +static int16_t qid[MAX_NUM_STAGES] = {-1};
>>>> +static int worker_cycles;
> used in fastpath move to struct fastpath_data.

Performance drops by ~20% when I put these in fastpath_data. No 
performace drop when in config_data.
I think we need to leaving in config_data for now.

>>>> +static int enable_queue_priorities;
> struct config_data ?

Done.

>>>> +
>>>> +struct prod_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +	int32_t qid;
>>>> +	unsigned int num_nic_ports;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +struct cons_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +static struct prod_data prod_data;
>>>> +static struct cons_data cons_data;
>>>> +
>>>> +struct worker_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +static unsigned int *enqueue_cnt;
>>>> +static unsigned int *dequeue_cnt;
>>> Not been consumed. Remove it.
>> Done.
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +
>>>> +static inline void
>>>> +work(struct rte_mbuf *m)
>>>> +{
>>>> +	struct ether_hdr *eth;
>>>> +	struct ether_addr addr;
>>>> +
>>>> +	/* change mac addresses on packet (to use mbuf data) */
>>>> +	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>> +	ether_addr_copy(&eth->d_addr, &addr);
>>>> +	ether_addr_copy(&eth->s_addr, &eth->d_addr);
>>>> +	ether_addr_copy(&addr, &eth->s_addr);
>>> If it is even number of stages(say 2), Will mac swap be negated? as we are
>>> swapping on each stage NOT in consumer?
>> The mac swap is just to touch the mbuf. It does not matter if it is negated.
> Are you sure or I am missing something here? for example,If I add following piece
> of code, irrespective number of stages it should send the same packet if
> source packet is same. Right ?
> But not happening as expected(Check the src and dest mac address)
>
> stage == 1
> 00000000: 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |....'+....',..E.
> 00000010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 |................
> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>
> stage == 2
> 00000000: 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |....',....'+..E.
> 00000010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 |................
> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>
>
> diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c
> index c62cba2..a7aaf37 100644
> --- a/examples/eventdev_pipeline_sw_pmd/main.c
> +++ b/examples/eventdev_pipeline_sw_pmd/main.c
> @@ -147,8 +147,9 @@ consumer(void)
>          if (start_time == 0)
> @@ -157,6 +158,7 @@ consumer(void)
>          received += n;
>          for (i = 0; i < n; i++) {
>                  uint8_t outport = packets[i].mbuf->port;
> +               rte_pktmbuf_dump(stdout, packets[i].mbuf, 64);
>                  rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport],
>                                  packets[i].mbuf);
>
> Either fix the mac swap properly or remove it.

Temporary fix added. Now reading in addr and writing it back without 
swapping. Not ideal,
will probably need more work in the future. Added a FIXME in the code 
with agreement from Jerin.

>
>>>> +
>>>> +	if (!quiet) {
>>>> +		printf("\nPort Workload distribution:\n");
>>>> +		uint32_t i;
>>>> +		uint64_t tot_pkts = 0;
>>>> +		uint64_t pkts_per_wkr[RTE_MAX_LCORE] = {0};
>>>> +		for (i = 0; i < num_workers; i++) {
>>>> +			char statname[64];
>>>> +			snprintf(statname, sizeof(statname), "port_%u_rx",
>>>> +					worker_data[i].port_id);
>>>> +			pkts_per_wkr[i] = rte_event_dev_xstats_by_name_get(
>>>> +					dev_id, statname, NULL);
>>> As discussed, Check the the given xstat supported on the PMD first.
>> Checking has now been implemented. It'd done by calling
>> rte_event_dev_xstats_by_name_get()
>> and seeing if the result is -ENOTSUP. However there is a bug in the function
>> in that it is declared
>> as a uint64_t, but then returns a -ENOTSUP, so I have to cast the -ENOTSUP
>> as a uint64_t for
>> comparison. This will need to be fixed when the function is patched.
>>
>>                          retval = rte_event_dev_xstats_by_name_get(
>>                                          dev_id, statname, NULL);
>>                          if (retval != (uint64_t)-ENOTSUP) {
>>                                  pkts_per_wkr[i] =  retval;
>>                                  tot_pkts += pkts_per_wkr[i];
>>                          }
>>
>>
>>
>>>> +			tot_pkts += pkts_per_wkr[i];
>>>> +		}
>>>> +		for (i = 0; i < num_workers; i++) {
>>>> +			float pc = pkts_per_wkr[i]  * 100 /
>>>> +				((float)tot_pkts);
> This will print NAN.
>
> How about, move the specific xstat as static inline function.
> port_stat(...)
> {
> 		char statname[64];
> 		uint64_t retval;
> 		snprintf(statname, sizeof(statname),"port_%u_rx",
> 				worker_data[i].port_id);
> 		retval = rte_event_dev_xstats_by_name_get(
> 				dev_id, statname, NULL);
> }
> and add check in the beginning of the "if" condition.
> ie.
>
> 	if (!cdata.quiet && port_stat(,,) != (uint64_t)-ENOTSUP) {
> 		printf("\nPort Workload distribution:\n");
> 		...
> 	}
>

Sure. Adding the port_stat simplifies a lot. Will do.

>>>> +			printf("worker %i :\t%.1f %% (%"PRIu64" pkts)\n",
>>>> +					i, pc, pkts_per_wkr[i]);
>>>> +		}
>>>> +
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}



More information about the dev mailing list