[dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info

Ananyev, Konstantin konstantin.ananyev at intel.com
Fri Jun 12 14:30:58 CEST 2015



> -----Original Message-----
> From: Wang, Liang-min
> Sent: Thursday, June 11, 2015 10:51 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, June 11, 2015 9:07 AM
> > To: Wang, Liang-min; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access
> > device info
> >
> >
> >
> > > -----Original Message-----
> > > From: Wang, Liang-min
> > > Sent: Thursday, June 11, 2015 1:58 PM
> > > To: Ananyev, Konstantin; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > access device info
> > >
> > > Hi Konstantin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Thursday, June 11, 2015 8:26 AM
> > > > To: Wang, Liang-min; dev at dpdk.org
> > > > Cc: Wang, Liang-min
> > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > access device info
> > > >
> > > > Hi Larry,
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang-Min
> > > > > Larry Wang
> > > > > Sent: Wednesday, June 10, 2015 4:10 PM
> > > > > To: dev at dpdk.org
> > > > > Cc: Wang, Liang-min
> > > > > Subject: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > > access device info
> > > > >
> > > > > add new apis:
> > > > > - rte_eth_dev_default_mac_addr_set
> > > > > - rte_eth_dev_reg_leng
> > > > > - rte_eth_dev_reg_info
> > > > > - rte_eth_dev_eeprom_leng
> > > > > - rte_eth_dev_get_eeprom
> > > > > - rte_eth_dev_set_eeprom
> > > > > - rte_eth_dev_get_ringparam
> > > > > - rte_eth_dev_set_ringparam
> > > > >
> > > > > to enable reading device parameters (mac-addr, register, eeprom,
> > > > > ring) based upon ethtool alike data parameter specification.
> > > > >
> > > > > Signed-off-by: Liang-Min Larry Wang <liang-min.wang at intel.com>
> > > > > ---
> > > > >  lib/librte_ether/Makefile              |   1 +
> > > > >  lib/librte_ether/rte_eth_dev_info.h    |  80 +++++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev.c          | 159
> > > > +++++++++++++++++++++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev.h          | 158
> > > > ++++++++++++++++++++++++++++++++
> > > > >  lib/librte_ether/rte_ether_version.map |   8 ++
> > > > >  5 files changed, 406 insertions(+)  create mode 100644
> > > > > lib/librte_ether/rte_eth_dev_info.h
> > > > >
> > > > > diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> > > > > index c0e5768..05209e9 100644
> > > > > --- a/lib/librte_ether/Makefile
> > > > > +++ b/lib/librte_ether/Makefile
> > > > > @@ -51,6 +51,7 @@ SRCS-y += rte_ethdev.c  SYMLINK-y-include +=
> > > > > rte_ether.h  SYMLINK-y-include += rte_ethdev.h  SYMLINK-y-include
> > > > > += rte_eth_ctrl.h
> > > > > +SYMLINK-y-include += rte_eth_dev_info.h
> > > > >
> > > > >  # this lib depends upon:
> > > > >  DEPDIRS-y += lib/librte_eal lib/librte_mempool lib/librte_ring
> > > > > lib/librte_mbuf diff --git a/lib/librte_ether/rte_eth_dev_info.h
> > > > > b/lib/librte_ether/rte_eth_dev_info.h
> > > > > new file mode 100644
> > > > > index 0000000..002c4b5
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_ether/rte_eth_dev_info.h
> > > > > @@ -0,0 +1,80 @@
> > > > > +/*-
> > > > > + *   BSD LICENSE
> > > > > + *
> > > > > + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> > > > > + *   All rights reserved.
> > > > > + *
> > > > > + *   Redistribution and use in source and binary forms, with or without
> > > > > + *   modification, are permitted provided that the following conditions
> > > > > + *   are met:
> > > > > + *
> > > > > + *     * Redistributions of source code must retain the above copyright
> > > > > + *       notice, this list of conditions and the following disclaimer.
> > > > > + *     * Redistributions in binary form must reproduce the above
> > copyright
> > > > > + *       notice, this list of conditions and the following disclaimer in
> > > > > + *       the documentation and/or other materials provided with the
> > > > > + *       distribution.
> > > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > > + *       contributors may be used to endorse or promote products
> > derived
> > > > > + *       from this software without specific prior written permission.
> > > > > + *
> > > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > > > CONTRIBUTORS
> > > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
> > BUT
> > > > NOT
> > > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > > > FITNESS FOR
> > > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > > > COPYRIGHT
> > > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > > > INCIDENTAL,
> > > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> > (INCLUDING,
> > > > BUT NOT
> > > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> > > > LOSS OF USE,
> > > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
> > CAUSED
> > > > AND ON ANY
> > > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> > OR
> > > > TORT
> > > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> > OUT
> > > > OF THE USE
> > > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > > > DAMAGE.
> > > > > + */
> > > > > +
> > > > > +#ifndef _RTE_ETH_DEV_INFO_H_
> > > > > +#define _RTE_ETH_DEV_INFO_H_
> > > > > +
> > > > > +
> > > > > +/*
> > > > > + * Placeholder for accessing device registers  */ struct
> > > > > +rte_dev_reg_info {
> > > > > +	void *buf; /**< Buffer for register */
> > > > > +	uint32_t offset; /**< Offset for 1st register to fetch */
> > > > > +	uint32_t leng; /**< Number of registers to fetch */
> > > > > +	uint32_t version; /**< Device version */ };
> > > > > +
> > > > > +/*
> > > > > + * Placeholder for accessing device eeprom  */ struct
> > > > > +rte_dev_eeprom_info {
> > > > > +	void *buf; /**< Buffer for eeprom */
> > > > > +	uint32_t offset; /**< Offset for 1st eeprom location to access */
> > > > > +	uint32_t leng; /**< Length of eeprom region to access */
> > > > > +	uint32_t magic; /**< Device ID */ };
> > > > > +
> > > > > +/*
> > > > > + * Placeholder for accessing device ring parameters  */ struct
> > > > > +rte_dev_ring_info {
> > > > > +	uint32_t rx_pending; /**< Number of outstanding Rx ring */
> > > > > +	uint32_t tx_pending; /**< Number of outstanding Tx ring */
> > > > > +	uint32_t rx_max_pending; /**< Maximum number of outstanding Rx
> > > > ring */
> > > > > +	uint32_t tx_max_pending; /**< Maximum number of outstanding Tx
> > > > ring
> > > > > +*/ };
> > > > > +
> > > > > +/*
> > > > > + * A data structure captures information as defined in struct
> > > > > +ifla_vf_info
> > > > > + * for user-space api
> > > > > + */
> > > > > +struct rte_dev_vf_info {
> > > > > +	uint32_t vf;
> > > > > +	uint8_t mac[ETHER_ADDR_LEN];
> > > > > +	uint32_t vlan;
> > > > > +	uint32_t tx_rate;
> > > > > +	uint32_t spoofchk;
> > > > > +};
> > > >
> > > >
> > > > Wonder what that structure is for?
> > > > I can't see it used in any function below?
> > > >
> > >
> > > Good catch, this is designed for other ethtool ops that I did not include in
> > this release, I will remove this from next fix.
> > >
> > > > > +
> > > > > +#endif /* _RTE_ETH_DEV_INFO_H_ */
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index 5a94654..186e85c 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -2751,6 +2751,32 @@ rte_eth_dev_mac_addr_remove(uint8_t
> > > > port_id,
> > > > > struct ether_addr *addr)  }
> > > > >
> > > > >  int
> > > > > +rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct
> > > > > +ether_addr
> > > > > +*addr) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +	const int index = 0;
> > > > > +	const uint32_t pool = 0;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	dev = &rte_eth_devices[port_id];
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mac_addr_remove, -
> > > > ENOTSUP);
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mac_addr_add, -
> > > > ENOTSUP);
> > > > > +
> > > > > +	/* Update NIC default MAC address*/
> > > > > +	(*dev->dev_ops->mac_addr_remove)(dev, index);
> > > > > +	(*dev->dev_ops->mac_addr_add)(dev, addr, index, pool);
> > > > > +
> > > > > +	/* Update default address in NIC data structure */
> > > > > +	ether_addr_copy(addr, &dev->data->mac_addrs[index]);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +int
> > > > >  rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
> > > > >  				uint16_t rx_mode, uint8_t on)  { @@ -3627,3
> > +3653,136 @@
> > > > > rte_eth_remove_tx_callback(uint8_t port_id,
> > > > uint16_t queue_id,
> > > > >  	/* Callback wasn't found. */
> > > > >  	return -EINVAL;
> > > > >  }
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_reg_leng(uint8_t port_id) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_length, -
> > > > ENOTSUP);
> > > > > +	return (*dev->dev_ops->get_reg_length)(dev);
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_reg_info(uint8_t port_id, struct rte_dev_reg_info
> > > > > +*info) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg, -ENOTSUP);
> > > > > +	return (*dev->dev_ops->get_reg)(dev, info); }
> > > >
> > > > Seems that *get_reg* stuff, will be really good addition for DPDK
> > > > debugging abilities.
> > > > Though, I'd suggest we change it a bit to make more generic and flexible:
> > > >
> > > > Introduce rte_eth_reg_read/rte_eth_reg_write(),
> > > > or probably even better rte_pcidev_reg_read /rte_pcidev_reg_write at
> > EAL.
> > > > Something similar to what port_pci_reg_read/port_pci_reg_write() are
> > > > doing now at testpmd.h.
> > > >
> > > > struct rte_pcidev_reg_info {
> > > >    const char *name;
> > > >    uint32_t endianes, bar, offset, size, count; };
> > > >
> > > > int rte_pcidev_reg_read(const struct rte_pci_device *, const struct
> > > > rte_pcidev_reg_info *, uint64_t *reg_val);
> > > >
> > > > Then:
> > > > int rte_eth_dev_get_reg_info(port_id, const struct
> > > > rte_pcidev_reg_info **info);
> > > >
> > > > So each device would store in info a pointer to an array of it's
> > > > register descriptions (finished by zero elem?).
> > > >
> > > > Then your ethtool (or any other upper layer) can do the following to
> > > > read all device regs:
> > > >
> > >
> > > The proposed reg info structure allows future improvement to support
> > individual register read/write.
> > > Also, because each NIC device has a very distinguish register definition.
> > > So, the plan is to have more comprehensive interface to support query
> > > operation (for example, register name) before introduce individual/group
> > register access.
> > > Points taken, the support will be in future release.
> >
> > Sorry, didn't get you.
> > So you are ok to make these changes in next patch version?
> >
> I would like to get a consensus from dpdk community on how to provide register information.

Well, that' ok, but if it is just a trial patch that is not intended to be applied, then you should mark it as RFC.

> Currently, it's designed for debug dumping. The register information is very hardware dependent.
> Need to consider current supported NIC device and future devices for DPDK, so we won't make it a bulky interface.

Ok, could you explain what exactly  concerns you in the approach described above?
What part you feel is bulky?

> > >
> > > > const struct rte_eth_dev_reg_info *reg_info; struct rte_eth_dev_info
> > > > dev_info;
> > > >
> > > > rte_eth_dev_info_get(pid, &dev_info);
> > > > rte_eth_dev_get_reg_info(port_id, &reg_info);
> > > >
> > > > for (i = 0; reg_info[i].name != NULL; i++) {
> > > >    ...
> > > >    rte_pcidev_read_reg(dev_info. pci_dev, reg_info[i], &v);
> > > >   ..
> > > > }
> > > >
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_eeprom_leng(uint8_t port_id) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_eeprom_length, -
> > > > ENOTSUP);
> > > > > +	return (*dev->dev_ops->get_eeprom_length)(dev);
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_get_eeprom(uint8_t port_id, struct
> > > > > +rte_dev_eeprom_info
> > > > > +*info) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_eeprom, -
> > > > ENOTSUP);
> > > > > +	return (*dev->dev_ops->get_eeprom)(dev, info); }
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_set_eeprom(uint8_t port_id, struct
> > > > > +rte_dev_eeprom_info
> > > > > +*info) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_eeprom, -
> > > > ENOTSUP);
> > > > > +	return (*dev->dev_ops->set_eeprom)(dev, info); }
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_get_ringparam(uint8_t port_id, struct
> > > > > +rte_dev_ring_info
> > > > > +*info) {
> > > > > +	struct rte_eth_dev *dev;
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > +		return -ENODEV;
> > > > > +	}
> > > > > +
> > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_ringparam, -
> > > > ENOTSUP);
> > > > > +	return (*dev->dev_ops->get_ringparam)(dev, info); }
> > > >
> > > > I think it will be a useful addition to the ethdev API  to have an
> > > > ability to retrieve current RX/TX queue parameters.
> > > > Though again, it need to be more generic, so it could be useful for
> > > > non- ethtool upper layer too.
> > > > So I suggest to modify it a bit.
> > > > Something like that:
> > > >
> > > > struct rte_eth_tx_queue_info {
> > > >     struct rte_eth_txconf txconf;
> > > >     uint32_t nb_tx_desc;
> > > >     uint32_t nb_max_tx_desc; /*max allowable TXDs for that queue */
> > > >     uint32_t nb_tx_free;            /* number of free TXDs at the moment of
> > call.
> > > > */
> > > >     /* other tx queue data. */
> > > > };
> > > >
> > > > int rte_etdev_get_tx_queue_info(portid, queue_id, struct
> > > > rte_eth_tx_queue_info *qinfo)
> > > >
> > > > Then, your upper layer ethtool wrapper, can implement yours
> > > > ethtool_get_ringparam() by:
> > > >
> > > >  ...
> > > >  struct rte_eth_tx_queue_info qinfo;
> > > > rte_ethdev_get_tx_queue_info(port, 0, &qinfo);
> > > > ring_param->tx_pending = qinfo.nb_tx_desc -
> > > > rte_eth_rx_queue_count(port, 0);
> > > >
> > > > Or probably even:
> > > > ring_param->tx_pending = qinfo.nb_tx_desc - qinfo.nb_tx_free;
> > > >
> > > > Same for RX.
> > > >
> > > For now, this descriptor ring information is used by the ethtool op.
> > > To make this interface simple, i.e. caller doesn't need to access other
> > queue information.
> >
> > I just repeat what I said to you in off-line conversation:
> > ethdev API is not equal ethtool API.
> > It is ok to add  a new function/structure to ethdev if it really needed, but we
> > should do mechanical one to one copy.
> > It is much better to add  a function/structure that would be more generic,
> > and suit other users, not only ethtool.
> > There is no point to have dozen functions in rte_ethdev API providing similar
> > information.
> > BTW, I don't see how API I proposed is much more  complex, then yours one.
> The ring parameter is a run-time information which is different than data structure described in this discussion.

I don't see how they are different.
Looking at ixgbe_get_ringparam(), it returns:
rx_max_pending - that's a static IXGBE PMD value (max possible number of RXDs per one queue).
rx_pending - number of RXD currently in use by the HW for queue 0 (that information can be changed at each call).

With the approach I suggesting - you can get same information for each RX queue by
calling rte_ethdev_get_rx_queue_info() and rte_eth_rx_queue_count(). 
Plus you are getting other RX queue data.

Another thing - what is practical usage of the information you retrieving now by get_ringparam()?
Let say it returned to you: rx_max_pending=4096; rx_pending=128;
How that information would help to understand what is going on with the device?
Without knowing  value of nb_tx_desc for the queue, you can't say is you queue full or not.
Again, it could be that all your traffic going through some other queue (not 0).
So from my point rte_eth_dev_get_ringparam()  usage is very limited, and doesn't
provide enough information about current queue state.

Same thing applies for TX. 

> It's the desire of this patch to separate each data structure to avoid cross dependency.

That's too cryptic to me.
Could you explain what cross dependency you are talking about?

Konstantin



More information about the dev mailing list